NihAV — Glue and Hacks

I don’t like to write the code that does nothing, it’s the excitement of my code doing at least something that keeps me writing code. So instead of designing a lot of new interfaces and such that can describe all theoretically feasible stuff plus utility code to handle the things passed through aforementioned interfaces, I’ve just added some barely working stuff, wrote a somewhat working demuxer and made a decoder.

And here it is:

use io::bitreader::*;
use io::codebook::*;
use formats;
use super::*;

static INDEO2_DELTA_TABLE: [[u8; 256]; 4] = [
    [
      0x80, 0x80, [...the rest is skipped for clarity...]
    ]
];

static INDEO2_CODE_CODES: &[u16] = &[
    0x0000, 0x0004, [...the rest is skipped for clarity...]
];

static INDEO2_CODE_LENGTHS: &[u8] = &[
     3,  3,  [...the rest is skipped for clarity...]
];

struct IR2CodeReader { }

impl CodebookDescReader<u8> for IR2CodeReader {
    fn bits(&mut self, idx: usize) -> u8  { INDEO2_CODE_LENGTHS[idx] }
    fn code(&mut self, idx: usize) -> u32 { INDEO2_CODE_CODES[idx] as u32 }
    fn sym (&mut self, idx: usize) -> u8 {
        if idx < 0x7F { (idx + 1) as u8 } else { (idx + 2) as u8 }
    }
    fn len(&mut self) -> usize { INDEO2_CODE_LENGTHS.len() }
}

struct Indeo2Decoder {
    info:    Rc<NACodecInfo>,
    cb:      Codebook<u8>,
    lastfrm: Option<Rc<NAFrame>>,
}

impl Indeo2Decoder {
    fn new() -> Self {
        let dummy_info = Rc::new(DUMMY_CODEC_INFO);
        let mut coderead = IR2CodeReader{};
        let cb = Codebook::new(&mut coderead, CodebookMode::LSB).unwrap();
        Indeo2Decoder { info: dummy_info, cb: cb, lastfrm: None }
    }

    fn decode_plane_intra(&self, br: &mut BitReader,
                          frm: &mut NAFrame, planeno: usize,
                          tableno: usize) -> DecoderResult<()> {
        let offs = frm.get_offset(planeno);
        let (w, h) = frm.get_dimensions(planeno);
        let stride = frm.get_stride(planeno);
        let cb = &self.cb;

        let mut buffer = frm.get_buffer_mut().unwrap();
        let mut data = buffer.get_data_mut().unwrap();
        let mut framebuf: &mut [u8] = data.as_mut_slice();

        let table = &INDEO2_DELTA_TABLE[tableno];

        let mut base = offs;
        let mut x: usize = 0;
        while x < w {
            let idx = br.read_cb(cb)? as usize;
            if idx >= 0x80 {
                let run = (idx - 0x80) * 2;
                if x + run > w { return Err(DecoderError::InvalidData); }
                for i in 0..run {
                    framebuf[base + x + i] = 0x80;
                }
                x += run;
            } else {
                framebuf[base + x + 0] = table[(idx * 2 + 0) as usize];
                framebuf[base + x + 1] = table[(idx * 2 + 1) as usize];
                x += 2;
            }
        }
        base += stride;
        for _ in 1..h {
            let mut x: usize = 0;
            while x < w {
                let idx = br.read_cb(cb)? as usize;
                if idx >= 0x80 {
                    let run = (idx - 0x80) * 2;
                    if x + run > w { return Err(DecoderError::InvalidData); }
                    for i in 0..run {
                        framebuf[base + x + i] = framebuf[base + x + i - stride];
                    }
                    x += run;
                } else {
                    let delta0 = (table[idx * 2 + 0] as i16) - 0x80;
                    let delta1 = (table[idx * 2 + 1] as i16) - 0x80;
                    let mut pix0 = framebuf[base + x + 0 - stride] as i16;
                    let mut pix1 = framebuf[base + x + 1 - stride] as i16;
                    pix0 += delta0;
                    pix1 += delta1;
                    if pix0 < 0 { pix0 = 0; }
                    if pix1 < 0 { pix1 = 0; }
                    if pix0 > 255 { pix0 = 255; }
                    if pix1 > 255 { pix1 = 255; }
                    framebuf[base + x + 0] = pix0 as u8;
                    framebuf[base + x + 1] = pix1 as u8;
                    x += 2;
                }
            }
            base += stride;
        }
        Ok(())
    }

    fn decode_plane_inter(&self, br: &mut BitReader,
                          frm: &mut NAFrame, planeno: usize,
                          tableno: usize) -> DecoderResult<()> {
        let offs = frm.get_offset(planeno);
        let (w, h) = frm.get_dimensions(planeno);
        let stride = frm.get_stride(planeno);
        let cb = &self.cb;

        let mut buffer = frm.get_buffer_mut().unwrap();
        let mut data = buffer.get_data_mut().unwrap();
        let mut framebuf: &mut [u8] = data.as_mut_slice();

        let table = &INDEO2_DELTA_TABLE[tableno];

        let mut base = offs;
        for _ in 0..h {
            let mut x: usize = 0;
            while x < w {
                let idx = br.read_cb(cb)? as usize;
                if idx >= 0x80 {
                    let run = (idx - 0x80) * 2;
                    if x + run > w { return Err(DecoderError::InvalidData); }
                    x += run;
                } else {
                    let delta0 = (table[idx * 2 + 0] as i16) - 0x80;
                    let delta1 = (table[idx * 2 + 1] as i16) - 0x80;
                    let mut pix0 = framebuf[base + x + 0] as i16;
                    let mut pix1 = framebuf[base + x + 1] as i16;
                    pix0 += delta0 * 3 >> 2;
                    pix1 += delta1 * 3 >> 2;
                    if pix0 < 0 { pix0 = 0; }
                    if pix1 < 0 { pix1 = 0; }
                    if pix0 > 255 { pix0 = 255; }
                    if pix1 > 255 { pix1 = 255; }
                    framebuf[base + x + 0] = pix0 as u8;
                    framebuf[base + x + 1] = pix1 as u8;
                    x += 2;
                }
            }
            base += stride;
        }
        Ok(())
    }
}

const IR2_START: usize = 48;

impl NADecoder for Indeo2Decoder {
    #[allow(unused_variables)]
    fn init(&mut self, info: Rc<NACodecInfo>) -> DecoderResult<()> {
        if let NACodecTypeInfo::Video(vinfo) = info.get_properties() {
            let w = vinfo.get_width();
            let h = vinfo.get_height();
            let f = vinfo.is_flipped();
            let fmt = formats::YUV410_FORMAT;
            let myinfo = NACodecTypeInfo::Video(NAVideoInfo::new(w, h, f, fmt));
            self.info = Rc::new(NACodecInfo::new_ref(info.get_name(), myinfo, info.get_extradata()));
            Ok(())
        } else {
            Err(DecoderError::InvalidData)
        }
    }
    fn decode(&mut self, pkt: &NAPacket) -> DecoderResult<Rc<NAFrame>> {
        let src = pkt.get_buffer();
        if src.len() <= IR2_START { return Err(DecoderError::ShortData); }
        let interframe = src[18];
        let tabs = src[34];
        let mut br = BitReader::new(&src[IR2_START..], src.len() - IR2_START, BitReaderMode::LE);
        let luma_tab = tabs & 3;
        let chroma_tab = (tabs >> 2) & 3;
        if interframe != 0 {
            let mut frm = NAFrame::new_from_pkt(pkt, self.info.clone());
            for plane in 0..3 {
                let tabidx = (if plane == 0 { luma_tab } else { chroma_tab }) as usize;
                self.decode_plane_intra(&mut br, &mut frm, plane, tabidx)?;
            }
            let rcf = Rc::new(frm);
            self.lastfrm = Some(rcf.clone());
            Ok(rcf)
        } else {
            let lf = self.lastfrm.clone();
            if let None = lf { return Err(DecoderError::MissingReference); }
            let lastfr = lf.unwrap();
            let mut frm = NAFrame::from_copy(lastfr.as_ref());
            frm.fill_timestamps(pkt);
            for plane in 0..3 {
                let tabidx = (if plane == 0 { luma_tab } else { chroma_tab }) as usize;
                self.decode_plane_inter(&mut br, &mut frm, plane, tabidx)?;
            }
            let rcf = Rc::new(frm);
            self.lastfrm = Some(rcf.clone());
            Ok(rcf)
        }
    }
}

pub fn get_decoder() -> Box<NADecoder> {
    Box::new(Indeo2Decoder::new())
}

#[cfg(test)]
mod test {
    use codecs;
    use demuxers::*;
    use frame::NAFrame;
    use io::byteio::*;
    use std::fs::File;
    use std::io::prelude::*;

    #[test]
    fn test_indeo2() {
        let avi_dmx = demuxers::find_demuxer("avi").unwrap();
        let mut file = File::open("assets/laser05.avi").unwrap();
        let mut fr = FileReader::new_read(&mut file);
        let mut br = ByteReader::new(&mut fr);
        let mut dmx = avi_dmx.new_demuxer(&mut br);
        dmx.open().unwrap();
        let mut dec = (codecs::find_decoder("indeo2").unwrap())();

        let mut str: u32 = 42;
        for i in 0..dmx.get_num_streams() {
            let s = dmx.get_stream(i).unwrap();
            let info = s.get_info();
            if info.is_video() && info.get_name() == "indeo2" {
                str = s.get_id();
                dec.init(s.get_info()).unwrap();
                break;
            }
        }

        loop {
            let pktres = dmx.get_frame();
            if let Err(e) = pktres {
                if (e as i32) == (DemuxerError::EOF as i32) { break; }
                panic!("error");
            }
            let pkt = pktres.unwrap();
            if pkt.get_stream().get_id() == str {
                let frm = dec.decode(&pkt).unwrap();
                write_pgmyuv(pkt.get_pts().unwrap(), &frm);
            }
        }
    }

    fn write_pgmyuv(num: u64, frm: &NAFrame) {
        [...]skipped for clarity...]
    }
}

(In case you wonder what are all those .unwrap() for, Rust doesn’t have NULL pointers and uses other means like enum Option which can be either None or Some(x), so in order to access contents you have to unwrap it. Same for results where you can either get requested output or some error.)

Anyway, if you look at the end of the code (at the test function) you can see how it should work in principle:

  1. you request demuxer by name (in the future it will be possible to get demuxer for MIME type or file extension plus probing);
  2. you create a new demuxer instance for certain ByteReader input (in the future it should be easy to add chained demuxers);
  3. you try opening input (demuxer reads header then);
  4. you scan the streams declared by demuxer and decide how to handle them;
  5. you request decoder(s) for the provided stream(s) in the same fashion as demuxers;
  6. you loop until demuxer gives an error or ends demuxing and feed packets from proper stream to the decoder and do whatever you like with the output.

Decoder submodule exports just get_decoder() function which is used in the main module to create demuxer instances on request (the example of usage is in the code above):

pub struct DecoderInfo {
    name: &'static str,
    get_decoder: fn () -> Box,
}

const DECODERS: &[DecoderInfo] = &[
#[cfg(feature="decoder_indeo2")]
    DecoderInfo { name: "indeo2", get_decoder: indeo2::get_decoder },
];

pub fn find_decoder(name: &str) -> Option<fn () -> Box<NADecoder>> {
    for &dec in DECODERS {
        if dec.name == name {
            return Some(dec.get_decoder);
        }
    }
    None
}

The data structures are quite nested: NAFrame and NAPacket have a pointer to NACodecInfo, which contains codec name, possible extradata and codec type information. That codec type information has type-specific information tied to the type itself:

pub enum NACodecTypeInfo {
    None,
    Audio(NAAudioInfo),
    Video(NAVideoInfo),
}

Where NAVideoInfo includes (currently) frame dimensions and NAPixelFormaton (salvaged from the NAScale which I described long time ago). NAAudioInfo has number of channels, sample rate, block size length and NASoniton following the same model but for audio sample:

pub struct NASoniton {
    bits:       u8,
    be:         bool,
    packed:     bool,
    planar:     bool,
    float:      bool,
    signed:     bool,
}

Here you have sample size (in bits) and its type (signed/unsigned integer or float, in BE/LE format) and two confusing flags: packed is for signalling that individual samples are stored in packed form or not, that matters only if you have e.g. 20-bit samples that can be stored in 24 bits individually or two samples crammed into 5 bytes; and planar is for signalling whether channel data is stored in separate buffers or interleaved in single buffer. Since I don’t care much about audio at this stage the finer details about obtaining proper buffers and managing proper channel maps are left for the later.

Also as I said in the original NihAV manifest, decoders and demuxers are distinguished by text names because I strongly dislike enumerations spanning several screens. So AVI demuxer calls register::find_codec_from_avi_fourcc() or register::find_codec_from_wav_twocc() and it will return the codec name as a string; you can use that string to search for an appropriate decoder or retrieve known codec information from the same registry.

That’s all for now, the next things that are likely to happen (in no particular order):

  • refactoring data structures, moving them between modules and adding more utility code;
  • work on proper audio support;
  • work on proper video frame management (especially ownership);
  • some Indeo video or audio decoder;
  • utility code for more automated demuxer output handling (automatic stream skipping, better demuxer assignment and such);
  • anything else.

Until next time.

7 Responses to “NihAV — Glue and Hacks”

  1. Luca Barbato says:

    Thank you for documenting all this 🙂

  2. Marcus says:

    You should NOT use the file extension to check the filetype, file extensions don’t actually mean anything.

    If I was writing NihAV, I would have it check the first X bits to see if it matches any known type, and I’d store the known types in an Enum.

    So have PNG’s magic number in there and everything else, and do it that way.

    Otherwise, you’re just asking for trouble, and for formats with no real standard extension (like TrueHD, or JPEG-LS), you’re just giving the user even more trouble.

  3. Kostya says:

    I’d argue that quite often you have certain file extensions associated with certain format. Also file extensions are good for reducing the winnowing process. If the extension matches then you can try specific demuxer and if it agrees you’re done. Otherwise you have to iterate through probing functions for every demuxer.
    While I’d definitely not use enums, having magic-like database for quick and dirty format detection (I’d not like to do that with a demuxer instance really) would be interesting to implement. And the shit that is essentially elementary stream dumped into a file (MP3, raw H.264/H.EVC, speech codecs etc etc) will give troubles regardless. In any case it’s not the decision for today or even tomorrow.

  4. Marcus says:

    I just had a crazy idea.

    Since file extensions don’t actually mean anything, but users associate meaning with them (sometimes erroneously), either way, they do that because it’s convenient, so we can’t really get rid of them.

    What if instead you based your codec selection on MIME types?

    So the user would type something like (based on FFmpeg) `NihAV -i INPUTFILE -c:image/jpeg -o OUTPUTFILE`

    IDK, that kinda makes a crazy amount of sense to me, and you could probably just download a list of the latest MIME definitions as it was built.

  5. Kostya says:

    I’m sorry to break your illusions but that idea is not that crazy. Browsers do that already. Or even RealMedia container.

    There is one issue here: MIME types are defined for generic file formats. That means that they don’t cover codecs well and have lots of redunancy which makes them a bit unwieldy with e.g. video/x-msvideo instead of avi. And since I’m lazy (and this is NihAV) I’d go with my own list of reasonably short names for both codecs and container formats.

  6. Marcus says:

    Obviously it’s used by browsers, I’ve never seen it used by any media app tho.

    As for the longer, more complex mimetypes like image/x-microsoft-avi (I made that one up, but you get the idea) I think those would be too combersum, but I’m thinking about adding a parser for that to BitIO, to extract the type (image/audio, video), then the codec’s name.

  7. Kostya says:

    The problem here is that image and video types overlap (maybe because video is merely a sequence of images). In result we have things like Motion JPEG and animated GIF.

    In any case, if you feel this is the right way then write the code or at least a design document, it will be interesting to study (and who knows where it will end up). For example once I proposed a mechanism for packet side data because I wanted demuxers to transmit palette changes in clean non-intrusive way, nowadays it seems to be used for frame output too and for all possible purposed but palette change.