NihAV — Buffers and Wrappers

It might be hard to believe but the number of decoders in NihAV has tripled! So now there are three codecs supported in NihAV: Intel Indeo 2, Intel Indeo 3 and PCM.

Before I talk about the design I’d like to say some things about Indeo 3 implementation. Essentially it’s an improvement over Indeo 2 that had simple delta compression—now deltas are coming from one of 21 codebooks and can be applied to both pairs and quads of pixels, there is motion compensation and planes are split into cells that use blocks for coding data in them (4×4, 4×8 or 8×8 blocks). libavcodec had two versions of the decoder: the first version was submitted anonymously and looks like it’s a direct translation of disassembly for XAnim vid_iv32.so; the second version is still based on some binary specifications but also with some information coming from the Intel patent. The problem is that those two implementations are both rather horrible to translate directly into Rust because of all the optimisations like working with a quad of pixels as 32-bit integer plus lots of macros and overall control flow like a maze of twisty little passages. In result I’ve ended with three main structures: Indeo3Decoder for main things, Buffers for managing the internal frame buffers and doing pixel operations like block copy and CellDecParams for storing current cell decoding parameters like block dimensions, indices to the codebooks used and pointers to the functions that actually apply deltas or copy the lines for the current block (for example there are two different ways to do that for 4×8 block).

Anyway, back to overall NihAV design changes. Now there’s a dedicated structure NATimeInfo for keeping DTS, PTS, frame duration and timebase information; this structure is used in both NAFrame and NAPacket for storing timestamp information. And NAFrame now is essentially the wrapper for NATimeInfo, NABufferType plus some metadata.

So what is NABufferType? It is the type-specific frame buffer that stores actual data:

pub enum NABufferType {
    Video      (NAVideoBuffer<u8>),
    Video16    (NAVideoBuffer<u16>),
    VideoPacked(NAVideoBuffer<u8>),
    AudioU8    (NAAudioBuffer<u8>),
    AudioI16   (NAAudioBuffer<i16>),
    AudioI32   (NAAudioBuffer<i32>),
    AudioF32   (NAAudioBuffer<f32>),
    AudioPacked(NAAudioBuffer<u8>),
    Data       (NABufferRefT<u8>),
    None,
}

As you can see it declares several types of audio and video buffers. That’s because you don’t want to mess with bytes in many cases: if you decode 10-bit video you’d better output pixels directly into 16-bit elements, same with audio; for the other cases there’s AudioPacked/VideoPacked. To reiterate: the idea is that you allocate buffer of specific type and output native elements into it (floats for AudioF32, 16-bit for packed RGB565/RGB555 formats etc. etc.) and the conversion interface or the sink will take care of converting data into designated format.

And here’s how audio buffer looks like (video buffer is about the same but doesn’t have channel map):

pub struct NAAudioBuffer<T> {
    info:   NAAudioInfo,
    data:   NABufferRefT<T>,
    offs:   Vec<usize>,
    chmap:  NAChannelMap,
}

impl<T: Clone> NAAudioBuffer<T> {
    pub fn get_offset(&self, idx: usize) -> usize { ... }
    pub fn get_info(&self) -> NAAudioInfo { self.info }
    pub fn get_chmap(&self) -> NAChannelMap { self.chmap.clone() }
    pub fn get_data(&self) -> Ref<Vec<T>> { self.data.borrow() }
    pub fn get_data_mut(&mut self) -> RefMut<Vec<T>> { self.data.borrow_mut() }
    pub fn copy_buffer(&mut self) -> Self { ... }
}

For planar audio (or video) get_offset() allows caller to obtain the offset in the buffer to the requested component (because it’s all stored in the single buffer).

There are two functions for allocating buffers:

pub fn alloc_video_buffer(vinfo: NAVideoInfo, align: u8) -> Result<NABufferType, AllocatorError>;
pub fn alloc_audio_buffer(ainfo: NAAudioInfo, nsamples: usize, chmap: NAChannelMap) -> Result<NABufferType, AllocatorError>;

Video buffer allocated buffer in the requested format with the provided block alignment (it’s for the codecs that actually code data in e.g. 16×16 macroblocks but still want to report frame having e.g. width=1366 or height=1080 and if you think that it’s better to constantly confuse avctx->width with avctx->coded_width then you’ve forgotten this project name). Audio buffer allocator needs to know the length of the frame in samples instead.

As for subtitles, they will not be implemented in NihAV beside demuxing the stream with subtitle data. I believe subtitles are the dependent kind of stream and because of that they should be rendered by the consumer (video player program or whatever). Otherwise you need to take, say, RGB-encoded subtitles, convert them into proper YUV flavour and draw in the specific region of the frame which might be not the original size if you use e.g. DVD rip encoded into different size with DVD subtitles preserved as is. And for textual subtitles you have even more rendering problems since you need to render them with proper font (stored as the attachment in the container), apply using the proper effect, adjust positions if needed and such. Plus the user may want to adjust them during playback in some way so IMO it belongs to the rendering pipeline and not NihAV (it’s okay though, you’re not going to use NihAV anyway).

Oh, and PCM “decoder” just rewraps buffer provided by NAPacket as NABufferType::AudioPacked, it’s good enough to dump as is and the future resampler will take care of format conversion.

No idea what comes next: maybe it’s Indeo audio decoders, maybe it’s Indeo 4/5 video decoder or maybe it’s deflate unpacker. Or something completely different. Or nothing at all. Only the time will tell.

One Response to “NihAV — Buffers and Wrappers”

  1. Luca Barbato says:

    For my x264-rs wrapper I used the pattern to expose the memory as 1 slice per plane.

    I hadn’t added the 10bit support yet but probably I’ll just make a variant of as_slice and as_mut_slice that provides the buffer as &[u16]