NihAV « Kostya's Boring Codec World

Archive for the ‘NihAV’ Category

NihAV — Concept and Principles

Thursday, June 1st, 2017

Looks like I’m going to repeat the same things over and over in every NihAV-related post so I’d better sum them up and ~~when~~if people ask why some decision was made like that I can point them here.

So, let’s start with what NihAV IS. NihAV is the project started by me and me alone with the following goals:

design multimedia framework from the ground in the way I see fit (hence the NIH in the name);
do that without any burden of legacy (should be obvious why);
implement real working code to both test the concepts and to keep me interested in continuing the project (it gets boring pretty quickly when you design, write code and it still does not do anything visible at all);
ignore bullshit cases like interlaced H.264 (the project is written by me and for myself and I’ll do fine without it, thank you very much);
let me understand Rust better (it’s not that important but a nice bonus nevertheless).

Now what NihAV is NOT and is NOT going to be:

a full-stack multimedia framework (i.e. which lacks only handling user input and audio/video output to become a media player too, more about it below);
transcoder for all your needs (first, I hardly care about my own needs; second, transcoder belongs elsewhere);
supporting things just because they’re standard (you can leave your broadcasting shit to yourself, including but not limited to MXF, interlacing and private streams in MPEG-TS);
designed with the most convenient way of usage for the end user (e.g. in frame management I already output dummy frames that merely signal there was no change from the previous frame; also frame reordering will be implemented outside decoders);
have other FFeatures just because some other project has them;
depend on many other crates (that’s the way of NIH!);
have hacks to support some very special cases (I’m not going to be paid for e.g. fixing AVI demuxer to support some file produced by a broken AVI writer anyway).

What it might become is a foundation for higher level multimedia data management which in turn can be either a library for building transcoder/player or just used directly in such tools. IMO libav* has suffered exactly from the features that should be kept in transcoder creeping into the libraries, the whole libavdevice is an example of that. Obviously it takes some burden off library users (including transcoding tool developers) but IMO library should be rather finished piece with clearly defined functionality, not a collection of code snippets developers decided to reuse or share with the world. Just build another layer (not wrapper, functional layer!) on top of it.

For similar reasons I’m not going to hide serious functionality in utility code or duplicate it in codecs. In NihAV frames will be output in the same order as received and reordering for the display will be done in specific frame reorderer (if needed), same for filling missing timestamps; dummy frame that tells just to repeat the previous frame is used there in GDV decoder already:

    let mut frm = NAFrame::new_from_pkt(pkt, self.info.clone(), NABufferType::None);
    frm.set_keyframe(false);
    frm.set_frame_type(FrameType::Skip);

Some things do not belong to NihAV because they are either too low-level (like protocols) or too high-level (subtitles rendering, stream handling for e.g. transcoding or playback, playlist support). Some of them deserve to be made into separate library(ies) later, others should be implemented by the end user. Again, IMO libav* suffers from exactly this mix of low- and medium-level stuff that feels too low-level and not low-level enough at the same time (just look how much code those ffmpeg or avconv tools have). Same goes for hardware-accelerated decoding where the library should just demux frame data and parse its headers, the rest is up to hwaccel chain in the end application, but instead lazy users prefer libavcodec to try all possible hwaccels on the frame and fall back to multithreaded software decoding automatically if required. And preferably all other processing in e.g. libavfilter should be done using custom hwaccel format too. Since I’m all for this approach (…NOT), NihAV will recognize that the frame uses some hwaccel format and that’s all. It’s up to the upper layer to build custom processing chain.

I hope the domain for NihAV is clear: it will take ByteIO input, demux data using it (packets or elementary stream chunks—if you want them in packet format then use a parser), optionally fill timestamp information, decode frames, reorder them in display order if requested, similar approach for writing data. Anything else will belong to other crates (and they might appear in the future too). But for now this is enough for me.

P.S. If I wanted to have multimedia player I’d write one that can take MP4/FLAC/WV for input and decode AAC/FLAC/WavPack plus feed H.264 to VAAPI. I know my hardware and my content, others can write their own players.

P.P.S. If you want multimedia framework written in Rust for wide userbase just wait until rust-av is ready.

Posted in NihAV | 1 Comment »

NihAV — a Small Update

Wednesday, May 31st, 2017

For testing how well NihAV handles palettised formats I’ve decided to add support for Gremlin Digital Video format (8-bit only). So now I can decode various cutscenes from Normality, one of very few 3D first person adventure games for DOS. I’ve tested my implementation and it works fine.

The funny thing is that this demuxer and decoder for GDV (actually there’s also GDV DPCM but the samples I have seem to use raw PCM) are missing from CEmpeg. Wiki description also has some parts missing.

The first frame I was decoding started with a code for copying 8 bytes from offset -56. The first frame. At the very first pixel. So I’ve consulted the VAG’s code and the original binary specification (even by dumping executed instructions in DosBox and analysing them—it helped me in debugging later) to see where it went wrong. And it turns out the decoder is really supposed to do that because it has specially initialised buffer before the actual frame data (kinda like the original LZHUF did, also there’s no need to check if we copy before the buffer start since it’s not possible) plus some other small issues. I’ll try to correct the Wiki article on GDV in the following days.

And I don’t really plan to add any other old game codecs beside VMD and Smacker (I have soft spot for them after all). Next decoders should be either for audio or more modern ones, like H.26x or Indeo 4/5 since I still have some ideas to test out.

Update to to this update: my decoder code is here.

Posted in NihAV | 1 Comment »

NihAV — Buffers and Wrappers

Saturday, May 27th, 2017

It might be hard to believe but the number of decoders in NihAV has tripled! So now there are three codecs supported in NihAV: Intel Indeo 2, Intel Indeo 3 and PCM.

Before I talk about the design I’d like to say some things about Indeo 3 implementation. Essentially it’s an improvement over Indeo 2 that had simple delta compression—now deltas are coming from one of 21 codebooks and can be applied to both pairs and quads of pixels, there is motion compensation and planes are split into cells that use blocks for coding data in them (4×4, 4×8 or 8×8 blocks). libavcodec had two versions of the decoder: the first version was submitted anonymously and looks like it’s a direct translation of disassembly for XAnim vid_iv32.so; the second version is still based on some binary specifications but also with some information coming from the Intel patent. The problem is that those two implementations are both rather horrible to translate directly into Rust because of all the optimisations like working with a quad of pixels as 32-bit integer plus lots of macros and overall control flow like a maze of twisty little passages. In result I’ve ended with three main structures: Indeo3Decoder for main things, Buffers for managing the internal frame buffers and doing pixel operations like block copy and CellDecParams for storing current cell decoding parameters like block dimensions, indices to the codebooks used and pointers to the functions that actually apply deltas or copy the lines for the current block (for example there are two different ways to do that for 4×8 block).

Anyway, back to overall NihAV design changes. Now there’s a dedicated structure NATimeInfo for keeping DTS, PTS, frame duration and timebase information; this structure is used in both NAFrame and NAPacket for storing timestamp information. And NAFrame now is essentially the wrapper for NATimeInfo, NABufferType plus some metadata.

So what is NABufferType? It is the type-specific frame buffer that stores actual data:

pub enum NABufferType {
    Video      (NAVideoBuffer<u8>),
    Video16    (NAVideoBuffer<u16>),
    VideoPacked(NAVideoBuffer<u8>),
    AudioU8    (NAAudioBuffer<u8>),
    AudioI16   (NAAudioBuffer<i16>),
    AudioI32   (NAAudioBuffer<i32>),
    AudioF32   (NAAudioBuffer<f32>),
    AudioPacked(NAAudioBuffer<u8>),
    Data       (NABufferRefT<u8>),
    None,
}

As you can see it declares several types of audio and video buffers. That’s because you don’t want to mess with bytes in many cases: if you decode 10-bit video you’d better output pixels directly into 16-bit elements, same with audio; for the other cases there’s AudioPacked/VideoPacked. To reiterate: the idea is that you allocate buffer of specific type and output native elements into it (floats for AudioF32, 16-bit for packed RGB565/RGB555 formats etc. etc.) and the conversion interface or the sink will take care of converting data into designated format.

And here’s how audio buffer looks like (video buffer is about the same but doesn’t have channel map):

pub struct NAAudioBuffer<T> {
    info:   NAAudioInfo,
    data:   NABufferRefT<T>,
    offs:   Vec<usize>,
    chmap:  NAChannelMap,
}

impl<T: Clone> NAAudioBuffer<T> {
    pub fn get_offset(&self, idx: usize) -> usize { ... }
    pub fn get_info(&self) -> NAAudioInfo { self.info }
    pub fn get_chmap(&self) -> NAChannelMap { self.chmap.clone() }
    pub fn get_data(&self) -> Ref<Vec<T>> { self.data.borrow() }
    pub fn get_data_mut(&mut self) -> RefMut<Vec<T>> { self.data.borrow_mut() }
    pub fn copy_buffer(&mut self) -> Self { ... }
}

For planar audio (or video) get_offset() allows caller to obtain the offset in the buffer to the requested component (because it’s all stored in the single buffer).

There are two functions for allocating buffers:

pub fn alloc_video_buffer(vinfo: NAVideoInfo, align: u8) -> Result<NABufferType, AllocatorError>;
pub fn alloc_audio_buffer(ainfo: NAAudioInfo, nsamples: usize, chmap: NAChannelMap) -> Result<NABufferType, AllocatorError>;

Video buffer allocated buffer in the requested format with the provided block alignment (it’s for the codecs that actually code data in e.g. 16×16 macroblocks but still want to report frame having e.g. width=1366 or height=1080 and if you think that it’s better to constantly confuse avctx->width with avctx->coded_width then you’ve forgotten this project name). Audio buffer allocator needs to know the length of the frame in samples instead.

As for subtitles, they will not be implemented in NihAV beside demuxing the stream with subtitle data. I believe subtitles are the dependent kind of stream and because of that they should be rendered by the consumer (video player program or whatever). Otherwise you need to take, say, RGB-encoded subtitles, convert them into proper YUV flavour and draw in the specific region of the frame which might be not the original size if you use e.g. DVD rip encoded into different size with DVD subtitles preserved as is. And for textual subtitles you have even more rendering problems since you need to render them with proper font (stored as the attachment in the container), apply using the proper effect, adjust positions if needed and such. Plus the user may want to adjust them during playback in some way so IMO it belongs to the rendering pipeline and not NihAV (it’s okay though, you’re not going to use NihAV anyway).

Oh, and PCM “decoder” just rewraps buffer provided by NAPacket as NABufferType::AudioPacked, it’s good enough to dump as is and the future resampler will take care of format conversion.

No idea what comes next: maybe it’s Indeo audio decoders, maybe it’s Indeo 4/5 video decoder or maybe it’s deflate unpacker. Or something completely different. Or nothing at all. Only the time will tell.

Posted in NihAV | 1 Comment »

NihAV — Glue and Hacks

Saturday, May 20th, 2017

I don’t like to write the code that does nothing, it’s the excitement of my code doing at least something that keeps me writing code. So instead of designing a lot of new interfaces and such that can describe all theoretically feasible stuff plus utility code to handle the things passed through aforementioned interfaces, I’ve just added some barely working stuff, wrote a somewhat working demuxer and made a decoder.

And here it is:
(more…)

Posted in NihAV | 7 Comments »

NihAV – io module

Thursday, May 11th, 2017

I’ve more or less completed nihav::io module, so let’s look at it (or not, it’s your choice).

There are four components there: bytestream reading, bitstream reading, generic integer code reading and codebook support for bitstream reader. Obviously there is no writing functionality there but I don’t need it now and it can be added later when (or if) needed.
(more…)

Posted in NihAV | 5 Comments »

NihAV Development Progress

Saturday, May 6th, 2017

After long considerations and much hesitation NihAV finally accepts its first developer (that would be me). And for a change it will be written in Rust. First, it’s an interesting language worth learning; second, it seems to offer needed functionality without much hassle; third, it offers new features that are tempting to try. By new features I mostly mean enums, traits and functions bound to structures.

Previously I expressed the intent to do a completely new design of multimedia (mostly decoding) framework with decoders being assembled from smaller blocks. For example, if I’d implement VIVO H.263 decoder (just as a troll) it would contain these bits:

generic 8×8 block decoder interface that does common stuff for such decoders (maintaining block indices, filling frame information e.g. block type, motion vectors etc etc);
trait for 8×8 block decoder implementation that does actual bitstream decoding (functions for decoding GOP/picture/slice headers, MV prediction, block data decoding and such);
IPB frame shuffler as implementation of generic frame shuffler (i.e. that piece of code that selects which frames to use as references and which frame to output after decoding the current one);
maybe even custom codebook accessor (the piece of code that tells codebook generator what is the code and symbol at position N) so it doesn’t need to be converted into some fixed form.

There’s not much code written yet but there are some bits implemented: rudimentary universal bitstream reader, bytestream reader (the same for memory and file I/O) and semi-working framework for demuxing. That’s a start at least.

Side rant: it seems that visible languages (i.e. not completely obscure ones) that use := form assignment have rather unpleasant evangelists (full of themselves in the best case, actively pushing their language to replace everything else in the worst case). That includes Go, Oberon and Wolfram Language. I don’t mean that other languages are free of that problem but in these cases looks like the majority of posts or articles about such language are written from this position.

Posted in NihAV, Useless Rants | 2 Comments »

NAScale — Internal Design

Saturday, February 6th, 2016

In my previous post I’ve described NAScale ideas and here I’d like to give more detailed overview on the internal design.

Basically, it builds a processing pipeline (or chain, I insist on terminology being inconsistent) that takes input frame, does some magic on it and outputs the result.
Let’s start with looking at typical pipeline for processing packed formats:

And that should be a pipeline for the worst case since we don’t need input unpacker when it’s not packed (and same for the output), and processing stage might be not needed either (if we simply repack formats), and in some cases one stage will be enough.
My approach for pipeline construction is rather simple:

we have special modules (called kernels) that are used to construct pipeline stages;
all those modules are divided into three categories (input handling, output handling and intermediate filters)
all stages input and output (except for the source and destination handlers) should be planar and either 8- or 16-bit native endian (the less variations in input and output one has to handle the better).

Efficiency considerations tell us there will be special kernels for combining format conversion into one stage (like super-optimised rgb24tovyuy) but generic pipeline able to hold almost any format should have one universal input unpacker and one universal output packer.

Let’s review how pipeline building should work.
Unfortunately, my prototype can build only several pipelines for very specific cases but the principle stays the same.

Zeroeth stage: check if we are dealing with completely the same input and output formats and dimensions and then just apply memcopy (the kernel is obviously called murder).
Even if formats are the same we may need to scale the input or convert it to account for colourspace details etc etc.
Right from the start we need to check if we deal with a packed format and insert unpack stage or skip it and feed input directly to the processing stage.
On one hand there might be need for input stage converting planar 12-bit input into proper 16-bit input for processing stages, on the other hand it might be skipped in some cases (for efficiency reasons).

And then you should add a next stage but don’t forget about intermediate planar buffers—they should be allocated for the stage so that the following stage will know where to read its input from.
Now it’s probably a good time to mention that during pipeline construction we should keep track of current format and thus each stage construction should modify it to let the next stage know what input format it should expect (e.g. input was packed 10-bit BGR, unpack makes it into 16-bit planar RGB, scale changes nothing but dimensions and rgb2yuv converts it to 16-bit YUV and pack converts it into output format like YUYV).
Don’t forget that for clarity and simplicity all stages get pointers to the components in the order of the colourspace model components, i.e. it’s always R,G,B even if the input was packed BGR32 (the pointers will point to the start positions of the component in a packed input in that case), and that applies to both input and output.

Kernels are standalone modules that prepare contexts and processing functions but intermediate and scratch buffers are allocated by NAScale core during pipeline construction.
Often you don’t need scratch buffers but when your stage outputs only three components (i.e. RGB) and the next stage demands four (i.e. RGBA) you should allocate a scratch buffer for that stage to use as an input (again, for efficiency reasons we might want to pass through buffers from the previous stage that are not touched in the current one but I’m not sure how to implement that yet).
Developing all of this should be not that hard though and most time should be spent on optimising common cases instead.
And that’s all for now.

Posted in NihAV | 4 Comments »

NihAV — Notes on Audio

Thursday, December 3rd, 2015

There’s one thing I like in Libav audio design and that’s planar audio. There’s one thing I don’t like about it and that’s handling multichannel audio. As a project with roots intermingled with MPlayer, it has followed M$ definition of multichannel audio and since many multichannel codecs have their own channel order lavc decoders have to define channel reordering tables. And it gets hairier when you have dynamic channel configuration (i.e. not one of the static channels configurations but rather a channel mask defining what channels are present or not). It gets especially fun with codecs that have extensions to the base 5.1 format that redefine existing channels and add new ones (like D*lby and DT$ codecs).

My proposal is that multichannel codecs should not bother with what the library considers The Only True Layout but rather export channel map along with the channel data and let the conversion library figure it out (shuffling pointers is not hard you know).

P.S. It’s hard to blame lu_zero for the current Libav situation but I still shall do it.

Posted in NihAV | 1 Comment »

NihAV — NAScale

Saturday, November 21st, 2015

First, some history. If you don’t like reading about it just skip to the ruler below.

So, NAScale is born after yet another attempt to design a good colourspace conversion and scaling library. Long time ago FFmpeg didn’t have any proper solution for that and used rather rudimentary imgconvert; later it was replaced with libswscale lifted from MPlayer. Unfortunately it was designed for rather specific player tasks (mostly converting YUV to RGB for displaying with X11 DGA driver) rather than generic utility library and some of its original design still shows to this day. Actually, libswscale should have a warm place in every true FFmpeg developer’s heart next to MPEGEncContext. Still, while being far from ideal it has SIMD optimisations and it works, so it’s still being used.

And yet some people unsatisfied with it decided to write a replacement from scratch. Originally AVScale (a Libav™ project) was supposed to be designed during coding sprint in Summer 2014. And of course nothing substantial came out of it.

Then I proposed my vision how it should work and even wrote a proof of concept (i.e. throwaway) code to demonstrate it back in Autumn 2014. I’d made an update to it in March 2015 to show how to work with high bitdepth formats but nobody has touched it since then (and hardly before that too). Thus I’m reusing that failing effort as NAScale for NihAV.

And now about the NAScale design.

The main guiding rule was: “see libswscale? Don’t do it like this.”

First, I really hate long enums and dislike API/ABI breaks. So NAScale should have stable interface and no enumeration of known pixel formats. What should it have instead? Pixel format description that should be good enough to make NAScale convert even formats it had no idea about (like BARG5156 to YUV412).

So what should such description have? Colourspace information (RGB/YUV/XYZ/whatever, gamma, transfer function etc), size of whole packed pixel where applicable (i.e. not for planar formats) and individual component information. That component information includes information on how to find and/or extract such component (i.e. on which plane it is located, what shift and mask is needed to extract it from packed bitfield, how many bytes to skip to find the first and next component etc.) and subsampling information. The names chosen for those descriptors were Formaton and Chromaton (for rather obvious reasons).

Second, the way NAScale processes data. As I remember it libswscale converted input into YUV with fixed precision before scaling and then back into destination format unless it was common case format conversion without scaling (and then some bypass hacks were employed like plane repacking function and such).

NAScale prefers to build filter chain in stages. Each stage has either one function processing all components or a function processing only one component applied to each component — that allows you to execute e.g. scaling in parallel. It also allows to build proper conversion+scaling process without horrible hacks. Each stage might have its own temporary buffers that will be used for output (and fed to the next stage).

You need to convert XYZ to YUV? First you unpack XYZ into planar RGB (stage 1), then scale it (stage 2) and then convert it to YUV (stage 3). NAScale constructs chain by searching for kernels that can do the work (e.g. convert input into some intermediate format or pack planes into output format), provides that kernel with a Formaton and dimensions and that kernels sets stage processing functions. For example, the first stage of RGB to YUV is unpacking RGB data, thus NAScale searches for the kernel called rgbunp, which sets stage processing function and allocated RGB plane buffers, then the kernel called rgb2yuv will convert and pack RGB data from the planes into YUV.

And last, implementation. I’ve written some sample code that would be able to take RGB input (high bitdepth was supported too), scale it if needed and pack back into RGB or convert into YUV depending on what was requested. So test program converted raw r210 frame into r10k or input PPM into PPM or PGMYUV with scaling. I think it’s enough to demonstrate how the concept works. Sadly nobody has picked this work (and you know whom I blame for that; oh, and koda — he wanted to be mentioned too).

Posted in NihAV | 8 Comments »

NihAV — Processing Graph Notes

Saturday, October 10th, 2015

I’m giving only a short overview for now, more to come later.

Basically, you have NAGraph that connects different workers (processing units with queues).
Lock-free NAQueue should accept only objects of certain type (side note: introduce libnaarch/naatomics.h).
Also those objects should have a common base (NAGraphObject) that extends NAClass by adding side data and signaling subtype (NAPacket, NARawData or NAFrame for now).
Multiple object types with a common base allow to have the same processing interface (after all, both encoders and decoders simply take some input data and output something else).
Elementary stream from demuxers can be either fed to parser filter that will produce proper packets or it will be directly accepted by decoder or muxer.
Later I should make those parser filters autoinserted too.
Uniform interface should allow easier integration of third-party components even in binary form (if there’s somebody willing to use not yet existing library like this).

Zonal partitioning of the graph (inputs, processing, outputs) maybe should include generic filters (e.g. de/noise) too, and maybe set a flag on NAFrame if it can be modified or should be kept intact.
Errors should be caught right at graph processing stage (do that with callbacks and have fun).
Roughly, this should be the architecture for NihAV till the end of its days since it should be future proof.

Of course, all things described here should be implemented too eventually (sigh).

Posted in NihAV | 1 Comment »

Kostya's Boring Codec World

Archive for the ‘NihAV’ Category

NihAV — Concept and Principles

NihAV — a Small Update

NihAV — Buffers and Wrappers

NihAV — Glue and Hacks

NihAV – io module

NihAV Development Progress

NAScale — Internal Design

NihAV — Notes on Audio

NihAV — NAScale

NihAV — Processing Graph Notes

Pages

Archives

Categories