Archive for the ‘NihAV’ Category

NihAV — A New Decoder

Saturday, June 10th, 2017

After a lot of procrastination I’ve finally more or less completed decoder for I.263 (Intel version of H.263) in NihAV.

It can decode I-, P- and PB-frames quite fine (though B-frames have some artefacts) and deblock them too (except B-frames because I’m too lazy for that). Let’s have a look at the overall structure of the decoder.

Obviously I’ve tried to make it modular but not exceeding the needs of H.263 decoder (i.e. I’m not going to extend the code to make it work with MPEG-2 part 2 and similar though some code might be reused), so it’s been split into several modules. Here’s a walk on all modules and their functionality review.
(more…)

NihAV — Format Detection

Sunday, June 4th, 2017

So I’ve decided to implement container format detection for NihAV. This is a work of progress and I’m pretty sure I’ll change it later but it should do for now.

The main principles are quite simple: formats are detected by extension and by the contents, so there’s a score for it:

pub enum DetectionScore {
    No,
    ExtensionMatches,
    MagicMatches,
}

I don’t see why some format should not be detected properly if demuxer for it is disabled or not implemented at all. So in NihAV there’s a specific detect module that offers just one function:

pub fn detect_format(name: &str, src: &mut ByteReader) -> Option< (&'static str, DetectionScore)>;

It takes input filename and source stream reader and then tries to determine whether some format matches and returns format name and detection score on success (or nothing otherwise). I might add probing individual format later if I feel like it.

Before I explain how detection works let me quote the source of the detection array (in hope that it will explain a lot by itself):

const DETECTORS: &[DetectConditions] = &[
    DetectConditions {
        demux_name: "avi",
        extensions: ".avi",
        conditions: &[CheckItem{offs: 0,
                                cond: &CC::Or(&CC::Str(b"RIFF"),
                                              &CC::Str(b"ON2 ")) },
                      CheckItem{offs: 8,
                                cond: &CC::Or(&CC::Or(&CC::Str(b"AVI LIST"),
                                                      &CC::Str(b"AVIXLIST")),
                                              &CC::Str(b"ON2fLIST")) },
                     ]
    },
    DetectConditions {
        demux_name: "gdv",
        extensions: ".gdv",
        conditions: &[CheckItem{offs: 0,
                                cond: &CC::Eq(Arg::U32LE(0x29111994))}],
    },
];

So what is the way to detect format? First the name is matched to see whether one of the listed extensions fits, then the file contents are checked for markers inside. These checks are descriptions like “check that at offset X there’s data of type <type> that (equals/less than/greater than) Y”. Also you can specify several alternative checks for the same offset and there’s range check condition too.

This way I can describe most sane formats, like “if at offset 1024 you have tag M.K. then it’s ProTracker module” or “if it starts with BM and 16-bit LE value here is less than this and here it’s in range 1-16 then this must be BMP”.

One might wonder how well it would work on MP3s renamed to “.acm” (IIRC one game did that). I’ll reveal the secret: it won’t work at all. Dealing with raw streams is actually beside format detector because it is raw stream and not a container format. You can create raw stream demuxer, then try all possible chunkers to see which one fit but that is stuff for the upper layer (maybe it will be implemented there inside the input stream handling function eventually). NihAV is not a place for automagic things.

NihAV — Concept and Principles

Thursday, June 1st, 2017

Looks like I’m going to repeat the same things over and over in every NihAV-related post so I’d better sum them up and whenif people ask why some decision was made like that I can point them here.

So, let’s start with what NihAV IS. NihAV is the project started by me and me alone with the following goals:

  • design multimedia framework from the ground in the way I see fit (hence the NIH in the name);
  • do that without any burden of legacy (should be obvious why);
  • implement real working code to both test the concepts and to keep me interested in continuing the project (it gets boring pretty quickly when you design, write code and it still does not do anything visible at all);
  • ignore bullshit cases like interlaced H.264 (the project is written by me and for myself and I’ll do fine without it, thank you very much);
  • let me understand Rust better (it’s not that important but a nice bonus nevertheless).

Now what NihAV is NOT and is NOT going to be:

  • a full-stack multimedia framework (i.e. which lacks only handling user input and audio/video output to become a media player too, more about it below);
  • transcoder for all your needs (first, I hardly care about my own needs; second, transcoder belongs elsewhere);
  • supporting things just because they’re standard (you can leave your broadcasting shit to yourself, including but not limited to MXF, interlacing and private streams in MPEG-TS);
  • designed with the most convenient way of usage for the end user (e.g. in frame management I already output dummy frames that merely signal there was no change from the previous frame; also frame reordering will be implemented outside decoders);
  • have other FFeatures just because some other project has them;
  • depend on many other crates (that’s the way of NIH!);
  • have hacks to support some very special cases (I’m not going to be paid for e.g. fixing AVI demuxer to support some file produced by a broken AVI writer anyway).

What it might become is a foundation for higher level multimedia data management which in turn can be either a library for building transcoder/player or just used directly in such tools. IMO libav* has suffered exactly from the features that should be kept in transcoder creeping into the libraries, the whole libavdevice is an example of that. Obviously it takes some burden off library users (including transcoding tool developers) but IMO library should be rather finished piece with clearly defined functionality, not a collection of code snippets developers decided to reuse or share with the world. Just build another layer (not wrapper, functional layer!) on top of it.

For similar reasons I’m not going to hide serious functionality in utility code or duplicate it in codecs. In NihAV frames will be output in the same order as received and reordering for the display will be done in specific frame reorderer (if needed), same for filling missing timestamps; dummy frame that tells just to repeat the previous frame is used there in GDV decoder already:

    let mut frm = NAFrame::new_from_pkt(pkt, self.info.clone(), NABufferType::None);
    frm.set_keyframe(false);
    frm.set_frame_type(FrameType::Skip);

Some things do not belong to NihAV because they are either too low-level (like protocols) or too high-level (subtitles rendering, stream handling for e.g. transcoding or playback, playlist support). Some of them deserve to be made into separate library(ies) later, others should be implemented by the end user. Again, IMO libav* suffers from exactly this mix of low- and medium-level stuff that feels too low-level and not low-level enough at the same time (just look how much code those ffmpeg or avconv tools have). Same goes for hardware-accelerated decoding where the library should just demux frame data and parse its headers, the rest is up to hwaccel chain in the end application, but instead lazy users prefer libavcodec to try all possible hwaccels on the frame and fall back to multithreaded software decoding automatically if required. And preferably all other processing in e.g. libavfilter should be done using custom hwaccel format too. Since I’m all for this approach (…NOT), NihAV will recognize that the frame uses some hwaccel format and that’s all. It’s up to the upper layer to build custom processing chain.

I hope the domain for NihAV is clear: it will take ByteIO input, demux data using it (packets or elementary stream chunks—if you want them in packet format then use a parser), optionally fill timestamp information, decode frames, reorder them in display order if requested, similar approach for writing data. Anything else will belong to other crates (and they might appear in the future too). But for now this is enough for me.

P.S. If I wanted to have multimedia player I’d write one that can take MP4/FLAC/WV for input and decode AAC/FLAC/WavPack plus feed H.264 to VAAPI. I know my hardware and my content, others can write their own players.

P.P.S. If you want multimedia framework written in Rust for wide userbase just wait until rust-av is ready.

NihAV — a Small Update

Wednesday, May 31st, 2017

For testing how well NihAV handles palettised formats I’ve decided to add support for Gremlin Digital Video format (8-bit only). So now I can decode various cutscenes from Normality, one of very few 3D first person adventure games for DOS. I’ve tested my implementation and it works fine.

The funny thing is that this demuxer and decoder for GDV (actually there’s also GDV DPCM but the samples I have seem to use raw PCM) are missing from CEmpeg. Wiki description also has some parts missing.

The first frame I was decoding started with a code for copying 8 bytes from offset -56. The first frame. At the very first pixel. So I’ve consulted the VAG’s code and the original binary specification (even by dumping executed instructions in DosBox and analysing them—it helped me in debugging later) to see where it went wrong. And it turns out the decoder is really supposed to do that because it has specially initialised buffer before the actual frame data (kinda like the original LZHUF did, also there’s no need to check if we copy before the buffer start since it’s not possible) plus some other small issues. I’ll try to correct the Wiki article on GDV in the following days.

And I don’t really plan to add any other old game codecs beside VMD and Smacker (I have soft spot for them after all). Next decoders should be either for audio or more modern ones, like H.26x or Indeo 4/5 since I still have some ideas to test out.

Update to to this update: my decoder code is here.

NihAV — Buffers and Wrappers

Saturday, May 27th, 2017

It might be hard to believe but the number of decoders in NihAV has tripled! So now there are three codecs supported in NihAV: Intel Indeo 2, Intel Indeo 3 and PCM.

Before I talk about the design I’d like to say some things about Indeo 3 implementation. Essentially it’s an improvement over Indeo 2 that had simple delta compression—now deltas are coming from one of 21 codebooks and can be applied to both pairs and quads of pixels, there is motion compensation and planes are split into cells that use blocks for coding data in them (4×4, 4×8 or 8×8 blocks). libavcodec had two versions of the decoder: the first version was submitted anonymously and looks like it’s a direct translation of disassembly for XAnim vid_iv32.so; the second version is still based on some binary specifications but also with some information coming from the Intel patent. The problem is that those two implementations are both rather horrible to translate directly into Rust because of all the optimisations like working with a quad of pixels as 32-bit integer plus lots of macros and overall control flow like a maze of twisty little passages. In result I’ve ended with three main structures: Indeo3Decoder for main things, Buffers for managing the internal frame buffers and doing pixel operations like block copy and CellDecParams for storing current cell decoding parameters like block dimensions, indices to the codebooks used and pointers to the functions that actually apply deltas or copy the lines for the current block (for example there are two different ways to do that for 4×8 block).

Anyway, back to overall NihAV design changes. Now there’s a dedicated structure NATimeInfo for keeping DTS, PTS, frame duration and timebase information; this structure is used in both NAFrame and NAPacket for storing timestamp information. And NAFrame now is essentially the wrapper for NATimeInfo, NABufferType plus some metadata.

So what is NABufferType? It is the type-specific frame buffer that stores actual data:

pub enum NABufferType {
    Video      (NAVideoBuffer<u8>),
    Video16    (NAVideoBuffer<u16>),
    VideoPacked(NAVideoBuffer<u8>),
    AudioU8    (NAAudioBuffer<u8>),
    AudioI16   (NAAudioBuffer<i16>),
    AudioI32   (NAAudioBuffer<i32>),
    AudioF32   (NAAudioBuffer<f32>),
    AudioPacked(NAAudioBuffer<u8>),
    Data       (NABufferRefT<u8>),
    None,
}

As you can see it declares several types of audio and video buffers. That’s because you don’t want to mess with bytes in many cases: if you decode 10-bit video you’d better output pixels directly into 16-bit elements, same with audio; for the other cases there’s AudioPacked/VideoPacked. To reiterate: the idea is that you allocate buffer of specific type and output native elements into it (floats for AudioF32, 16-bit for packed RGB565/RGB555 formats etc. etc.) and the conversion interface or the sink will take care of converting data into designated format.

And here’s how audio buffer looks like (video buffer is about the same but doesn’t have channel map):

pub struct NAAudioBuffer<T> {
    info:   NAAudioInfo,
    data:   NABufferRefT<T>,
    offs:   Vec<usize>,
    chmap:  NAChannelMap,
}

impl<T: Clone> NAAudioBuffer<T> {
    pub fn get_offset(&self, idx: usize) -> usize { ... }
    pub fn get_info(&self) -> NAAudioInfo { self.info }
    pub fn get_chmap(&self) -> NAChannelMap { self.chmap.clone() }
    pub fn get_data(&self) -> Ref<Vec<T>> { self.data.borrow() }
    pub fn get_data_mut(&mut self) -> RefMut<Vec<T>> { self.data.borrow_mut() }
    pub fn copy_buffer(&mut self) -> Self { ... }
}

For planar audio (or video) get_offset() allows caller to obtain the offset in the buffer to the requested component (because it’s all stored in the single buffer).

There are two functions for allocating buffers:

pub fn alloc_video_buffer(vinfo: NAVideoInfo, align: u8) -> Result<NABufferType, AllocatorError>;
pub fn alloc_audio_buffer(ainfo: NAAudioInfo, nsamples: usize, chmap: NAChannelMap) -> Result<NABufferType, AllocatorError>;

Video buffer allocated buffer in the requested format with the provided block alignment (it’s for the codecs that actually code data in e.g. 16×16 macroblocks but still want to report frame having e.g. width=1366 or height=1080 and if you think that it’s better to constantly confuse avctx->width with avctx->coded_width then you’ve forgotten this project name). Audio buffer allocator needs to know the length of the frame in samples instead.

As for subtitles, they will not be implemented in NihAV beside demuxing the stream with subtitle data. I believe subtitles are the dependent kind of stream and because of that they should be rendered by the consumer (video player program or whatever). Otherwise you need to take, say, RGB-encoded subtitles, convert them into proper YUV flavour and draw in the specific region of the frame which might be not the original size if you use e.g. DVD rip encoded into different size with DVD subtitles preserved as is. And for textual subtitles you have even more rendering problems since you need to render them with proper font (stored as the attachment in the container), apply using the proper effect, adjust positions if needed and such. Plus the user may want to adjust them during playback in some way so IMO it belongs to the rendering pipeline and not NihAV (it’s okay though, you’re not going to use NihAV anyway).

Oh, and PCM “decoder” just rewraps buffer provided by NAPacket as NABufferType::AudioPacked, it’s good enough to dump as is and the future resampler will take care of format conversion.

No idea what comes next: maybe it’s Indeo audio decoders, maybe it’s Indeo 4/5 video decoder or maybe it’s deflate unpacker. Or something completely different. Or nothing at all. Only the time will tell.

NihAV — Glue and Hacks

Saturday, May 20th, 2017

I don’t like to write the code that does nothing, it’s the excitement of my code doing at least something that keeps me writing code. So instead of designing a lot of new interfaces and such that can describe all theoretically feasible stuff plus utility code to handle the things passed through aforementioned interfaces, I’ve just added some barely working stuff, wrote a somewhat working demuxer and made a decoder.

And here it is:
(more…)

NihAV – io module

Thursday, May 11th, 2017

I’ve more or less completed nihav::io module, so let’s look at it (or not, it’s your choice).

There are four components there: bytestream reading, bitstream reading, generic integer code reading and codebook support for bitstream reader. Obviously there is no writing functionality there but I don’t need it now and it can be added later when (or if) needed.
(more…)

NihAV Development Progress

Saturday, May 6th, 2017

After long considerations and much hesitation NihAV finally accepts its first developer (that would be me). And for a change it will be written in Rust. First, it’s an interesting language worth learning; second, it seems to offer needed functionality without much hassle; third, it offers new features that are tempting to try. By new features I mostly mean enums, traits and functions bound to structures.

Previously I expressed the intent to do a completely new design of multimedia (mostly decoding) framework with decoders being assembled from smaller blocks. For example, if I’d implement VIVO H.263 decoder (just as a troll) it would contain these bits:

  • generic 8×8 block decoder interface that does common stuff for such decoders (maintaining block indices, filling frame information e.g. block type, motion vectors etc etc);
  • trait for 8×8 block decoder implementation that does actual bitstream decoding (functions for decoding GOP/picture/slice headers, MV prediction, block data decoding and such);
  • IPB frame shuffler as implementation of generic frame shuffler (i.e. that piece of code that selects which frames to use as references and which frame to output after decoding the current one);
  • maybe even custom codebook accessor (the piece of code that tells codebook generator what is the code and symbol at position N) so it doesn’t need to be converted into some fixed form.

There’s not much code written yet but there are some bits implemented: rudimentary universal bitstream reader, bytestream reader (the same for memory and file I/O) and semi-working framework for demuxing. That’s a start at least.

Side rant: it seems that visible languages (i.e. not completely obscure ones) that use := form assignment have rather unpleasant evangelists (full of themselves in the best case, actively pushing their language to replace everything else in the worst case). That includes Go, Oberon and Wolfram Language. I don’t mean that other languages are free of that problem but in these cases looks like the majority of posts or articles about such language are written from this position.

NAScale — Internal Design

Saturday, February 6th, 2016

In my previous post I’ve described NAScale ideas and here I’d like to give more detailed overview on the internal design.

Basically, it builds a processing pipeline (or chain, I insist on terminology being inconsistent) that takes input frame, does some magic on it and outputs the result.
Let’s start with looking at typical pipeline for processing packed formats:
NAScale pipeline
And that should be a pipeline for the worst case since we don’t need input unpacker when it’s not packed (and same for the output), and processing stage might be not needed either (if we simply repack formats), and in some cases one stage will be enough.
My approach for pipeline construction is rather simple:

  • we have special modules (called kernels) that are used to construct pipeline stages;
  • all those modules are divided into three categories (input handling, output handling and intermediate filters)
  • all stages input and output (except for the source and destination handlers) should be planar and either 8- or 16-bit native endian (the less variations in input and output one has to handle the better).

Efficiency considerations tell us there will be special kernels for combining format conversion into one stage (like super-optimised rgb24tovyuy) but generic pipeline able to hold almost any format should have one universal input unpacker and one universal output packer.

Let’s review how pipeline building should work.
Unfortunately, my prototype can build only several pipelines for very specific cases but the principle stays the same.


Zeroeth stage: check if we are dealing with completely the same input and output formats and dimensions and then just apply memcopy (the kernel is obviously called murder).
Even if formats are the same we may need to scale the input or convert it to account for colourspace details etc etc.
Right from the start we need to check if we deal with a packed format and insert unpack stage or skip it and feed input directly to the processing stage.
On one hand there might be need for input stage converting planar 12-bit input into proper 16-bit input for processing stages, on the other hand it might be skipped in some cases (for efficiency reasons).

And then you should add a next stage but don’t forget about intermediate planar buffers—they should be allocated for the stage so that the following stage will know where to read its input from.
Now it’s probably a good time to mention that during pipeline construction we should keep track of current format and thus each stage construction should modify it to let the next stage know what input format it should expect (e.g. input was packed 10-bit BGR, unpack makes it into 16-bit planar RGB, scale changes nothing but dimensions and rgb2yuv converts it to 16-bit YUV and pack converts it into output format like YUYV).
Don’t forget that for clarity and simplicity all stages get pointers to the components in the order of the colourspace model components, i.e. it’s always R,G,B even if the input was packed BGR32 (the pointers will point to the start positions of the component in a packed input in that case), and that applies to both input and output.

Kernels are standalone modules that prepare contexts and processing functions but intermediate and scratch buffers are allocated by NAScale core during pipeline construction.
Often you don’t need scratch buffers but when your stage outputs only three components (i.e. RGB) and the next stage demands four (i.e. RGBA) you should allocate a scratch buffer for that stage to use as an input (again, for efficiency reasons we might want to pass through buffers from the previous stage that are not touched in the current one but I’m not sure how to implement that yet).
Developing all of this should be not that hard though and most time should be spent on optimising common cases instead.
And that’s all for now.

NihAV — Notes on Audio

Thursday, December 3rd, 2015

There’s one thing I like in Libav audio design and that’s planar audio. There’s one thing I don’t like about it and that’s handling multichannel audio. As a project with roots intermingled with MPlayer, it has followed M$ definition of multichannel audio and since many multichannel codecs have their own channel order lavc decoders have to define channel reordering tables. And it gets hairier when you have dynamic channel configuration (i.e. not one of the static channels configurations but rather a channel mask defining what channels are present or not). It gets especially fun with codecs that have extensions to the base 5.1 format that redefine existing channels and add new ones (like D*lby and DT$ codecs).

My proposal is that multichannel codecs should not bother with what the library considers The Only True Layout but rather export channel map along with the channel data and let the conversion library figure it out (shuffling pointers is not hard you know).

P.S. It’s hard to blame lu_zero for the current Libav situation but I still shall do it.