NihAV « Kostya's Boring Codec World

Archive for the ‘NihAV’ Category

NihAV: Some Progress to Report!

Friday, August 24th, 2018

Finally the large chunk is finished: NihAV has finally got support for RealVideo 3 and 4!

Since I’ve learned a great deal more about codecs since the last time I wrote RealVideo 3/4 decoder (and specifications for both were leaked—they have mistakes but still clarify some things), I was able to write a new decoder that also seems to reconstruct frames better.

Some words on the design: I’ve split it into several parts as usual—common RV3/4 code, RV3/4 DSP, RV3 bitstream parser, RV3 DSP and RV4 bitstream parser and DSP. That’s the approach I’ve been using before and I’ll probably use it in future decoders as well. The only more or less interesting thing is how I did weighted motion compensation: instead of temporary buffer I allocate 16×16 frame that I use for storing temporary results and which is used later to average results (since motion compensation routines in RealVideo 3 and 4 differ while weighted averaging is the same it makes sense to split it into separate operation).

And now for the juicy part: benchmarks and performance. I’ve tested one of the RealVideo 4 trailers (namely swordfish.rmvb) and avconv -threads 1 -cpuflags 0 decodes it in 15 seconds, nihav-tool needs almost 25.
(more…)

Posted in NihAV, RealVideo, Rust | 7 Comments »

NihAV: Progress Report

Monday, July 2nd, 2018

I’m still working (barely) on NihAV and I’ve managed to make my code decode both RealVideo 3 and 4. It’s not always correct, especially B-frames and some corner cases, but at least it produces a sane picture in most cases.

And this time I’d like to write about disadvantages of writing motion compensation functions in Rust instead of C.
(more…)

Posted in NihAV, Rust | 4 Comments »

NihAV: progress report

Sunday, June 10th, 2018

Well, since I had no incentive to work on NihAV and recently the weather is not very encouraging for any kind of intellectual activity there was almost no progress. And yet now I have something to write about: NihAV has finally managed to decode non-trivial (i.e not fully black) RealVideo 3 I-frame properly (i.e. without any visible distortions). Loop filter is still missing but it’s a start. And it’s not a small feat considering one has to implement both coefficients decoding and intra prediction. So essentially it’s just motion vector juggling and motion compensation are all the things that are missing for P- and B-frames support. Maybe it will go faster from here (but most likely not).

And since doing that involved rewriting some C code into Rust here are some notes on how oxidising went:

match is a nice replacement for the cases when you have to partly remap values—in my case I had to adjust intra prediction directions in case top or left or bottom reference were missing and that means changing three or four values into other values, match looks more compact than several } else if (itype == FOO) { and does not lose readability either;
while in C foo = bar = 42; is a common thing, Rust does not allow this (I can understand why) and I’m surprised I ran into it only now (with intra prediction functions that assign the same calculated value to several output pixels at once);
loops in Rust are fine for basic use but when you need to deal with something more complex like for (i = 0; i < block_size; i += 4) or for (i = 99; i > 0; i--) you need either to write a simpler loop and remap indices inside or to remember it's Rust and permute range in less intuitive ways like for i in (0..block_size).filter(|x| x&3 == 0) and for i in (1..99+1).rev(). While this works and even somewhat conveys the meaning it's a bit unwieldy IMO;
and it might be a bit too esoteric but looks like I cannot easily write fn clip_u8(val: N) -> u8 that would take any primitive numeric type as input, do comparisons inside and return value either clipped to converted to u8. The best answer on how to do it I found was "you can't, it's against Rust practices". I don't need it much and I care even less, so I'll just mark it as a neutral language feature and forget about it.

And now the small but constantly irritating thing: arrays. While slices are nice and easy to use (including extracting sub-slices), in my area I often need a slice with arbitrary start and end bounds. To clarify my use case: quite often you need a piece of memory that's addressable with both positive and negative indices and those make sense on certain interval.

One of such common arrays is clipping array which essentially takes input index and returns it clipped usually to 0-255 range. So you have part [-255..-1] filled with zeroes, [0..255] filled with values in the same range and [256..511] filled with 255. I repeat, such clipping table is very common and useful thing that's currently not easy to implement in Rust.

Another less common case is the block of pixels we process may require information from its top, left and top-left neighbours—and those are addressed as src[-stride + i], src[-1 + stride*i] and src[-stride - 1]. Or a whole frame of GDI-related codec (no, not from Westwood) or even simple BMP/DIB that stores lines upside-down so after you process line 0 you have to move to line -1.

I currently deal with it by keeping an additional variable pointing to the current position in array that I use as a reference and from which I can subtract other numbers if needed, but it's a bit clunky and error-prone. Since Rust checks indices on slice access I wonder if extending it to work with e.g. negative indices is possible. IIRC FORTRAN and Pascal allowed you to define an array starting with arbitrary index, it might be possible in Rust too.

Oh well, I'll just keep using my approach meanwhile and waiting to see what rust-av does in this regard.

Posted in NihAV, Rust | 13 Comments »

NihAV: Aten’t Dead

Saturday, March 3rd, 2018

Surprisingly, there’s still some life in NihAV and some progress time from time.

So I’ve debugged RealVideo 2 decoding and verified B-part of PB-frame reconstruction in Intel.263 decoder against the binary specification. Mind you, the latter is not likely to be seen supported by libavcodec ever. First, it’s a fringe feature for extremely old video codecs nobody cares about any more and, second, unlike later codecs, B-part is stored along with P-frame data (i.e. first you have macroblock header for P- or I-macroblock, then macroblock header for B-macroblock, then macroblock coefficients for P-part and then macroblock coefficients for B-part). Other codecs simply pack B-frame along with reference frame but here data is interleaved. I’ve added some support for skipping B-part in libavcodec H.263 decoder (exactly nine years ago!) but decoding two frames in parallel would require some serious hacking of infamous MpegEncContext-using core so it’s very unlikely to happen.

And directions for near future still include RealVideo 3/4 and all RealAudio codecs. Fun fact: two of those are patent-free now—ATSC A/52 aka DNET and AAC-LC (but probably not SBR extension used in racp version). So if you implement them now you can flip a middle finger to both D*lby and Ferkel-herzen-Gesellschaft since new decoders can’t be covered by patent licenses. Not that I cared about it before.

Posted in NihAV | 1 Comment »

Rust: Annoyance-Driven Design

Sunday, December 3rd, 2017

I’ve finally made NihAV decode RealVideo 2 content, including B-frames (there are still 4 video codecs to support (and I don’t have any samples for RMHD) and all audio codecs too so it’s a long way) and so I have some more words to say about Rust and my experience with it.

To me it looks like the most decisions on decompositions in Rust are the consequences of annoyance of making it other way? Too large structures mean you have to either pass too many arguments into new() or fill it with some defaults (and I’m pretty sure that #derive[Default] won’t save you with more complex types) and initialise to sane values later. In result it’s easier to split everything into smaller structures which are (at least) subjectively are much easier to handle, especially if you reference them as Option<YourStruct>. Modules and imports, on the other hoof, are more annoying to manage since you have to take care of proper dependencies, visibility and imports—in result I find it easier to import all stuff from all modules and just keep comment out currently unused imports (because I still can’t bring myself to make it all a single mega-module). And now for the even higher level: crates. Yes, I’m going to beat that undead horse again.

First of all, I’m aware of incremental building enabled in nocturnal Rust but I’m not going to use nightly for rather obvious reasons (mostly because I’m not here to experiment with the all potential bells and whistles of the language but rather what it can offer right out of the box and how it suits my needs). So, the compilation times are horrible: when I change a single non-public function it rebuilds the whole crate (which is supposed behaviour, I know) and it takes 15 seconds to do that. Obviously it’s laughable for people doing “serious” projects but it’s basic fact that humans expect response (any response) in about five seconds after the action or they get impatient. In result instead of one crate with optional features (in my case decoders and demuxers) I’d rather have several smaller crates and that creates new issues too. There’s this obvious npm.js kind of issue of making packages for every small thing so your programs ends with more package dependencies than modern Linux distribution. But there’s also the issue with package splitting: I’d like to split my code into packages that encompass certain family of features—e.g. nihav-core for common stuff, nihav-avi for AVI demuxer, nihav-indeo for all Indeo codecs (audio and video) and nihav-realmedia for RealMedia demuxer and related codecs—then some of them may depend on some common package (like H.263 common core for Intel I.263 and RealVideo 1 and 2 decoders) but probably with different features requested (one of them does not need B-frame support, another one does not need PB-frame support). Since I don’t know quantum cargodynamics I don’t know how it will all be resolved. So it will either end in dead code or code duplication (in an additional crate too, I suppose).

My theory is that people behind Rust are biased by their development environment. In other words you don’t care much about compilation times when you have to build browsers (or compilers) on daily basis. While my main development machine is a laptop I bought in 2010 with 8GB of RAM (which I believed to be future-proof). So the Rust language designers might either have beefy machines to deal with fast compilation or be conditioned to long development cycles. I know that back in the day “start compiling Linux kernel and go make some coffee to pass 45 minutes of compilation time” was quite common but I guess it’s Jevons’ paradox all over again: the more computing power is there the more it’s wasted on compilation times. Like modern C++ or single-header libraries: you actually have to compile a very large corpus of code as single file. Back in the days my laptop with 64MB RAM was spending most of the time compiling libavcodec/dsputil.c (a monstrous file full of templates that old FFmpeg developers might remember even today) so I had to install more RAM in order to make compilation time reasonable. The solution was to split the file instead of upgrading the machines for every developer but nowadays it’d be seen as a ridiculous solution.

And now documentation. I find it rather poor (but that’s common with programming languages). If I know more or less what feature I want I can find it in the standard documentation (if I don’t I would complain about non-overlapping multiple &mut [range] borrows not working instead of using slice.split_at_mut()—and I did) but it does not really tell me what I should be looking for in the first place. I call it Excel complexity. In Excel there’s probably a function that does anything you want but it’s much easier to reimplement it yourself than to look up in the documentation how it’s called and what are its less obvious parameters. And even if you combine both The Rust Programming Language Second Edition and Rust By Example you still won’t get it right. Now that Rust aspires to be a JavaScript replacement it should take an example from it too: provide extensive overview how to do things in it instead of showcasing features. IMO in TRPLv2 there are two chapters—11 and 12—that are close to that ideal: they talk about testing and how to make a console program. In other words, good practical tasks that one would like to achieve with Rust (in other words, not so many people care about features per se, they want something done with a language: build multi-threaded application, parse Web server reply, make an efficient number cruncher etc etc). I can rant more about how it should be organised but nobody reads documentation including me.

There’s still this annoyance with tuples as such too: why I can’t declare let foo, bar; if baz { foo = 4; bar = 2; } else { foo = bar = 0; } and have to use two separate lets? why I can’t have let (foo, bar); if baz { (foo, bar) = (4, 2); } else { (foo, bar) = (0,0); } either? In result while named tuples are there I end up using only unnamed tuples.

So while Rust offers some nice things it has not a very nice way to shape development. And this also explains why C was so popular and still is: it does not enforce any particular behaviour on you (except in recent editions when the standard and compilers suddenly started to care about arithmetic and bit operations being non-portable—you might make your own CPU that does not use two’s complement arithmetic after all), no enforced coding style, you can compile code in any order you like and interface almost anything without special tools or wrappers. And the freedom it offered along with effectiveness is what is often lacking in more modern languages (the saddest thing is that it’s traded not for memory security but rather for sacks of syntactic sugar).

Anyway, I’ll keep experimenting and we’ll see how things will turn out. In either case I should start thinking about splitting NihAV into several crates, registering codecs and such. Too much work, too many opportunities to procrastinate!

Posted in NihAV, Rust, Useless Rants | 4 Comments »

Rust: Optimising Decoder Experience

Thursday, August 3rd, 2017

Okay, I’ve made some changes so hopefully the server will withstand the curiosity of more than two people if it will go like the last time.

So, after implementing Indeo 4/5 decoders for NihAV I nano-benchmarked it and my decoder was about twice as slow compared to libavcodec. And since neither has SIMD optimisations they should be good enough to compare.

The tested file was 00186002.avi — Indeo 4 sample with scalability feature(i.e. luma is split into four bands and uses Haar wavelet to compose the output plane) and duration over ten minutes. The results I got will be given in Linux perf sample counts as those should be representative enough.

avconv — 13.4 seconds, 10K cycles. About 24% spent in luma plane recombination (with Haar wavelet), about 40% of time is taken by bitstream decoding and the rest is mostly transforms and motion compensation.

nihav-tool — 31.6 seconds, 20K cycles. 30% spend in luma plane recombination, 48% of time is taken by bitstream decoding, 11% is for motion compensation and the rest is mostly transforms. Or in samples: recombination — 9900 (against 3300 in libavcodec), bitstream decoding (dirty estimate, it includes some DSP functions inlined) — 15800 against
5600. Motion compensation — 3500 against 1700. Transforms — 1300 against 1500 (they are not equivalent though, my code only transforms the block and output costs are hidden in bitstream decoding). Overall, my code is consistently worse. Is there any way to optimise it a bit?
(more…)

Posted in NihAV, Rust, Useless Rants | 12 Comments »

NihAV — Some News

Sunday, July 30th, 2017

So, despite work, heat, travels, and overall laziness, I’ve managed to complete more or less full-featured Indeo 4 and 5 decoder. That means that my own decoder decodes Indeo 4 and 5 files with all known frame types (including B-frames) and transforms (except DCT because there are no known samples using it) and even transparency!

Here are two random samples from Civilization II and Starship Titanic decoded and dumped as PGM (click for full size):

I’m not going to share the code for transparency plane decoding, it’s very simple (just RLE) and the binary specification is easy to read. The only gotchas are that it’s decoded as contiguous tile aligned to width of 32 (e.g. the first sample has width 332 pixels but the transparency tile is 352 pixels) and the dirty rectangles provided in the band header are just a hint for the end user, not a thing used in decoding.

This decoder was written mostly so that I can understand Indeo better and what can I say about it: Indeo 4/5 is about the same codec with some features fit for more advanced codecs of the later era. While the only things it reuses from the previous frames are pixels and band transform mode, it can reuse decoded quantisers and motion vectors from the first band for chroma bands and luma bands 1-3 in scalability mode too. It has variable block sizes (4×4, 8×8 and 8×8 in 16×16 macroblock) with various selectable transforms and scans (i.e. you can have 2D, row or column Slant, Haar or (theoretically) DCT and scans can be diagonal, horizontal or vertical too). And there were several frame types too: normal I-, P- and B-frames, droppable I- and P-frames, and droppable P-frame sequence (i.e. P-frames that reference the previous frame of such type or normal I/P-frame). Had it had proper stereo support, it’d be still as hot as ITU H.EVC.

The internal design between Indeo 4 and 5 differs in small details, like Indeo 4 having more frame types (like B-frames and droppable I-frames) — but Indeo 5 had introduced droppable P-frame sequence; picture and band headers differ between versions but (macro)block information and actual content decoding is the same (Indeo 5 does a bit trickier stuff with macroblock quantisers but that’s all). Also Indeo 4 had transparency information and different plane reconstruction (using Haar wavelet instead of 5/7 used in Indeo 5). So, in result my decoder was split into several modules reflecting the changes: indeo4.rs and indeo5.rs for codec-specific functions, ivi.rs for common structures and types (e.g. picture header, frame type and such), ividsp.rs for transforms and motion compensation and ivibr.rs for the actual decoding functions.

As with Intel H.263 decoder, Indeo 4/5 decoders provide implementations for IndeoXParser that parse picture header, band header and macroblock information and also recombine back plane in case it was coded as scalable. In result they store not so much information, just the codebooks used in decoding and for Indeo5 the common picture information that is stored only for I-frames (in other words, GOP info).

In result, here’s how Indeo 4 main decoding function looks like:

    fn decode(&mut self, pkt: &NAPacket) -> DecoderResult<NAFrameRef> {
        let src = pkt.get_buffer();
        let mut br = BitReader::new(src.as_slice(), src.len(), BitReaderMode::LE);

        let mut ip = Indeo4Parser::new();
        let bufinfo = self.dec.decode_frame(&mut ip, &mut br)?;
        let mut frm = NAFrame::new_from_pkt(pkt, self.info.clone(), bufinfo);
        frm.set_keyframe(self.dec.is_intra());
        frm.set_frame_type(self.dec.get_frame_type());
        Ok(Rc::new(RefCell::new(frm)))
    }

with the actual interface for parser being

pub trait IndeoXParser {
    fn decode_picture_header(&mut self, br: &mut BitReader) -> DecoderResult<PictureHeader>;
    fn decode_band_header(&mut self, br: &mut BitReader, pic_hdr: &PictureHeader, plane: usize, band: usize) -> DecoderResult<BandHeader>;
    fn decode_mb_info(&mut self, br: &mut BitReader, pic_hdr: &PictureHeader, band_hdr: &BandHeader, tile: &mut IVITile, ref_tile: Option<Ref<IVITile>>, mv_scale: u8) -> DecoderResult<()>;
    fn recombine_plane(&mut self, src: &[i16], sstride: usize, dst: &mut [u8], dstride: usize, w: usize, h: usize);
}

And the nano-benchmarks:
the longest Indeo4 file I have around (00186002.avi) — nihav-tool 20sec, avconv 9sec plus lots of error messages;
Mask of Eternity opening (Indeo 5) — nihav-tool 8.1sec, avconv 4.1sec.
Return to Krondor intro (Indeo 5) — nihav-tool 5.8sec, avconv 2.9sec.
For other files it’s also consistently about two times slower but whatever, I was not trying to make it fast, I tried to make it work.

The next post should be either about the things that irritate me in Rust and make it not so good for codec implementing or about cooking.

Posted in NihAV | Comments Closed

NihAV — Progress Report

Sunday, June 25th, 2017

Obviously it moves very slowly: I spend most of my time on work, sleep, cooking and travelling around. Plus it was too hot to think or do anything productive.

Anyway, I’ve completed IMC/IAC decoder for NihAV. In case you’ve forgotten or didn’t care to find out at all, the names stand for Intel Music Coder and Indeo Audio software with IAC being slightly upgraded version of IMC that allows stereo and has tables calculated for every supported sample rate instead of the set of them precalculated for 22kHz. And despite what you might think it is rather complex audio codec that took a route of D*lby AC-3, G.722.1/RealAudio Cooker and CELT—parametric bit allocation codecs. It’s the kind of audio codecs that I dislike less than speech codecs but more than the rest because they have large and complex function that calculates how many bits/values should be spent on each individual coefficient or subband. In IMC/IAC case it gets even worse since the codec uses floating point numbers so the results are somewhat unstable between implementations and platforms (a bit more on that later). Oh, and this codec has I- and P-frames since some blocks are coded as independent and others are coded using information from the previous block.

Rust does not have much to do with C so you cannot simply copy-paste code and expect it to work and it’s against the principles of the project anyway. Side note: the only annoying Rust feature so far is array initialisation, I’d like to be able to fill array in a loop before using it without initialising array contents to some default value (which I can’t do for some types) or resorting to mem::uninitialized() and ptr::write(). Anyway, I had to implement my own version of the code so it’s structured a bit differently, has different names, uses bitstream reader in MSB16LE mode instead of block swapping and decodes most files I could test without errors unlike libavcodec—so it’s NIH all the way!

I wasted time mostly on validating my code against the binary specifications so this version actually decodes most files as intended while libavcodec fails to do that. To describe the problem briefly, it all comes from the same place: the codec first produces bit allocation for all bits still available then determines how to read flags for skipping coefficients in some bands, reads those flags and adjusts bit allocation for the number of bits freed by this operation; the problem is that bit allocation may go wrong and in result skip flags take more bits than the coefficients that would be coded otherwise and decoder would fail to adjust bit allocation for that case (it’s not supposed to do that in the specification) and will read more bits than the block contains. For the thirty-something IMC and IAC in AVI samples only one fails now for me because in bit allocation the wrong band gets selected for coefficient length decreasing. And the reason is the difference in the fourth or fifth digit after the decimal point in one array of constants that makes the wrong value minimum (and thus selected for coefficients length decreasing). Since it takes several minutes with gdb+mplayer2 to get information at this point (about at 10-second position in 14-second audio) I decided not to dig further.

Also I had to write other pieces of code like split-radix FFT, byte writer and WAV dumper that accepts audio packets and writes them with the provided ByteWriter.

P.S. Nanobenchmarks ahoy: decoding the longest IMC stream that I had (a bit more than two minutes) takes 0.124s with avconv and 0.09s with nihav-tool. Actual decoding functions take about the same time though Rust implementation is still faster by couple percents and my FFT implementation is slower (but on the other hoof it’s called for every frame since it decodes that file without errors).

P.P.S. So next is Indeo 4/5 with all wonderful features like scalable decoding, B-frames and transparency (that reminds me that Libav and ScummVM had a competition who would be the last to implement proper transparency support for Indeo 4, now they both might win). And then I’d probably go back to implementing the features I wanted: being able to tell the demuxer to discard and don’t demux certain streams, better streams reporting from the demuxer, seeking and decoder reset, frame reordering functionality, maybe WAV support too. And then maybe back to decoders. I want to have several codec families fully implemented, like RAD (Smacker, Bink and Bink2), Duck/On2 (TM1, TM-RT, TM2, TM2X, TM VP3, VP4, VP5, AVC, VP6 and VP7) and RealMedia (again). But I’m not in a hurry.

P.P.P.S. I’m not going to publish all source code but bits of it may be either posted when relevant or leaked to rust-av, its developer(s) has(have) shown some interest, so enquire there.

Posted in NihAV | 9 Comments »

NihAV — A New Decoder

Saturday, June 10th, 2017

After a lot of procrastination I’ve finally more or less completed decoder for I.263 (Intel version of H.263) in NihAV.

It can decode I-, P- and PB-frames quite fine (though B-frames have some artefacts) and deblock them too (except B-frames because I’m too lazy for that). Let’s have a look at the overall structure of the decoder.

Obviously I’ve tried to make it modular but not exceeding the needs of H.263 decoder (i.e. I’m not going to extend the code to make it work with MPEG-2 part 2 and similar though some code might be reused), so it’s been split into several modules. Here’s a walk on all modules and their functionality review.
(more…)

Posted in NihAV | 1 Comment »

NihAV — Format Detection

Sunday, June 4th, 2017

So I’ve decided to implement container format detection for NihAV. This is a work of progress and I’m pretty sure I’ll change it later but it should do for now.

The main principles are quite simple: formats are detected by extension and by the contents, so there’s a score for it:

pub enum DetectionScore {
    No,
    ExtensionMatches,
    MagicMatches,
}

I don’t see why some format should not be detected properly if demuxer for it is disabled or not implemented at all. So in NihAV there’s a specific detect module that offers just one function:

pub fn detect_format(name: &str, src: &mut ByteReader) -> Option< (&'static str, DetectionScore)>;

It takes input filename and source stream reader and then tries to determine whether some format matches and returns format name and detection score on success (or nothing otherwise). I might add probing individual format later if I feel like it.

Before I explain how detection works let me quote the source of the detection array (in hope that it will explain a lot by itself):

const DETECTORS: &[DetectConditions] = &[
    DetectConditions {
        demux_name: "avi",
        extensions: ".avi",
        conditions: &[CheckItem{offs: 0,
                                cond: &CC::Or(&CC::Str(b"RIFF"),
                                              &CC::Str(b"ON2 ")) },
                      CheckItem{offs: 8,
                                cond: &CC::Or(&CC::Or(&CC::Str(b"AVI LIST"),
                                                      &CC::Str(b"AVIXLIST")),
                                              &CC::Str(b"ON2fLIST")) },
                     ]
    },
    DetectConditions {
        demux_name: "gdv",
        extensions: ".gdv",
        conditions: &[CheckItem{offs: 0,
                                cond: &CC::Eq(Arg::U32LE(0x29111994))}],
    },
];

So what is the way to detect format? First the name is matched to see whether one of the listed extensions fits, then the file contents are checked for markers inside. These checks are descriptions like “check that at offset X there’s data of type <type> that (equals/less than/greater than) Y”. Also you can specify several alternative checks for the same offset and there’s range check condition too.

This way I can describe most sane formats, like “if at offset 1024 you have tag M.K. then it’s ProTracker module” or “if it starts with BM and 16-bit LE value here is less than this and here it’s in range 1-16 then this must be BMP”.

One might wonder how well it would work on MP3s renamed to “.acm” (IIRC one game did that). I’ll reveal the secret: it won’t work at all. Dealing with raw streams is actually beside format detector because it is raw stream and not a container format. You can create raw stream demuxer, then try all possible chunkers to see which one fit but that is stuff for the upper layer (maybe it will be implemented there inside the input stream handling function eventually). NihAV is not a place for automagic things.