Starting work on VP7 encoder

January 26th, 2022

As I said in the previous post, currently I don’t have any features or decoders to add to NihAV (because Paul has not finished his work on Bink2 decoder yet) beside some encoders that nobody will use.

Thus I decided to work on something more advanced than VP6 that allows me to play with more advanced features (like EPZS motion estimation, per macroblock quantiser selection and such). For that I needed to pick some codec probably based on H.264 and there was not that much to pick from:

  • ITU H.264—first and foremost, I don’t have a properly working decoder for it (yet?); second, the format is too complex so just thinking about writing all those SPSes, PPSes and various lists discourages me from even considering to write an encoder for it;
  • RealVideo 3 or 4—tempting but that means I also need to write a RealMedia muxer and the format lacks dquant (in theory it’s supported, in practice it’s never happened). Maybe one day I’ll make my own NihAV-Really? encoder for RV3+Cooker but not today;
  • Sorenson SVQ3—same problems essentially;
  • VP8—Mike has done it over a decade ago;
  • VX—this is a custom game codec which is simplified (even quantiser is rather implicit).

The rough roadmap is the following:

  1. make intra-only encoder that encodes picture somehow;
  2. improve it to select the best whole macroblock prediction mode;
  3. add 4×4 prediction mode and make it select the best mode;
  4. add inter-frame support along with motion compensation;
  5. add EPZS-based motion estimation;
  6. introduce rough motion search for group of frames to determine good golden frame candidate and the macroblocks that should be coded with higher quality;
  7. actually code those macroblocks with higher quality using MB features;
  8. use trellis-based quantiser search for improved coding of frames;
  9. speed it up by using various heuristics instead of brute force search for coding parameters.

This should take some time…

Looking at SMUSH/INSANE formats

January 6th, 2022

As some of you might know, I had an interest for various game formats for decades (and that’s one of the reasons that brought me into opensource multimedia). And those formats include videos from LucasArts games as well. Actually SMUSH is not an ordinary video format but rather a sub-engine where both audio and video are objects (background, sprites, main audio, sound effects) that should be composed into final audiovisual experience. INSANE is the next iteration of the engine that became simpler (coding full frames, only one object per frame, just one codec, 16-bit video instead of paletted one) but it shares a lot in common with its predecessor.

As expected, the main source of information about those come from ScummVM (and one of their developers made smushplay to play the files in stand-alone matter). There’s a personal story related to that: one Cyril Zorin meddled with some formats from LucasArts games and wanted to add INSANE support (for Grim Fandango but it’s the same for all other games using SNM format) in FFmpeg, sadly he could not stomach review process there (which is hard to blame him for) and abandoned it; some time later I picked it up, added support for SMUSH codecs 37 and 47 (the ones used in adventure games) and got it committed; years later Paul B. Mahol (of future Bink2 decoder fame) added VIMA audio support to it.

Yet there are more games out there and some of them use different codecs, for which details were not previously known. So I decided to finally reverse engineer them to see how the development went. My implementation in NihAV is far from being perfect (there are many issues with transparency and coordinates) but it can decode all files I could encounter with very few exceptions.

So, let’s look at the codecs used for image coding. Audio is rather boring: there’s very old PCM format in SAUD chunks, scaled PCM audio in IACT chunks and VIMA is IMA ADPCM with 2-7 bits per step.
Read the rest of this entry »

Looking at Aware MotionWavelets

December 26th, 2021

I wanted to reverse-engineer and implement some wavelet codec just for the sake of it. And finally I’ve managed to do that.

Initially I wanted to finish Rududu Video codec (I’ve looked at it briefly and one of the funny things is that the opensource release of Rududu Image codec does not match the actual binary specification, even arithmetic coder is different), but it turns out there’re no samples in the usual place so I just picked something that has some samples already.

The codec turned out to employ some tricks so I had to resort to collecting debug information in order to understand band structure (all band dimensions are implicit, you need to know them and the order to decode it all successfully). Then it turned out that band data is coded in boustrophedon order instead of the usual raster scan. And finally there’s fun with scaling: vertical transform is the same as horizontal one but the output is scaled by 128. Beside that it’s rather unremarkable.

Anyway, I got slightly deeper knowledge about the inner workings of wavelet codecs and it should not bother me any longer. It’s time to slack off before doing something else.

On Bluetooth codecs

December 15th, 2021

I got a strange request for LDAC decoder as it may help to “…verify (or debunk) Sony’s quality claims.” This made me want to write a post about the known BT codecs and what’s my opinion on them.

Bluetooth codecs can be divided into three categories: the standard ones defined in A2DP that nobody uses (MP3 and ATRAC), the standard ones that are widely used (AAC and SBC) and custom codecs supported by specific vendors.

So, let’s start with mandatory A2DP codecs:

  • SBC—a codec designed specifically for Bluetooth. It works like MPEG Audio Layer II but with 4 or 8 sub-bands and parametric bit allocation instead of fixed tables. This allows it to change bitrate at any frame (which allows it to adapt to changing transmission quality). I heard an opinion that it beats newer codecs at their bitrates in quality but the standard intentionally capped it to prevent that. I find that not that hard to believe;
  • MPEG-1,2 Audio—I’ve not heard that anybody actually uses them and it’s fro the best;
  • MPEG-2,4 AAC—it should give better quality than SBC but for a much larger delay and decoding complexity;
  • ATRAC family—this feels like a proprietary replacement of AAC to me. I’ve not heard that anybody actually supports any of the codecs in their products (it’s not that I’ve heard much about BT in general though).

Here I should also mention a candidate codec named LC3 (and LC3plus). Whatever audio codec FhG IIS develops, it’ll be AAC. LC3 is no exception as by the first glance it looks like like AAC LC with an arithmetic coding and some additional coding tools glued to it.

There’s CVSD codec for speech transmission over BT. It’s a speech codec and that’s enough about it.

Now let’s move to the proprietary codecs:

  • aptX—a rather simple codec with 4:1 compression ration (four 16-bit samples into single 16-bit word). The codec works by splitting audio into four sub-bands, applying ADPCM and quantising to the fixed amount. Beside inability to adapt to bad channels it should produce about the same quality as SBC (at least from a feature comparison point of view);
  • aptX HD—the same as non-HD version but works on 24-bit samples (and probably the only honest high-res/high-definition codec here);
  • aptX other variants—they exist but there’s no solid information about them;
  • LDAC—will be discussed below in more detail. For now suffice to say it’s on MP2 level and hi-res claims are just marketing;
  • LHDC and LLAC—not much is known about the codecs but after seeing quality comparison picture (with a note) on the official website I don’t expect anything good;
  • Ultra Audio Transmission—there’s no information about it except for a name mentioned in Wikipedia list of BT codecs and some marketing materials on the page with smartphone description by the same vendor;
  • Samsung BT codecs—see above.

Now let’s review LDAC specifically. I’m somewhat surprised nobody has written a decoder for it yet. It’s so easy to reconstruct the format from the open-source encoder that Paul B. Mahol could do it in a couple of days (before returning to Bink2 decoder hopefully). aptX has only binary encoder and yet people have managed to RE it. I’m not going to do it because I don’t care much about Bluetooth codecs in general and it’s not a good fit for NihAV either.

To the technical details. The codec frame is either one long MDCT or two interlaced half-size MDCTs (just like ATSC A/52B), coefficients are coded as pairs, quads or larger single values (which reminds me of MP3 and MP2, quantisation is very similar as well). Coefficients (in pairs and quads as well) are stored in bit fields, the only variable-length codebooks are used to code quantiser differences. There’s bit allocation information transmitted for each frame so different coefficients can have different bit sizes (and thus precision). Nevertheless the maximum it can have is just 15 bits per coefficient (plus sign), which makes it hardly any hi-resier than AAC LC or SBC. And the only excuse that can be said here is the one I heard about MP3 being hi-res: with the large scales and coefficients you can have almost infinite precision. Disproving it is left as an exercise to the reader.

I hope now it’s clear why I don’t look at the development of Bluetooth codecs much. Back to slacking.

Looking at Voxware MetaVoice

December 13th, 2021

Since there’s not much I’d like to do with NihAV, I decided to revisit one. old family of codecs.

It seems that they had several families of codecs and most (all?) of them are licensed from some other company, sometimes with some changes (there are four codecs licensed from Lernout & Hauspie, MetaSound is essentially TwinVQ with a different set of codebooks, RT2x and VR1x are essentially different flavours of the same codec, SC3 and SC6 might be related to Micronas codec though Micronas SC4 decoder does not look similar at all).

So here’s a short review of those various codecs that I have some information about:

  • L&H CELP 4.8kpbs—this is rather standard CELP codec with no remarkable features (and I’ve even managed to write a working decoder for it);
  • L&H SBC 8/12/16kbps—that one is a sub-band coder with variable frame size (and amount of bits allocated per band);
  • RT24/RT28/RT29HQ and VR12/VR18—all these codecs share the common core and essentially it’s a variable-bitrate LPC-based speech codec with four different frame modes with no information transmitted beside frame mode, pitch information and the filter coefficients (for CELP you’d also have pulse information).
  • SC3/SC6—this one seems to be more advanced and, by the look of it, it uses order 12 LPC filter (usually speech codecs use either LPC of order 10 or 16).

I’ll try to document it for The Wiki but don’t expect much. And I’m not going to implement decoders for these formats either (beside already implemented 4.8k CELP one): the codecs have variable bitrate so you need to decode a frame (at least partially) in order to tell how many bytes it will take—and I don’t want to introduce a hack in NihAV to support such mode (either the demuxer should serve variable-length frames or the decoder should expect fixed-size frames); and even worse thing is that they are speech codecs that I don’t understand well (and there’s a lot of obscure code there). It took me more than a week to implement and debug CELP decoder. Fun story: I could not use MPlayer2 binary loader because the codec was misdetected as MPEG Audio Layer II. The cause of that was libavformat and its “helpful” tag search: when twocc 0x0070 was not found, it tried upper-case 0x0050 which belongs to MP2. And after I’ve finally made it work I discovered a fun bug in the reference decoder: while calculating cosine, the difference can overflow and thus the resulting value is somewhat wrong (and it could be easily fixed by changing “less or equal” condition to “less” in table search refinement step).

Anyway, it’s done and now I can forget about it.

A new software category?

December 9th, 2021

There are two specific software categories where competition is reduced: category-killer software (i.e. the one that discourages others from creating an alternative because it’s a lot of work and it works good enough) and the software with vendor lock-in (i.e. it works only with the vendor-approved components or interfaces). Now, do we have open-source software that fits both categories?

The answer is, sadly, yes. For instance, there’s Chromium, de facto the only Internet browser. You can point out that there are other browsers (which are based either on it or on WebKit, and Chromium is a fork of WebKit) and there is still Firefox (probably only because the management is not trying hard enough to drive the company into the ground). Again, it would be a perfect example of category-killer software if not for the fact that it changes the playfield by introducing new features that other browsers have to support in order to stay relevant. Not to mention that it’s a tool of a certain Internet company which can both spend lots of money and manpower on updating it while making life harder for other browsers on the popular websites (you can’t prove malice, but there were too many subtle bugs breaking or degrading experience with other browsers, always working in Chromium’s favour).

But you should not forget IBM and the ecosystem its employees have built on Linux, where you have lots of poorly documented (if at all) components tied together with constantly changing interfaces and desktop environment relying on kernel-specific features to work (so you can’t neither easily port it to another OS like BSD nor make other things interoperate with it properly—I’ve had troubles trying to use twm in recent Ubuntu). So I think this kind of software deserves to be named after its most prolific creator.

Lennartware.

Looking at Flash Traffic videos

November 28th, 2021

There’s this game (or interactive movie rather) called Flash Traffic: City of Angels from Tsunami media that had two editions: DOS one that used BFI videos and DOS edition that relied on special MPEG accelerator for its MPEG-1 videos.

The Multimedia Mike had some interest in it so I decided to look what’s so special about those MPEG videos and why they can’t be played (they produce a lot of garbage if played with the usual players).

Those files seem to be intended for playback only on the special hardware, namely RealMagic card (I’d rather have MPEGMagic card for hardware-accelerated playback of RealVideo thank you very much) that has a special driver for interfacing it (RMDEV.SYS and FMPDRV.EXE). The game engine simply invokes FMPDRV.EXE for MPEG playback and there’s no software playback (considering how powerful machines were back in the day, playing back 352×240 MPEG-1 videos on them was next to impossible, hence the need for special cards offloading the work).

So I had to look inside in faint hope to see what can go wrong.

First of all, I don’t have MPEG-1 specifications and I don’t want to pay ISO hundreds of francs for a copy (and I could not find them online). So I downloaded ITU H.222.0 and H.262 standards (that correspond to MPEG-2 systems and video standards but are free) and used them as the reference (H.262 even lists the changes from MPEG-1 video). Then I tried to hack a simple raw stream demuxer, packetiser and video decoder in NihAV to see what goes there.

The container format seems to be the standard MPEG PS so demuxing was not a big problem. Video has some problems though. The first sign is framerate code: by the standard it should be 1-8, in that video it’s 12. In libavcodec this code maps to an unofficial 12fps mode introduced by libmpeg3 but the file obviously predates that library by many years (and avconv reports 13.3 fps anyway for some reason). Also by analysing group start timecode and GOP structure it seems that the real framerate is standard 30fps. Thus my conclusion is that those files were coded in slightly wrong way and marked with an invalid framerate code to make sure that compliant players won’t try to play them (and only the special videocards will). And considering that even I-frames can’t be always decoded properly, the encoder they used probably was not compliant either.

It is rather hard to find out what’s wrong with the bitstream so I’m not going to continue the efforts but at least I checked it out and verified that the files are not fully compliant and can be decoded correctly only by chance.

Also now I have even less desire to play with MPEG formats but I’ve never been a big fan of them anyway.

Fun with LGT 5/3 wavelet transform

November 20th, 2021

LGT 5/3 wavelet transform is the second simplest lossless wavelet transform (the first one is Haar transform of course) so it’s used in a variety of image formats (most famously in JPEG-2000) and intermediate video codecs. Recently I helped one guy with implementing it and while explaining things about it I understood it myself, so here I’m going to write it down for posterity.
Read the rest of this entry »

Raw streams support in NihAV

November 18th, 2021

Sadly there’s enough MP3s in my music collection to ignore the format and I’ve finally implemented MP3 decoding support for NihAV. That involved introducing several new concepts which I’d like to review in this post.

Previously NihAV operated on a simple approach: there’s a demuxer that produces full packets, those packets are fed to the corresponding decoder and the decoded audio/video data is used somehow. With MP3 you have a raw stream of audio packets (sometimes with an additional metadata). While I could pretend to have a demuxer (that will simply read data and form packets) I decided to do it differently.
Read the rest of this entry »

NihAV: now with Flash support

November 2nd, 2021

During my work on VP6 encoder I was contacted by Ruffle developer who was interested in it, one thing led to another and I licensed my decoder for the use there (the main issues were cutting off all the interfaces from NihAV that are not needed for it and selecting the license). But it’s over and they say it’s working fine. Meanwhile I got curious and decided to finally do what no other bit of open-source code could do: encode VP6 to FLV without relying on any external software.

In addition to the FLV muxer I also implemented all known decoders as well and that was uneven load. One evening was enough to implement two and half codecs: FLV1 (it’s just H.263 with slightly different header and block format), Flash ADPCM (a slight variation of IMA ADPCM) and a bit of ASAO. Another day was spent on trying to make ASAO work properly (I dislike codecs with parametric bit allocation like this one, at least it’s not a typical speech codec). VP6 modifications took minutes, Flash Screen Video was done in less than an hour, Flash Screen Video 2 took the rest of a day (because I completely forgot how priming works there). I wasted another day on hacking barely enough support for onMetaData packet parsing and the other codec-specific bits in FLV demuxer.

And now it’s ready and more or less working. It can even play H.264+AAC combination (remember when it was popular), the only codecs it does not support are Speex (I’m not sure if I ever want to touch it) and MP3 (this one I’ll deal with eventually and FLV will provide me with nicely split MP3 packets for testing before the infrastructure for handling raw streams is ready).

Now what to do next? It would be nice to have SANM/SMUSH support, maybe get to MP3 already (so nihav-sndplay is even more usable for me) or RE all those VoxWare codecs (I hope I can find the samples). There’s some interest for bearly functioning VP7 encoder too.

But who cares about that? I can encode VP6 into FLV now (even if I have no reasons to do so).