Kostya's Boring Codec World

Looking at Aware MotionWavelets

December 26th, 2021

I wanted to reverse-engineer and implement some wavelet codec just for the sake of it. And finally I’ve managed to do that.

Initially I wanted to finish Rududu Video codec (I’ve looked at it briefly and one of the funny things is that the opensource release of Rududu Image codec does not match the actual binary specification, even arithmetic coder is different), but it turns out there’re no samples in the usual place so I just picked something that has some samples already.

The codec turned out to employ some tricks so I had to resort to collecting debug information in order to understand band structure (all band dimensions are implicit, you need to know them and the order to decode it all successfully). Then it turned out that band data is coded in boustrophedon order instead of the usual raster scan. And finally there’s fun with scaling: vertical transform is the same as horizontal one but the output is scaled by 128. Beside that it’s rather unremarkable.

Anyway, I got slightly deeper knowledge about the inner workings of wavelet codecs and it should not bother me any longer. It’s time to slack off before doing something else.

Posted in Various Video Codecs | 2 Comments »

On Bluetooth codecs

December 15th, 2021

I got a strange request for LDAC decoder as it may help to “…verify (or debunk) Sony’s quality claims.” This made me want to write a post about the known BT codecs and what’s my opinion on them.

Bluetooth codecs can be divided into three categories: the standard ones defined in A2DP that nobody uses (MP3 and ATRAC), the standard ones that are widely used (AAC and SBC) and custom codecs supported by specific vendors.

So, let’s start with mandatory A2DP codecs:

SBC—a codec designed specifically for Bluetooth. It works like MPEG Audio Layer II but with 4 or 8 sub-bands and parametric bit allocation instead of fixed tables. This allows it to change bitrate at any frame (which allows it to adapt to changing transmission quality). I heard an opinion that it beats newer codecs at their bitrates in quality but the standard intentionally capped it to prevent that. I find that not that hard to believe;
MPEG-1,2 Audio—I’ve not heard that anybody actually uses them and it’s fro the best;
MPEG-2,4 AAC—it should give better quality than SBC but for a much larger delay and decoding complexity;
ATRAC family—this feels like a proprietary replacement of AAC to me. I’ve not heard that anybody actually supports any of the codecs in their products (it’s not that I’ve heard much about BT in general though).

Here I should also mention a candidate codec named LC3 (and LC3plus). Whatever audio codec FhG IIS develops, it’ll be AAC. LC3 is no exception as by the first glance it looks like like AAC LC with an arithmetic coding and some additional coding tools glued to it.

There’s CVSD codec for speech transmission over BT. It’s a speech codec and that’s enough about it.

Now let’s move to the proprietary codecs:

aptX—a rather simple codec with 4:1 compression ration (four 16-bit samples into single 16-bit word). The codec works by splitting audio into four sub-bands, applying ADPCM and quantising to the fixed amount. Beside inability to adapt to bad channels it should produce about the same quality as SBC (at least from a feature comparison point of view);
aptX HD—the same as non-HD version but works on 24-bit samples (and probably the only honest high-res/high-definition codec here);
aptX other variants—they exist but there’s no solid information about them;
LDAC—will be discussed below in more detail. For now suffice to say it’s on MP2 level and hi-res claims are just marketing;
LHDC and LLAC—not much is known about the codecs but after seeing quality comparison picture (with a note) on the official website I don’t expect anything good;
Ultra Audio Transmission—there’s no information about it except for a name mentioned in Wikipedia list of BT codecs and some marketing materials on the page with smartphone description by the same vendor;
Samsung BT codecs—see above.

Now let’s review LDAC specifically. I’m somewhat surprised nobody has written a decoder for it yet. It’s so easy to reconstruct the format from the open-source encoder that Paul B. Mahol could do it in a couple of days (before returning to Bink2 decoder hopefully). aptX has only binary encoder and yet people have managed to RE it. I’m not going to do it because I don’t care much about Bluetooth codecs in general and it’s not a good fit for NihAV either.

To the technical details. The codec frame is either one long MDCT or two interlaced half-size MDCTs (just like ATSC A/52B), coefficients are coded as pairs, quads or larger single values (which reminds me of MP3 and MP2, quantisation is very similar as well). Coefficients (in pairs and quads as well) are stored in bit fields, the only variable-length codebooks are used to code quantiser differences. There’s bit allocation information transmitted for each frame so different coefficients can have different bit sizes (and thus precision). Nevertheless the maximum it can have is just 15 bits per coefficient (plus sign), which makes it hardly any hi-resier than AAC LC or SBC. And the only excuse that can be said here is the one I heard about MP3 being hi-res: with the large scales and coefficients you can have almost infinite precision. Disproving it is left as an exercise to the reader.

I hope now it’s clear why I don’t look at the development of Bluetooth codecs much. Back to slacking.

Posted in Audio, Speech Codecs | 8 Comments »

Looking at Voxware MetaVoice

December 13th, 2021

Since there’s not much I’d like to do with NihAV, I decided to revisit one. old family of codecs.

It seems that they had several families of codecs and most (all?) of them are licensed from some other company, sometimes with some changes (there are four codecs licensed from Lernout & Hauspie, MetaSound is essentially TwinVQ with a different set of codebooks, RT2x and VR1x are essentially different flavours of the same codec, SC3 and SC6 might be related to Micronas codec though Micronas SC4 decoder does not look similar at all).

So here’s a short review of those various codecs that I have some information about:

L&H CELP 4.8kpbs—this is rather standard CELP codec with no remarkable features (and I’ve even managed to write a working decoder for it);
L&H SBC 8/12/16kbps—that one is a sub-band coder with variable frame size (and amount of bits allocated per band);
RT24/RT28/RT29HQ and VR12/VR18—all these codecs share the common core and essentially it’s a variable-bitrate LPC-based speech codec with four different frame modes with no information transmitted beside frame mode, pitch information and the filter coefficients (for CELP you’d also have pulse information).
SC3/SC6—this one seems to be more advanced and, by the look of it, it uses order 12 LPC filter (usually speech codecs use either LPC of order 10 or 16).

I’ll try to document it for The Wiki but don’t expect much. And I’m not going to implement decoders for these formats either (beside already implemented 4.8k CELP one): the codecs have variable bitrate so you need to decode a frame (at least partially) in order to tell how many bytes it will take—and I don’t want to introduce a hack in NihAV to support such mode (either the demuxer should serve variable-length frames or the decoder should expect fixed-size frames); and even worse thing is that they are speech codecs that I don’t understand well (and there’s a lot of obscure code there). It took me more than a week to implement and debug CELP decoder. Fun story: I could not use MPlayer2 binary loader because the codec was misdetected as MPEG Audio Layer II. The cause of that was libavformat and its “helpful” tag search: when twocc 0x0070 was not found, it tried upper-case 0x0050 which belongs to MP2. And after I’ve finally made it work I discovered a fun bug in the reference decoder: while calculating cosine, the difference can overflow and thus the resulting value is somewhat wrong (and it could be easily fixed by changing “less or equal” condition to “less” in table search refinement step).

Anyway, it’s done and now I can forget about it.

Posted in Speech Codecs | 4 Comments »

A new software category?

December 9th, 2021

There are two specific software categories where competition is reduced: category-killer software (i.e. the one that discourages others from creating an alternative because it’s a lot of work and it works good enough) and the software with vendor lock-in (i.e. it works only with the vendor-approved components or interfaces). Now, do we have open-source software that fits both categories?

The answer is, sadly, yes. For instance, there’s Chromium, de facto the only Internet browser. You can point out that there are other browsers (which are based either on it or on WebKit, and Chromium is a fork of WebKit) and there is still Firefox (probably only because the management is not trying hard enough to drive the company into the ground). Again, it would be a perfect example of category-killer software if not for the fact that it changes the playfield by introducing new features that other browsers have to support in order to stay relevant. Not to mention that it’s a tool of a certain Internet company which can both spend lots of money and manpower on updating it while making life harder for other browsers on the popular websites (you can’t prove malice, but there were too many subtle bugs breaking or degrading experience with other browsers, always working in Chromium’s favour).

But you should not forget IBM and the ecosystem its employees have built on Linux, where you have lots of poorly documented (if at all) components tied together with constantly changing interfaces and desktop environment relying on kernel-specific features to work (so you can’t neither easily port it to another OS like BSD nor make other things interoperate with it properly—I’ve had troubles trying to use twm in recent Ubuntu). So I think this kind of software deserves to be named after its most prolific creator.

Lennartware.

Posted in Useless Rants | 4 Comments »

Looking at Flash Traffic videos

November 28th, 2021

There’s this game (or interactive movie rather) called Flash Traffic: City of Angels from Tsunami media that had two editions: DOS one that used BFI videos and DOS edition that relied on special MPEG accelerator for its MPEG-1 videos.

The Multimedia Mike had some interest in it so I decided to look what’s so special about those MPEG videos and why they can’t be played (they produce a lot of garbage if played with the usual players).

Those files seem to be intended for playback only on the special hardware, namely RealMagic card (I’d rather have MPEGMagic card for hardware-accelerated playback of RealVideo thank you very much) that has a special driver for interfacing it (RMDEV.SYS and FMPDRV.EXE). The game engine simply invokes FMPDRV.EXE for MPEG playback and there’s no software playback (considering how powerful machines were back in the day, playing back 352×240 MPEG-1 videos on them was next to impossible, hence the need for special cards offloading the work).

So I had to look inside in faint hope to see what can go wrong.

First of all, I don’t have MPEG-1 specifications and I don’t want to pay ISO hundreds of francs for a copy (and I could not find them online). So I downloaded ITU H.222.0 and H.262 standards (that correspond to MPEG-2 systems and video standards but are free) and used them as the reference (H.262 even lists the changes from MPEG-1 video). Then I tried to hack a simple raw stream demuxer, packetiser and video decoder in NihAV to see what goes there.

The container format seems to be the standard MPEG PS so demuxing was not a big problem. Video has some problems though. The first sign is framerate code: by the standard it should be 1-8, in that video it’s 12. In libavcodec this code maps to an unofficial 12fps mode introduced by libmpeg3 but the file obviously predates that library by many years (and avconv reports 13.3 fps anyway for some reason). Also by analysing group start timecode and GOP structure it seems that the real framerate is standard 30fps. Thus my conclusion is that those files were coded in slightly wrong way and marked with an invalid framerate code to make sure that compliant players won’t try to play them (and only the special videocards will). And considering that even I-frames can’t be always decoded properly, the encoder they used probably was not compliant either.

It is rather hard to find out what’s wrong with the bitstream so I’m not going to continue the efforts but at least I checked it out and verified that the files are not fully compliant and can be decoded correctly only by chance.

Also now I have even less desire to play with MPEG formats but I’ve never been a big fan of them anyway.

Posted in Game Video | 2 Comments »

Fun with LGT 5/3 wavelet transform

November 20th, 2021

LGT 5/3 wavelet transform is the second simplest lossless wavelet transform (the first one is Haar transform of course) so it’s used in a variety of image formats (most famously in JPEG-2000) and intermediate video codecs. Recently I helped one guy with implementing it and while explaining things about it I understood it myself, so here I’m going to write it down for posterity.
Read the rest of this entry »

Posted in Intermediate Video Codecs | 2 Comments »

Raw streams support in NihAV

November 18th, 2021

Sadly there’s enough MP3s in my music collection to ignore the format and I’ve finally implemented MP3 decoding support for NihAV. That involved introducing several new concepts which I’d like to review in this post.

Previously NihAV operated on a simple approach: there’s a demuxer that produces full packets, those packets are fed to the corresponding decoder and the decoded audio/video data is used somehow. With MP3 you have a raw stream of audio packets (sometimes with an additional metadata). While I could pretend to have a demuxer (that will simply read data and form packets) I decided to do it differently.
Read the rest of this entry »

Posted in NihAV | 2 Comments »

NihAV: now with Flash support

November 2nd, 2021

During my work on VP6 encoder I was contacted by Ruffle developer who was interested in it, one thing led to another and I licensed my decoder for the use there (the main issues were cutting off all the interfaces from NihAV that are not needed for it and selecting the license). But it’s over and they say it’s working fine. Meanwhile I got curious and decided to finally do what no other bit of open-source code could do: encode VP6 to FLV without relying on any external software.

In addition to the FLV muxer I also implemented all known decoders as well and that was uneven load. One evening was enough to implement two and half codecs: FLV1 (it’s just H.263 with slightly different header and block format), Flash ADPCM (a slight variation of IMA ADPCM) and a bit of ASAO. Another day was spent on trying to make ASAO work properly (I dislike codecs with parametric bit allocation like this one, at least it’s not a typical speech codec). VP6 modifications took minutes, Flash Screen Video was done in less than an hour, Flash Screen Video 2 took the rest of a day (because I completely forgot how priming works there). I wasted another day on hacking barely enough support for onMetaData packet parsing and the other codec-specific bits in FLV demuxer.

And now it’s ready and more or less working. It can even play H.264+AAC combination (remember when it was popular), the only codecs it does not support are Speex (I’m not sure if I ever want to touch it) and MP3 (this one I’ll deal with eventually and FLV will provide me with nicely split MP3 packets for testing before the infrastructure for handling raw streams is ready).

Now what to do next? It would be nice to have SANM/SMUSH support, maybe get to MP3 already (so nihav-sndplay is even more usable for me) or RE all those VoxWare codecs (I hope I can find the samples). There’s some interest for bearly functioning VP7 encoder too.

But who cares about that? I can encode VP6 into FLV now (even if I have no reasons to do so).

Posted in NihAV | Comments Closed

Looking at RoQ

October 27th, 2021

Recently The (Multimedia) Mike contacted me and asked if I can look what’s wrong with Clandestiny videos. I did.

First of all it’s worth mentioning that the format obviously originates from Trilobyte as it’d been used in its games years before Quake III was released, it has more features and the decoder in open-sourced Q3 engine still calls it trFMV.

Then I should mention that RoQ support in Libav is not great (and in FFmpeg it’s exactly the same) as the demuxer lacks support for some packet types (like 0x1030 used to signal that it’s a good time to prefetch data), there’s no support for JPEG frames and it goes crazy on files extracted from Clandestiny demo because of all that.

Thus I decided to quickly hack my own decoder for it based on the original description by Dr. Tim Ferguson (yet another forgotten researcher who REd several VQ-based video formats) and played with it to see what’s going wrong.

And it seems the problem was mostly in motion compensation. In some conditions you should double the motion vectors (I think it’s when the first chunk size is zero instead of minus one); also some files have alpha information in the codebook (this is detected by video properties chunk argument being set to one) as it’s apparent from ScummVM code. In either case it’s just the minor details that make things complicated (and I was lucky not to encounter interlaced mode files).

It was a nice distraction but I guess it’s time to do something else.

Posted in Game Video | 6 Comments »

NihAV: now with lossless audio encoder

October 26th, 2021

Since I wanted to try my hoof at various encoding concepts it’s no wonder that after lossy audio encoder (IMA ADPCM with trellis encoding), lossless video encoder (ZMBV, using my own deflate implementation for compressing), lossy video encoder (Cinepak, for playing with vector quantisation, and VP6, for playing with many other concepts) it was time for a lossless audio encoder.

To remind you, there are essentially two types of lossless audio compressors—fast and asymmetric (based on LPC filters) and slow and symmetric (based on adaptive filters, usually long LMS ones). The theory behind them is rather simple and described below.
Read the rest of this entry »

Posted in Lossless Audio, NihAV | 2 Comments »

Kostya's Boring Codec World

Looking at Aware MotionWavelets

On Bluetooth codecs

Looking at Voxware MetaVoice

A new software category?

Looking at Flash Traffic videos

Fun with LGT 5/3 wavelet transform

Raw streams support in NihAV

NihAV: now with Flash support

Looking at RoQ

NihAV: now with lossless audio encoder

Pages

Archives

Categories

Another Fine Blogs

Multimedia Projects

My E-mail

Meta