Author Archive

Money and Multimedia

Tuesday, November 14th, 2023

Inspired by recent events.

It is no secret that sometimes (or rather often, I’d say) political and business considerations prevail over technical ones. The persistent rumour said that MP3 format was not so bad originally but during the standardisation phase it had been changed to contain QMF in addition to MDCT because a certain company still help a patent on it. We have a couple of video codec standards developed not for any technical merit but rather for trying to create a patent-free formats (and failing at that). We see how many modern formats (not just audio or video, but streaming protocols as well) are essentially “one of everything” because each company tries to put its own technology there (probably for patent considerations)—and then even more companies appear with a claim to own a patent on the same technology (some of them form a patent pool, some act on their own). And of course we see Nokia (not the dead phone company and not the tyre producing one either) trying to become the SCO of this decade.

You know, the modern patent system was formed with the intent of sustaining development of new inventions: an inventor brings benefit to society with new inventions, society repays by granting that inventor a protection on exclusive rights for those inventions allowing to get profit from them. In theory a mutually beneficial scheme but people always find a way to game system and here we are. IMO the best patch to the legal system would be to strip those abusing their rights of that right, be it copyright (material part), industrial property rights or anything else. But as an optimist I expect the legions of lawyers to find a workaround for it rather fast.

Anyway, I wanted to demonstrate how political and financial interests spoiled already undead (I’ll elaborate below why I think so) project. And how a certain Frenchman paved a road with good intentions there. Of course I’m talking about FFmpeg (or jbmpeg as I name it after the current most influential person).
(more…)

NihAV: nothing left to do

Saturday, November 11th, 2023

If anybody read my previous posts, he might’ve picked a notion about me complaining that there’s nothing left to do for NihAV and it is really a problem I have.

Since the (re)start of the project in 2017 it grew from a small package that could only read bits and bytes to a collection of crates supporting various multimedia formats and a set of tools to use them. I had two principal goals: to experiments with the framework design and learn how various multimedia concepts are implemented and also (ideally) make an independent converter and player so I don’t have to rely on the external projects for most of my multimedia needs.
(more…)

A look at Winnov WINX

Friday, November 3rd, 2023

It is really a coincidence that about a week after I looked at their Pyramid codec I got reminded that there’s another codec of theirs exists, probably related to the WNV1 codec I REd back in 2005.

So apparently the codec codes YUY2 in 8×8 blocks. Each block is prefixed with a bit telling whether it’s a coded or skipped block. Coded block have additional 4-bit mode that seems to determine which quantisation they’ll use. The data is packed as deltas to the previously decoded value (per-component) using static codebook with values in -7..7 range (plus scaling by shifting left). There’s also an escape value in case raw value should be read instead. Overall it feels like Winnov Video 1 coding.

In other words, nothing remarkable but still a bit more advanced than usual DPCM-based intermediate codecs.

A look at Winnov Pyramid codec

Friday, October 27th, 2023

Since I still have nothing better to do, I decided to take a look at some old codec. Apparently I tried looking at it before and abandoned it because Ghidra cannot disassemble its code properly let alone decompile. I think this is a recurring theme with the old 16-bit code, especially the one reading data using non-standard segments.

So I located Sourcer, the best disassembler of the era (that seems to be abandonware nowadays but I cannot swear on that) and used it to disassemble the binary, referring to Ghidra database to locate the functions I should care about. It is not that much fun to translate assembly by hand but at least there was not that much of it.

The codec itself turned out to be a moderately complex DPCM codec compressing 7-bit YUV 4:1:1 data using per-frame codebook and not so trivial delta compression. Codebooks contain pair of delta values calculated depending on number of bits per delta. The data is coded per plane with prediction running continuously for all pixels in the plane:

  // before decoding data
  (delta0, delta1) = get_code();
  pprev = 64;
  prev = 64 + delta0;
  pdelta = delta1;
  for each pixel pair {
    (delta0, delta1) = get_code();
    delta = ((prev + delta0 - pprev) >> 3) + pdelta;
    pix0 = clip_uint8((prev + delta) * 2);
    pix1 = clip_uint8((prev - delta) * 2);
    pprev = prev;
    prev += delta0;
    pdelta = delta1;
  }

Normally such codecs would not bother to generate a codebook for the specific delta size or use something more complex than pix = prev + delta; so this was a rather interesting codec to look at. Hopefully there will be more of interesting formats to study even if sometimes I get the feeling that all undiscovered formats are either trivial or rip-offs of some standard.

Looking at Motion Pixels

Tuesday, October 24th, 2023

There is this very Sirius (or Sirius Publishing, more precisely) family of video codecs (plus one container format) apparently developed by two guys (who like to spam their name even in junk sections of AVI files). Also initially it had its own container format but later they’ve started to target AVI.

Another peculiarity of this format is that initially it targeted games but later was also used as a crappy Video CD alternative.

Back in the day Gregory Montoir REd the original game format for one of the game engine re-implementations he’s famous for and donated the code to FFmpeg as well. Since that time I was curious whether that code can be adapted to play MVI1 and MVI2 as well but the codec itself turned me off.

The codec itself is perverted, both in code and interface. Also it’s inherently interlaced. Normally video codecs in AVI can be recognized by their FOURCC and pass additional configuration parameters in the additional header data. Here they decided to use half of FOURCC to pass configuration flags to the codec and use stream handler FOURCC (that most apps ignore) to tell their decoder should be used to handle it. This alone would make me want to not support it ever, but the binary specification is worse.

Looks like the code consists mostly of handwritten assembly because I don’t know which compiler may generate this madness. There are many versions of the codecs, most of them are 16-bit and the 32-bit version is no better. For starters, it uses segments.

Not so many people remember DOS times and its memory models, even fewer remember them fondly. And almost nobody remembers that in 32-bit mode you can also use FS and GS registers to have custom addressing modes. Well, this codec uses them: it sets FS to the context pointer so context fields are accessed as mov EAX, dword ptr[1A8h] while global variables are accessed as mov EAX, dword ptr GS:[SYM] and of course no decompiler likes that. I was able to work around it in Ghidra by creating a new segment starting from zero but it’s still annoying.

Another thing is (ab)using registers to the full extent. Functions pass their parameters implicitly in the registers, using stack only to save those values before a loop or form a list of rectangles to process. And of course it uses this annoying (for the decompiler) feature as using the same register for two loop counter (e.g. top byte for the outer loop and low byte for the inner loop). As the result Ghidra can’t decompile it properly or even ignores whole blocks of the code because to its belief they can’t be invoked—and it’s still better than decompiling 16-bit version of MVI1 which made decompiler commit suicide. As the result some functions are easier to hand-translate from the assembly.

In either case looks like despite all the improvements it remains about the same as the initial version: data is coded as 5-bit YUV internally and stored using Huffman codes, quantisation and change maps (rectangles that tell which areas to update/fill). MVI2 can use ten different frame decoding modes that differ in how the deltas are coded but essentially it remains the same. They have not even gotten to introducing a proper motion compensation it seems.

So, now I’ve had a good long look at the codec, found nothing interesting there that was not known before and can forget about it. If only there was something more interesting to look at…

HW accel for NihAV player: fully done

Saturday, October 21st, 2023

As mentioned in the previous post, I’ve managed to make hardware acceleration work with my video player and there was only some polishing left to be done. Now that part is complete as well.

The worst part was forking cros-libva crate. Again, I could do without that but it was too annoying. For starters, it had rather useless dependencies for error handling for the cases that either are too unlikely to happen (e.g. destroying some buffer/config failed) or rather unhelpful (i.e. it may return a detailed error when opening a device has failed but for the rest of operations it’s rather unhelpful “VA-API error N” with an optional error explanation if libva bothered to provide it). I’ve switched it to enums because e.g. VAError::UnsupportedEntrypoint is easier to handle and understand when you actually care about return error codes.

The other annoying part was all the bindgen-produced enumerations (and flags). For example, surface allocation is done with:

display.create_surfaces(
                bindings::constants::VA_RT_FORMAT_YUV420,
                None, width, height,
                Some(UsageHint::USAGE_HINT_DECODER), 1)

In my slightly cleaned version it now looks like this:

display.create_surfaces(
                RTFormat::YUV420,
                None, width, height,
                Some(UsageHint::Decoder.into()), 1)

In addition to less typing it gives better argument type check: in some places you use both VA_RT_FORMAT_ and VA_FOURCC_ values and they are quite easy to mix up (because they describe about the same thing and stored as 32-bit integer). VAFourcc and RTFormat are distinct enough even if they get cast back to u32 internally.

And finally, I don’t like libva init info being printed every time a new display is created (which happens every time when new H.264 file is played in my case) so I added a version of the function that does not print it at all.

But if you wonder why fork it instead of improving the upstream, beside the obvious considerations (I forked off version 0.0.3, they’re working on 0.0.5 already with many underlying thing being different already), there’s also CONTRIBUTING.md that outright tells you to sign Contributor License Agreement (no thanks) that would also require to use their account (which was so inconvenient for me that I’ve moved from it over a year ago). At least the license does not forbid creating your own fork—which I did, mentioning the original authorship and source in two or three places and preserving the original 3-clause BSD license.

But enough about it, there’s another fun thing left to be discussed. After I’ve completed the work I also tried it on my other laptop (also with Intel® “I can’t believe it’s not GPU”, half a decade newer but still with slim chances to get hardware-accelerated decoding via Vulkan API on Linux in the near future). Surprisingly the decoding was slower than software decoder again but for a different reason this time.

Apparently accessing decoded surfaces is slow and it’s better to leave processing and displaying them to GPU as well (or offload them into main memory in advance) but that would require too many changes in my player/decoder design. Also Rust could not optimise chroma deinterleaving code for chroma (in NV12 to planar YUV conversion) and loads/stores data byte-by-byte which is extremely slow on my newer laptop. Thus I quickly wrote a simply SSE assembly to deinterleave data reading 32 bytes at once and it works many times faster. So it’s good enough and I’m drawing a line.

So while this has been rather useful experience, it was not that fun and I’d rather not return to it. I should probably go and reverse engineer some obscure codec instead, I haven’t done that for long enough.

Hardware acceleration for NihAV video player

Wednesday, October 18th, 2023

Since I was not fully satisfied with the CPU load from my H.264 decoder (and optimising it further is too tedious), I decided to take a look at VA-API hardware accelerated decoding once again (no Vulkan for me).

It turned out that documentation is not as lacking as I expected it to be, it’s just most of it was eaten by bindgen so e.g. you can get VAImage from the decoded surface but you have to look into source code for its definition because it’s just an alias for semi-hidden _VAImage. And even if you look at the original header files from libva, that documentation is rather scarce anyway.
(more…)

Encoding Bink Audio

Wednesday, October 11th, 2023

As I mentioned in the introduction post, Bink Audio is rather simple: you have audio frames overlapped by 1/16th of its size with the previous and the following frame, data is transformed either with RDFT (stereo mode) or DCT-II (per-channel mode), quantised and written out.

From what I can tell, there are about three revisions of the codec: version 'b' (and maybe 'd') was RDFT-only and first two coefficients were written as 32-bit floats. Later versions shaved three bits off exponents as the range for those coefficients is rather limited. Also while the initial version grouped output values by sixteen, later versions use grouping by eight values with possibility to code a run for the groups with the same bit width.

The coding is rather simple, just quantise bands (that more or less correspond to the critical bands for human ear), select bitwidth of the coefficients groups (that are fixed-width are not related to the band widths) and code them without any special tricks. The only trick is how to quantise the bands.

Since my previous attempts to write a proper psychoacoustic model for an encoder failed, I decided to keep it simple: the encoder simply tries all possible quantisers and selects the one with the lowest value of A log2 dist+λ bits. This may be slow but it works fast enough for my (un)practical purposes and the quality is not that bad either (as much as I can be trusted on judging it). And of course it allows to control bitrate in rather natural way.

There’s one other caveat though: Bink Audio frames are tied to Bink Video frames (unless it’s newer Bink Audio only container) and thus the codec should know the video framerate in order to match it. I worked around it by introducing yet another nihav-encoder hack to set audio timebase from the video so I don’t have to provide it by hand.

So that’s it. It was a nice experiment and I hope (but not expect) to think of something equally fun to do next.

Bink encoder: coefficients coding

Tuesday, October 10th, 2023

Somewhat unrelated update: I’ve managed to verify that the output of my decoder works in the Heroes of Might and Magic III properly even with sound after I fiddled with the container flags. The only annoyance is that because of DCT discrepancies sometimes there are artefacts in the form of white or black dots but who would care about that?

At last let’s talk about the one of the most original things in the Bink Video format (and considering the rest of the things it has, that’s saying something).
(more…)

Bink encoder: doing DCT

Monday, October 9th, 2023

Again, this should be laughably trivial for anybody familiar with that area but I since I lack mathematical skills to do it properly, here’s how I wrote forward DCT for inverse one.
(more…)