NihAV: giving up on hardware acceleration

August 3rd, 2023

After having several attempts on trying to add hardware-accelerated decoding support for NihAV I’m giving up, the reason being the sorry state of it in general.

I’m aware of two major APIs for hardware-accelerated video decoding for Linux, those are VDPAU and VA-API. Plus there are some specific toolkits e.g. from Intel but from what I remember those are even more complicated.

So, VDPAU has only bare-bone documentation without actual explanation what is expected for each codec in order to decode it. VA-API turned out to be even worse: it points out to 01.org for documentation which no longer exists (and redirects to some Intel’s page blurbing how great they are at open source). And web.archive.org shows that that page essentially contained a link to libva and libva-utils repositories plus some references to the projects that have VA-API support implemented. “…so shut up and go away” was not written but implied.

At least VA-API has three crates implementing its bindings in Rust and not just one not updated in four years like VDPAU but how usable are those? There’s FeV that seems to support JPEG decoding only (and has a strict warning against AMD GPUs), there’s libva-sys that is a pile of auto-generated bindings and there’s cros-libva. The latter seems to be the cleanest one and most actively developed (too actively developed to my taste as it changes base APIs every couple of months). Unfortunately it’s still not exactly clear how to use it for H.264 decoding (and the cros-codecs crate provides equally confusing API). And the final straw is that it seems to be intended for single-thread use only by design, which means it’s not possible to use with my library (e.g. my player uses separate threads for audio and video decoding, so I can’t use the standard decoder interface for hardware-accelerated decoding without some ugly hacks).

Oh well, I’ll work on improving my own H.264 decoder performance—while it’s not much fun either at least it’s clear what I can do with it and how it can be done.

P.S. This reminds me of the situation with ALSA. From what I heard it’s one of the worst documented subsystems in Linux with too flexible interface, to the point that it took help from ALSA developers to make at least MPlayer support ALSA output. The most probable reason is that it’s common to smoke weed in Czechia (where ALSA was developed), but what is the excuse for the other libraries?

Why I work on NihAV

July 30th, 2023

I started NihAV as a more or less toy project to play with different concepts and try new stuff like finding out how vector quantisation works or attempting to write an encoder. Having enough experience with libavcodec and libavformat, I did not want to touch them again (and still don’t) and there was a hope that rust-av will provide a viable albeit limited alternative for multimedia playback (it still hasn’t). In theory I’ve achieved my original goals—NihAV supports decoding a lot of exotic formats (some of which are not handled by any other open-source project), it even has some encoders and its own transcoder tool and there’s even two players (one for audio files, another one can also play videos). So I could relax and do something else entirely but yet I’m working on adding new features to NihAV that take a lot of effort and do not bring me joy. Why?

Read the rest of this entry »

NihAV: updated for Rust 1.69

July 27th, 2023

Since I had nothing better to do I decided to optimise my H.264 decoder a bit more, and that required a rather recent version of rustc that supports sym construct in asm!{} (so I can reference data tables in the inline assembly). Why this specific version though? I picked whatever was both recent enough to support the aforementioned feature (and older version had multiple micro version releases which hints on some problems with them) and not too recent either (again, I’m no beta tester of the compiler and I don’t need other shiny features).

And while at it I decided to make the code a bit more up to date. cargo-clippy is still annoying with its default warning about all-caps names and some lints that changed names and their suppressors no longer work. Getting rid of some leftover hints for the old versions of the compiler (like explicit drop()s for the objects borrowing code and some type hints) was nice though. Inline assembly is still only halfway done, especially considering that using const in it won’t be possible in stable for a long time and sym sucks compared to GCC inline assembly (it provides just a symbol name and you should magically know for yourself how the target platform works in order to make it possible to load it correctly; on AMD64 it’s rather simple but on aarch64 and on 32-bit ARMs that depends on target OS and PIC mode). Who would’ve thought that assembly may be platform-dependent! Looks like the current solution to that problem is to expose current configuration to the user so it’s up to you to check all environment variables and write the appropriate code. And of course even that solution will be available some time in the future since the developers haven’t thought about it at all.

Anyway, now my H.264 decoder features some more assembly optimisations and decodes video even faster than before. Though I fear it still takes too much CPU for the comfortable playback of my typical content so I’ll have to dabble in the hardware video acceleration. NihAV is a learning project after all.

Simple things

July 20th, 2023

Simple things are usually the hardest to accept and follow. Here I’ll list the things considering the current situation and we’ll see how those will be understood by the world:

  1. There’s currently a world war going on. The definition of world war concerns mostly not the number of countries involved but rather that the goals of war (or its consequences) affect the world order in general. In this case if russia wins it means that the old system built on respecting country sovereignty and resolving international conflicts in peaceful manner via Useless Nations (formerly United Nations) does not work and any country with the nukes can do whatever it likes. And when the temporarily existing despicable mistake known as russia will lose the war, this may lead to its dissolution as well as making entities like UN and NATO reform or perish. And it’ll impact the future of China too;
  2. russia has demonstrated that it is a terrorist state countless times (trying to disrupt the world order by force is the very definition of international terrorism), but in addition to that it demonstrates that it’s not above the economic blackmailing as well. Just look at the recent development of the grain deal—it did not merely stop participating in it until its simple demand of fulfilling its countless demands is met but also started missile strikes at Ukrainian ports (again) and threatening to start a war with the countries that will keep participating in the grain deal without russia. And of course spewing obvious lies instead of saying directly that it’s racketeering;
  3. People who commit such crimes are either arrested and isolated in prisons or executed, so they can bring no harm to the society. Armed people (especially if they’re shooting during the arrest) are often shot on spot to eliminate the immediate danger (that’s not the best outcome but it’s an acceptable one). Countries should have the same treatment, out of self-preservation if not anything else (and stop point at nukes, russia demonstrated that it poses more nuclear threat when nothing is done about it);
  4. Speaking of isolation, it should be maintained airtight instead of trying to earn money while hoping that whatever russia does with your resources won’t be used against you later. I’m not so sure about the business risks of (usually French) companies that still have their russian subsidiaries operating as usual but if they suffer from the reputational losses in Europe and their businesses would be confiscated in russia, that would be a completely foreseen outcome. Also considering the current isolation and slow implosion of russian economy, it’s hard to tell what good the income earned there can bring (as you can’t transfer those money from russia and there’s risk of losing them entirely);
  5. When NATO talks about eliminating corruption as one of the demands for the candidates, it should serve an example and do something about the glaring example of Hungary. EU should take note as well.

Again, those are very simple things to understand but apparently not for the countries or large businesses. For now though, I find it ironic that I could travel with less restrictions and was significantly less ashamed of my country (and even its government) when I had Ukrainian citizenship than now when I’m a German citizen.

TM1 encoder: probably done

July 19th, 2023

After some trial I decided to release what I’ve done and probably not return to it ever again.

Currently my encoder can encode 15-bit TrueMotion 1 format with different block sizes. It’s probably not very adjustable but there’s not that much to adjust really. I’ll talk why I gave up on 24-bit mode (again!) below, for the other options here’s a condensed version: it does not matter. I’ve tried encoding files with an alternative delta set and it resulted in significantly worse picture quality (but at least encoded frames were usually larger as well); as I mentioned in the previous post, only the first codebook makes sense for 15-bit data (as other two codebooks waste space on coding delta value 7 which is not used in 15-bit mode). Inter mode uses simple skip block as I didn’t bother to think about the possible threshold but it works good enough anyway. In theory I could calculate gradients to determine what sub-block sizes to use for each frame (as I did in Indeo 3 encoder) but again, I decided not to bother.

Now, here are the reasons why 24-bit mode is much harder. For 15-bit mode you can easily calculate deltas for each (decorrelated) component independently rather easily—and the coding method allows selecting deltas in fine-grained way too. In 24-bit mode you have chroma delta pair that updates red and blue components and luma delta pair that updates red component with one value and green and blue components together with another value. In theory decorrelating just green and blue components should help but there we hit another issue: the amount of possible deltas is good enough to represent different delta values occurring during the prediction stage. Essentially you can’t process each component independently and should rather apply deltas as 32-bit values to the 32-bit pixel value, then unpack it and see that the individual components aren’t far enough from the desired ones. It is not that hard to implement but it essentially means writing a second TrueMotion 1 encoder that processes 24-bit data in an entirely different way. Considering its limited use and the fact that it shrinks down horizontal resolution in two times—the coduck (that’s their very original name for it) always processes blocks of two 32-bit words but now those are two 24-bit pixels instead of four 15-bit ones. In either case, even if I see how it should be solved I’m not going to actually do it.

I need to find myself a better task to undertake.

Restarting the work on TM1 encoder

July 15th, 2023

Back in February I wrote about my failed attempt to write TrueMotion 1 encoder. And since I was bored and really had nothing better to do, I tried my hoof at it again.

Last time it was 24-bit encoding, now I tried to approach 15-bit encoding instead and got some results. I guess the moral of the story is that you should not overthink it and use the simplest approach to coding.
Read the rest of this entry »

Looking at NUVision

July 7th, 2023

Since I still have nothing better to do, I decided to look at some obscure video codec I had laying around for really long time. And it turned out to be simple yet rather original.

Unlike many other codecs, this one codes YUV 4:2:2 (at least it looks like that) line per line in chunks of 24, 16 or 8 elements (essentially 24-pixel chunks plus shorter tail and the line width being a multiple of eight). Each chunk can be coded using one of four modes (leave it as is, decode and apply delta, copy chunk from the previous line with or without delta). Delta values are coded as chunk quantiser, delta values (-q, 0, q, escape) plus escape values. And since those mode/delta values can fit into two bits, they’re packed together into 16-bit words.

If you look at it, there’s nothing really inventive: short slices are present in many codecs, lossy delta coding is common too. But together they create a combination that I’ve not seen anywhere else. And that’s why looking at older codecs is pleasing: beside seeing rip-offs of the standard codecs sometimes you also encounter such original ideas as well.

Looking at the (un)Original Sound Quality format

June 27th, 2023

I was asked to look at it and since I have nothing better to do, I finally did.

OSQ was a format used by WaveLab mastering software for storing lossless compressed audio (they seem to have dropped support for it in the more recent versions). Internally it’s a clone of ages-old Shorten which does not even have LPC mode, only fixed predictors.

Of course the format somewhat differs and there are more prediction modes but they still use maximum three previously decoded samples and filter coefficients are constant. There are other original additions such as stereo decorrelation or adaptive Rice coding with the parameter selected based on previous statistics and some floating-point calculations that look more complex than they should be. Overall though it looks like somebody took Shorten and decided not to do the hard part (LPC), maybe in order to make files to save faster, and tried to compensate it by a couple of simple tricks.

I’ll probably document it at my leisure but overall it’s a rather silly format.

Looking at Digital Pictures video format(s)

June 26th, 2023

Since OSQ format requires obtaining a copy of some expensive software of unknown old version, I’ll leave it to somebody else. Meanwhile I’ve looked closer at the AVC format mentioned in the previous post and its relatives.

For those of you who don’t recognize the name immediately, Digital Pictures is the company responsible for some FMV-based action games, including the infamous Night Trap. As I rediscovered previously, the AVC files they use are really SGA files with compression method 0x81 but what about the other formats?

About half of the games I could look at contain an archive occupying most of the CD space, inside that archive are the same AVC files. Other half of the games usually has one or two AVC files with a company logo and one megamovie in various formats. And after some research it turned out to be the same 0x81 compression format but with audio data and varying headers.

And since nobody bothered to document it for The Wiki, I’ll explain it here.
Read the rest of this entry »

Looking at even more game formats

June 23rd, 2023

Since I have nothing better to do as usual, I decided to look at some game formats.

For instance, there’s a game called The Fuel Run promoting a product from a Swiss company supporting russian war crimes. This game has animations in VDO format. VDO file header starts with the string “Artform” and it employs RLE compression, each line of the frame prefixed with its size. What’s funny is that its RLE uses only bottom 6 bits for the run length and top bit is completely unused. For obvious reasons the format is not worthy documenting further.

Or there’s a game called Double Switch, this one has AVC videos. It uses RGB555 palette and 8×8 tiles that depending on opcode may be either coded with one of 25 predefined patterns and 1-7 colours or split into tiles of the smaller size. And only afterwards I decided to look into The Wiki and it seems to match SGA format description (except that this particular format variant is not documented). I don’t know if I should bother writing a decoder for it but with the lack of PC codecs to RE I might try my hoof at console ones instead.