Archive for August, 2021

VP6 — bool coder

Sunday, August 29th, 2021

Today I’ll try to tell the principles behind bool coder in VP6 (actually VP5-VP9) and how it all should work in the encoder. As usual, let’s start with the theory.
(more…)

VP6 encoding — DCT

Friday, August 27th, 2021

Transform is one of the essential parts of typical video codec, lots of them can be described as e.g. “DCT-based video codec using X coding [and additional features like …]”. That is why I’m starting with it.
(more…)

Starting work on VP6 encoder

Thursday, August 26th, 2021

It is no secret (not even to me) that I suck at writing encoders. But with NihAV being developed exactly for trying new things and concepts, why not go ahead and try writing an encoder? It is not for having an encoder per se but rather a way to learn how things work (the best way to learn things is to try them yourself after all).

There are several reasons why I picked VP6:

  • it is complex enough to have different concept of encoding to try on it;
  • in the same time it’s not that complex (just DCT, MC, bool coder and no B-frames, complex block partitioning or complex context-adaptive coding);
  • there’s no opensource encoders for it;
  • there’s a decoder for it in NihAV already;
  • this is not a toy format so it may be of some use for me later.

Of course I’m aware of other attempts to bring us an opensource VP6 encoder and that they all failed, but nothing prevents me from failing at it myself and documenting my path so others might fail at it faster and better.

Speaking of documenting, here’s a roadmap of things I want to play with (or played with already) and report how it went:

  1. DCT;
  2. bool coder;
  3. simple intraframe coding;
  4. motion estimation (including fast search and subpixel precision);
  5. rate distortion optimisation;
  6. rate control.

Hopefully the post about DCT will come tomorrow.

P.S. Why I declare this in public? So that I won’t chicken out immediately.

PACo done!

Tuesday, August 24th, 2021

As I said in the previous post, I was looking at PACo format with an enhanced RLE compression. It turned out to be even more curious than I expected.

Beside the normal RLE mode it has pair and quad RLE (where you repeat two or four pixels) and long operation mode where you can do single/pair/quad RLE, copy or skip using 12-bit number. But it turns out it also has a compact RLE mode that works on a single line using a set of 16 colours and putting operation code, length and colour index into a single nibble instead of a byte (and it also has its own long mode where lengths are using a whole byte!).

That’s the variety I miss in modern video codecs.

Looking at PACo

Monday, August 23rd, 2021

I was asked to look at this format used at least in Iron Helix game and it’s a somewhat interesting format.

It turned out to come from Macintosh. There are two small signs hinting on it: big-endian numbers inside the file and the fact it uses default QuickTime palette.

The container is simple but functional: there’s a header containing frame sizes among other thing, frame consisting of several records (usually it’s just video data and frame data end marker; the first frame has initalisation data chunk as well). Frames can have one of two compression methods and coded area size and offset.

Compression is just a slightly advanced RLE: codes 0x010x7F mean copying data, codes 0x800xFD are used to signal runs, code 0x00 is used to code long operations, code 0xFE is used to signal skips, code 0xFF is used for either runs of pairs or quads of pixels (depending on the run length). Each line is coded independently (i.e. runs or copies can’t go past the current line end). So what’s the tricky part there?

That’s compression method 1 and it works quite well. Compression method 2 is essentially the same but it codes lines in interlaced manner for which I haven’t managed to get a good picture in all cases yet (it seems to code more lines than declared sometimes and interlacing seems to be dependent on both decoded are position and height). But hopefully it won’t take long and I can document it in The Wiki.

Some words on QT Animation (SMC) codec

Tuesday, August 10th, 2021

A recent question about buggy SMC decoding led me deep into QuickTime specification to look at the codec missing opcode. And there are some noteworthy things here as well.

Back in the day there was the multimedia player for Unix called XAnim. Its last release was in 1999—before other opensource multimedia player projects have started! It was both feature-rich (e.g. it could step frames forward and backwards, something that not all current media players can do) and had an excellent codec support for the time.

Somehow its author reverse engineered (long before the era of decompilers too) a lot of codecs and somehow managed to obtain the sources for e.g. Indeo and while he could not provide them, he offered them for a wide variety of architectures—Alpha, MIPS, Sparc, PowerPC, x86. It was a treasure trove for formats and lots of the decoders were ported to other projects (even I did that for one or two codecs) and binary codecs were a great help in reverse-engineering efforts as well.

Now to SMC itself. Formally it’s QuickTime Animation codec but people call it after its FOURCC which is “smc “, probably after the author’s initials.

Opensource SMC decoders come from the same source (I based mine on the description in The Wiki but you can guess what that description is based on; and yes, back in the day e.g. MPlayer and Xine had their own decoders for various codecs before relying on libavcodec for everything). After looking at the binary specification I can say it looks exactly like it was reverse engineered from it directly (it has the same logic and data types but lacks sensible names). Anyway, the thing is that it does not handle opcode 0xF0 and I finally had an occasion to look at it.

I took QuickTime 6.3 binary specification for Windows (somehow the decoder ended in QuickTimeInternetExtras.qtx) and looked inside. It turns out that there are several decoding functions there (for different output formats) but they all do the same: handle 0xF0 opcode in exactly the same way as 0xE0 opcode (raw blocks), there are no differences there whatsoever.

That’s one mystery less, even if the answer is a bit disappointing. At least I could reminisce about good old times hardly anybody else remembers.

Playing with trellis and encoding

Sunday, August 8th, 2021

I said before I want to play with encoding algorithms within NihAV and here’s another step (a previous major step was vector quantisation and a simple Cinepak encoder using it). Now it’s time for trellis search for encoding.

The idea by itself is very simple: you want to encode a sequence optimally (or decode transmitted sequence with distortions), so you represent your data as a set of possible states for each sample and search a path from one state to another with the minimum error. Since each state of the sample is connected with all states of the previous samples, its graph looks like a trellis:

The search itself is performed by selecting for each state a transition from a previous state that gives minimal error, then selecting a state with the least error for the last sample and tracing back the path that lead to it from the beginning. You just need to store the pointer to the previous state, error value and whatever decoder state you require.

I’ve chosen IMA ADPCM encoder as a test playground since it’s simple but useful. The idea of the format is very simple: you have a state consisting of current sample value and step size used as a multiplier for the transmitted 4-bit difference value; you reconstruct the difference, add it to the previous stored value, and correct step size (small delta—decrease step, large delta—increase step). You have 16 possible states for each sample which makes the search take not so long time.

There’s another tricky question of selecting initial step size (it will adapt to the samples but you need to start with something). I select it to be close to the difference between first and second samples and actually abuse first state to store not the index of the previous state but rather a step index. This way I start with (ideally) 16 different step sizes around the current one and can use the one that gives slightly lower error in the end.

And another fun fact: this way I can use just the code for decompression of single ADPCM sample and I don’t require actual code for compression—it traverses through all possible compressed codes already.

I hope this demonstrates that it’s an easy method that improves quality significantly (I have not conducted proper testing but a from a quick test it reduced mean squared error for me by 10-20%).

It should also come in handy for video compression but unfortunately rate distortion optimisation does not look that easy…

About upcoming AV2…

Friday, August 6th, 2021

So today I’ve seen an article titled AV2 Video Codec — Early Performance Evaluation of the Research which of course has drawn my attention.

Fun things are that it is a sponsored article and that it’s written by three engineers from ViCueSoft. This is strange, but so far it still looks more promising than the original AV1 feature review article with over 20 authors and too much marketing in it (my review of it is here; and to be fair it was followed by more serious paper with less authors but this one exists as well). Anyway, let’s see what is presented here.

I don’t care about the performance much so I just quote the phrase from the conclusion: “…rough approximation shows only 1.2x times encoding complexity increase and 1.4x time decoding”. I find the increase in decoding complexity being larger than the increase of encoding complexity a bit strange, normally you’d expect encoding difficulty rising faster because of the nature of the coding approach in modern codecs (normally an encoder needs to search for the best combination of encoding tools and their parameters and then apply the same steps as decoder does in order to have a coded frame in the same state as decoder would have it). Let’s look at the features then, it’s the most interesting part to me anyway.

  • distant weighted compound mode and dual interpolation filter are removed;
  • semi-decoupled partitioning is introduced—this feature allows splitting luma and chroma blocks and code their contents independently under certain level. The paper also says there’s Dual Tree feature in VVC that does the same;
  • quantiser step overhaul—instead of six tables in AV1 now you have just one simple formula for all quantiser step;
  • extending motion sample selection to work with compound blocks as well;
  • more partitioning modes to be more like HEVC;
  • multiple reference line selection for intra prediction—allows you to select not just neighbouring row/column for directional intra prediction. The same tool exists in VVC. And it also reminds me of X8 frames in WMV2/WMV9, that is the first case of intra prediction using more than one line known to me;
  • offset-based intra prediction refinement—adding some offset to the top/left intra predicted edge of the block to make it even smoother (the offset is calculated from the neighbouring blocks as well);
  • intra secondary transform—this tool tries to improve compression by applying a special secondary transform to the low-frequency coefficients. VVC has low-frequency non separable transform doing the same;
  • simplifications in intra mode signalling;
  • some improvements in motion prediction coding;
  • cross-component sample offset—another chroma-from-luma tool: for the whole CTU between deblocking and CDEF stages a DC offset is calculated from the luma values and applied to chroma values.

Essentially there are three kinds of improvements: simplification or generalisation of the existing feature (including complete removal of it—I approve either), picking the tool used by VVC/H.266 (that approach works but lacks originality) and an occasional improvement of an existing tool (too few and not too original). Of course nobody knows when AV2 will be declared finished and some things will surely have changed by then, but I don’t expect radical changes.

Once I said that I’ll review H.266 when AV2 is released but these guys has essentially done my work instead of me. Thanks!

Why codecs are designed like this and why they are not very interchangeable

Monday, August 2nd, 2021

Sometimes I have to explain the role of various codecs and why it’s pointless in most cases to adapt compression tricks from image codecs to audio codecs (and vice versa) and even from lossy to lossless codecs in the same content. If you understand that already then you’ll find no new information here.

Yours truly
Captain Obvious
(more…)

Looking at Tsunami games

Sunday, August 1st, 2021

You may remember Tsunami Media as a company founded by ex-Sierra people that released a couple of games and ceased its existence.

MobyGames lists the following titles (characteristic is mine):

  • Ringworld: Revenge of the Patriarch—an adventure game based on Ringworld novel by Larry Niven. ScummVM supports its but I played it long before that.
  • Wacky Funsters! The Geekwad’s Guide to Gaming—a collection of arcade games with wacky design (I’ve only played its sequel though).
  • Protostar: War on the Frontier—a reportedly good strategy game inspired by the same source as Star Control II so they share a lot of game design (and yes, they have a common ancestor so it’s not a rip-off. I’ve never played it myself but what I saw looks interesting.
  • Blue ForcePolice Quest in anything but name. ScummVM supports it but again, I played it long before that.
  • The Geekwad: Games of the Galaxy—another collection of standard arcade games but with wacky design. I especially liked quizzes there.
  • Flash Traffic: City of Angels—one of the first interactive movies. The Mike has blogged about it.
  • Return to Ringworld—a sequel to the Ringworld obviously. ScummVM supports it (so I can re-play it and check whether that empty platform where you seek for the details is really that horrible).
  • Man Enough—FMV dating game.
  • Silent Steel—yet another interactive movie.
  • Free Enterprise—some business simulator.

As you can see, some of the adventure games are supported by ScummVM already but FMV-based ones are not which makes me wonder why.

Man Enough uses the same engine as the previous games (I checked personally. FMV sections there turned out to be animations in the same format as in Ringworld II (I hacked a quick RLB extractor and animation decoder to check that). Side note: whoever added a support for the engine was tired (for a very good reason given below) and hadn’t recognized that it uses LZW compression (so the decompressor is still slightly beautified REd code). Also in this game you have RLBs with elements following each other and those aligned to 16-byte boundaries. So maybe it’d be better just to check if you have first entry right after the library header or not (they all have TMI- header so you can’t mistake it for anything else).

Flash Traffic is special since while it uses the same engine it has all resources stored separately instead of a single library archiving them all. Additionally while TMI format they use is the same, the compression is different. Here you have LZ77-like method which can copy verbatim, copy 32-bit words from already decoded area, fill region with repeating pattern of two bytes, or simply zero region. I have a working implementation for it so I can unpack various resources from the game just fine (but beside BFI or MPEG files that have been supported since long time why should I bother?).

Silent Steel is their newer FMV game that’s mostly just one large MPEG video file and some logic around it (that’s why there was an interactive DVD re-release of the game later).

And now here’s the most important reason why Flash Traffic and Man Enough are not supported up to this day: Tsunami Media hardcoded game logic into the binaries, so while for SCI or SCUMM games you can write a virtual machine and interpret original bytecodes, here you need to decompile game logic from the DOS executable (it’s not an easy task even today) and re-implement it yourself. That’s why tsage engine in ScummVM is full of ${game_name}/${game_name}_scenesN.cpp files for each of the three games it supports. I’m pretty sure the developers won’t refuse somebody else’s contribution for the other tsage games support so you’re welcome to try.