Archive for the ‘Screen Codecs’ Category

Some Notes on Lossless Video Codecs

Saturday, March 21st, 2015

While reading a nice PhD thesis from 2014 (in Russian) about new lossless video compression method I laughed hard at lossless video codec descriptions (here’s the link for unbelievers –, translation below is mine):

To date various lossless videostream compression methods and algorithms have been developed. They are used in the following widespread codecs:

* CorePNG employs deflate algorithm for independent compression of every frame. Theoretically the codec supports delta frames but this option is not used.

* FFV1 employs prediction coding with following entropy coding or prediction error.

* Huffyuv, like FFV1 algorithm, employs predictive coding but prediction error is effectively coded with Huffman algorithm.

* MSU Lossless Video Codec has been developed for many years at Moscow State University labs.

And yet some real world tasks demand more effective compression and thus a challenge of developing new more effective lossless video compression methods still remains actual.

Readers are welcome to find inaccurate, erroneous and outright bullshit statements in this quote themselves. I’d rather talk about lossless video codecs as I know them.

A Codec Family Proposal

Monday, September 29th, 2014

There are enough general use standardised codecs, there’s even VPx family for those who want more. But there are not enough niche codecs with free/open specifications.

One of such niche codecs would be an intermediate codec. It’s suitable for capturing and quick editing of video material. Main requirements are modest compression rate and fast processing (scalable is a plus too). Maybe SMPTE VC-5 will be the answer, maybe Ogg Chloe, maybe something completely different. Let’s discuss it some other time.

Another niche codec that desperately needs an open standard is screen video codec. Such codec may be also used for recording webcasts, presentations and such. And here I’d like to discuss a whole family of such codecs based on the same coding principles.

It makes sense to make codec fast by employing multithreading where possible. That’s why frame should be divided into tiles that should be not so large and not so small, maybe 192×128 pixels or so.

Each tile should be coded independently, preferably its distinct features coded separately too. It makes sense to separate tile data into smooth features (like gradients and real life pictures) and sharp transitions (like text and UI elements). Let’s call the former a natural layer and the latter a synthetic layer. We’ll need a mask to tell which layer to use for the current pixel too. And using these main blocks and employing different coding methods we can make a whole family of codecs.

Here’s the list of example codecs (with a random FOURCC assigned):

  • J-B0 — employ JPEG for natural layer and GIFPNG for mask and synthetic layer coding;
  • J-B1 — employ Snow for natural layer coding and FFV1 for synthetic layer coding;
  • J-B2 — employ JPEG-2000 for natural layer coding, JBIG for mask coding and something like PPM modeller for synthetic layer;
  • J-BG — employ WebP for natural layer and WebP LL for synthetic layer.

As one can see, it’s rather easy to build such codec since all coding blocks are there and only natural/synthetic layer separation might need a bit of research. I see no reasons why, say, VLC can’t use it for recording and streaming desktop for e.g. virtual meeting.

On some screen codecs

Tuesday, March 5th, 2013

Recently my attention was drawn to two screen codecs. So I’ve looked a bit closer at them (I’m not going to implement them — at least now, simian audio codec is waiting) and here are the most interesting bits to me.

Dxtory codec. I don’t know if it uses prediction or not but the coding is extremely simple: read unary code (1-8 bits), if the code is 8 (escape) then read byte for a new value, otherwise retrieve value from LRU list and of course move/put the new value to the beginning of that list.
The organisation into slices and decoding for different colourspaces is trivial.

Mirillis FIC Video. This one has more complexity: it operates on 8×8 blocks. Block is coded in the following way: if the flag is present then the block is skipped, otherwise read 7 bits for the number of coefficients to decode and the coefficients (signed exp-Golomb coded). Zigzag scan, IDCT, output.


A new record

Monday, March 4th, 2013

I thought that there’s no codec more horrible than Go2Despair and was proved wrong again. Here’s the even worse pair: Dxtory codec and its standalone companion PackBit.

DxtoryCodec.dll is 8.2MB which rivals Go2C++Hell because it’s standalone decoder without any additional code. PackBitCodec.dll is even more elegant. Of the total size of 672256 bytes more than 357000 bytes are occupied by the code of one function (that’s 3-6 times more than the ordinary zlib-based codec complete DLL file). Truly some people should not be allowed to program.

Now (hopefully) the last post about Go2Disassembly details

Monday, February 18th, 2013

I’ll update codec details on our wiki and hopefully can forget about it. Let somebody French-speaking complete it.

There are many codecs out there to reverse engineer, including worthier ones. On the second thought that would define almost every codec.

Update: filled all information I had, the rest is up to somebody with the debugger and motivation to do it.

Another piece of digital archeology news

Thursday, February 14th, 2013

After investigating the smallest available pile of fossilised dung called Go2Cesspit (only 2.5MB instead of 15MB for the current version) the G2M1 can be reconstructed. It had the same tiled structure as its successors but it coded all tiles with ELS only, no combining with JPEG data.

And the full history of Go2UnwantedPlaces evolution (there definitely was no intelligent design for obvious reasons):

  1. Citrix licenses image coding from Accusoft (ELS-based), uses it to release G2M1.
  2. They want to improve compression and try to code some blocks in the different way. JPEG to the rescue! That’s how G2M2 was born (compression method 2).
  3. G2M3 looks like marketing bump since image coding has not been changed.
  4. For some reasons they replace ELS part with simple deflated raw bitmap. That’s G2M4 now (and compression method 3).

Further findings may correct this system of course but so far it looks like this.

P.S. If you want to have G2M1 supported then send samples and support requests to VideoLAN. They will be grateful for sure.

Now the final words about Go2UselessTalk

Wednesday, February 13th, 2013

Now I can officially say that G2M4 is essentially reverse engineered. It indeed uses zlib-compressed image for sharp details (it’s called “synthetic layer” internally) and JPEG compression for smooth detail (it’s called “natural layer” internally). The only catch — it’s not plain JPEG coding, it codes only some block with JPEG and uses a special way to restore a sparse image.

The idea of this J-BKoding (not the same as J-BKoder in Go2Coven!) is simple: only the blocks referenced from top layer are coded and to save space and improve compression they are coded continuously. So how you know how to restore the picture back? Easy! JPEG data is preceeded by number of macroblocks coded and flags (in bytes, LSB first) telling whether the block is decoded one or skipped. I suspect that something similar might be true for the previous versions of the codec too, because quite often decoded JPEG data showed that its width is less than expected.

Here’s the output of the previous version with synthetic layer only (demonstrated at FOSDEM to the close circle of trolled people):

G2M4 decoder output

New version (quick hack):


Now any VideoLAN guy should be able to implement a decoder for it.

A few words about Go2WasteTime version 2/3

Saturday, February 9th, 2013

As some people know, there was Go2UselessActivity decoder for Libav showcased at FOSDEM (on an ARM-based laptop even). It decoded all known variants of the codec with varying degress of success (i.e. output ranged from being completely garbage to being slightly recognizable). Some guy nicknamed “j-b” was really happy for some unknown reason.

Let’s consider a purely theoretical situation that somebody needs Go2EncountersOfTheWorstKind 2/3 and wants to know how it works (usually it’s either one or another or none). As it’s known already it combines ELS-coded images with JPEG-coded data for some pixels that are coded transparent in ELS. JPEG data, to quote j-b is boring, so let’s look at ELS image coding.

The coder used is the augmented ELS coder by Douglas M. Withers (still available somewhere in Internet insize OSAUGELS.ZIP) with the same tables (36 jots per byte). The only interesting thing is how the image itself is coded with this binary coder.

  • Unsigned values are decoded as unary prefix for number of bits to follow and then the actual value, signed values are coded as unsigned with an usual scheme x -> 2*x, -x -> 2*x - 1 (zero is explicitly signalled by number of bits being zero of course).
  • Pixels are coded as RGB triplets, using median prediction from top, top left and left neighbours if possible. If prediction is used then pixels are coded as differences to (G, R-G, B-G) predicted value.
  • General image coding is conceptually simple: the image is coded as runs of RGB triplets if possible; in some cases if it’s possible to decode pixel from cache it’s done so. If that’s not possible too then one pixel is coded as described above.

Here’s slightly more detailed image decoding description:

for (y = 0; y < height; y++) {   x = 0;   while (x < width) {    if (x > 1 && y > 0 &&
     rgb[x - 1][y] != rgb[x - 2][y] &&
     rgb[x - 1][y] != rgb[x ][y - 1] &&
     rgb[x - 1][y] != rgb[x - 1][y - 1] &&
     rgb[x - 1][y] != rgb[x - 2][y - 1] &&
     pixel_in_cache(rgb[x - 1][y]) {
    rgb[x][y] = decode_pixel_with_prediction(x, y);
   decode_run(x, y, &run_length, &pix);
   if (run_length > 0) {
    // pixel value may get changed here
    decode_modified_pix(x, y, run_length, &pix);
    while (run_length--)
     rgb[x++][y] = pix;
   } else if (decode_from_cache(&pix)) {
    rgb[x][y] = pix;
   } else {
    rgb[x][y] = decode_pixel_with_prediction(x, y);

Hopefully no more information will follow soon.

A few words about G2M4 (that were not censored)

Sunday, November 4th, 2012

Okay, I looked into G2M4 closer, here’s the output:

As with the previous beast, there are two types of images combined in single tile — so-called synthetic layer and natural layer. What you see if the first layer decoded.

Here’s the general tile structure:

  • Compression subtype (top bits from the first byte).
  • Transparency colour (three bytes)
  • Number of palette entries minus one (one byte)
  • Palette entries (byte triplets)
  • Synthetic layer (16-bit BE chunk size plus deflated data, may be not present)
  • Natural layer (probably headerless JPEG data, too lazy to verify)

Synthetic layer image is (after decompression) contains packed bitmap that uses palette from above, each row is coded as 8-bit flag [packed row data]. If the flag is zero then row data is present (that’s my guess, it always seems to be zero). Row data is just palette indices stored as 1/2/4 or 8 bits per index depending on palette size. Sample output you can see above.

Feel free to complete RE.

FnAQ about G2M2/G2M3

Saturday, November 3rd, 2012

Just to clarify status: I’m not working on this anymore so anyone can pick it up and finish.

And here are some possible questions that might be asked but more probably won’t.

Q: who cares about this codec anyway?

A: Not me. VLC does.

Q: so why don’t you do it?

A: there are several reasons. First, now I have idea how it works and it’s not that interesting anymore. Second, it would require some debugging and I cannot run that decoder under MPlayer2 (and I don’t use Windows at all).

Q: but wait, there’s G2M4!

A: right, and it uses completely different coding. I might look at it but no promises either.

Q: so, how does it work?

A: the idea is simple. Every frame is divided into tiles and some tiles can be updated from the previous frame or not; some additional information (i.e. mouse cursor shape and position) is also stored in the frame.
G2M2/G2M3 use the technology licensed from Accusoft that combines JPEG and ELS-coded image.

Q: how do they do it?

A: the approach (I call it JPEG-Binary Koder or J-BK for short) is quite simple. Every tile has ELS-coded picture with possible transparency. Transparent areas should be replaced with headerless JPEG data (i.e. only scan data without any markers but with escapes).

Q: sounds easy, where’s the catch?

A: I’m too lazy to catch bugs in my quick JPEG decoder reimplementation and ELS-coded image requires some debugging which I can’t do.

Q: okay, I want to do it, where shall I start?

A: is it the first of April? No? Hmm… Okay, here’s what I would do: grab a copy of g2m.dll (there are enough of them around, in various sizes too), disassemble it.
Find the ELS thresholds table (referenced values are 0x10000, 0x12A00, 0x15C00, 0x19600, 0x1DA00, 0x22800, 0x28500, ...) — the function referencing it is the one used to update ELS coder state, go up from there. Feel free to look at the wiki entry about G2M. Bonne chance!