Recently Paul B. Mahol turned my attention to the fact there are codecs with human names so why not take a look at them?
The first disappointing thing is that you need to register in order to receive a link for downloading that software. Oh well, throwaway mail services exist and it’s not the first codec I’ve seen doing that (though I’ve forgotten its name already like I’ll forget about this codec). Then it’s the binary size. I remember thinking that Go2Somewhere single 15MB .dll is too much but here’s is even larger (because they’ve decided to bundle a bunch of other decoders and demuxers instead of just one codec).
In either case, what can I say about the codec? Nothing much really. They both seem to be DCT-based intermediate codecs that group blocks into slices. Daniel2 uses larger tiles in slices, probably to accommodate for the wider variety of supported chroma formats (unlike its predecessor it supports different colourspaces, chroma subsamplings and bitdepths). The claimed high (de)coding speed comes from the same approach as in VBLE (does anybody remember it? I still remember how Derek introduced its author to me at one of VDDs). Yes, they simply store coefficients in fixed amount of bits and transmit bit length for each tile to use.
The only really curious thing I’ve found is some combinatorial coding approach I’ve never seen anywhere else. Essentially it stores something like sum in a table and for each value only the number of table entries is transmitted. Actual value is decoded as (max(table[0], ..., table[len - 1]) + min(table[0], ..., table[len - 1]) + 1) / 2
and then the decoded value is subtracted from all table elements used in the calculation. I have no idea why it’s there and what it’s good for but it exists.
Overall, it was not a complete waste of time.
There is linux package available too, and manageable decode/encode only cuda lib of 1.4M size. So questioni is: Does it use special VLC tables at all? I see some LUT tables in PTX cuda binaries when disassembled so perhaps that is enough to write decoder. I already know what some spots in bitstream of encoded file represents, it supports 422 and 4444 only, with bitdepth >= 8 and <16. My main concern is that it may use some CPU code to handle bitstream and not fully using GPU to give frames. It also have some stuff to build octave segments and to build octave shifts, never encountered such stuff before.
I’ve not encountered any tables in it and took Mac binary because it had non-CUDA path (it seems to be limited to just a couple of formats that CPU-only decoder supports). And it’s easier to find interesting code in it since they have all those error messages left intact.