If it looks that I’m not doing anything, that’s about right. Nevertheless I’d like to discuss two exotic formats that I’d like to write decoders for.
The first one is unlike most of the video codecs I’ve seen so far. For starters, it uses fractal compression. Not surprising since it comes from Iterated Systems. And unlike later ClearVideo, it is really a fractal codec. From what I see, it works exactly like the textbook example of the fractal compression: split video into small fixed-size blocks, search for a domain block, apply simple affine transform on scaled-down version of it plus brightness scaling and output the result. There are additional possible operations like leaving blocks unchanged or reading raw data for a block. Since this works only for the greyscale, frame is stored in YUV420 format, planes coded sequentially. Unfortunately since the binary specification is mixed 16/32-bit VfW driver that Ghidra can’t decompile properly, the work on it goes at glacial speed.
The other codec is like the previous one but it has its own container format and DOS player. It comes from TMM—not The Multimedia Mike but rather the company known for RLE-based PH Video format. I don’t see mentions of Iterated Systems in the binary specification, but considering how similar this FRAC codec is to theirs (it uses the same bitstream format with the same opcode meanings and the same assembly instructions) I expect they’ve licensed it from Iterated Systems.
So hopefully when I actually finish it I’ll have two decoders for the price of one.
Update: while refreshing the information about fractal compression, I discovered in the Wickedpedia article on it that two companies claimed they got exclusive license for fractal compression algorithm from Iterated Systems—TMM and Dimension. The last one licensed it to Spectrum Holobyte to be used for FMV. And what do you know, that explains why FVF is named so and why its video bitstream syntax is the same as in the other two (and the code seems to be the same too). So I guess it means I’ll have almost the same decoder (but with different containers) in NihAV, na_game_tool and na_eofdec.
Why are fractals not more used today in codecs? The CPUs are fast enough today for it?
Probably for the same reason as wavelets: it may be decent-looking picture at low bitrate but it does not compete that good when you want more details; and then there’s the whole question of handling motion compensation.
I suspect that actually it will converge by making motion compensation to be a lot like fractal compression—it’s halfway there already (copying blocks is here, affine transforms are somewhat here, scaling is coming next probably), the only thing that is radically different is using reference picture as the source instead of the encoded image itself.
The Duck codecs already use the idea of a “reference picture” with its golden frames, no? Copying parts of the image is done by Interplay’s MVE codec (Fallout 1 & 2 for example), and a similar idea is used for intra coding in H.264 etc. Even PNG’s predictors are similar. Nothing new under the Sun..
Something of a cross between MVE and H.264 is not hard to imagine, possibly with an affine transform. Fill in missing pixels ála H.264 in the negative direction of the copy vector. The block size probably has to be rather big to amortize the transform bits. Between 8×8 and 32×32 perhaps?
Residual detail can be added with ye olde DCT, same as with intra coding today.
Actually AV1 has re-introduced intra copy already, though it interferes with other tools quite a bit. ClearVideo works about the same as you suggested: DCT-coded intra frames, affine transforms with scaling and nothing else on inter frames—with quadtrees too for better efficiency.
There’s still one thing that distinguishes them: in proper fractal compression you find the source blocks (which should be larger than the destination ones) for each of the coded blocks and code only the transform parameters, not the image itself. You can restore the image from e.g. completely black frame by repeating the transformation operations on it several times until the image converges (this reminded me of ray tracing for some reason).
Obviously, back in the day search for the parameters was prohibitively expensive as well as iterating picture several times for decompression, so they had to resort to various tricks to cut time down to somewhat acceptable levels (combinatorial explosion is no joke). The legend says that the first practical algorithm was “put a student at a graphic workstation and make him find the appropriate transforms” while a student, maybe the one from that algorithm, proposed the second one: “let just limit ourselves to square tiles of fixed size with 2:1 scaling and 90-degree rotations and let graphic workstation simply iterate all possible combinations”.
Apparently all the tricks were not enough to make it compete with DCT-based codecs and it was forgotten. Some elements of it are re-discovered and may be used but I doubt that fractal compression in pure form will be used ever again. Maybe just an enhancement layer technology for LCEVC3 but that’s it.
It can start from black because of the contrast and brightness transform at each iteration, right?
Personally I’m betting on KLT making a comeback. DCT is popular because it’s cheap to compute and a good general purpose transform that tends to concentrate most energy in a few coefficients. But a proper KLT, including on the color components (rather than relying on YUV) should be quite doable with modern hardware. If you allow picking transform on a per-block basis then it would also have shades of VQ. One could go really crazy and allow combining bases, like “50% DCT + 50% Hadamard on this block please”
It can start from anything in theory as the operations will make it converge eventually (for the reasons you stated). It may just take more iterations at that.
As for KLT, it poses a problem of metadata (i.e. would savings from using ideal transform outweigh the bits required to describe that transform coefficients). Maybe it will catch on again, maybe some hybrid scheme that combines different transform bases (like HVQM from consoles) will re-appear, who knows. Indeo 4/5 are famous for allowing different tiles to use different transforms (both of different size and type—Haar, slant, even DCT in theory).
Component decorrelation is almost there though.