As mentioned in the previous post, Indeo 3 splits frame into cells using binary trees and they’re coded using one of several possible modes. In reality it’s more complex: there’s a primary tree that splits frame into regions and tells how to code them (intra or inter) and those regions themselves can be split using another binary tree to tell which coding method to use (or to skip decoding it entirely). See, it had tree coding, prediction units and coding units two decades before H.265! And slices as well: it divides data into strips 160 pixels wide too.
Splitting the frame optimally is practically impossible task (because of its combinatorial complexity). In reality though it’s much simpler: first we split plane into 160-pixel wide (or 40-pixel wide for chroma) strips then split them along the largest dimension until we get cells of maximum acceptable size (which seems to be 767 pixels but the encoder seems to handle up to 2048 pixels in a coded cell). Then it’s up to a secondary cell coding.
From what I could gather in the encoder, it also tries to split secondary cells if they’re above the limit but it’s the same value used in the reference encoder even if it could be set separately.
Since my goal is to learn something new instead of re-creating something existing, I use a different approach: initial mode is selected by the relation between horizontal and vertical differences (if both are too high I try to split the cell and try again). Similarly for inter mode I first try to see whether the cell can be coded as inter (and if splitting it will make at least one of the sub-cells code as inter) and if not then I resort to intra coding.
There is probably a better way than brute force to find out the optimal splitting but for lack of it a simple heuristic should do.
Cell coding mode and codebook selection is a topic best left for the next time.
I can’t shake the feeling that AI/ML could help with the combinatoric decisions. But that’s probably just me falling for the hype. 🙂
Well, yes and no – some things are done there by a similar process (analysing a huge set of encoded data statistics) but substituting it for real intelligence at the last stage (so they could use some simpler rule with magic numbers instead of having to carry a huge model along).
And the result of AI/ML work will also be a set of magic numbers albeit a slightly larger set 😉