Before moving to improving parts of NihAV
not related to decoding I decided to implement some small family of formats and I picked VivoActive since somebody complained some of it was unsupported.
This family consists of one custom container format and three codecs based on ITU standards. Container format is simple, intended just for one video and one audio stream with video frame most likely split into 128-byte chunks (probably for better streaming), the only interesting thing is that it stores header in text form which is too flexible compared to the rest of format.
First audio codecs is ITU G.723.1 and it was painful to implement it. As a proper speech codec it has a lot of proper speech codec math like “multiply 32-bit value A by 16-bit value B and shift result by 15 bits” which requires explicit casts in Rust. On the other hoof it has saturating_add()
and friends which help in many other cases. There are places where functions take the same data as input and output while in other places the same functions have different input and output arrays. Plus I wanted to have a slightly better design structure so there are functions inherent to subframes, some functions belong to the decoder instance and some are used by both. And then I had to debug it. To give it a perspective, G.723.1 decoder takes 110 kB in source form and code part is 37 kB; for Siren the numbers are 45 kB and 15 kB respectively; Vivo video decoder is merely 19 kB because most of the decoding is done by base H.263 decoder in nihav-codec-support
.
Siren (or more officially Polycom Siren 7) is a codec that served as a base for ITU G.722.1. Since RealAudio Cook is based on G.722.1 and I’ve written a decoder for it already, this one was quite easy to implement. Especially considering that some guy wrote an opensource decoder and encoder for it back in early 2000s. Also this might be the case when having 5*2^N FFT finally paid off since Siren frames are 320 samples long so I still can use my standard IMDCT implementation here (it outputs samples in reverse order but that’s no problem).
And finally Vivo Video. It’s yet another codec based on H.263 (but with slightly different headers) and notable mostly for how it represents codebooks. The codebooks are stored as a single set (but not in order e.g. codebook definition number two is used for codebook number fourteen), each codebook can represent codes up to eight bits long (for longer codes you have escape prefix which means that e.g. codes starting with 0000 10
have their tails defined in another codebook set). Another interesting feature is that the codes are stored as text strings with ones, zeroes, and spaces (yes, the decoder parses them to get the actual code). Additionally it has a weird decoding mode where you keep a state ID, there’s a special table to map it to the actual codebook number, and codebook tells you how to change state ID when you decoded a new code. This mode can be used to decode the whole stream or just macroblock coefficients.
As for the codec itself, there are two flavours of it: Vivo/1.0 (or Vivo/0.90) and Vivo/2.0. The first version is plain H.263 that does not use any special features, the second version has PB-frames (i.e. frames where B-frame macroblock data is stored together with P-frame macroblock data) and it employs AIC (advanced intra coding mode). It’s probably the only codec I’ve seen that actually has AIC in P-frames and not just in I-frames. Reconstruction of P-frames because of this AIC mode is not perfect but as with G.723.1 decoder it’s good enough to demonstrate that it works and I don’t want to waste more time on it.
All in all it was a meh-y experiment with mediocre results and I should move on.