Since I wanted to try my hoof at various encoding concepts it’s no wonder that after lossy audio encoder (IMA ADPCM with trellis encoding), lossless video encoder (ZMBV, using my own deflate implementation for compressing), lossy video encoder (Cinepak, for playing with vector quantisation, and VP6, for playing with many other concepts) it was time for a lossless audio encoder.
To remind you, there are essentially two types of lossless audio compressors—fast and asymmetric (based on LPC filters) and slow and symmetric (based on adaptive filters, usually long LMS ones). The theory behind them is rather simple and described below.
Adaptive filters are called so because they’re updated after each processed sample depending on the error between the actual and predicted sample value. Those filters tend to be long (from 16 to 1280 samples seen in the wild) and often there are several layers of those to make the prediction even better. So encoder does the same this as the decoder: filters input samples, updates filter coefficients and codes prediction errors. Thus an encoder for such format is not that hard to implement but it’s rather boring.
The approach with fixed filters is different: you take rather short block of data (usually several thousands samples; adaptive filters work on tens of thousands samples or more—they adapt to the data better after a long run), calculate short filter (the usual LPC order is 6-32), apply it to the data and code the remaining residues. During encoding you can spend a lot of time selecting the optimal filter order. For decoding you simply read filter coefficients and apply the filter to the decoded residues. For the long time though I had a question how it is done (and why).
I don’t know where the actual answer is written so here’s how I understand it: the input data is arbitrary long and the filter is short, so we must use some metric to map one to another. And for that correlation is used (essentially a sum of samples multiplied by the same samples shifted by several positions). And the filter is a set of coefficients that if multiplied by the sequence of correlation values gives the next correlation value. This can be expressed in matrix form and luckily the matrix with correlation values has a special type called Toeplitz matrix and this equation can be solved by several methods like Levinson-Durbin method or Cholesky decomposition, to name the most famous ones. I chose Levinson-Durbin method as it’s the simplest one (and seems to be the fastest for the usual lossless audio filter sizes).
So I read the method description in Wikipedia and implemented it after that description. After I’d finally managed to make it work as supposed I discovered that since matrix is symmetric, I can simply use just backward vectors as forward vectors are the mirrored version of those.
And with a new routine to calculate LPC filter at my disposal, I decided to try it by writing some simple lossless audio encoder. FLAC was a passable candidate (not the best format but overall the codec is simple and I’ve written a decoder for it already). And, after a lot of debugging, it works as expected (i.e. it compresses data with a ratio comparable to the official FLAC encoder on not very high level).
Of course it’s far from being on par with any serious FLAC encoder (and IIRC nothing beats flake
in terms of compression ratio), but my goal was to learn how calculating LPC works and test it in some lossless audio codec—and the goal is accomplished. Now I should move to something else. For instance, The Mike has asked me to look at some RoQ files…
Write wavpack encoder.
I considered doing that but WavPack is based on adaptive filters so it’s not that good for my purposes.