A bit more

August 8th, 2008

With low-pass filter my AAC encoder is more or less feature-complete. Of course there’s still more room for improvements but it’s pretty fine now. I’d like to submit it for review but it depends on some parts of AAC decoder and it’s still under review :-(. So I don’t have much to do until then.

So I switched to last GSoC task and hacked again at RV40 loop filter. Well, filter invoking pattern is almost there and I’ve fixed several bugs in actual filtering code. Bit it’s not there yet. Maybe in a month it will be so if AAC encoder won’t take all my time again.

News + Extra

August 3rd, 2008

AAC front: to compete with other encoders I have to implement low-pass filter. Benjamin suggested Butterworth filter, so I will try it next week. Hopefully that will be the last big feature to do.

RV front: looks like deblocking pattern is generated from comparing motion vectors, if the difference for subblocks is greater than 3, then edge between them is scheduled for loop filtering. Don’t expect working loop filter implementation too soon though, I still have to deal with AAC encoder and it’s more important.

Extra: I’ve finally decided to buy ASUS Eee, it was easy thing to do – there’s only one model (Eee 701 4G with Win XP installed) for about the same price of four hundred bucks (maybe $450 in greedy shops). So the first thing I did with it was installing Linux and tearing down that stupid “Designed for Windows” label (which was surprisingly easy thing to do and left no marks on laptop surface).

Now here are complains about Ubuntu Eee (I don’t have USB DVD drive and Xandros hasn’t worked from USB flash drive for me): it requires some hacking of system configuration to make it work (like shutdown properly) but that I can live with, but the braindead thing is that gcc is installed (why?) without any development header or library, so you can’t compile even “Hello, world!” program. Both of those issues are resolved, so I just need to make this toy more useful to me 🙂

AAC: encoder progress

August 1st, 2008

If there are still people interested in my work on AAC encoder (I doubt though), here is a status report for you.

Tasks completed:

  • Make encoder provide lookahead samples
  • Make psy model use these samples to produce block switching decision
  • M/S case handling (at least I hope so)
  • Multichannel coding (from mono to 5.1, seems to work fine)

So there are only two bits left: tune model to produce better sound and commit that all to FFmpeg SVN.

The problem with current implementation is that it follows given bitrate too closely, so some frames may be coded too badly resulting in audible artifacts. I will add ABR mode and quality-based mode to make it better. Another issue is how to use codec options to turn them on, so Robert Swain can add profiles for them ;).

In related news: AAC decoder is slowly migrating to FFmpeg SVN, so I should be able to submit my encoder for review and inclusion as well.

A bit of AAC news

July 20th, 2008

I’ve not abandoned work on AAC encoder completely.
For example, I think now my psy model calculates and handles bitrate better. Now the only goals I set to myself are:

  • Make encoder provide psy model lookahead samples for block switching
  • Finetune psy model, or more specifically:
    • block switching decision
    • finish M/S case handling (for now it does not update all psy model information and result in artifacts on several test samples)
    • adjust scalefactors to reduce quantisation distortion
    • some other tricks from 3GPP TS26.403?

After that there are several ways to go: improve quality until it beats other encoders (at least libfaac), implement SBR/PS encoding (the latter is easier to do), introduce multichannel coding.

I suspect that at least one person will suggest to do it all.

Turtles All the Way

July 20th, 2008

Just in time I though I’ve fully understood RV4 loop filter. It uses both coded block pattern and some other pattern. I thought it was CBP from the previous frame, but it turned out to be some special deblocking pattern calculated for each block in interframes after decoding that block. That calculation is easy – it just selects a set of subblocks to check, compare some values and if the difference is less than 3 then set a bit in deblock pattern. Now the only thing left is to find out is where those values come from.

Again and again on RV40 loop filter

July 15th, 2008

I’ve mostly understood how RV40 loop filter works.

Just not to forget main principles I document it here (this blog was created for such things after all).

  • CBPs from left, top and bottom neighbours are used in filter, and if frame type is interframe then CBPs for those blocks in that reference frame are used as well.
  • There are two actual filter types – weak and strong, both are described in H.264 drafts.
  • Edges in subblock are filtered in the next order: bottom, left, top. Top edge is filtered only for the subblocks in the first row.
  • There are many filter parameters passed: dither argument (for strong filter, depends on subblock position), two thresholds taken from ClipTable, threshold taken from alpha_tab, threshold taken from beta_tab and the same value multiplied by 3 or 4 (four is for Y plane filters in not extremely big pictures).
  • The problem was to determine what ClipTable parameters should be used, as it has an additional dimension, more on it below.

There are seven values taken from ClipTable total:

  1. ClipTable[0][current block quantiser]
  2. ClipTable[2][current block quantiser]
  3. ClipTable[2][global quantiser set in header]
  4. ClipTable[x][current block quantiser]
  5. ClipTable[x][top neighbour quantiser]
  6. ClipTable[x][left neighbour quantiser]
  7. ClipTable[x][bottom neighbour quantiser]

That x value is 2 for the intra block types and P-frame interblock with DCs coded separately, 1 otherwise.
As I understand, ClipTable[x][current block quantiser] is used by default and other valuer are used for corner cases (subblock on the side of the edge is uncoded, belongs to another macroblock or does not exist at all).

I should look at H.264 loop filter description (thanks to all who sent me the pointers to the book by Iain Richardson), it seems suspiciously similar.

Troubles with Psy Model

July 14th, 2008

I’m battling with psychoacoustic model. 3GPP TS26.403 seemed clear at the start but then problems have begun.

The main problem is perceptual entropy estimation. Since MDCT coefficients differ by magnitude in FFmpeg and 3GPP, perceptual entropy estimated by formulae from 3GPP differs much from actual number of bits to code. Also 3GPP encoder always makes scalefactor lie in range 104…164, beats me why (and its center does not correspond to scale = 1.0 either).

I have to investigate further before continuing work on psy model. I also hope to see AAC decoder in FFmpeg SVN soon and push my work on encoder there as well.

Again on RV40 loop filter

July 10th, 2008

While work on AAC encoder is slowly progressing (now it’s mostly psychoacoustics left to do and maybe HE-AAC if somebody will convince me), I’m looking at side tasks to make my life a bit more colourful.
For now those tasks are writing SSE2 optimization for Monkey’s Audio decoder (and that is the first piece of SIMD assembly I’ve ever written) and working on RV40 loop filter.
To give people false hope, it’s more understandable by now. Only one function argument is not obvious. And Dark Shikari, you were wrong – RV40 is 99,5% alike with H264 draft (not 99% you said), as loop filter is suspiciously similar to H264 one.

AAC: Nachrichten pro Woche

July 5th, 2008

Here is this week portion of AAC-related news:

  • I was working on psychoacoustic model and fixes for it. Now encoder should always produce correct files (i.e. decodable without bitstream errors). Sound quality may be low though.
  • There was a bug in MDCT calculation which resulted in wrong spectrum.
  • My test device for AAC has broken 🙁 Where I can find a decent pair of headphones that won’t break that easily? Especially in this country.

And just in case my mentor’s reading this, here are my plans:

  • Improve and finish 3GPP TS26.403-based psychoacoustic model.
  • Implement block switching.
  • Add sine windows.
  • Sync my encoder with current AAC decoder code (maybe it will be committed by then?)

AAC: weekly report

July 1st, 2008

I’m working on creating psychoacoustic model from recommendations presented in 3GPP TS26.403. Implementation is very rough but at least it can produce the files with desired bitrate (not quite that bitrate but ~2kbps around it).

Now the tasks are to eliminate noise from encoded material and add block switching. Maybe window switching as well.
Oh, and commit that all to FFmpeg SVN.