Some Thought on Future FFmpeg Audio API

After some discussions on IRC I’ve participated I’d like to present here for future discussion.

  1. Audio API should reflect video API as much as possible. Now decoder outputs 16-bit native-endian audio into raw buffer.
  2. Introduce audio formats. I’d like to be able to decode old 8-bit codec into bytes, newer 24-bit audio into 32-bit ints, floats for other codecs if they need it, etc.
  3. Planar format for multichannel codecs. It will simplify downmixing and channel reordering. (This is not my idea but it is worth mentioning)
  4. Swscaler-like structure for format handling and negotiations between audio filters.
  5. Block-based audio processing. Each audio should be operated as a multiple of blocks with fixed number of samples (like video is operated by frames and rarely by slices). Why not always by single block? Because some formats throw chunks with multiple blocks to decode (Monkey Audio, Musepack SV8) and some have too small blocks that cause too much overhead to process them by one at time (most speech codecs and (AD)PCM). This is just a bit stricter than current scheme.

Now, who wants to implement this?

3 Responses to “Some Thought on Future FFmpeg Audio API”

  1. Can you perhaps give an overview on what a successful implementation would entail?

  2. Kostya says:

    Well, mostly it will requre new structure AVAudioFrame a la AVFrame, where channel data, downmixing coefficients, channel positions, maybe number of blocks.
    Common API which will take those AVFrames and process block-by-block, changing format, up/downmixing and resampling (like swscaler does).

    It’s important to preserve 16-bit short mono/stereo passthrough though.

  3. […] multichannel (more-than-stereo) audio very robustly in its present incarnation. Yeah, that’s another item on the TODO list. Check out the complete specs for CAFF, however. I think if we made it a goal to support CAFF to […]