After some discussions on IRC I’ve participated I’d like to present here for future discussion.
- Audio API should reflect video API as much as possible. Now decoder outputs 16-bit native-endian audio into raw buffer.
- Introduce audio formats. I’d like to be able to decode old 8-bit codec into bytes, newer 24-bit audio into 32-bit ints, floats for other codecs if they need it, etc.
- Planar format for multichannel codecs. It will simplify downmixing and channel reordering. (This is not my idea but it is worth mentioning)
- Swscaler-like structure for format handling and negotiations between audio filters.
- Block-based audio processing. Each audio should be operated as a multiple of blocks with fixed number of samples (like video is operated by frames and rarely by slices). Why not always by single block? Because some formats throw chunks with multiple blocks to decode (Monkey Audio, Musepack SV8) and some have too small blocks that cause too much overhead to process them by one at time (most speech codecs and (AD)PCM). This is just a bit stricter than current scheme.
Now, who wants to implement this?