As I wrote in my previous post, I had functioning audio player nearing completion. And now I’ve finally added all features I wanted to add and can call it done.
While previously I mostly ranted on the bloat introduced by the components authors, here I’d like to describe the design and the reasoning behind it.
First of all, what I wanted to have: a player for audio that uses NihAV
for decoding it, minimum of outside dependencies (i.e. just the audio output interface and nothing else), simple design, and an ability to pause and seek. The support for certain kind of corner cases was sacrificed for simplicity. After all, I’m using just Linux and I know what kind of content I play and how—so why bother on whether it will work as fine compiled for Windows or under some exotic terminal. The same applies to custom output, filters and such.
So, how does it work? About as simple as you can make it: at first it spawns a separate thread to read key presses from the terminal and then for each provided input it opens it, configures SDL output (16-bit mono or stereo, I don’t have multichannel headphones or good ears to enjoy the intricacies of 24-bit audio) and then simply in a loop displays current time, reacts to the input commands sent by the terminal reader thread, and refills audio queue (implemented by SDL2) when its fill drops too low.
While this is enough to make a good enough player, it has some drawbacks too. For starters, certain formats may take a significant time to decode one frame (and we all know that Monkey’s Audio codec insane mode) which means that while it’s being decoded the screen is not updated which may be a bit irritating. But comparing to the alternative of having another thread for audio decoding (with an additional logic to control it) I’d rather pick the simpler solution. Another drawback is that you cannot do anything to the already queued data so if you change volume it will be applied only to the newly queued data and that is likely to happen in a second or more (again, a problem mostly for a certain codec in a certain mode). As I mentioned before, the proper solution would involve a separate object for handling audio callbacks that maintains its own queue and applies volume modification only when output samples are requested but I’ll probably leave to the upcoming video player.
Another thing is that I’ve not particularly tried to optimise codecs yet it seems to be satisfactory for my needs. On my more than ten years old laptop I get these results for various lossless audio decoders on audio CD rips (my music collection is mostly in lossless formats after all):
- 16-minute FLAC—1.3s with
avconv
, 6.3s withnihav-tool
(mostly because I don’t have any optimisation for unary code reading and that’s where most of the time is wasted). Still it’s 150 times real-time decoding so it’s good enough; - 52-minute Monkey’s Audio in normal mode—23s with
avconv
and 33s withnihav-tool
; - 78-minute Monkey’s Audio with insane compression—2:30 with
avconv
, 7:10 withnihav-tool
(and it spends over 90% of time performing adaptive filtering which I perform with 32-bit ints instead of 16-bit ones and no SIMD except the one from compiler); - 54-minute TTA—13s with
avconv
, 18s withnihav-tool
; - 69-minute WavPack—46s with
avconv
and 54s withnihav-tool
.
So while it’s not as fast as the usual alternative, it is fast enough on my hardware for practical purposes (i.e. playing audio without loading too much CPU with one notable exception). And another problem with Monkey’s Audio insane mode is latency. Its frame in this mode contains by default whopping 1179648 samples (over 26 seconds of audio at 44.1kHz) and it takes about two and half seconds to decode such frame. unmac
, on which libavcodec
decoder is based, exploits the fact that newer APE format codes samples in interleaved manner (yes, old versions coded all samples for single channel together—but the again, insane frames there were about six seconds long) and decodes it in blocks of 4608 samples by default so you can have 1/256th of insane frame decoded immediately and the rest decoded later. It also requires a whole frame data being buffered at once and decoder being called again and again until the whole frame is decoded. Since it’s not possible with current NihAV
design I can probably make a special hacky demuxer and decoder specifically for the player that will be based on a sequence of single data frame and dummy frames just to tell the decoder to decode more of the first frame. At least nothing prevents me from bundling such demuxer and decoder with the player and register it just there. But that should be done only if the problems with playing back such files will irritate me enough to implement that.
In either case, nihav-sndplay
is done and works sufficiently good so it’s time to move to writing a satisfactory video player. A good task for the rest of this year and maybe some chunk of the next one.