Pending work for FFmpeg

December 19th, 2009

Here are some pictures decoded with game decoders I’ve more or less finished in my free time:

While logotype in the middle should be recognisable to almost everybody (it’s from video file embedded in another player/converter for that format), others are not so famous.
Yes, colour planes are swapped but that’s not critical.

Left picture is taken from Wing Commander IV trailer packed with Xan codec. It has a very long history — it was 90% complete even before I joined FFmpeg project. The only caveat was that it outputs YUV format while Mike thought it was 16-bit RGB. Also nobody was interested in completing it (including Mike and me). Well, it’s almost there.

Right picture is from Descent III intro encoded with Interplay Video 16-bit version. I’ve looked at it once, almost got it right. Main thing I missed is that is stores motion vector data at certain offset, not along with other data as it did in 8-bit version. Now it plays fine though.

Another funny thing I remember is that there were complaints on detection of 16-bit variant. And what do you know? That information was available for ages at container description page. Sometimes it’s useful also to read Multimedia Wiki, not only write to it.


What next? I don’t know, there so many things to do — finish Flash Video 2 decoder, integrate Auravision 1/2 decoder before it rots, have another stab at some formats like Apple Intermediate codec or some codecs from Windows Media family.

At least I know that FFmpeg may be a bit closer to its one of unofficial ultimate goals — converting everything.

Gdium optimizations

November 24th, 2009

Since I’m not going to work at this soon (have more stuff to do), I publish that stuff I did. Grab
tgzipped sources here. Most of it does not give any significant speedup because of internal Loongson structure, so it’s just proof of a concept.

A joy of underpowered hardware

October 21st, 2009

I prefer to develop on underpowered hardware since it makes you want to squeeze all you can from it. Looks like Gdium netbook is an ideal candidate when it comes to being underpowered (BeagleBoard is too underpowered in that matter).

What really sucks in Gdium (to my taste):

  • video card performance — in MPlayer video output time tends to be more than decoding time (which mostly don’t have SIMD optimisations for Loongson). Watching something greater than 512×384 MPEG-4 video is not comfortable. Floating-point audio codecs also take a lot of CPU.
  • there is an audible noise in headphones during playing audio; since video card, audio chip and some other things are integrated into single SM501 chip, I can add that it seems to be the suckiest part in netbook.

Some things are annoying too, like having 16-bit display (while chip supports 24-bit output, that does affect picture), battery charge limit of 97-98% (so it’s always charging and never completely charged — probably some glitch on my sample), fan and temperature issues (specs say that CPU dissipates up to 4 watts, where all that heat comes from?) and probably having an internal drive instead of USB key should greatly increase performance too.

I still hope for something like notebook containing multi-core MIPS or ARM with (preferably) 1Gb RAM.

A Bit of New Hardware

September 28th, 2009

I’ve finally got SheevaPlug which will be my new server instead of Artigo 1000 which seems to have internal power management broken. Also it will help me in my plans of decreasing x86 share in my boxes. The only uses I find for x86 netbook now are reverse-engineering and running an occasional game, everything else I do on other boxes as well.

Looks like FedEx at least here is going downhill. While two weeks of delay (aka “custom clearance”) is pretty usual for me, from this year one has to go to their office to sign some papers and pay custom fee before they finish custom clearance and deliver package to your town. I had to go there second time to pick up the package (and before that packages were delivered straight to my place except for one case when it went back to USA). Not that 2.5km walk can harm.

Another thing worth mentioning is that my Gdium now has probably the fastest MPlayer — I’ve ported several lavc MMX-accelerated functions to it, so now H.264, RV3/4 and H.26[13]-based formats decode faster (the latter by couple of ten percents faster, others by 5-10%), not mentioning Monkey Audio which is now possible to listen to in realtime even files packed on insane level. Maybe in distant future they will hit SVN (if I clean them and Måns finds time for review).

F2 40 01 2A (some notes on SIMD, instruction sets and everything)

September 15th, 2009

I’ve been following the steps of Måns and got myself Gdium too. Since it’s no fun just owning less-spread computer architecture and not writing anything on it, I’ve tried SIMDifying the easiest operations one can do on it — vector sum, vector subtraction and vector scalar product. And I have a decoder that uses those operations extensively, so why not try to benchmark it a bit?

Test sample was first 26 seconds of Monkey Audio file with insane compression since this mode uses longest filters and benefits from SIMD most (and is slow enough even for short samples ;). In all cases I’m the one who has written SIMD code, so it’s fair 🙂

PowerPC (Freescale 7447A 1.42 GHz): 25 seconds and 6 seconds
MIPS (Loongson 2F 900 MHz): 37 seconds and 7 seconds
ARM (Cortex A8 600 MHz): 138 seconds and 22 seconds
x86 (Intel Atom N270 800 MHz): 50 seconds and 9 seconds

Mind you, SIMD instructions in Loongson are custom for that CPU and modelled after MMX (64-bit registers, actually reusing FPU regs, similar names) but at least they are done in RISC fashion, i.e. you can store result in some other register.


I’ve also looked out of interest at binary representation of SIMD. On x86 the principle is to prefix SIMD instruction (usually with 0x66 “opcode for CPU with half of current bits” byte) so SSE7 instructions will look like instructions for 1-bit FPU on Intel 4004 predecessor and will take 8-16 bytes to represent.

Other architectures use simple 32-bit word for any instruction. NEON (on ARM) and AltiVec (PowerPC) use some opcodes in general instruction space, Loongson 2 SIMD are custom calls to the second co-processor.

Talking about instruction sets I cannot omit the fact that IDA 5.2 sucks at disassembling PowerPC code (not only AltiVec but some of the core instructions too) and objdump sucks at disassembling MacOSX format (it ignores internal structure and disassembles it as raw file), that looks like the reason why we don’t have Apple Intermediate Codec RE’d yet.


P.S. Jag vill gärna få AVR32, BlackFin, ColdFire och andra exotisk CPU:ar. Alpha eller Sparc är bra ochså men det är bara orealistisk, tror jag.

Tell me how you pronounce ‘g’ and I’ll tell who you are

September 7th, 2009

As some of you may already know, I have a bit of interest in linguistics. Here I’ll try to describe an interesting (for me) fact. While some of the letters are read virtually the same in any language, some differ greatly. It looks to me that ‘g’ is the telltale letter because its pronunciation differs most in different languages.

Let’s see:

  • English: djee
  • French: may sound more like ‘z’ in “azure” (Je ne parle pas français, though)
  • German: IIRC, in words ending with “-ig” it’s read as soft ‘h’ or something (Ich spreche Deutsch nicht)
  • Hungarian: sometimes it’s read as ‘d’ (for example, in the name of country — Magyar)

And now for more exotic languages:

  • Ukrainian: it’s more like voiced ‘h’ or French ‘r’. For ‘g’ sound in loanwords another letter is used.
  • Belarusian: resembles Ukrainian but less voiced.
  • Japanese: it’s easy — you’ll never see it alone since they use syllable-based system, not letter-based.

And finally, in my homeland (och jag vet lita svenska) it may also sound in two different ways: more like in other languages (till exempel: “gamla”) and more like ‘j’ — listen at example from Wikipedia how to pronounce Göteborg correctly (you can hear ‘g’ at the beginning and at the end of the word).

FFmpeg: providing better alternative since 2000

September 4th, 2009

Few days ago FFmpeg finally got WMA3 decoder. This event gives me an opportunity to look at our achievements.

  1. Popular and/or standard codecs — supported except for the newest stuff (AAC-HE[2], H.264 interlaced modes, VC-1 interlaced modes).
  2. Windows Media — WMV1-WMV3 are supported (except for beta version of WMV3 and other WMV3 spinoffs). WMA1-WMA3 is supported too. We still have WMA Lossless and WMA Voice to RE and our top men are working on it (did you remember “Raiders of the Ark” ending? Neither did I).
  3. Real Media — RV1-RV4 are supported, from the variety of audio codecs only Sipro and Real Lossless support are missing. Sipro is in the works and nobody (including RealNetworks itself) cares about RALF.
  4. Intel codecs — Indeo 1-3 is supported, patch for Indeo 4-5 is available, IMC is supported, IAC is not REd (and not in queue).
  5. RAD codecs — REd, there are still some issues with Bink to sort out before inclusion.
  6. AVI codecs — that’s a mess. There are simply too many very codecs and new ones still continue to appear. Some are supported, most are not.
  7. Lossless audio codecs — some are supported, some are not. Again, looks like everybody writes own lossless audio or video codec. I’d like to get support for TAK though.
  8. Game video codecs — we still have a lot of them to RE. Personally I want Discworld III video (BMV, but it differs from the format used in Discworld II) support. *sigh*

If you think there’s some codec we definitely should support, please tell us (preferably with specification or decoder sources 😉 If you just want to have some codec support in FFmpeg — make us interested in it, some codecs support appeared in FFmpeg after somebody had said “can play that file?”.

Bink: pattern-run blocks

September 4th, 2009

And now for something completely the same.

Let’s talk about most interesting block type in Bink. I don’t know official name for it but I call it pattern-run block because of the way it’s coded. Idea is simple: there are runs of single colour and blocks of different colours like in your ordinary RLE; what can be interesting in that? But there is one thing — block is filled with runs/copies not in usual scan orders but following one of 16 predefined patterns – columns, spirals, Hilbert curve (Zelda pattern for some of us), whatever.

Here’s an example:
Scan pattern #13
(and SVG version)

I think it’s obvious how this helps block compression. The only bad thing about it is the fact it did not appear in Smacker (mostly because Smacker uses 4×4 blocks).


This concludes my series of posts about Bink.
“Works for me” patch against FFmpeg r19754 is located here.

Bink: a bunch of peculiarities

September 3rd, 2009

I’ve mentioned before that Bink differs greatly from other codecs. Now I want to walk over general structure of it and mark all peculiarities I’ve seen so far.

  1. Huffman coding. I think I’ve mentioned it enough times.
  2. Data coding. The fact that different values (block types, colours, run values) are coded in so-called bundles (i.e. groups) for at least one row of blocks at once. So when starting decoding new row bundles are checked whether there’s enough data and more is decoded if needed.
  3. 16×16 and 8×8 block mix. Sometimes encoder inserts 16×16 block into usual array of 8×8 blocks. Looks like those blocks can happen only on even positions which eases skipping decoded part of it. 16×16 block contents are actually 8×8 block contents scaled twice.
  4. Coding modes. There are 10 block types; three of them belongs to vector quantisation techniques (I’ll write another post about special run-length pattern block), two block types use DCT (more below) and another block type uses special coding for residue without any additional transform.
  5. DCT coefficients coding. I’ve written a bit about it already. Have I mentioned they also use non-standard scan order (designed for pairs of coefficients)?
  6. Coefficients quantising. There are 16 possible quantisers – 1, 1 1/3, 1 2/3, 2, 2 2/3, 3 1/2, 4, 5, 6, 8, 12, 17, 22, 28, 34 and 44.

I suspect that some of the things are legacy of Smacker and really clean design would go in slightly other direction – it’s not pure vector quantisation as it was but it’s not pure DCT-based codec either.

As for the progress: I have more or less working decoder in my own build of FFmpeg. When somebody kicks certain devs to push Bink demuxer and audio decoder into SVN codebase, I’ll give my decoder with that. Until then just wait.

Bink: ‘lossless’ block coding

September 2nd, 2009

First of all, I’d like to note that those names are taken from Bink code. In reality ‘lossy’ block is used as is and ‘lossless’ block is DCT coefficients.

And now, the differences:

  • in ‘lossless’ mode coefficients are decoded until mask becomes zero, there’s no explicit number of coefficients
  • coefficient bits are stored explicitly, not as several masks: coef[x] = mask | get_bits(log2(mask));
  • starting list somewhat differs

For those who for some unknown reason are interested in RE progress, I can say that my implementation is still far away from perfect. It crashes on 640×480 BIKi files and for those two files it plays (BIKf and BIKi) it gives barely recognisable image — I blame DCT and dequantisation (I haven’t looked at them yet).