I’ve been following the steps of Måns and got myself Gdium too. Since it’s no fun just owning less-spread computer architecture and not writing anything on it, I’ve tried SIMDifying the easiest operations one can do on it — vector sum, vector subtraction and vector scalar product. And I have a decoder that uses those operations extensively, so why not try to benchmark it a bit?
Test sample was first 26 seconds of Monkey Audio file with insane compression since this mode uses longest filters and benefits from SIMD most (and is slow enough even for short samples ;). In all cases I’m the one who has written SIMD code, so it’s fair 🙂
PowerPC (Freescale 7447A 1.42 GHz): 25 seconds and 6 seconds
MIPS (Loongson 2F 900 MHz): 37 seconds and 7 seconds
ARM (Cortex A8 600 MHz): 138 seconds and 22 seconds
x86 (Intel Atom N270 800 MHz): 50 seconds and 9 seconds
Mind you, SIMD instructions in Loongson are custom for that CPU and modelled after MMX (64-bit registers, actually reusing FPU regs, similar names) but at least they are done in RISC fashion, i.e. you can store result in some other register.
I’ve also looked out of interest at binary representation of SIMD. On x86 the principle is to prefix SIMD instruction (usually with 0x66 “opcode for CPU with half of current bits” byte) so SSE7 instructions will look like instructions for 1-bit FPU on Intel 4004 predecessor and will take 8-16 bytes to represent.
Other architectures use simple 32-bit word for any instruction. NEON (on ARM) and AltiVec (PowerPC) use some opcodes in general instruction space, Loongson 2 SIMD are custom calls to the second co-processor.
Talking about instruction sets I cannot omit the fact that IDA 5.2 sucks at disassembling PowerPC code (not only AltiVec but some of the core instructions too) and objdump sucks at disassembling MacOSX format (it ignores internal structure and disassembles it as raw file), that looks like the reason why we don’t have Apple Intermediate Codec RE’d yet.
P.S. Jag vill gärna få AVR32, BlackFin, ColdFire och andra exotisk CPU:ar. Alpha eller Sparc är bra ochså men det är bara orealistisk, tror jag.