OK, 8×8 transform is slightly optimized too.
This version gives 7x gain on PII and 15x on G4.
Here is vc1itrans.c file from VC-1 reference decoder with replaced transform.
See here
how to use VC-1 reference decoder in FFmpeg
and just replace one file in libvc1 with my version before compiling.
Very nice optimization! It speeds up almost 50% in my P-M 2.0 (from 60sec to 40 sec). I’m interested on further optimizations. Can you give me some information about where to optimize? I think
vc1cropmv.c
vc1deblock.c
vc1tools.c
has the priority.
I think you just got this list in reverse order.
First is GetVLC from vc1decbit.c
Then picture and ME/MC opts
Then deblocking
and then other functions. But YMMV.
Is GetVLC time consuming? I saw many small operations. GetVLC calls ReadBits, and ReadBits calls ReadBytes, ReadBytes’ inner loop occurs only when
while (BitsValid
You can use profiler to determine which operations are most time consuming.
GetVLC is called many times for each macroblock decoding so small gains of time will sum up.