VC-1: 8×8 transform

OK, 8×8 transform is slightly optimized too.

This version gives 7x gain on PII and 15x on G4.

Here is vc1itrans.c file from VC-1 reference decoder with replaced transform.

See here
how to use VC-1 reference decoder in FFmpeg
and just replace one file in libvc1 with my version before compiling.

4 Responses to “VC-1: 8×8 transform”

  1. Netex says:

    Very nice optimization! It speeds up almost 50% in my P-M 2.0 (from 60sec to 40 sec). I’m interested on further optimizations. Can you give me some information about where to optimize? I think


    has the priority.

  2. Kostya says:

    I think you just got this list in reverse order.
    First is GetVLC from vc1decbit.c
    Then picture and ME/MC opts
    Then deblocking
    and then other functions. But YMMV.

  3. Netex says:

    Is GetVLC time consuming? I saw many small operations. GetVLC calls ReadBits, and ReadBits calls ReadBytes, ReadBytes’ inner loop occurs only when

    while (BitsValid

  4. Kostya says:

    You can use profiler to determine which operations are most time consuming.
    GetVLC is called many times for each macroblock decoding so small gains of time will sum up.