RISC-V: still not ready for multimedia?

A year ago I wrote why I’d like to be excited by RISC-V but can’t. I viewed it (and still view) as slightly freshened up MIPS (with the same problems as MIPS had). But the real question is what does it offer me as a developer of multimedia framework?

Not that much, as it turns out. RISC-V is often lauded as a free ISA any vendor can edit but what does it give me as an end user? It’s not that I can build a powerful enough chip even if hardware specifications are available (flashing an FPGA is an option but see “powerful enough” above) so I’m still dependent on what various vendors can offer me and from that point of view almost all CPU architectures are the same. The notable exceptions are russian Elbrus-2000 where instruction set documentation is under NDA (because its primary application being for russian military systems) and some Chinese chips they refuse to sell abroad (was it Loongson family?).

Anyway, as a multimedia framework developer I care about SIMD capabilities offered by CPU—modern codecs are too complex to write them in assembly and with a medium- or high-level language compiler you don’t care about CPU details much—except for SIMD-accelerated blocks that make sense to write using assembly (or intrinsics for POWER). And that’s where RISC-V sucks.

In theory RISC-V supports V extension (for variable-length SIMD processing), in practice hardly any CPUs support it. Essentially there is only one core on the market that support RISC-V V extension (or RVV for short)—C920 from T-Head and it’s v0.7.1 only (here’s a link to Rémi’s report on what’s changed between RVVv0.7.1 and RVVv1.0). Of course there’s a newer revision of that core that features RVVv1.0 support but currently there’s only one (rather underpowered) board using it and it’s not possible to buy anyway. Also I heard about SiFive designing a CPU with RVVv1.0 support but I don’t remember seeing products built on it.

And before you offer to use an emulator—emulation is skewed and proper simulation is too slow for practical purposes. Here’s a real-world example: when Macs migrated from PowerPC to x86, developers discovered that the vector shuffle instruction that was fast on PowerPC was much slower on Rosetta emulation (unlike the rest of code). Similarly there’s a story about NEON optimisations not giving any speed-up—when tested in QEMU—but made a significant performance boost on real hardware. That’s why I’d rather have a small development board (something like the original BeagleBoard) to test the optimisations if I can’t get a proper box to develop stuff on it directly.

This also rises a question not only about when CPUs with RVV support should be more accessible but why they are so rare. I can understand the problems with designing a modern performant CPU in general let alone with vector extension and on rather short term but since some have accomplished it already, why is it not more common? Particularly SiFive, if you have one chip with RVV what prevents adding it to other chips which are supposedly desktop- and server-oriented? I have only one answer and I’d like to be proven wrong (as usual): while the chip designers can implement RVV, they were unable to make it performant without hurting the rest of CPUs (either too large transistor budget or power requirements; or maybe its design interferes with the rest of the core too much) so we have it mostly on underwhelming Chinese cores and some SiFive CPU not oriented for a general user. Hopefully in the future the problems will be solved and we’ll see more mainline RISC-V CPUs with RVV. Hopefully.

So far though it reminds me of a story about Nv*dia and its first Tegra SoCs. From what I heard, the company managed to convince various vendors to use it in their infotainment systems and those who used it discovered that its hardware H.264 decoder worked only for files with certain resolutions and they somehow used a CPU without SIMD (IIRC the first Tegra lacked even FPU) so you could not even attempt to decode stuff there with a software decoder. As the result those vendors were disappointed and made a pass on the following SoCs (resulting in a rather funny Tegra-powered microwave oven). I fear that RISC-V might lose interest of the multimedia developers with both the need to rewrite code from RVVv0.7.1 to RVVv1.0 and the lack of appealing hardware supporting RVVv1.0 anyway—so when it’s ready nobody will be interested any longer. And don’t repeat again the same words about open and royalty-free ISA. We have free Theora format that sucked and was kept alive because “it’s free”—when it was improved to be about as good as MPEG-4 ASP there was a much better open and free VP8 codec available. Maybe somebody will design a less fragmented ISA targeting more specific use cases than “anything from simple microcontrollers to server CPUS” and RISC-V will join OpenRISC and others (…and nothing of the value will be lost).

P.S. Of course multimedia is far from the most important use case but it involves a good deal of technologies used in other places. And remember, SSE was marketed as something that speeds-up working with Internet (I like to end posts on a baffling note).

11 Responses to “RISC-V: still not ready for multimedia?”

  1. -.- says:

    Actually I think the free/open ISA part is likely the *problem* here. With x86/ARM, you tend to only hear about things once it’s mostly complete. But with RISC-V, you hear about it during development. And development of these things typically takes several years.

    In other words, I think you need to readjust your expectations. It’s still *very* early days regarding RVV, so you shouldn’t expect much of it at the moment. This isn’t helped by the RISC-V fanboys out there making it seem that RISC-V is far more mature than it actually is. Combine this with the fact that RV is most attractive in the embedded space, where SIMD is often less important (or constrained), hence most cores will focus on that use case.

    My qualms with RVV are with how awfully complex the ISA is – don’t be fooled by the “RISC” name. Yet shuffle functionality is basically a joke (“we give you a general shuffle instruction, you figure out the rest” seems to be their mantra).
    As far as optimisations go, maybe you focus on 128-bit implementations, yet you still need to think about what a 65536-bit processor would do.

  2. Hmm, I actually have a RISC-V board, but I hadn’t thought to check whether it offers SIMD. It’s called the MangoPi MQ Pro and has an AllWinner D1 chip. I had assumed this thing had an ARM SoC when I first booted it up. I think maybe I was trying to install some ARM binary when I finally figured out it was RISC-V.

    Looks like the D1 offers all the usual cadre of codec decoders on board (and just M/JPEG encoding). But no SIMD. The unit is very inexpensive, though.

  3. Kostya says:

    @-.-

    That’s what I complained about before: RISC-V is so fragmented that it tries to cover everything but nothing concrete (i.e. if you ask about something then of course RISC-V should support it but when you ask about precise details then RISC-V is not about it). At least ARM has clearly defined profiles so you know what to expect from Cortex-M cores compared to Cortex-A (they also had alphabet soup in core names like ARM1176JZF-S but they had enough common sense to drop it).

  4. Kostya says:

    @Mike
    Yes, that board uses SoC based on older version of that T-Head core and I guess it’s good enough to blink LEDs like any other board.

    The real question is what it offers to me compared to non-RISC-V boards. They all feature hardware decoders and usual I/O interfaces and compiler takes care about instruction set differences so one board is about as good as the next one.

  5. -.- says:

    Actually, I think RVV itself manages to avoid the fragmented nature of RISC-V instruction sets. Basically, you’ve got *one* extension (V) which supports a complete set of SIMD functionality (compared to other RV extensions) – that’s over 100 instructions, including support for int, FP ops etc.

    On the other hand, RVV is complex because it tries to support everything, but not due to fragmenting the instruction set into multiple extensions. The vector configuration can be challenging to deal with. Add on top features like LMUL which adds a lot of further complexity, for dubious benefit.

    (RV does have subsets of RVV for embedded processors, but you don’t have to worry about those when targeting RVV)

  6. Kostya says:

    I’m pretty sure that if they tried to fragment RVV like the base set Intel would sue them for violating their patented business practice.

    And for comparison ARM ARM lists about 350 NEON instructions. Of course many of those are duplicates for different modes but overall a hundred instructions for a vector extension set seems reasonable.

    Still, the main problem for me is that I can’t play with it properly and I’ll probably lose all faith in it by the time there are available options.

  7. Luca says:

    Until there is something to the level of the sifive boards with an RVV 1.0 it is safe to say that the thing is immature.

    the kendryte-k230 is good for tiny benchmarks but I doubt it could have enough ram to decode high res av1 with enough tools on…

  8. cancername says:

    You’re right that RISC-V is not ready for large multimedia applications yet, but I do take issue with parts of your post.

    “Maybe somebody will design a less fragmented ISA […]”

    I don’t see how RISC-V is particularly fragmented. In general, they have one extension per feature, not multiple competing ones.

    “Essentially there is only one core on the market that support RISC-V V extension (or RVV for short)—C920 from T-Head […]”

    The CV1800B also supports RVV 0.7.1, and I’m sure you can find several other cores :).

    As for 0.7.1 vs 1.0, rvv-rollback claims to backport lots of 1.0 code to 0.7.1. I haven’t personally verified this.

    RISC-V is in its infancy, but steadily becoming viable for more use cases, and CPUs like the 64-core SG2042 apparently already deliver 1/4 to 1/8 the performance of fast desktop x86 CPUs . RVV and SVE’s runtime vector lengths are, in my naive opinion, a more future-proof paradigm than rewriting code for each vector length increase, which we already experienced from 64 to 128 to 256 to 512 bits.

  9. cancername says:

    ^ Apologies for the formatting.

  10. Kostya says:

    > I don’t see how RISC-V is particularly fragmented.

    Have you considered the sheer amount of those extensions? E.g. M extension is for integer multiplication and division except for those added in Zmmul extension… So with every new CPU it’s like a game to find out which of the instructions will be supported there.

    > The CV1800B also supports RVV 0.7.1…

    And it’s C906-based. That’s my point – many vendors just package T-Head cores.

    As for your (rather optimistic) claims, I agree that having variable-length vector processing is the future (even if it complicates things for high-level programming languages). As for the performance, this new 64-core CPU is comparable to my over decade old four-core laptop CPU which is rather disappointing. I understand that CPU architecture and its implementation are different things so we’re yet to see if RISC-V implementations will get significantly better. But if they keep preferring brawn for brains, it may end with some other competitor winning.

  11. […] So I now have my first piece of RISC-V hobbyist kit, although I learned recently from Kostya that it’s not that great for multimedia. […]