Rust needs proper stand-alone assembler support

Back when I gave my arguments I why don’t consider Rust a mature language, one of those arguments was that is lacks proper assembler support and systems programming language requires it since some of the tasks you need to perform (including optimisation) require as low level access as you can get. Here I would like to argue why asm!{} may be enough for most cases it’s definitely not for mine.

Assembly languages are quite different in structure and design from Rust or any other high-level programming language. Since their intent is to be as close to the hardware as possible, you have to do everything manually – declare to which segment which global variable should belong, write function prologue and epilogues that deal with calling conventions and so on. Some of the stupider assemblers can’t even calculate the jump offsets for local labels so you need to do that by hand. And because it’s assembly language with usually just one simple statement per line, you end with enormous amounts of source code for anything non-trivial.

In order to mitigate many of those deficiencies a macro assembler has been invented, probably still in the era of punchcards. There you have various levels of macro substitution: simple macro substitution (where a macro term is expanded into its definition), parametrised macro substitution (where a macro term may have several arguments so FOO r1 and FOO r2 will be expanded into e.g. mov r1, #42 and mov r2, #42) and a full macro language where your macro definition may create new macro definitions during expansion process (for example C preprocessor can’t do that).

This macro part is what allows you to write code in assembly language efficiently and for obvious reasons it can’t be used with asm!{} (unless you implement a special macro substitution language for those blocks—but then why not make it into an external assembler?). Now I’ll try to show why I need that.

I have a multimedia project and most of the optimisations there can be expressed in a form of stand-alone functions performing tight loops over fixed-size regions of memory. And in most cases you can use the same macro definition with parameters to generate function working on 8×4 block and 8×8 block, doing block averaging with rounding or without—which means writing one macro and instantiating it four times. And since usually those are usually just a handful of operations, any of them counts, including those that the compiler may generate around either asm!{} block or intrinsics.

So why not simply call an external assembler? Sure, but which one? On ARM (both 32- and 64-bit architectures) you have nice GAS with powerful macro system and no issues I can think of. But x86 (also 32- and 64-bit) is an unholy mess.

For starters, there’s no single assembler syntax. Instead you have AT&T syntax used by GCC and Intel syntax used by most of the other assemblers and compilers (with their unique quirks of course, those remembering TASM and MASM won’t be surprised). Then there are two assemblers most commonly used for x86 code on various operating systems, namely NASM and YASM. And I still remember how there was a transition from using one to using another and then back because (at least) one of them was abandoned while the other one got new features, and then the first project was resurrected and the other one became abandoned. And it mattered because Intel still introduces new instructions for AMD64 architecture occasionally. Also it’s worth mentioning that the macroprocessor in both is not very good, I remember FFmpeg having to introduce workarounds to make it assemble files much faster (I think later the situation has improved though).

And there’s another very x86 thing. There’s a special header file for NASM/YASM called x86inc.asm (originally developed by x264 developers but now used in many other projects) that has macro definitions for dealing with two issues: various SIMD sets and function calls. SIMD requirement is simple to understand: you may want to generate code for MMX, SSE or AVX using the same template—and since various x86 CPUs supported different instructions sets you needed it, you still may need it for SSE/AVX versions of the same code on AMD64. Function calls is the main pain in the ass on x86 and AMD64.

Unlike many other platforms there is no single ABI for function calls. In the old times arguments were passed on stack, on AMD64 usually the first three arguments are passed in registers and the rest goes on stack (and floating-point values may be passed in SIMD arguments). In order to deal with this, x86inc.asm has a platform-specific macro that creates a function prologue for you depending on the number of arguments and general-purpose and SIMD registers you use in the function. It also creates aliases for arguments and registers so you can use them without caring too much about actual registers you use. This makes me think about C first and then that this should be the standard option for any assembler that deals with such platforms.

Thus, in my opinion, it would be better to have an external assembler specially for Rust (and GCC-RS is coming, so you should not rely on rustc and LLVM behaviour). And this external assembler should support at least easy interfacing with the platform-specific ABI without a need to write external code to account for various ABIs. Plus it should be easy to integrate into Cargo so it can be used as the default assembler and not merely yet another external tool. Probably this will never happen (there’s not so much interest in it and I know only one guy actively evading working on it), but a dream is a dream.

2 Responses to “Rust needs proper stand-alone assembler support”

  1. e71 says:

    Oh boy, Media Mike’s going to get mad your going to crash the server again. That’s if the article gets widespread of course.

    But isn’t there

  2. Kostya says:

    It probably won’t. This is a very specific topic after all.