Archive for the ‘Useless Rants’ Category

ZMBV support in NihAV and deflate format fun

Saturday, May 22nd, 2021

As I said in the previous post, I wanted to add ZMBV support to NihAV, mostly because it is rather simple codec (which means I can write a decoder and an encoder for it without spending too much time), it’s lossless and supports various bit-depths too (which means I can encode various content into it preserving the original format).

I still had to improve my deflate support (both decompressing and compressing) a bit to support the way the data is stored there. At least now I mostly understand what various flags are for.

First of all, by itself deflate format specifies just a bitstream split into blocks of data that may contain any amount of coded data. And these blocks start at the next bit after the previous block has ended, no byte aligning except by chance or after a copy block (which aligns bitstream before storing length and block contents).

Then, there is raw format used in various formats (like Zip or gzip) and there’s zlib format used for most cases data is stored as part of some other format (that means you have two initial bytes like 0x78 0x5E and 2×2 bytes of checksum in the end).

So, ZMBV uses unterminated stream format: first frame contains zlib header plus one or several blocks of data padded with an empty copy block to the byte limit, next frame contains continuation of that stream (also one or more blocks padded to the byte boundary) and so on. This is obviously done so you can decode frames one after another and still exploit the redundancy from the previously coded frame data if you’re lucky.

Normally you would start decoding data and keep decoding it until the final block (there’s a flag in block header for that) has been decoded—or error out earlier for insufficient data. In this case though we need to decode data block, check if we are at the end of input data and then return the decoded data. Similarly during data compression we need to encode all current data and pad output stream to the byte boundary if needed.

This is not hard or particularly tricky but it demonstrates that deflated data can be stored in different ways. At least now I really understand what that Z_SYNC_FLUSH flag is for.

Missing optimisation opportunity in Rust

Wednesday, May 12th, 2021

While I’m struggling to write a video player that would satisfy my demands I decided to see if it’s possible to make my H.264 decoder a bit faster. It turned out it can be done with ease and that also raises the question concerning the title of this post.

What I did cannot be truly called optimisations but rather “optimisations” yet they gave a noticeable speed-up. The main optimisation candidates were motion compensation functions. First I shaved a tiny fraction of second by not zeroing temporary arrays as their contents will be overwritten before the first read.

And then I replaced the idiomatic Rust code for working with block like

    for (dline, (sline0, sline1)) in dst.chunks_mut(dstride).zip(tmp.chunks(TMP_BUF_STRIDE).zip(tmp2.chunks(TMP_BUF_STRIDE))).take(h) {
        for (pix, (&a, &b)) in dline.iter_mut().zip(sline0.iter().zip(sline1.iter())).take(w) {
            *pix = ((u16::from(a) + u16::from(b) + 1) >> 1) as u8;
        }
    }

with raw pointers:

    unsafe {
        let mut src1 = tmp.as_ptr();
        let mut src2 = tmp2.as_ptr();
        let mut dst = dst.as_mut_ptr();
        for _ in 0..h {
            for x in 0..w {
                let a = *src1.add(x);
                let b = *src2.add(x);
                *dst.add(x) = ((u16::from(a) + u16::from(b) + 1) >> 1) as u8;
            }
            dst = dst.add(dstride);
            src1 = src1.add(TMP_BUF_STRIDE);
            src2 = src2.add(TMP_BUF_STRIDE);
        }
    }

What do you know, the total decoding time for the test clip I used shrank from 6.6 seconds to 4.9 seconds. That’s just three quarters of the original time!

And here is the problem. In theory if Rust compiler knew that the input satisfies certain parameters i.e. that there’s always enough data to perform full block operation in this case, it would be able to optimise code as good as the one I wrote using pointers or even better. But unfortunately there is no way to tell the compiler that input slices are large enough to perform the operation required amount of times. Even if I added mathematically correct check in the beginning it would not eliminate most of the checks.

Let’s see what happens with the iterator loop step by step:

  1. first all sources are checked to be non-empty;
  2. then in outer loop remaining length of each source is checked to see if the loop should end;
  3. then it is checked if the outer loop has run not more than requested number of times (i.e. just for the block height);
  4. then it checks line lengths (in theory those may be shorter than block width) and requested width to find out the actual length of the inner loop;
  5. and finally inside the loop it performs the averaging.

And here’s what happens with the pointer loop:

  1. outer loop is run the requested amount of times;
  2. inner loop is run the requested amount of times;
  3. operation inside the inner loop is performed.

Of course those checks are required to make sure you work only with the accessible data but it would be nice if I could either mark loops as “I promise it will run exactly this number of times” (maybe via .take_exact() as Luca suggested but I still don’t think it will work perfectly for 2D case) or at least put code using slices instead of iterators into unsafe {} block and tell compiler that I do not want boundary checks performed inside.

Update: in this particular case the input buffer size should be stride * (height - 1) + width i.e. it is always enough to perform operation in the way described above but if you use .chunks_exact() the last line might be not handled which is wrong.

The former is rather hard to implement for the common case so I don’t think it will happen anywhere outside Fortran compilers, the latter would cause conflicts with different Deref trait implementation for slices so it’s not likely to happen either. So doing it with pointers may be clunky but it’s the only way.

Le spam

Saturday, May 1st, 2021

Sometimes I look inside Baidu Mail spam folder to see if there’s anything useful got there by mistake (notifications from various shops with purchase confirmations end there quite often, to give one example). And there’s a weird tendency I’ve spotted recently.

In the last five days I’ve received 47 spam mails. 37 of them were in French. I’m used to receiving spam in various European languages (including but not limited to Bulgarian, German, Italian, Russian and Spanish) but before last year it was mostly in English. Additionally a good deal of them now is about some promotional actions from supermarket chains like Aldi, Carrefour or Lidl (and I’ve never considered either of them to be some luxury store).

What’s wrong with this world?

Things I want in Ghidra

Wednesday, April 14th, 2021

Ghidra, as you should know already, is a disassembler and decompiler for unprofessional folks who can’t cough up a couple thousands dollars and pass background check to buy a disassembler and decompiler the experts use. And here’s a list of things that would make Ghidra better or much better for me. I know it’s opensource but I’d rather not touch large codebases written in Java (or any codebases written in Java really). Disclaimer: by the time of writing this I’m still using Ghidra 9.2 even if 9.2.2 has been available for months, but I doubt any of the things mentioned here are implemented already.

First of all, there’s a bug with x86 disassembly: while rep movsX is recognized and works fine, some of the code I’ve seen uses repne movsX which is treated as simple movsX even by disassembler. Initially I was confused by this bug and only disassembling instruction bytes with ndisasm proved that it’s not a compiler (or assembly author) missed a prefix but rather Ghidra ignoring it entirely. Some other rarely seen instructions that involve FPU operation and an addressing with segment registers (e.g. ES, FS or GS; if you don’t have an idea what those are for—be thankful for that) and explicit offsets are disassembled incorrectly, consuming a byte more than required (and thus making the following instruction to be disassembled incorrectly as well).

And speaking about assembly, current program text search is nice since you can search inside a specific part of an instruction (e.g. MOVSD.REP in instruction name or 0x800 in its operand) but it would be even better to have a more generic search by a regular expression. Quite often I want to locate a specific instruction doing e.g. shift left by five. The problem is that there are many shift instructions and even more instructions with an operand having 5 somewhere in it. And I don’t know the exact operand register so searching for shl eax, 5 first, then shl ebx, 5 and all the way to shl dh, 5 is tedious. The same can be said about dumping listing and searching there. It will work as intended though.

Beside the issues above and idiosyncratic x86 assembly syntax (it does not bother me much though) I have nothing else to complain in disassembler, so let’s move to the decompiler issues.

I suspect that decompiler output is not stored permanently, but it would be nice to mark some function as being “decompiled, it’s fine, do not touch it, I mean it”. Looks like the process of function being decompiled again and again even if you change something not related to it in any way is annoying not just to me. So it would be nice to mark some large decompiled function as permanently decompiled so it’s not re-decompiled on a subsequent visit to it.

And speaking of functions, it would be nice if functors (aka function pointers) would be supported instead of just detecting that this variable is a pointer to function. When arguments are passed by stack, decompiler usually can detect that. But when arguments are passed in registers you have to trace the registers and their values by hand. Of course you’d need a monstrous syntax to specify a type for e.g. a function that accepts three arguments in designated registers but I can still wish for it, can’t I?

Another thing I often wish for is being able to tell decompiler that after a certain point the variable is no longer valid and it should treat subsequent uses of that register or stack as a new variable. A very common example is when a first argument (usually some context pointer) is moved to a register (or it is passed in a register already), some fields are read from the context, context value is stored to some local variable and the initial register is used for example as a loop variable that gets decompiled to something monstrous like for (ctx = (Context*)0; ctx < (Context*)42; ctx = (Context*)((int)ctx + 1)) { ... } and it also screws types for variables involved in the same expression as this loop counter.

Of course not all of these things can be easily implemented, and maybe some of them would require architectural changes. But I prefer to cherish my ignorance on Ghidra internal details and just point out what I’d find good to have in principle.

I feel old

Sunday, April 4th, 2021

Probably it’s exactly when you start complaining about things changing you realize you’ve become old. And as you can guess this is my turn to complain about things changing.

The last tipping point for me was when I tried to update rustc from version 1.33 I’d been using. I wanted to do it mostly for three things: atomic types beside bool and usize (for the use in NihAV video player that should be written eventually), replacing now deprecated mem::uninitialized(), and being able to switch to a newer version of SDL2 bindings for the future video player (because old version depends on about 25 crates more than a newer ones).

So I tried latest-and-greatest rustc 1.51 and ran cargo clippy to see what has changed meanwhile. And beside seeing that now there’s a special macro matches!() as a convenient shorthand for checking object type to be some variant(s) I could not see anything useful because the linter spammed everything with “selected name is bad, you should change NASomething into NaSomething“. Which looks ugly and dropping the NA prefix entirely may make names too generic and collide with the standard ones. And before you tell me: I’m aware of the lint suppressor flag and that there’s a fix for it that makes it not to complain on public names. Still, I find this brain-damaged and thus I’ve settled on rustc 1.46 for now.

Then there’s an even funnier thing. I’ve discovered that crates.io (the site for navigating Rust packages) has stopped working in Firefox 52 (docs.rs still works fine—for now). Probably because of some JavaScript new feature (which is invoked inside rather large bundle) the scripts cannot be parsed and it simply gives you a cryptic message “Sorry, it looks like we were not able to load the page.” without explaining much. The problem is most likely at your side, you find it out.

My friend Luca actually wasted some time to create an issue for that and got an obvious reply that the browser is outdated and this is expected. For the record GitHub at least prints a message that the browser version is too old and not supported (yet the main functionality is still working), the same is true for WordPress instance where I’m typing this post.

So why don’t I update this four-year old browser? Because it’s painful. The browser is tied to the Linux distribution so either I should compile it myself (I tried that once with earlier Firefox, ran out of disk space and almost of swap space too; so I’m not eager to compile any browser more complex than elinks). The other alternative is updating the distribution and that is even more painful because of the drastic changes in software.

I’m not talking about desktop environments per se even if they expose the problem. I don’t care much about how the windows look like or where launch menu is located. But I do care about the programs I used to being no longer the part of the distributive or changing their interface in radical way. Simple example: for various reasons I like to have several input layouts and methods (for English, Ukrainian, Russian and Japanese language). On Ubuntu 12.04 I have keyboard layout switching between English and Ukrainian layouts (the latter can also produce Russian or Belarussian letters with AltGr) plus mozc to input Japanese (previously it was Anthy and who knows what will be the default Japanese IME in 2024). On Ubuntu 18.04 (which I use on a different machine) you can’t set it up the same way, Ukrainian layout does not have AltGr support and mozc by default outputs Latin characters with no setting or key combination to make it output kana by default (of course I can simply press kana mode change key on my Japanese keyboard—but that means connecting an external keyboard just for that). And even if my fingers are the wurst I have sausage fingers hardly hitting a correct key on the first time, I still want to perform keyboard actions using keyboard.

The sad thing is that I somewhat understand why this happens. Web sites get bloated and do not work with older browsers because of the browser war (no plural, you know The Browser) and shiny features developers are eager to try (and testing for older versions takes time and efforts better spent elsewhere—if people remember about it at all). Some programs need to be updated because of security issues (e.g. I upgraded from Ubuntu 10.04 mainly because of TLS troubles leading to programs not being able to open half of the sites over HTTPS). Some programs get replaced because the maintainer left and there’s nobody to step in. Some interfaces need to be adapted to the new reality (while Debian on Nokia smartphones is not so popular now, Ubuntu on tablets seems to be the next popular thing). And there’s GNOME and freedesktop.org which seem to be the main sources of disruptions. I can’t explain the logic by which they change things but it’s because of their view the mountpoint for network shares in the next release will probably be different from two previous places (and when you’re still using command line to access files on those shares like I do this feels annoying).

The sad thing is that all other non-toy OSes have the same problems. Windows users might still remember Windows Vista GUI affectionately and like how Windows 10 changes GUI elements time from time. *BSD seems to be more stable but they still support GNOME. macOS users might be still ashamed for their muscle memory failing them when OS update decided to reverse scroll direction. This is also why Linux kernel is still precious: it still cares about its interfaces being backward compatible so the userspace can rely on them while you can’t say the same about glibc 2.16.

And here we have it. As I said in the beginning this whining about programs and environments changing constantly mostly tells that I’ve become old. On the other hand if you think this “move fast and break things” motto is a good idea—remember who coined it and think again.

Why I don’t consider Monkey Island games to be the best adventure games

Monday, March 15th, 2021

Disclaimer: I try to write posts mostly about multimedia stuff but some things are nagging me for a long time. So I write this to get rid of that nagging and move along with my lack of life.

First of all, I need to say that even if grew up in a land that mostly favoured Sierra games, I don’t think low of Monkey Island series (the first three games for sure). The setting is interesting and fun, the graphics was good (for the first three games at least), the music is memorable (even if iMUSE system was under-appreciated), the characters became part of the culture. I’ve played them countless times (MI1 both floppy and CD versions too) and would probably play again soon. I agree that those are very good games but there’s one thing about them that prevents me from agreeing with the majority that MI 1 and MI 2 are one of the best adventure games of all time.
(more…)

Videos in Russian quests – FLIC and Indeo

Friday, February 19th, 2021

It should be obvious by the fact that in ex-USSR countries there are usually “quest games” or simply “quests” instead of adventure games that Sierra games were favoured way more than anything else (while Germany seems to be more Lucas Arts land). But unlike either of these companies, Russian quest games history is much more limited both in time and quantity.

There are just a couple of games that can be called Russian quests. They appeared in 1997-1998 and remembered dearly up to this day. Almost anything coming has been forgotten and for good reasons too.

First I’d like to give a short review of the games:
(more…)

Hopefully final stupid question

Saturday, February 13th, 2021

Since my previous question got a satisfying answer (even two) maybe I’ll get some answers for this question. It’s not been bugging me but it’s been with me for a long time.

First, the background. What we have:

  • there’s a significant demand for video editing software for all those content creators on various video hosting platforms;
  • there are some commercial products that got mentioned rather often (usually with complaints about their stability);
  • there are some freeware or shareware products (less mentioned but you can see them in paid advertisements) that are built on opensource technologies (many of those have libavcodec in some form, one even turned MEncoder into .dll form; I checked the programs specifically to see if they have some interesting decoders bundled with, no luck so far);
  • there are some open-source video editors as well like Cinelerra, Kdenlive or LiVES that keep existing in obscurity (AviSynth or Blender are more popular but they’re a different class of programs);
  • most of the building blocks for input/output formats support or filters are already there (and if not I’m pretty sure Paul B. Mahol can add more on request);
  • there’s a certain traffic cone shaped foundation (maybe VLC secretly stands for cône de Lübeck video) that is known for making an opensource video player that majority of people have heard of and use—on various platforms too;
  • beside the player (which can also stream and do other things) they have supported other projects (like x264 or dav1d).

So the question (or rather a tree of related questions) is: why can’t those people organise and make a popular open-source video editor? Are they not interested in a product with less then ten million potential users? Are there any principal issues in creating a video editor that I’ve not seen (I remember VLME which did not even look like a serious attempt)? Or maybe there are some ideological reasons preventing its creation?

To me it looked like it’s possible to build an opensource video editor, it should use mostly the same blocks as video player, it should have some demand and it is possible to make a popular open-source product. So why it is not done, what I’m missing in the picture?

…and silence will be the answer.

Another stupid question

Wednesday, February 10th, 2021

Here’s another thing that has been bothering me for some time. As we all know RDBMSes more complex than SQLite have data management that is more complex than simply writing all data into one file sequentially plus you have database users belonging to different roles with different permissions on different databases. And you’re supposed to connect it via the usual networking mechanism even if you’re running it on the same machine. And advanced features include multi-machine interaction like sharding, replication and such.

So the question is: if this all looks like a standalone operating system then why enterprise-grade RDBMS are running as applications instead of their own OSes build on some kernel to manage CPU cores, storage and network interface? And in the age of containers we don’t really need to have an app version even for legacy version, just replace the driver (or merely its settings).

But maybe like the last time I wondered why GPUs do not have raytracing, I’ll learn that it’s a trend already and I’m just oblivious to it.

Ghidra and DPMI

Sunday, January 24th, 2021

When trying to look inside an old game you might have a problem if the executable is 32-bit one running with some DOS extender.

Most people do not know (and those who know probably want to forget it) that beside the usual 16-bit COMs and EXEs (that can have several segments and sometimes an overlay too—an external file for loading code or data from disk) you can have 32-bit DOS applications with a very special interface to access so called protected mode. Essentially you have two executables bundled together: actual 32-bit executable in its own format and the loader for it (or a stub to call an external loader usually called dos4gw.exe). Kinda like PE format but more flexible.

So what’s the problem with loading them? For starters Ghidra does not recognize the extended format and loads just the loader (an external LX loader plugin did not help either; but when I tried to load some ActiveX components for XVD decoding, their format was detected as LX). Another problem is that some extenders (like the one from CauseWay) also compress the payload. Luckily enough there’s DOS/32A extended that is shipped with a utility to extract the so-called linear executable which you can’t run but you can study it.

And this is not hard but somewhat tedious since (at least in my cases) it can be loaded only as raw image. From the LX header you can extract the position of actual segment data, segment sizes and their base addresses; then you can use that information to create custom memory segments for them. Now you have some (usually two) segments and one problem: the segments do not address the full virtual address but rather an offset e.g. if you have code segment starting at 0x10000 and data segment starting at 0x90000 you might have mov eax, [0x4bc] which really should be mov eax, [0x904bc]. That is why the first hundred kilobytes or more of LX is dedicated to so-called “fixups” that tell you how you should correct the addresses at the specific places in loaded data (or code). Luckily Ghidra allows patching the addresses in the instructions and re-assigning segment base addresses to have less places to patch (and calls/jumps almost always use relative offsets so it’s not an issue).

And how to find those wrong addresses for specific data item to patch? Memory search for low word part of the address works just fine.

Overall, maybe it is not the most convenient way to deal with this format but it works sufficiently good for me. Who knows, maybe in the future I’ll look at more games for not yet covered formats. At least now I know how to do that.