In my efforts to have an independent player (that relies on third-party libraries merely for doing input and output while the demuxing and decoding is done purely by NihAV
) I had to explore the way of writing a multi-threaded H.264 decoder. And while it’s not working perfectly it’s a good proof of a concept. Here I’ll describe how I hacked my existing decoder to support multi-threading.
Read the rest of this entry »
NihAV experiments: multi-threaded decoder
June 1st, 2023A quick glance at the original Cinepak encoder
May 26th, 2023Since I don’t have anything to do with NihAV
at the time (beside two major tasks that always make me think about doing anything else but them) I decided to look at what tricks did the original Cinepak encoder have.
Apparently it has essentially three settings: interval between key frames (with maximum and minimum values), temporal/spatial quality (for deciding which kinds of coding should be used) and neighbour radius (probably for merging close enough values before actual codebook is calculated).
Skip blocks are decided by sum of squared differences being smaller than the threshold (calculated from the time quality); V1/V4 coding is decided by calculating sum of 2×2 sub-block variances and comparing it against the threshold (calculated from spatial quality).
Codebook creation is done by grouping all blocks into five bins (by logarithm of the variance) and trying to calculate a smaller codebook for each bin independently (so together they’ll make up the full 256-entry codebook).
Overall even if I’m not going to copy that approach it was still interesting to look at.
On the origins of ruscism
May 17th, 2023A couple of weeks ago Ukrainian parliament has finally recognized this term on the official level and listed several telltale signs of it. But in my opinion they can be boiled down to two main actions: disregarding the laws, agreements and traditions (if some suckers believe in those—then it’s just easier to swindle them) and constantly lying, often in an unconvincing way and usually by attributing own deficiencies to somebody else. They’ve been behaving like that throughout their history (which is partly stolen and partly fictitious), the wars just make it more visible. So, why russians behave like that?
Fascism and Nazism grow to power using the support of the second-worst kind of people: people who feel offended or wronged and do not think for themselves. That sort of folks would never blame themselves for their own faults and will gladly follow a leader who has simple answers to questions like who’s guilty and what to do (those answers are usually “that certain group of people” and “unite around me and do what I tell”). In case of ruscism, I believe it’s not merely an ideology that unites the nation but rather the idea that defines this entity (you’ll see why I don’t consider them a nation soon).
One researcher described russians as a dynamic community where everybody can belong to it or fall from it depending on circumstances (or rather benefits it gives: if I need something from you then you’re my brother, if you need something from me then I don’t know you). From this a rather obvious conclusion follows: russians have failed to develop as a nation—even small tribes usually have clear definition of who belongs to them and who are outsiders—and it must be something immaterial uniting them (i.e. an idea). Nations have not merely clearly defined rules of belonging but also clearly defined territory (no matter if it’s the historical settlement are or pieces of land wrestled from somebody else)—russians claim that russia has no borders and that any territory where a russian has been is a part of russia (IIRC just last year some russian dropped a piece of dirt on Dubai beach and claimed that now it’s all russian soil; I’ve encountered many more examples where common russians believed that some place is russian because they’ve been there).
If you look at the real russian history, it starts with the principality of Suzdal, created on the territories inhabited mostly by Finnic and Ugrian people, conquered by the Golden Horde and after its fall proclaiming itself a legitimate successor and capturing other lands (usually not inhabited by Slavic people either) and yet they tried to turn this multi-ethnic mix into “russians”, partially succeeding at that. Last year the russian führer made a speech that he belongs to all nationalities living in russia—what has not been said is that all those nations are russian only as long as they’re going to war, if they try to move to moscow they’ll be greeted with the traditional “go back to your shithole you non-russian hick” (but if they die at war they’ll be called as “true russian heroes” anyway).
It is hard to define the idea that unites them though. It is not a religion since the original pagan beliefs were replaced by the state-controlled Christian church (unlike many countries where the Church was an independent powerful player, in russia it was created by the state—two or three times even—always to serve the state interests). It is not the idea of exclusivity: such ideas are usually created to support the nation while in russia it’s mostly used to sacrifice russians for that very idea. There’s a difference “you’re the best so everything belongs to you, you just need to go and take it” and “you’re the best so keep living in shit until you’re sent to die for defending that belief somewhere abroad”. Sure, a deep spirituality of russian people is usually mentioned in connection to that but no concrete examples are ever given.
You know, there exists such thing as russian nationalists whose ideas can be boiled down to “russians are being offended; and usually it’s Britain that offends them by acting as a puppeteer of russian government since long ago”. Even funnier that until very recently they were prosecuted by the government—I suppose not for the incompatibility of views but rather because they formed those views independently instead of following the official guidelines.
I propose a different explanation: because of the vague dynamic community russians lost incentive to work themselves (a lot like with socialistic system: why bother if everybody around belong to the same community and you can benefit from them working while not benefiting from working hard yourself? See kulak for an example of russian peasants who worked slightly better than the rest and what happened to them; russian national symbol should’ve been a crab bucket instead), in the same time they believed they can take anything because they all belong to the same community. And the refusal offends them. The same story with them believing that whatever they sell or give as a gift still belongs to them (so they can always take it back or tell what you can do or not with it). That may also be the reason behind russians ignoring all kinds of agreements—they’ve been trained only to recognize “might makes right” rule. Yet it does not prevent them from trying to take what belongs to somebody else again and again (like Ukraine). Why don’t they stop attempts? Because they essentially live off selling natural resources (back in the day it was wax, fat and furs, nowadays it’s oil, gas and metals) and they need somebody to actually mine those resources (usually foreigners) and when the old sources get depleted of course they want to capture a new source of income.
Now consider what happens when such creature feels that everything should belong to it and denied those things, feels that others are more developed in many aspects (not just, say, advanced electronics, but having a functioning society too), feels that others have no respect for them (the archetypical question of a drunk russian is “do you respect me?” hints on it)? You’ll get a gamut of emotions, from the desire to present themselves as much better than in reality to drag others down by attributing them all your own bad features. That is how we get claims that Europe will freeze without russian gas (even in summer—they really claimed that), the claims about famous russian culture (it was created by a small strata of elites, often not of russian origin; for the most of russian population their own culture remained alien and forced from above; russians love to present exceptional cases as the general rule), the claims about Western level of quality of life (in moscow—do not look at the rural area that lacks gas, sewer system and roads) and evil godless Westerners want to occupy and destroy them (they’ve looked in the mirror while creating this lie).
And that’s how we get ruscism: psychological complexes of something not deserving to be called a nation, which realizes and resents that. Throw in their sociopathic disregard for honouring agreements (nothing demonstrates it better than the Budapest Memorandum but they’ve been inventing pretexts or outright violated international treaties for centuries) and the lack of thinking (critical or otherwise—there are countless examples that the discussions with common russians fail because those accept ideas selectively and refuse to see connections between different facts) and you get the perfect mix for disaster.
The sad thing is that all russians are infected by it in one form or another. Some may demand nuclear holocaust for all countries that do not ally with them, others merely cheer at the news of russian war criminals killing civilians. Some want russia to conquer the whole world (or at least restore its borders to the times of USSR or russian empire), others simply want russia to end war and not get punished for all its war crimes. Some want to destroy USA, others believe that USA will collapse soon anyway (and they all secretly want to move there regardless). Some hate all other nations, others don’t (but still despise Jews, people from Asia and Caucasus).
I think now it’s more or less clear what the idea unites russians and creates ruscism: russians are those who cast away thinking for a feeling of inferiority. Now, what to do with all that? The realistic way is demonstrated by the Ukrainian Army: over two hundred thousand russians will no longer force their opinions onto others. In theory occupation and re-education might work—it worked for Japan which behaved rather similarly in 20th century—but considering the sheer area of russia and the lack of interest I doubt that even China will attempt it. Meanwhile the best you can do is not to listen to russians at all and check the information you get. Keep thinking, that’s what distinguishes a normal human from russian.
rv4enc: magic numbers
May 16th, 2023While there’s nothing much to write about the encoder itself (it should be released and forgotten soon), it’s worth recording down how some magic numbers in the code (those not coming from the specification) were obtained. Spoiler: it’s mostly statistics and gut feeling.
Read the rest of this entry »
rv4enc: probably done
May 13th, 2023In one of the previous posts I said that this encoder will likely keep me occupied for a long time. Considering how bad was that estimation I must be a programmer.
Anyway, there were four main issues to be resolved: compatibility with the reference player, B-frame selection and performing motion estimation for interpolated macroblocks in them, and rate control.
I gave up on the compatibility. The reference player is unwieldy and I’d rather not run it at all let alone debug it. Nowadays the majority of players use my decoder anyway and the produced videos seem to play fine with it.
The question of motion vector search for interpolated macroblocks was discusses in the previous post. The solution is there but it slows down encoding by several times. As a side note, by omitting intra 4×4 mode in B-frames I’ve got a significant speed-up (ten to thirty percent depending on quantiser) so I decided to keep it this way by default.
The last two issues were resolved with the same trick: estimating frame complexity. This is done in a relatively simple way: calculate SATD (sum of absolute values of Hadamard-transformed block) of the differences between current and some previous frame with motion compensation applied. For speed reasons you can downsample those frames and use a simpler motion search (like with pixel-precision only). And then you can use calculated value to estimate some frame properties.
For example, if the difference between frames 0 and 1 is about the same as the difference between frames 1 and 2 then frame 1 should probably be coded as B-frame. I’ve implemented it as a simple dynamic frame selector that allows one B-frame between reference frames (it can be extended to allow several B-frames but I didn’t bother) and it improved coding compared to the fixed frame order.
Additionally there seems to be a correlation between frame complexity and output frame size (also depending on the quantiser of course). So I reworked rate control system to rely on those factors to select the quantiser for I- and P-frames (adjusting them if the predicted and the actual sizes differ too much). B-frames simply use P-frame quantiser plus constant offset. The system seems to work rather well except that it tends to assign too high quantisers for some frames, resulting in rather crisp I-frame followed by more and more blurry frames.
I suppose I’ll play with it for a week or two, hopefully improving it a bit, and then I shall commit it and move to something else.
P.S. the main goal of NihAV
is to provide me with a playground for learning and testing new ideas. If it becomes useful beside that, that’s a bonus (for example, I’m mostly using nihav-sndplay
to play audio nowadays). So RealVideo 4 encoder has served its purpose by allowing me to play more with various concepts related to B-frames and rate control (plus there were some other tricks). Even if its output makes RealPlayer hang, even if it’s slow—that does not matter much as I’m not going to use it myself and nobody else is going to use it either (VP6 encoder had some initial burst of interest from some people but none afterwards, and nobody cares about RV4 from the start).
Now the challenge is to find myself an interesting task, because most of the tasks I can think about involve improving some encoder or decoder or—shudder—writing a MOV/MP4 muxer. Oh well, I hope I’ll come with something regardless.
rv4enc: B-frame experiments
May 6th, 2023As I mentioned in the previous post, one of the problems is to find a good motion vector for B-frame interpolated macroblock. Since I had nothing better to do I’ve decided to try motion vector search in the same style as the usual motion estimation: start from the candidate motion vector pair and try adjusting both vectors using diamond pattern (since it’s the simplest one).
The results are not exciting: while it slightly improves PSNR and reduces file size (on lower quantisers), encoding time explodes. My 17-second 320×240 test clip encoded with quant=16
and two B-frames between I/P-frames takes 40 seconds without that option and 136 seconds with it. And while average PSNR improves from 38.0446 to 38.0521, the size decreases from 1511843 bytes to 1507224.
That’s the law of diminishing returns in action. Of course it can be made significantly faster by e.g. using pre-interpolated set of reference frames but why bother in this case? I’ve put this under an option (i.e. be satisfied with the initial guess or try to search for a better pair of motion vectors) but I doubt anybody will ever use it (the same applies to the whole encoder as well).
rv4enc: somewhat working
May 3rd, 2023I’ve finally managed to implement more or less working RealVideo 4 encoder with all the main features (yeah, I’m also surprised that I’ve got to this stage this fast). As usual, it’s small details that will take a lot of time to make them decent let alone good.
So, what can my encoder actually do now? It can encode video with I/P/B-frames using provided order, it can encode all possible macroblock types and has some kind of rate control.
What it does not have then? First of all, I don’t know yet how it would fare with the original RealPlayer (I also need to modify RMVB muxer to output improper B-frame timestamps and maybe write the additional streaming-related information). Then there’s a question of having a proper rate control. And finally there are a lot of questions related to B-frames.
Currently my rate control is implemented as a system that keeps statistics on how large is on average an encoded frame for a given frame type and quantiser and tries to find the best fitting quantiser. If there’s still no statistics (or not enough of it) I resort to a simpler quantiser guessing, adjusting quantiser depending on how different are the projected and actual frame sizes. Of course it can be tuned to behave better (the question is how though). And I’m not going to touch the two-pass encoding (theoretically it’s rather simple—you log various encoder information in the first pass and use it to select quantisers better in the second part; in practice it means messing with text data and doing additional guesstimates, so pass).
With B-frames there are two main issues to deal with: which frames to select and how to perform motion estimation. I read the first can be achieved by performing motion compensation against neighbouring frames and calculating SATD (often done on scaled-down frames to be faster). The second question is how to search for a bidirectional block vectors. Currently I have a very simple approach: I search for a forward and backward motion vectors independently and check which combination of them works the best. I suspect there may be an approach specifically for weighted bi-directional search but I could not find anything (and I’m not desperate enough to dive into the codebase of MPEG-4 ASP/AVC encoders).
And finally there’s the whole question of quality. I suspect that my encoder is far from being good because it should not merely transform-quantise-code blocks but also perform some masking (i.e. set some higher-frequency coefficients to zero instead of hoping that they’ll be quantised to zero).
So this will be long and boring work…
A quick look at a watery game video format
April 24th, 2023As you can guess, my work on Real Video 4 encoder goes so well that I’d rather look at virtually anything else.
Probably some of you are aware of Tex Murphy game series (maybe not even just Mike) and I got reminded of an existence of The Pandora Directive game that supposedly has a lot of videos in it, so I downloaded a demo and decided to explore it a bit.
Game data is packed into extremely simple archive format (just the number of offsets and the offsets) so you need to know what archives to look into (hint: the larger ones).
So what about videos? They start with H2O
magic (hence me calling it a watery format) and contain video frames and audio data (in an embedded WAV file). The video compression is both nothing special and somewhat original: it’s Huffman-compressed RLE. Video frame can contain several chunks, first there’s a Huffman tree description, then palette data and finally there’s compressed RLE data. The originality lies in the details: Huffman coding is often used to code single bytes while here single symbol represents both run length and value (also IIUC zero value signals skips), additionally the data is stored as the list of symbols for each codeword length. I don’t think I’ve seen anywhere neither RLE+Huffman used in such manner nor such tree description format.
This was a fun distraction even if I don’t care about the game series (I tried playing their other games like Countdown and Amazon: Guardians of Eden but the interface was too clunky and the action sequences were too annoying). It’s always nice to see some originality in video coding even if it is nothing advanced.
rv4enc: limping forward
April 18th, 2023So far there’s not much to write about: I’ve dealt with the second most annoying thing (writing coefficients) and now my encoder can produce a valid stream of I- and P-frames. Probably it’s a good place to define a road map though.
First of all, there’s still the most annoying thing pending—in-loop filtering. And I can’t reuse that much code from the decoder since it was written back in the day when my knowledge of Rust was even less than now, so I have to partly rewrite some of the parts to make them fit my current approaches to the interfaces (that means a lot of DSP functions among other things). At least it can be disabled for now but I’ll have to return to it sooner or later.
Then there’s 4×4 intra prediction mode still waiting to be implemented. Again, I know how to pick the good enough prediction modes, it’s context-dependent coding of those modes that is going to be annoying.
Another thing missing is estimating the number of bits required to encode a single block. There are about five codebooks involved and those are selected depending on macroblock type, quantiser, coefficient block type (luma, chroma or luma DCs) and the global set index. I guess I’ll resort to gathering statistics for all possible coding modes and seeing if I can make some heuristic estimation out of it. And there’s still a task of selecting the best coding set for a slice…
After all that it will be mostly B-frame related things left to implement. I also need to make sure the muxer will write them out properly (for historical reasons in that case it mangles the timestamp to be last frame timestamp plus one). That’s not counting all possible enhancements like deciding the frame type for coding. Most likely I’ll be annoyed by it and keep the fixed coding order instead.
There are too many things to do and considering my pace it may keep me busy for a good part of the year—I mean a large part of a calender year, the current date is still February 24.
Starting yet another failure of an encoder
April 6th, 2023As anybody could’ve guessed from Cook encoder, I’d not want to stop on that and do some video encoder to accompany it. So here I declare that I’m starting working on RealVideo 4 encoder (again, making it public should prevent me from chickening out).
I can salvage up some parts from my VP7 encoder but there are several things that make it different enough from it (beside the bitstream coding): slices and B-frames. Historically RealMedia packets are limited to 64kB and they should not contain partial slices (grouping several slices or even frames in the packet is fine though), so the frame should be split during coding. And while Duck codecs re-invent B-frames to make them still be coded in sequence, RealVideo 4 has honest B-frames that should be reordered before encoding.
So while the core is pretty straightforward (try different coding modes for each macroblock, pick the best one, write bitstream), it gives me enough opportunity to try different aspects of H.264 encoding that I had no reason to care about previously. Maybe I’ll try to see if automatic frame type selection makes sense, maybe I’ll experiment with more advanced motion search algorithms, maybe I’ll try better heuristics for e.g. quantiser selection.
There should be a lot to keep me occupied (but I expect to spend even more time on evading that task for the lack of inspiration or a sheer amount of work to do demotivating me).