So after weeks of doing nothing and looking at lossless audio codecs (in no particular order) I’m going back to developing NihAV
and more particularly an audio player.
(more…)
Archive for the ‘Useless Rants’ Category
NihAV: towards an audio player
Sunday, October 4th, 2020Lossless audio codecs were more advanced than I thought
Wednesday, September 23rd, 2020As I’d mentioned in a previous post on lossless audio codecs, I wanted to look at some of them that are still not reverse engineered for documentation sake. And I did exactly that so now entries on LA, OptimFROG and RK Audio are not stubs any more but rather contain some information on how the codecs work.
And if you look at LA
structure you see a lot of filters of various sizes and structure. Plus an adaptive weight used to select certain parameters. If you look at other lossless audio codecs with high compression and slow decoding like OptimFROG
or Monkey's Audio
you’ll see the same picture: several filters of different kinds and sizes layered over each other plus adaptive weights also used in residuals coding. Of course that reminded me of AV2 and more specifically about neural networks. And what do you know, Monkey's Audio
actually calls its longer filters neural networks (hence the name NNFilter.h
in the official SDK and you can spot it in the version history as well leaving no doubts that it’s exactly the neural networks it is named after).
Which leads me to the only possible conclusion: lossless audio codecs had been using neural networks for compression before it became mainstream and it gave them the best compression ratios in the class.
And if we apply all this knowledge to video coding then maybe in AV4 we’ll finally see some kind of convolution filters processing whole tiles and then the smaller blocks removing spatial redundance maybe with some compaction layers like many neural network designs have (or transforms for largest possible block size in H.265/AV1/AVS2) and expansion layers (well, what do you think motion interpolation actually does?) and using RNNs to code residues left from all the prediction.
Why Rust is not a mature programming language
Friday, September 18th, 2020While I have nothing against Rust as such and keep writing my pet project in Rust, there are still some deficiencies I find preventing Rust from being a proper programming language. Here I’d like to present them and explain why I deem them as such even if not all of them have any impact on me.
(more…)
A Modest Proposal for AV2
Wednesday, September 16th, 2020Occasionally I look at the experiments in AV1 repository that should be the base for AV2 (unless Baidu rolls out VP11 from its private repository to replace it entirely). A year ago they added intra modes predictor based on neural network and in August they added a neural network based loop filter experiment as well. So, to make AV2 both simpler to implement in hardware and improve its compression efficiency I propose to switch all possible coding tools to use misapplied statistics. This way it can also attract more people from the corresponding field to compensate the lack of video compression experts. Considering the amount of pixels (let alone the ways to encode them) in a modern video it is BigData™ indeed.
Anyway, here is what I propose specifically:
- expand intra mode prediction neural networks to predict block subdivision mode and coding mode for each part (including transform selection);
- replace plane intra prediction with a trained neural network to reconstruct block from neighbours;
- switch motion vector prediction to use neural network for prediction from neighbouring blocks in current and reference frames (the schemes in modern video codecs become too convoluted anyway);
- come to think about it, neural network can simply output some weights for mixing several references in one block;
- maybe even make a leap and ditch all the transforms for reconstructing block from coefficients directly by the model as well.
In result we’ll have a rather simple codec with most blocks being neural networks doing specific tasks, an arithmetic coder to provide input values, some logic to connect those blocks together, and some leftover DSP routines but I’m not sure we’ll need them at this stage. This will also greatly simplify the encoder as well as it will be more of a producing fitting model weights instead of trying some limited encoding combinations. And it may also be the first true next generation video codec after H.261 paving the road to radically different video codecs.
From hardware implementation point of view this will be a win too, you just need some ROM and RAM for models plus a generic tensor accelerator (which become common these days) and no need to design those custom DSP blocks.
P.S. Of course it may initially be slow and work in a range of thousands FPS (frames per season) but I’m not going to use AV1 let alone AV2 so why should I care?
A Quality Video Hosting
Friday, July 31st, 2020A brief context: I watch videos from BaidUTube (name slightly altered just because) and my preferable way to do that is to grab video files with youtube-dl
in 720p quality so I can watch them later at my leisure, in the way I like (i.e. without a browser), and re-watch later even if it’s taken down. It works fine but in recent weeks I’ve noticed that some of the downloaded videos are unplayable. Of course this can be fixed by downloading it again in slightly different form (separate video stream and separate audio streams muxed locally, youtube-dl
can do that) but today I was annoyed enough to look at the problem.
In case it’s not obvious I’m talking about mp4 filed encoded and muxed at BaidUTube without any modifications by youtube-dl
which merely downloaded it. So, what’s the problem?
Essentially MP4 file contains header with metadata telling at which offset and which size are frames for each codec and the actual data is stored in mdat
atom. Not here. First you have lots of 12-byte sequenced 90 00 00 00 00 0X XX XX 00 02 XX XX
, then moof
atom (used in fragmented MP4) and then another mdat
. And another. I’ve tried to avoid streaming stuff but even to me it looks like somebody put all fragments prepared for HLS streaming into single MP4 file making an unplayable mess.
Overall this happens only on few random videos and probably most of the browsers would not pick it (since VP9 or VP10 in WebMKV is the suggested format) so I don’t expect it to be fixed. My theory is that they decided to roll a new version of encoding software with a broken muxer library or muxing mode. And if you ask “What were they thinking? You should run at least some tests to see if it encodes properly.”, one wise guy has an answer to you: they weren’t thinking about that, they were thinking when how long until the lunch break and then when it’s time to go home. This is the state of enterprise software and I have no reasons to believe the situation will ever improve.
And there’s a fact maybe related to it. Random files starting from 2019 maybe also show the marker “x264 – core 155 r2901 7d0ff22” in the encoded frames while most of the files have no markers at all. While I don’t think they violate the license it still looks strange that a company known for not admitting that it uses open-source projects (“for their own protection” as it was explained once) lets such marker slip through.
Well, that was an even more useless rant than usual.
#chemicalexperiments — Bread
Saturday, May 9th, 2020It seems that as a programmer and especially during these days you have an obligation to bake bread (the same way if you belonged to MPlayer community you had to watch anime). So here’s me doing it:
It’s made after a traditional recipe from Norrland: barley flour, wheat flour, milk, yeast, cinnamon, a bit of salt and molasses. IMO it goes fine with some gravad lax or proper cheese.
P.S. And if you think I should have made a sour-dough bread—I can always order some from Sweden instead.
Reviewing AV1 Features
Saturday, March 21st, 2020Since we have this wonderful situation in Europe and I need to stay at home why not do something useless and comment on the features of AV1 especially since there’s a nice paper from (some of?) the original authors is here. In this post I’ll try to review it and give my comments on various details presented there.
First of all I’d like to note that the paper has 21 author for a review that can be done by a single person. I guess this was done to give academic credit to the people involved and I have no problems with that (also I should note that even if two of fourteen pages are short authors’ biographies they were probably the most interesting part of paper to me).
(more…)
Om marsipangrisorna
Sunday, February 9th, 2020Since I have nothing better to do (obviously) I want to talk about marzipan pig situation in Sweden.
(more…)
Railways of Baden and Württemberg
Sunday, November 17th, 2019First of all, unlike Rheinland Palatine railways I’ve not travelled on all local railways here because of certain geographic peculiarities it takes less time to get to the farthest corners of Rheinland-Pfalz from here than to places around Bodensee. And I can’t visit all historic railways either because one of them is located in the middle of nowhere with the only connection being a bus that comes there maybe twice a day.
Nevertheless, I’ve seen most of them and here I’d like to describe my impressions about them.
(more…)
Dingo Pictures: A Book
Saturday, August 24th, 2019When I think I’ve learned enough about Dingo Pictures there’s something new appears. From Toys review I’ve learned that Roswitha Haas also published a book (or maybe several) that are directly related to the subject at hoof. And of course being curious I’ve decided to buy it.
Amazon mentions two other books—an adaptation of Land Before Time and Jimmy Button but I dared not to buy them (one has definitely nothing to do with Dingo Pictures and I fear to find out that the first one is not about Tio).
Anyway, let’s look at the alternative version of King of the Animals Part 2 (spoiler: it’s still much better than Disney remake).