Here’s another thing that has been bothering me for some time. As we all know RDBMSes more complex than SQLite have data management that is more complex than simply writing all data into one file sequentially plus you have database users belonging to different roles with different permissions on different databases. And you’re supposed to connect it via the usual networking mechanism even if you’re running it on the same machine. And advanced features include multi-machine interaction like sharding, replication and such.
So the question is: if this all looks like a standalone operating system then why enterprise-grade RDBMS are running as applications instead of their own OSes build on some kernel to manage CPU cores, storage and network interface? And in the age of containers we don’t really need to have an app version even for legacy version, just replace the driver (or merely its settings).
But maybe like the last time I wondered why GPUs do not have raytracing, I’ll learn that it’s a trend already and I’m just oblivious to it.
I used to be a DBA, and my best answer is: you want to be able to manage and monitor the database software, and in some cases extend the code (plugins/exits) and all that is dramatically easier if the database software runs on a standard platform.
For example, centralized installation/provisioning/license management is a known entity on Windows or Linux, but not on PostgresSqlOS, Your auditors, security people, and accountants will al freak out if they have to handle a new “platform”.
Yes, learning a new custom platform would a huge detriment. As for extending it, you can customise an OS by adding drivers or service programs let alone shell scripts
written in PL/SQL.Let’s live and see if any major vendor will be brave enough to pick such approach…
I guess you could flip the question around and ask why RDBMSes should be their own OS. Would it give much of a benefit to be worth the effort?
I believe some systems are somewhat like that, as far as the user is concerned, in the sense that you just rent a box with the DBMS software on it, or you get given an OS image instead of some software installer. However, it’s strongly likely that they’re just regular software packaged with a Linux install, though likely with some vendor specific customizations.
There are DBMSes which will do stuff like manage raw disks via direct IO, bypassing filesystems and OS level caching, so some level of OS bypass exists.
Running the DB as an OS could let you benefit a bit through fewer context switches and bypassing one level of memory paging/translation, though I’m not sure if there’s that much benefit in general.
Running it as a user-mode application does provide more flexibility, a big one being deployment on developer machines without needing VMs. You can also more easily use third party tools (such as system monitoring) rather than rely on the database vendor to provide similar functionality.
A concept that made a bit of fuss years ago, in the land of servers, is the “unikernel”, where the idea is that every application runs as its own OS. With that, running a DB as an OS seems much more feasible if someone wanted to try it out.
However, I’ve been skeptical over the idea of unikernels, but more importantly, it hasn’t really taken off, so I’m not sure many out there are that interested.
I understand the reasons why they are applications and what benefits this gives. My question was more about the following: since modern RDBMS is a lot like OS itself in design and does not need much from the OS itself beside I/O and multi-threading essentially, has any vendor realized this and tried to get rid of the intermediate layer between RDBMS and hardware entirely? Unikernel approach by itself is silly since there are not so many application types that would benefit from it. IIRC game engines for the original Doom or Quake III were done as the OSes but when you application requires some external daemon processes running and environments installed you’d better choose some other approach.
Again, I understand it may be more or less convenient to the users (on one hand you have to configure a new environment, on the other hand it’s a standalone environment not depending on the third-party OS) but it gives better control to the RDBMS vendor and may make more sense for enterprise-level setups (i.e. when you have a whole cluster dedicated to the database and not just a single MySQL database running along with your site written on PHP). And you can guess what wins in a conflict of money with user interests who’ll still have to pay even if they’re displeased.
I’m not advocating for this approach, I just wonder why it won’t fly. So far it seems to be two main reasons: tradition (people are used to the old ways) and less extensibility (various plug-ins and monitoring systems). Beside the NoSQL rise to fame I see no radical changes in the last decades so quite probably nothing will change in the upcoming decades in DB world either.
Software generally marches towards more abstraction layers, not less. If you really think about it, many things could do with less layers, but either by convenience, practicality, stupidity or laziness, it’s generally not the direction people are heading in.
I/O and CPU scheduling are pretty big reasons to rely on an OS, but it does also provide other things, such as hardware interaction (drivers, interrupts etc), memory management (disk paging, ECC handling), networking, process management (because it may be easier to run things as separate processes than shoehorn it into one program) and isolation (may help with bugs/security). DB developers would likely rather focus on their application than dealing with the nitty-gritty details of a kernel. Even if, from a high level, it doesn’t look like much work, details matter, and I’m sure they’d rather prefer Linux deal with most of the Spectre/Meltdown mitigations than have to handle all of that themselves.
In terms of RDBMS vendor control, they already do it by only offering hosted solutions or being quite restrictive around on-prem hosting (though I’d say the primary reason for this would likely be licensing fees rather than technical utility). You don’t need the database itself to be an OS to do this.
Enterprises will have dedicated clusters for databases, but they’ll also need developer machines and testing environments. With cloud hosting becoming the norm though, I suppose this is less of an issue, but being able to run the DB in a small scale environment is still a plus. Also don’t forget the educational and hiring benefits of a system that can be run by hobbyists.
The only benefit I can see with dropping the OS is some minor efficiency gains. DBs are probably mostly I/O constrained, followed by CPU, neither of which an OS really impedes much on (particularly if the DB is doing its own I/O).
Practically speaking, for efficiency gains, it probably makes more sense to spend effort on tuning your DB engine. There’s still plenty to gain from query optimisation, scheduling, caching etc, to not be worth the effort to try and scrape gains by running in kernel mode.
Makes sense, thanks for the explanation.