Hacker News new | past | comments | ask | show | jobs | submit login
DBOS: A database-oriented operating system (dbos-project.github.io)
166 points by KraftyOne on Sept 3, 2022 | hide | past | favorite | 110 comments



I understand conceptually how you could architect a desktop/server OS based on a local database instead of a local filesystem, and it's deeply intriguing to me in terms of how everything could share a common data language that is far more flexible than files but far more structured than text. Presumably things like terminal output would be formatted as query results instead of text. The database wouldn't reside in files on disk, it would reside directly in blocks on disk and in-memory as required.

But this seems to be proposing a distributed database that runs across computers, which confuses me. Anything that runs across computers seems to me to be application-level, not OS-level, by definition. What does it even mean to be running a distributed database on "microkernel services" instead of on fully-fledged OS's? And then... where is the OS-level CPU coming from? If the database is distributed among computers then fine, but which computer is "running" the OS?

Both the nomenclature and distributed aspect are really throwing me off here.


So I would strongly disagree with your notion that an OS cannot be a distributed system. Several OS's, such as Plan 9, were explicitly designed as distributed systems from the beginning. For Plan 9, running it on a solo computer is more the edge case than the common case they designed for, which was a small team with workstations sharing a central server.

All of the computers are "running" the OS. The OS is more than one service on more than one machine.


Ah thank you. I had no idea. Turns out distributed operating systems were/are a thing:

https://en.wikipedia.org/wiki/Distributed_operating_system

"Research and experimentation efforts began in earnest in the 1970s and continued through the 1990s, with focused interest peaking in the late 1980s. A number of distributed operating systems were introduced during this period; however, very few of these implementations achieved even modest commercial success."

So looks like they're trying to keep the idea going?


I think there is a very real need for a distributed operating system. Operating systems ultimately just virtualize hardware, and its clear to anyone that has used VMware or Xen that we're building such a system piecemeal and constantly reinventing the operating system wheel. I think this is absolutely where OS design is inevitably destined to evolve into.


There's a need but don't forget that all the previous distributed OSs(Plan9, Mosix,..) didn't succeed.

So 'inevitably destinated to evolve'? No..


It wasn't the right time yet. Virtualization has only just come to the data center and taken off. People no longer have their single expensive PC, but utilize a multitude of devices. Cloud storage is just making its way onto people's phones and devices. The full suite of VMware products is essentially already a commercially lucrative operating system for the data center / cloud. I think this is a long term trend rather than a fad. Its not the hottest new framework, but rather the culmination of incremental changes over decades. We might not call it an operating system - we'll probably give it a fancier name to generate hype. But the idea is the same none the less.


Your comment is about 10 years out of date. Possibly more.

VMWare usage is actually on the decline now with many companies switching to cloud services (which are generally running Xen). And containerisation, while not a like-for-like replacement for virtualisation, has also eaten aware at some of VMWares market share. Particularly with services like k8s. Those that are dependent on running Windows might still use VMWare but Microsoft Hyper V has also eaten away at VMWares market share too.


* Conditions apply. Use the comment in the context of the HN bubble, and ignore that a huge part of the industry is still deploying wordpress on VPS, using excel spreadsheets for database and gmail for bug tracking.


The GP was talking about data centres. I’ve worked in data centres and thus replied with the same context as the GP was discussing.

Also people deploying Wordpress on a VPS and using Excel are categorically NOT doing any form of distributed computing, which is what this discussion is about. So it’s correct to discount those contexts.

What I’m noticing here is a lot of people are confused about the terms “distributed” and “data centre”.


I mean, alright. Xen and Hyper-V then. VMware is just what I'm most familiar with in the data center.


But even in cloud computing, the really compelling offerings (ie serverless) is all containers. Xen is only used as an additional security boundary (ie Docker running on top of Xen). Containers are the software abstraction providing the “distributed” aspect of the GPs argument.

And as for Hyper-V, that’s only there for people locked into the Windows ecosystem. You rarely see much distributed computing happening in Windows. Frankly it’s silly to even mention Windows in the same conversation as Plan 9 and other distributed systems.


Okay, but lets say containers are how we manage applications at scale. And so now you want these containers to still be distributed among regions, among different computers, to be increasingly interconnected, online, to be efficient. Maybe even you want kubernetes on bare metal for extra performance. Lets say that's the trend. Where is it going? Its moving towards grouping together heterogenous hardware in different locations as a single abstraction and managing that. You're going to want to abstract away various forms of memory, processors, GPUs, networking, file systems, and you're going to want something like a task manager, you're going to want to manage permissions and licenses. That is, you want the next layer of abstraction beyond that of a single computer. Docker, virtualization, hyperconverged disaggregated infrastructure - whatever the specific incarnation is, they represent different solutions to the same underlying, long-term trends.


You’re making a really vague meta point here. Yes I do agree that scaling horizontally has is presently more economical than scaling vertically. But that has also been true for years — for about as long as x86 servers have been around. So even longer than you’re original argument about virtualisation. Furthermore you don’t really need any distributed layer to achieve that either. In the early days I used to manage fleets of x86 bare metal servers with little more than a few shells scripts I hastily cobbled together.

Saying “infrastructure needs to scale” is such a generalised truism that it doesn’t really contribute anything to the discussion. And the way how Plan 9 manages scale vastly different to how Docker, VMWare and other solutions manage scale.

I do get the general point you’re making. I honestly do. But as I said in my initial post, it’s not a new trend nor emerging trend like you claimed. It’s already been the industry norm for the lifetime of most engineers careers.

So the real crux of our disagreement isn’t about technology nor whether infrastructure at scale is even needed, it’s the timeline you suggested. You’re out by a couple of decades — so far out that entire architectural designs to solve these problems have come and gone.


> Your comment is about 10 years out of date. Possibly more.

This feels perhaps a bit too dismissive?

Just in the past few months the VPS vendor that I get most of my VPSes from introduced a VMware cloud offering: https://www.time4vps.com/vmware-cloud/

If you were looking for a technology that is inevitably on its way out, I'd reckon that something like OpenVZ might be a better fit for such criteria, albeit for different reasons (being pinned to an older kernel version for the most part).

> VMWare usage is actually on the decline now with many companies switching to cloud services (which are generally running Xen). And containerisation, while not a like-for-like replacement for virtualisation, has also eaten aware at some of VMWares market share. Particularly with services like k8s. Those that are dependent on running Windows might still use VMWare but Microsoft Hyper V has also eaten away at VMWares market share too.

Containers are pretty great, however I wouldn't say that they're always exclusive to virtualization - I've seen plenty of useful setups splitting up physical hardware in virtual machines per project/team for access control and hard resource limit reasons and then the team using container orchestrators inside of those for easier deployment/management of apps and further resource limits on a per-service basis (e.g. how one would otherwise use something like systemd slices).

Sure, there are benefits to running containers directly on hardware, but also some challenges associated with it, compared to the VM based setup which gives you more flexibility (including not running containers in parallel for legacy systems and/or teams where they aren't a good fit) and the benefit of a technology that has been around for a long time, as opposed to needing to worry about picking the correct rootless runtime and figuring out how to better enforce resource limits across different teams with varying quality of engineering standards (Kubernetes namespaces are helpful here, but Kubernetes is not the only orchestrator you might use, the requirements for which might also vary on a per-project basis).

I don't really have a horse in the race, I merely enjoy the benefits of the (various) virtualization solutions, container orchestrators and runtimes, as well as any number of storage abstractions (GlusterFS and Ceph come to mind) and networking solutions (WireGuard seems like a pretty cool recent one) that come to mind. More so, many of those can easily work in tandem in many cases. Building on the progress of the past few decades seems like a pretty decent idea and I don't really see VMware or other solutions as on their way out anytime soon.

None of that might be very relevant for the serverless cloud, though, or people who just run managed Kubernetes in the cloud and don't care about the actual infrastructure, but that is nowhere near anyone.


> This feels perhaps a bit too dismissive?

Im just posting my observations having managed on prem and data centres over the last couple of decades.

> Containers are pretty great, however I wouldn't say that they're always exclusive to virtualization

I literally said it’s not a like for like replacement. Plenty of SaaS offerings run virtualisation for security with containers on top. But in those instances the VMs aren’t used for distributed computing; they’re used as a security layer. It’s the containers that perform the distributed aspects.


I think it's gonna be like steam engines...lots of historical attempts until the stars align and suddenly people see the need.


Funny thing about steam engines is that we think about them as a thing of the past. But in reality steam engines just scaled up and turned into steam turbines, which are a backbone of society


They succeeded just fine. Most people don't have a use for a distributed operating system, so they were not commercially viable. Since many of them were never meant to be commercial products, it's not really accurate to claim they were unsuccessful. The principles involved work fine to this day.


I would not consider VMS and Aegis/Domain OS as unsuccessful. They got however in the hands of HP who killed them.


A file system is a database though. Not a relational one, granted, but it's still basically nosql before it was cool.


Today’s filesystems are more like the NoSQL equivalent of hierarchical databases, which were the very first database design, created in the 1960s, preceding RDBMSs.


I don't think you can really call something a database if it doesn't have a query language, at least in contemporary practice.

And I wouldn't call "cd", "ls", "mkdir", "rm", "cat" and "echo" a query language.

Even the first navigational databases had a proper query language, although radically different from SQL.


Would you call "get", "set", "del" and "rename" a query language? Because they are commonly used commands for Redis.


There are plenty of databases without associated query languages. RocksDB, FoundationDB, and LMDB to name three. "Queries" are generally "list keys in this range" and all of the higher-level metadata you would want to query on (e.g. relations) are encoded in the key. I would absolutely put filesystems in this camp.


"Language" is just an interface. You can SQL a filesystem if you want, and also can build 3rd party indexers that query engines can benefit from.


Exactly! https://osquery.io is one example that


Path syntax with shell globbing is the query language.


Now add find, grep, xargs, sed, awk and put all those together with pipes


Considering things like wildcards and glob matching these don't seem far off from what you'd use in a graph db query language. (E.g. something like neo4j)


Databases have schema that can change (understood without having to scan content) and indexes - beyond trivial ones like "name; content" and "order written to disk".


Anything that runs across computers seems to me to be application-level, not OS-level, by definition

Novell Netware was also marketed as a Network Operating System, in the sense that none of its services were confined to the local machine: its entire purpose was to combine a network of computers into a single management unit.


Have a look at IBM i (nee AS/400), it uses database instead of files, aptly named catalogs.


Disagree, IBM i uses files: how do you create a database table in it? CRTPF command ("Create Physical File"). You create a file.

And I don't know what you mean by "catalogs", in the context of IBM i. Are you talking about DB2 catalog views? (Which exist in DB2 on every other platform, and most other RDBMS have something equivalent, such as the ANSI standard INFORMATION_SCHEMA)

Or are you confusing IBM i with MVS (in which the OS contains databases called "catalogs", in which you lookup a file name or file name prefix to find out which disk volume a file is stored on?)


I might have mixed it up with object libraries.


Makes sense. Although, what's the difference between a "library" and a "directory", other than the name? I think the difference is more terminological than conceptual.


There's a cool little demo [0] of AS/400's files-as-databases concept:

[0] https://www.youtube.com/watch?v=CDSgJE5mPJM


"OS-level CPU" is just normal CPU that's been configured and handed out by some program (conventionally, the OS) for other things to use. If that initial process management program (call it an OS) also supports communicating with similar setups and receiving/dispatching jobs and queries then that's precisely the distributed OS you're looking for.

You could _also_ architect a comparable system as user-space networking on top of a conventional OS, but that's by no means required.


> Anything that runs across computers seems to me to be application-level, not OS-level, by definition.

Distributed OSes have been an active area of research since the 1970s at least.


There will be a day where the cloud becomes something like the OS


I have been working for years on a system designed to do much of what is described in their 'Level 2' layer. It is a single system that can effectively manage unstructured (i.e. file) data, semi-structured data (NoSql), and highly structured data (RDBMS).

I don't have the resources available to me like this team, so there are still a lot of features on my TODO list that would enable some of the things they are looking for, but it can do many things now. It can manage 200M+ files and find subsets of them in sub-second speed. It can build relational tables with hundreds of millions of rows and thousands of columns and perform queries faster than many conventional DBs.

www.Didgets.com is where you can download the beta software. Demo videos at https://www.youtube.com/channel/UC-L1oTcH0ocMXShifCt4JQQ


How do you deal with the aspects of CAP or ACID?


As many people know, CAP stands for Consistency, Availability, and Partitioning which applies to distributed systems. You get to pick two. I chose the AP part with eventual consistency. 'Eventual' is the term with variability. Some systems are faster than others in bringing the distributed nodes up-to-date with data changes across the system. I believe I have some algorithms that complement the structure of the data to make the window of inconsistency as small as possible.

The system is also designed for full ACID compliance, but there is still some work to be done in this area.


There are already some Dbos type systems out there. I built one which stores program state in SQLite databases and process state and programs are also stored in SQLite. In the past I believe things like silver stream did the same too. The project I made is open source too: https://github.com/yazz/yazz


I expect they will acknowledge the Pick OS? "a demand-paged, multiuser, virtual memory, time-sharing computer operating system based around a MultiValue database."

[0] https://en.wikipedia.org/wiki/Pick_operating_system


Pick's "multi-value database" is essentially just a flat-file database. I don't see how – if we put aside the marketing – it was really any more "database-oriented" than your average mainframe/minicomputer operating system with a record-oriented filesystem, such as MVS (especially VSAM), VM/CMS, OpenVMS (in particular its Files-11 RMS component), etc.


From my (very dim) memory, Pick was popular because it did everything some users needed and was easy to use (relatively speaking).

I never met anyone who used it, though.


It was built on BASIC – and it really took off at a time (late 1970s / early 1980s) when BASIC was still a very major programming language.

I think its database features have been somewhat overhyped – but no doubt an area of strength when compared to other lower-end systems of the same time period.

The fact that the OS, programming language and database were all a single software package, developed together and sold as a unit, attracted people.

Much of its initial success was also due to its marketing strategy – target ISVs who would resell it as the basis of turnkey solutions.

It was based on a virtual machine, which made it easy to port to disparate hardware platforms – everything from PCs, to multiple incompatible minicomputer lines, to IBM mainframes – a key selling point in its day – although not a unique approach (see UCSD Pascal). Other approaches to portability – such as the C programming language – hadn't really taken off yet.


I used one of the more modern Pick variants (D3) at job during the first internet bubble. It did support a SQL query syntax, which I think makes it distinct from MVS et all. In any case its much lauded nesting feature in practice just ends up being a different way to do joins, and not particularly superior as I recall.


> It did support a SQL query syntax, which I think makes it distinct from MVS

SQL query support wasn’t added to PICK platforms until the 1990s or thereabouts - at least 10 years after SQL became available on MVS (via DB2, Oracle, etc)


Yuuup. I currently work on multivalue systems and it is quite fun to think about an alternate reality where pick popped off instead of Linux and sql. Though I do think we are in the better timeline today.

If anyone wants to try out universe which is one of the truly more modern multivalue database, I wrote an installation guide.

https://nivethan.dev/devlog/installing-universe.html


"Pick was originally implemented as the Generalized Information Retrieval Language System (GIRLS) on an IBM System/360 in 1965 by Don Nelson and Richard (Dick) Pick"

I'm sot sure the acronym would stand today, nor indeed Richard Pick's nickname.


This is the future, but they're doing it backwards.

Start at the top with sqlite compiled to webassembly as a virtual machine and prove that the tabular paradigm is superior to the scalar, vector, and object paradigms.

It'll be more fun that way.

The way that webassembly and wasi are designed to not fall back on unix tradition creates the opening to escape from the traditional files, streams, and objects universe.


"A database-oriented operating system" -- where have I heard this before?

Oh, right: https://en.wikipedia.org/wiki/Pick_operating_system

(No, I never used this.)

Incidentally, the name of its creator would be pretty unfortunate these days: Dick Pick.


You could start doing an open-source version of this by basically installing a barebones Linux kernel, installing PostgreSQL on it, then, writing all services to run within PG. If you want multi-server, distributed, there are already ways to do that with PostgreSQL. It would get you 90% of the way there...


The second and third 90% can be challenging.


It's funny, but it's true!


Postgresql multi-server for reads yes. For writes it is a bit harder. There are commercial offerings for that last I checked.


So it's time to create fuse-postgres?


I doubt this would run well on a current version of PostgreSQL, but might have some ideas... https://www.linuxjournal.com/article/1383 (from 1997!)


Is there any work or implementation? Edit yes, this would be a better link: https://dbos-project.github.io/


""" At Level 1, a kernel provides low-level OS services such as device drivers and memory management. At Level 2, a distributed DBMS runs on those services. At Level 3, we build high-level OS services such as a distributed file system, cluster scheduler, and distributed inter-process communication (IPC) subsystem on top of the DBMS. At Level 4, users write applications. """ How would it be different from installing today's kernel and then installing postgres, Gluster, docker and etcd on top of it?


Probably shortens the path between the applications (built on their distributed database manager) and the hardware/network compared to running applications on top of Postgres or other DBs today (which still end up calling out to the kernel and other subsystems). If all your applications (as seems to be their intent) don't need the Linux kernel, but only the DB, then push the DB service into the operating system.

Building on top of too many layers increases the overall complexity and reduces overall performance. Periodically chopping out part of the system and creating what you actually need, fresh, is sometimes necessary. Even if it doesn't produce a new product or final system on its own, gives you a direction for moving other systems if the theory pans out in practice.


I guess the changes will be mostly within the kernel and not outside(possibly some new APIs). I thought applications would have to change which possibly is too hard to do. So anyone who misunderstood it the same way I did, nothing will change for an application developer.


would you force level 4 applications to only access level 3 services and model everything that OS does (device management, process management, memory management etc) as a layer on top of it? So essentially all devices are a table? I think just modeling memory access as a table would be a big win. Not sure how atomicity and consistency would help applications exactly though with every memory access. Would love to know.


I read this overview several times and I remain confused.

It seems like a rehash of plan 9/network-is-computer paradigms. I get all the reasons that is interesting now- networking is REALLY fast and lots of cache hierarchy and other assumptions in kernels just don't apply any more- but to place the emphasis on the areas of innovation on the use of tables and transactions as core distributed primitives for applications...is strange.

While maybe beneficial for system programmers, it definitely isn't a more understandable paradigm for application programmers- very few of whom who write code that saves data understand the varieties of transaction semantics.

And in terms of applications, what decades of failed "database under the filesystem with typed object storage" work has shown also that the data types and semantics applications need are always more imaginative than those exposed at lower levels. Someone used to working with graphs, when given tables, is going to walk away.

Where is the ML layer that optimizes under a "save this thing" surface that actually presents a new stable abstraction?

But in true armchair architecture sniper fashion, I'm going to leave my confusion as is and move on. Best wishes to them.


I feel like we are getting close to something approximating this with an experimental branch of our B2B product.

We designed a schema and range of application-defined functions that allow for us to use SQLite as a domain-specific language. In addition to domain data, the schema also stores the authoritative view state as well as any intermediate function arguments.

The really nice thing about this is all of the support around the edges. You can use SQLite’s EXPLAIN keyword to get an output of how the virtual machine would execute your statements, including any custom functions.

You can also trace the result of every command in separate tables for troubleshooting - Each command starts with SELECT so we might as well do something useful with this.

It’s a “reflective” approach since the scripts for handing the various events reside in the same database upon when they execute and can be inspected or altered by script execution.

There’s also the magic of the WAL. If you are doing everything in the database (presumably because it’s your operating system) then things like incremental replication of state to another computer become trivial.


AS/400 follows a similar idea if I recall correctly, on top of DB2


Dubious. IBM's marketing wants to convince you it does, but as far as I can work out, the integration of DB2 into the system is nowhere near as deep as the marketing makes it sound.


It isn't actually Db2. They call it "Db2 for i," but it's a completely different codebase from what you'll find on z/OS and elsewhere.

That said, the object-relational database goes down pretty deep in the system. IBM i (as OS/400 is now called) relies on it for storing things like source code and bound programs in addition to, well, data. Aside from IFS, which seems seldom used except for Java, it's basically the way to store things.

If you have a library (IBM i-ese for a directory, kinda, but it's actually more like a schema in something like PostgreSQL), say, called YCBR1 that contains a physical file named QRPGLESRC, you can run `SELECT srcseq, srcdta FROM ycbr1/qrpglesrc` against it even though the members in that physical file are RPG IV source code, and you'll get the usual kind of resultset that you'd expect from a relational database. Likewise, you'll get something similar for binaries.


> It isn't actually Db2. They call it "Db2 for i," but it's a completely different codebase from what you'll find on z/OS and elsewhere.

From what I understand, the DB2/400 code base was forked from SQL/DS – so its closest relative is Db2 for VM/VSE – and Db2 for i is as much Db2 as Db2 for VM/VSE or Db2 for LUW is. (Db2 for z/OS is itself descended from SQL/DS – it started out as a port of SQL/DS from VM/CMS to MVS, although I believe most or all of the SQL/DS code was rewritten early in its development history and there may be little or none left by now.)

> IBM i (as OS/400 is now called) relies on it for storing things like source code and bound programs in addition to, well, data

Not really. You are talking about storing objects (including files) in the classic OS/400 filesystem – which hasn't fundamentally changed from that of S/38 – and S/38 didn't have SQL. During the development of OS/400, they took the higher layers of SQL/DS and ported them on top of the S/38 file system. But if we are talking about how RPG code is stored, that's all about those lower layers which predate SQL support

> YCBR1 that contains a physical file named QRPGLESRC, you can run `SELECT srcseq, srcdta FROM ycbr1/qrpglesrc` against it even though the members in that physical file are RPG IV source code

Nowadays, many other RDBMS systems support querying OS files as if they were DB tables.

I won't deny that IBM i makes it somewhat more seamless than most other systems – but I think that's mainly because it is seen as somewhat of a niche requirement, other systems could easily make it more seamless if it was seen as a priority. No "deep" integration with the OS is necessary to implement it, either


Aha, that's just an interface for interrogating objects in the traditional QSYS.LIB filesystem. Thanks for that clarification.

Though, that does seem to imply that running a CREATE TABLE in STRSQL actually creates a physical file in a library just the same, doesn't it? Beyond CRTSRCPF/CRTPF vs STRSQL (or similar), what's the difference between using DDS and DDL for that?

I'm going to have to poke around and see if I can glean more insight into the SQL/DS lineage. Do you perhaps remember where you learned about its relation to Db2 for i? I'm finding that the details are scant.


> Aha, that's just an interface for interrogating objects in the traditional QSYS.LIB filesystem. Thanks for that clarification.

From what I understand, to create SQL/400, they used the SQL parser and query engine out of SQL/DS, but not the lower level storage code; instead, they used the existing QSYS.LIB database file code as a storage engine.

It is worth pointing out that OS/400 V1R1 did not include SQL, SQL/400 was a separately licensed add-on product; I'm not sure at which point SQL/400 got integrated into OS/400 – but this very fact shows that it is not quite as integrated as some of the hype suggests – https://www.ibm.com/common/ssi/ShowDoc.wss?docURL=/common/ss...

> Though, that does seem to imply that running a CREATE TABLE in STRSQL actually creates a physical file in a library just the same, doesn't it?

Yes, "CREATE TABLE" and "CRTPF" both create a physical file. So in principle they are equivalent.

> Beyond CRTSRCPF/CRTPF vs STRSQL (or similar), what's the difference between using DDS and DDL for that?

That said, in practice they aren't 100% equivalent. They set different defaults. I also believe there are certain features exposed through CREATE TABLE that are not exposed via DDS. I believe they ultimately reach the same lower-level internal code, but they get there via different code paths.

> I'm going to have to poke around and see if I can glean more insight into the SQL/DS lineage. Do you perhaps remember where you learned about its relation to Db2 for i?

I read it somewhere, unfortunately I can't remember where any more.


Hey, thanks again for following up.

I have the feeling that the hype about the integrated database functionality might have more to do with the fact that the storage engine goes pretty deep than the fact that there's a frontend for it that speaks SQL.

That was kind of my point in originally noting that it isn't _really_ Db2: It's the object-relational database management system built into QSYS.LIB with an alternative frontend built from the bits that look like Db2. The rest of it just feels like conflation with perhaps what was a bit of initial upselling before SQL/400 became taken for granted enough such that IBM stopped making it a chargeable feature.

As a consequence, it all smells (at least to me) wildly different from the Db2 on z that uses VSAM under the hood. That last part is notable: VSAM is just a dataset access method, not something like i's ORDBMS that allows for, e.g., more reasonable ad hoc reporting. Doing that with ordinary VSAM datasets is kind of a pain, particularly if you need to do a lot of joins to get what you're after.

Additionally, where you have to migrate data out of ordinary VSAM datasets into Db2 on z, the data is already there in Db2 for i since it's all the same object-relational storage engine underneath. The other details seem to hinge on DDS-defined PFs allowing invalid data on write where DDL-defined tables validate data as it's written, differences in how keys are handled, differences in allowed field lengths, and so on. The two are, nevertheless, allowed to coexist much more easily.


> That was kind of my point in originally noting that it isn't _really_ Db2

Well, "Db2" isn't really a thing. There is the original DB2 for MVS (now Db2 for z/OS). And then there are three other codebases which have (at least partially) independent origins and were later rebranded as "DB2"/"Db2" – SQL/DS (now Db2 for VM and VSE – which actually came out before DB2 for MVS did), OS/2 Extended Edition Database Manager (now Db2 for Linux/Unix/Windows aka LUW), and SQL/400 (later DB2/400 and then Db2 for IBM i). The IBM i "edition" is just as much DB2 as the VM/VSE and Linux/Unix/Windows editions are, and if there is such a thing as true/real Db2, it is only Db2 for z/OS.

> It's the object-relational database management system built into QSYS.LIB

QSYS.LIB is not an "object-relational database". IBM makes a lot out of its "object-oriented" nature, but it really isn't "OO" in the sense that an OORDBMS is:

* OS/400 never allowed anybody other than IBM to define object types. We are talking about a form of "OO" in which only the OS vendor can create classes.

* OS/400 has never had inheritance, in any sort of generic way. There are certain ad hoc notions of inheritance in the system – for example, *FILE objects have a type field (the object attribute aka OBJATR) which distinguishes various types (physical files, logical files, device files, ICF files, DDM files, etc). Similarly, file members are classified into different types (such as source vs data) by their FILEATR/FILETYPE/SRCTYPE attibutes. But this is a long way off the more general idea of inheritance in a true OORDBMS – and as in my first point, only IBM can define new "subclasses", customer and ISV applications can only use the ones defined by IBM

(Historically, there have been a few ISV-related object types – for example, the Novell Netware object types found in older OS/400 releases – but they were added by IBM engineering due to some special partnership between IBM and that ISV, IBM has never offered ISVs the ability to create object types themselves. There is also an immense profusion of object types for IBM's own add-on products, both current and defunct – once an object type is added to the system, its definition can never be removed, although all the code which actually uses it can be excised.)

> As a consequence, it all smells (at least to me) wildly different from the Db2 on z that uses VSAM under the hood.

No, it is actually quite similar to VSAM. For storing customer data, only a small set of object types are actually relevant – *FILE, *USRIDX, etc – most of which have pretty direct equivalents in VSAM.

The biggest difference with VSAM, is that VSAM doesn't have any standardised way to specify the structure of a record (how to break it down into fields). COBOL copybooks are commonly used, but you also see people doing it using assembler macros, PL/I includes (inside IBM, PL/X too), and more recently C headers. There are tools available to convert between these different formats, but they cost extra $$$. By contrast, with DDS, you get a standard way to specify record structures which works cross-language, and all the compilers support importing DDS definitions. That is a real advantage over MVS – but actually has nothing to do with object-orientation at all – none of these record definitions are actually object-oriented.

> Additionally, where you have to migrate data out of ordinary VSAM datasets into Db2 on z, the data is already there in Db2 for i since it's all the same object-relational storage engine underneath

Db2 on z/OS's native storage mechanism is VSAM LDS. So it does actually use VSAM, but you are right the way it uses it is incompatible with legacy COBOL/CICS/etc applications which use VSAM directly (which would be KSDS, ESDS or RRDS not LDS). However, there are add-ons for Db2 for z/OS which enable it to access those KSDS/ESDS/RRDS datasets, for example IBM QMF – https://www.ibm.com/docs/en/qmf/12.1.0?topic=data-creating-v...

This has nothing to do with how "deeply integrated" Db2 for z/OS (or other z/OS RDBMS such as Oracle) is with the OS compared to Db2 for IBM i. IBM could have made DB2 for z/OS use KSDS/ESDS/RRDS instead of LDS, which would have produced roughly the same situation as you get on IBM i. And it has nothing to do with anything being "object-relational", given IBM i's "object-orientation" is more superficial than real, and this design decision has nothing really to do with object-orientation anyway. I think the real difference is, Db2 for z/OS prioritised performance over backward compatibility (VSAM LDS is faster, and by keeping the details of the storage architecture undocumented, IBM could change it in backward-incompatible ways across DB2 releases) – Db2 for IBM i made the opposite choice. Both could well be legitimate choices in their respective business contexts, but put that way, IBM i's choice doesn't appear more "advanced" than z/OS's.


When I read the title I immediately thought of PalmOS, and indeed it seems to be similar in architecture, with the major difference being that the database is hosted elsewhere. I'm interested to know if the advantages that came from PalmOS' database-focused APIs carry over into the distributed systems world!


I journal computer ideas and the ideas from database engineering are yet to percolate everywhere, especially to the desktop environment. Why is every company building frontends and backends when the CRUD problem could be solved properly once and for all and reused everywhere? We did the same for communication and kernels with Linux, Windows and BSD, and BSD sockets which is shared by practically everybody. Your React frontend is legacy and shall be rewritten in 5 years. But BSD sockets or the Linux kernel doesn't get rewritten everyday.

Rather than writing hand rolled code for querying data structures and manipulating them as Linux does, we can define queries that retrieve data structures in in the shape we're looking for.

To put this simply, this is extremely high level, and the idea that data layout, data structure and algorithm can be unaggregated for cache locality and performance and developer experience. We can form materialized views on top of other materialized views and calculate the most efficient retrieval and storage format based on the structure of the data.

I suspect a materialized view, as in the data structures of the Linux kernel is more efficient than materializing a join at runtime.

One of my ideas is "ideas4 9. Query for data structure", https://github.com/samsquire/ideas4#9-query-for-data-structu... which is the idea we should be capable of querying to retrieve data structure in the shape we want. The shape of the data lends itself to solving certain kinds of problems.

An ideas3 is "Query database" https://github.com/samsquire/ideas3#17-query-database, we persist queries as we persist data and use them to optimise data storage format. Rather than using a cost based optimiser to optimise how to retrieve data in the same uniform structure, we optimise how to store data itself. For example, for write heavy loads, we use LSM trees, we persist columnular when a query uses every field of every row and so on.

I also had the idea # 10. in ideas4 to persist data access patterns directly and optimise that. https://github.com/samsquire/ideas4#10-access-pattern-serial...


Neat idea, but so many questions . . .

how are process boundaries on data preserved?

If data can be shared between processes via the db, how do they enforce clean, clear, testable interfaces (like monoliths lack)?

And given all that, how do they manage data schema changes that we'd handle now with API versioning?


> clean, clear, testable interfaces (like monoliths lack)

What prevents a monolith from having these things?


Other than Conway’s law; nothing


Sounds like something you could implement on top of an existing OS like Linux (even in userspace) and get mostly the same advantages.


Sounds like that's actually their plan for the first pass (reading their papers now):

> While the DBMS engine will need some basic resource management functionality to bootstrap its execution, this could be done over a cluster of servers running current OSs, and eventually bootstrapped over the new DBOS.

https://arxiv.org/abs/2007.11112


Yeah, and TBH this is not a good reason to forge up a completely new OS. Even though the idea itself will surely improve performance, the immatureness alone will likely offset the benefits. It'll need years of improvements and tuning, which directly translates to $$$. Worse, normal people will not test the new OS for free, which is how things worked for Linux.


This was my first thought, and probably should be the first pass IMO. If they have to implement an entire novel OS to support this it will never be anything beyond a lab experiment.


Windows Registry doesn't count?


Perhaps not, but WinFS (Windows Future Storage)[0] comes closer.

> In 2013 Bill Gates cited WinFS as his greatest disappointment at Microsoft and that the idea of WinFS was ahead of its time, which will re-emerge.

WinFS was supposed to be released with the codename Longhorn version of Windows, but Vista was released without it. WinFS was disbanded and parts incorporated into other products and features.

[0] https://en.wikipedia.org/wiki/WinFS


OK, so I've got 50GB of audio samples. Does anyone actually believe that these 50GB (400 giga-bits) are more efficiently stored in a database designed for "stuff" than they are in a database designed for files ("a filesystem") ?


Can be; it’s all contextual.

https://www.sqlite.org/fasterthanfs.html


Fair enough. The problem is that at some point, the data has to hit some sort of storage hardware. Presumably between the DB and the hardware, there's some layer that somewhat abstracts the storage hardware. Isn't that ... a filesystem?


That's a storage volume (partition, raid volume, zfs pool, etc), not a filesystem. A filesystem is the abstraction layer on top of the storage volume translating the user-assigned data identifiers (aka file names) to byte ranges.

Talking specifically about databases: they often implement their own data organization. Oracle and Sybase famously performed better when working on raw partitions than with files.


This 1981 paper by Stonebreaker, Operating System Support for Database Management [1], explains why conventional file systems are not a good foundation for databases. He writes:

"The bottom line is that operating system services in many existing systems are either too slow or inappropriate. Current DBMSs usually provide their own and make little or no use of those offered by the operating system. ..."

Now, 40+ years later, Stonebreaker is a participant in the present DBOS project!

1. http://www.cs.fsu.edu/~awang/courses/cop5611_s2022/os_databa...


If I’m reading you right, you’re correct that the database is still technically passing its data to the filesystem at the end of the day.

However, databases generally subsume most responsibility for the on-disk representation of data as well as I/O patterns. What’s really being compared here is the performance of the database as a storage engine vs. the file system itself as a storage engine - not the raw I/O potential of the filesystem itself.

https://en.wikipedia.org/wiki/Database_engine


Many databases implement their own filesystem internally that are heavily optimized for database-y use cases and access patterns while missing standard POSIX and other features a "real" filesystem would have. When this filesystem is installed on top of the OS filesystem, there is a cost due to duplication of effort, design impedance mismatch, limitations of the OS filesystem, etc. This is partially mitigated by turning the files in the OS filesystem into a giant block store to minimize interaction with the OS filesystem.

Some database filesystems can be installed directly on raw block devices if you desire with no OS filesystem in the middle. This usually offers significant performance and efficiency gains since everything above the raw hardware is purpose-built for the requirements of optimal database performance.


A filesystem offers a hierachical interface. Meanwhile, a DBMS needs nothing more from the OS than access to blocks and preferably information about HDD layout. That's a level below.


Can you clarify what point you're making? If you're trying to argue that adding an extra layer can only reduce performance, any cache is an obvious exception to that. Are you saying it's extraneous to use a database as a storage abstraction because they have to sit on top on filesystems, and filesystems already exist?


The point I'm making (and I'm not certain that it is true) is that ultimately if you want to store raw data, a filesystem seems more likely to be what you want to use. Put differently, BLOBs in the DB end up (necessarily) as blobs on the disk, and managing blobs on a disk is precisely what filesystems are intended for.

But yes, on top of that, there's the question that in the end even the DB will need something very, very much like a filesystem between them and the storage hardware ... which opens up the question whether this should remain hidden to every other application, or whether it makes sense that for certain kinds of applications, they too would use it (i.e. just like today)


> managing blobs on a disk is precisely what filesystems are intended for.

A filesystem is doing much more, e.g. providing naming and management (directories, symlinks, access control, extended attributes, cache management, …) for files for manipulation by humans and applications, whereas RDBMs only need fixed-sized blocks of storage.

Some databases actually support using raw disks without a normal filesystem, which can have advantages by removing the extra layer of abstraction, e.g.:

https://dev.mysql.com/doc/refman/8.0/en/innodb-system-tables...

https://docs.oracle.com/en/database/oracle/oracle-database/2...

https://www.ibm.com/docs/de/db2/9.7?topic=creation-attaching...


> But yes, on top of that, there's the question that in the end even the DB will need something very, very much like a filesystem between them and the storage hardware

So the answer to this question is no. The “filesystem” that a relational database uses - ie how it organizes and allocates on the block layer is so different from the DOS/POSIX semantics that you wouldn’t recognize it as a filesystem - so to say it is very, very much like a filesystem is dubious.


I created a kind of object store (https://www.Didgets.com) that originally was designed to replace file systems. It manages the data streams (i.e. blobs) for each object very much like a file system does for each file. Although I have a few algorithms that make allocation and management of all the blocks very efficient, my testing shows almost equivalent I/O speed for reading/writing the data.

It is in the metadata management where my system excels. The table of file records for volume with over 200M files only needs 13GB read from disk and that much RAM to cache it all. Contextual metadata tags can be attached to each object and lightning fast queries executed that use them. The objects (Didgets) can be arranged in a hierarchical folder tree just like file systems use, but they don't need to be.


The page says:

> SQLite reads and writes small blobs 35% faster¹ than the same blobs can be read from or written to individual files on disk using fread() or fwrite().

This is likely because of the hierarchical nature of the filesystem. The filesystem path lookup can be slower than DB index lookup (depends on actual implementation). I haven't tried this, but something like getdents(2) would improve the performance of the test C code here, as one can skip full path lookup.


I'm just going to be the first to propose it:

SQLiteOS


Every "subcollection" is just another SQLite DB embedded as a (searchable) BLOB.


OS/400?


or Mesos?


Well, Oracle APEX applications written in PL/SQL are a way to approach this.


The cloud is all ready a security nightmare. I don't want to put more stuff in it.


remember aix? big claim to fame was it was built around a db. nothing new under the sun i guess


Honest question, what are some (or at least one) distributed databases that actually work? Asking for.. a friend


To what degree? I assure you, there’s plenty of options that work. Maybe your concern is regarding operational complexity?


Depends on the use case. What’s the use case/features is the friend looking for?


I understand an OS to be something like the "base" software that runs on a single computer to unify the various pieces of hardware that are in it. I was surprised that this is a distributed system, but on second thought, it makes sense: "multiple CPU's" is just really "more hardware". The network is just another bus, in a sense. In my opinion it's an extra "innovation token" though, since it's not standard practice to build distributed OS's. With the addition of building a database-oriented system is going to be a hell of a challenge.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: