Posix Abstractions in Modern Operating Systems: The Old, the New, the Missing [pdf]

ktRolster · on May 8, 2016

It shows that we need a better IPC api, and also some primitives for dealing with graphics cards (Linux deals with it through ioctl, for example, which is a hack).

digi_owl · on May 8, 2016

Frankly everything is a hack at some level or other.

tmptmp · on May 8, 2016

Then we can say it is a rather poor hack.

pjmlp · on May 8, 2016

Yep, just the usual portability issues.

POSIX is stuck in the days when UNIX software was plain CLI or daemons, with keyboards and teletype as devices.

gaius · on May 8, 2016

The good old days, you mean. Long may they continue.

On a more serious note, this is not happening because POSIX is bad or irrelevant but because people a) don't know about it and b) even if they did, reinventing the wheel is more fun.

pjmlp · on May 8, 2016

Not everyone agree those were the good old days.

For me the good old days were the time spend with Amiga 500, discovering the world of Smalltalk, Oberon and all Xerox PARC research and other pioneers.

I got into UNIX via Xenix, and used almost every commercial flavour of it, but don't consider it the good old days.

The fact that POSIX is stuck in a PDP-11 world, is a proof that no big in the industry, with power to drive POSIX forward, is seeing as a relevant OS API for anything besides writing daemons and CLI applications accessible via SSH.

gaius · on May 8, 2016

It's not really proof of that tho', it's proof that they see advantages in less portability between Unix variants, which is not entirely the same thing. There's nothing in it for Red Hat or Canonical shareholders if you decide to run FreeBSD instead, so why would they support the standard?

ZenoArrow · on May 8, 2016

Red Hat and Canonical rely on third party software to boost their software ecosystem, that's why it's in their interest to support POSIX standards.

gaius · on May 8, 2016

Yes and no. They both have the resources (money, engineers) to do alot in-house, and the market share that vendors will support them specifically. They can't not do this, they are under the same commercial pressure as other OS vendors.

If you want proof, just look at the systemd fiasco. Red Hat's way or the highway.

majewsky · on May 8, 2016

> If you want proof, just look at the systemd fiasco. Red Hat's way or the highway.

What kind of proof is this? systemd is the new standard across all major distros (except for stuff like Gentoo or Slackware, which is irrelevant in the targeted enterprise market anyway). And in the process, systemd steam-rolled over lots and lots of bizarre inconsistencies between distros.

If that's Red Hat's version of EEE, I'm very happy with it.

gaius · on May 8, 2016

Fair enough, but let's not pretend we're in Unix-land anymore.

ktRolster · on May 8, 2016

Yeah, I feel like there was a golden age between 1975 and 1985, and then things got stale.

Things started picking up around Linux time, though that age of goodness started getting stale around 2006 (maybe because linux started to get commercialized? Or maybe because everyone started getting interested in handheld devices for their 'fun' programming?)

sethrin · on May 8, 2016

As far as I can tell, POSIX was an attempt to codify existing practice amongst Unix vendors, picking the best of the most widely supported features, as opposed to actually trying to come up with a good standard. From reading UHH, one might almost be tempted to say that the only thing worse than POSIX was leaving things up to the vendors. Either way, POSIX seems to have dealt with a lot of interop issues, but it has fallen so far behind the curve that it hardly even makes sense to talk about. The article compares Android, OSX, and Ubuntu. If you were going to have a conversation about interoperability between those platforms, would you be thinking about POSIX? Or perhaps some web standard?

Standards probably shouldn't sit still, especially if they hew more closely towards existing practice than ideal practice. I think POSIX is overdue for an update.

As an aside though, did people ever voluntarily use csh? Aside from committed masochists, that is.

pjmlp · on May 8, 2016

For me POSIX always looked like the C runtime part that vendors didn't want to make part of ANSI C, thus creating a separate standard instead.

majewsky · on May 8, 2016

> I think POSIX is overdue for an update.

POSIX is being updated constantly. It is just not as noticeable because POSIX is, as you say,

> an attempt to codify existing practice amongst Unix vendors, picking the best of the most widely supported features, as opposed to actually trying to come up with a good standard.

yuubi · on May 8, 2016

> did people ever voluntarily use csh?

Yes, for an interactive shell, back when the choice was between csh that had command-line history and sh that didn't.

ZenoArrow · on May 8, 2016

I would argue passing around all data as text is suboptimal, not only from a performance point of view but also in terms of robustness. Parsing plain text using regular expressions and suchlike can lead to some useful outcomes but it's easy to derive false positives from this. On the other hand a binary-based object-oriented approach can give better performance and make it easier to parse data.

This is just one example of where the old ways aren't necessarily the best. I'm fairly confident we could come up with something better than POSIX if we were willing to make the effort.

agoodboy · on May 8, 2016

> On the other hand a binary-based object-oriented approach can give better performance and make it easier to parse data

please, no. OOP approach means that my process needs to know how to communicate with other processes via very specific protocols, which aren't very well defined (any process could describe its type of object). This is the exact opposite of flexibility. The usefullness of tools like grep or sed would drop drastically, and we would fall back to big blobs of software.

Since I'm just passing data, do I really need the behavior attached to it? How would state persistence be handled?

The text (bytes as ASCII until line ending) approach may seem ugly and dirty, but in fact you can pass list, tuples, maps, trees or just text and is up to the receiver the responsibility to make sense out of the data.

Need data from A but B can't understand it? Use C (which operate on text) to format A's output as B needs. With object how many "translator" (C in the example) would you need to acheive the same result?

ZenoArrow · on May 8, 2016

> "please, no. OOP approach means that my process needs to know how to communicate with other processes via very specific protocols, which aren't very well defined"

It's not as hard as you make out. Take a look at how PowerShell works for an idea about how a OOP-based approach can work for CLIs.

http://www.computerworld.com/article/2954261/data-center/und...

zvrba · on May 8, 2016

> OOP approach means that my process needs to know how to communicate with other processes via very specific protocols, which aren't very well defined (any process could describe its type of object).

That's why we have IDL -- interface description language for RPC calls. An IDL-to-X (usually C) compiler generates the necessary glue so that anybody can talk to the program in question. IDL is also the basis of MS COM, which I quite like from the design standpoint.

> The usefullness of tools like grep or sed would drop drastically, and we would fall back to big blobs of software.

So you teach grep to take an IDL file, invoke an IDL compiler and dynamically load the parser for the protocol in question. Also, if the broker were a standardized, perhaps in-kernel component (dbus, kdbus), you could attach "idlgrep" to any process to trace its calls. You wouldn't be restricted to pipes.

> Since I'm just passing data, do I really need the behavior attached to it? How would state persistence be handled?

The parent's wording was a bit unfortunate. You can have interfaces and interface inheritance and versioning (the "OOP" part), but there's no behavior send between processes.

> With object how many "translator" (C in the example) would you need to acheive the same result?

Exactly one: the IDL compiler.

majewsky · on May 8, 2016

> So you teach grep to take an IDL file, invoke an IDL compiler and dynamically load the parser for the protocol in question. Also, if the broker were a standardized, perhaps in-kernel component (dbus, kdbus), you could attach "idlgrep" to any process to trace its calls.

So you introduce a huge load of accidental complexity because the text interface has some perceived inefficiency? I'll choose simplicity and accessibility over this mess anytime.

digi_owl · on May 9, 2016

Ok, now i am confused. On the one hand you rail against complexity, while on the other you favor systemd.

majewsky · on May 9, 2016

Yes, systemd is complex. But much of its complexity is justified. I rail against unnecessary complexity.

gaius · on May 8, 2016

Heh, you will only be able to openly advocate IDL once the last CORBA programmer is dead. We've been down that route. It didn't work.

pjmlp · on May 8, 2016

It is called REST and micro-services nowadays.

JdeBP · on May 10, 2016

Actually, as noted parenthetically above, it's called DBUS (Desktop Bus) and ... erm ... IDL.

* http://dwheeler.com/dbus/

Avshalom · on May 8, 2016

>>process needs to know how to communicate with other processes via very specific protocols

>>and is up to the receiver the responsibility to make sense out of the data

That is exactly the same thing.

Either way you're throwing bytes from one process to another and hoping the second one can do something useful with it.

gaius · on May 8, 2016

It's about the failure modes you anticipate. If a text file becomes corrupt a human can probably still make some sense of it. If a binary file does, you may well be completely out of luck. I mean try it for yourself, truncate a 4G core dump by a couple of bytes and watch GDB try to open it. Now imagine that all your system and application logs are vulnerable to this.

In a world where things are "fixed" by just blowing away a VM and spinning up a new one, you might not care but in that case why bother to log anything at all...?

ZenoArrow · on May 8, 2016

> "truncate a 4G core dump by a couple of bytes and watch GDB try to open it"

This comes down to a failure of the binary design. It's possible to design a binary format that is uniform enough to handle truncation of a few bytes. To give you one example, you have a header for the binary with pointers to the start and end of each data block and you can keep multiple copies of this header. Another alternative is to specify that each block needs to specify where the data in that block ends. That way, even if you have partial corruption you can read the uncorrupted data blocks.

lisivka · on May 8, 2016

It is already invented by ASCII.

ZenoArrow · on May 8, 2016

No it's not. ASCII doesn't give you enough metadata to permit for efficient and reliable parsing of data, you can do a lot better without using ASCII.

lisivka · on May 9, 2016

ASCII solved problem of "few bits/bytes lost", isn't? Binary flow is divided into bytes. Bytes are divided into control sequences and text characters. Control sequences contains sequences to separate columns (tab character) and records (new line character). Moreover, all that was standardized across various platforms. It was huge improvement. It is why UNIX sticks to ASCII: to be portable.

sph · on May 8, 2016

> passing around all data as text is suboptimal

POSIX is not Unix.

Passing data around as text is a tenet of the Unix philosophy, while AFAIK POSIX doesn't mandate any of that.

danieldk · on May 8, 2016

Actually it does, POSIX also specifies many of the standard (text-based) UNIX utilities:

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/co...

ZenoArrow · on May 8, 2016

> "POSIX is not Unix."

"This is just one example of where the old ways aren't necessarily the best."

VLM · on May 8, 2016

Performance in an absolute number vs scaling.

The lowest hanging fruit gets picked. "Lets dump absolutely everything and NIH the whole thing for a single time 5% performance increase" doesn't sell well when that one time gain is expressed in the amount of time it takes hardware or network capacity to improve 5%, or fixing poorly scaling algos. Also insert the usual analogy of the ratio of the cost of microscopically faster hardware vs the labor cost of extremely expensive rockstar ninja programmers.

Also its assumed that change will lead to improvement because anecdote, or because change is always good. However, "the thing that won uses text, so naturally we gotta get rid of text" doesn't sound like a wise plan.

ZenoArrow · on May 8, 2016

> "However, "the thing that won uses text, so naturally we gotta get rid of text" doesn't sound like a wise plan."

I'd argue the Unix command line ecosystem has only 'won' in the sense that many developers are familiar with it. I don't think it is technically the best we could do if we were starting from a blank slate.

thwarted · on May 8, 2016

Maybe many developers are familiar with it because it is text and thus accessible. Technical excellence matters not if no one is using it. Although I think that accessibility to humans is a major contributor to technical excellence.

ZenoArrow · on May 8, 2016

Look at how Bash scripts are used. You can hack together something quick with them, but at the time you want to do something harder or better people end up using a language like Python or Perl.

Now look at what happens on Windows. Devops can use PowerShell to hack together something quickly, but if they want to do something more complex you have the option to easily add new PowerShell commands and data types because it's all based on .NET so you can pull in any .NET code you want, whether that's something you write yourself or from an existing library.

You could do the same with Bash, but people don't do as often, I would suggest this is because it would tend to rely on plain text and regular expressions, and programmers tend to prefer using better specified data types.

Blackthorn · on May 8, 2016

Microsoft PowerShell does this, and it's pretty neat. Of course it's less popular than bash so now they're bringing bash and text-based pipes onto Windows! Ergh.

stepvhen · on May 8, 2016

Bash + coreutils is powerful. Putting that on windows is a very good thing. The "performance increase" is irrelevant compared to the ease of use, available documentation (man pages), interoperability, and maturity.

jschwartzi · on May 8, 2016

It's worth noting that the traditional UNIX text-based utilities are designed for reporting to a user. Using text to generate reports is the most intelligent thing they could have done, because humans universally process text. It's not a superior programming environment, but it's a superior reporting environment, which is helpful when you're trying to reverse-engineer some third-party ELF binary or debug why your shared memory object sometimes causes clients to crash. That we sometimes take strings from our reports and pass them into our shell is accidental.

pjmlp · on May 8, 2016

The same can be told of the REPL environments of Smalltalk, Interlisp-D, Lisp Machines, Mesa/Cedar and Oberon.

All more powerful than UNIX shell languages while using structured data.

Coming from Spectrum, MS-DOS and Amiga, the UNIX shell seemed powerful until I discovered how the development workstation should look like, from the eyes of Xerox PARC.

JdeBP · on May 10, 2016

At which point you switched to emacs and never looked back? (-:

Amiga REXX should have militated against that view of Unix shells slightly.

ktRolster · on May 8, 2016

the biggest problem with bash on windows is that it has poor interop between Windows tools (to the point of almost no interop). In that sense, cygwin is already better than the recent Microsoft offering.

eru · on May 8, 2016

What does binary vs text have to do with object orientation?

ZenoArrow · on May 8, 2016

It doesn't, but it gets around the extra overhead you get with the added metadata.

OO on its own is slower to parse than plain text, but binary is faster to parse than plain text. If you combine the both can you get the best of both words, something that's both reliable and fast.

eru · on May 9, 2016

I see that binary might be interesting. But why add OO to the mix? There are other worlds to mix the best from.

As one silly example, you can consider Google's protocol buffers. They are not particularly OO-y. (I have great sympathies for functional languages, but I don't think they offer too much insight into how to format your data files. And neither does OO?)

woodman · on May 8, 2016

It is a "enemy of my friend is my enemy" sort of thing, the minimization of a shortcoming. OOP stumbles on text (because of object methods) in a way that no other programming paradigm does. So obviously that means that text is stupid and we never really liked the benefits of text anyway!

digi_owl · on May 8, 2016

The cascade has to continue cascading...

vatlidak · on May 9, 2016

Thanks for sharing this!

Here is our project site: https://columbia.github.io/libtrack/

and our repository github repo: https://github.com/columbia/libtrack