Leave your OS at home: the rise of library operating systems

mwcampbell · on Sept 15, 2017

The article says that in data center workloads, 15-20% of CPU cycles are spent in the kernel. Contrast that with Brendan Gregg's statement (for Netflix running on EC2): "Most microservices we have run hot in user-level applications (think 99% user time), not the kernel, so it's difficult to find large gains from the OS or kernel." [1] I wonder why the OP and Brendan had such different observations.

[1]: https://news.ycombinator.com/item?id=13081465

brendangregg · on Sept 15, 2017

Right. Most cloud systems we have are user heavy, there's not much kernel, usually <3%. They cited a Google paper with 15-20% kernel, which is a lot. Although I've heard from other sources that Google is kernel heavy, so this further confirms that. Different workloads.

It'd be interesting if someone like sysdig would share (if they could) the usr/sys breakdown for their entire customer base (I seem to recall they were sharing other such statistics).

It's a bit hard to see how much an exokernel or unikernel would help given, say, a workload that was 5% kernel. You might say it would only improve performance by less than 5%. But that's just considering the cycles saved. Such kernels should also be smaller, and consume the CPU hardware caches less, leaving more room for the application to execute on those precious resources. That would improve the performance gain further than just the reduction in cycles alone.

pjmlp · on Sept 15, 2017

Security and boot times are also relevant, no?

Unikernels are much easier to manage than containers.

brendangregg · on Sept 15, 2017

Sure, both are relevant. Cutting boot time only helps so far: we also have to tune application warmup to be faster as well. This is a little dated (we've tuned it down since then), but here's a microservice with a 10 minute application warmup time: http://www.brendangregg.com/blog/2016-09-28/java-warmup.html

yaantc · on Sept 15, 2017

Could reduced MMU overhead be significant on a server?

I work in the embedded space, so very different with small CPUs. Using an MMU compared to a simple MPU can waste 25 to 30% of a CPU cycles easily. I wouldn't expect such overhead on a big server CPU with much bigger TLB (but also bigger working sets?), still a unikernel could use a very simple mapping with in some case a single large page, and see very little MMU management overhead. I have no idea on the possible gain on a server, but I'm curious. Any hindsight welcome.

jnwatson · on Sept 15, 2017

25-30% is highly unusual.

There's something extraordinarily inefficient going on in your system.

yaantc · on Sept 15, 2017

This was on an ARM926 running a quite large code base. The ARM926 TLB has only 64 entries, which is not much when running an OS with 4 kB pages. So that's quite an extreme (and old...) case, but interesting when dealing with rather small CPU with not so small context. The MMU has a performance cost to keep in mind.

A server is very different. Still, a blog on transparent huge pages on Linux [1] shows an example where a high load JVM application server spends over 10% of its time doing page walk: "Yes, you see it right! More than 10% of CPU cycles were spent doing the page table walking.". So there's a massively bigger TLB, but also the server run much larger application with a large footprint. In the end, for some apps the MMU overhead can be significant in the server space.

In a unikernel, there's a single address space used and no isolation. You could map it with only huge pages, and dramatically reduce TLB misses even for a large footprint application (assuming a server running mostly such unikernels, as another regular OS apps could trash the TLB otherwise I guess). Now, I'm not sure there's any such large footprint application running an a unikernel yet, but it may be a possible gain for unikernels on principle.

[1] https://alexandrnikitin.github.io/blog/transparent-hugepages...

AndrewDavis · on Sept 15, 2017

I wonder how this would compare to Netflix's CDN machines. Where Netflix moved TLS encryption into the kernel.

brendangregg · on Sept 15, 2017

CDNs are kernel heavy. :)

rtpg · on Sept 15, 2017

Doesn't Brendan Gregg work on things like Illumos/SmartOS? I think it's probably using lightweight OSs like that which help

Meanwhile most people just kinda take out of the box Ubuntu

brendangregg · on Sept 15, 2017

I've primarily worked on Linux for 3+ years.

buster · on Sept 15, 2017

In any case, I wouldn't consider illumos much more lightweight as Linux or bsd.. It's Solaris at its heart.. (not to disrespect Solaris, it is/was a great OS)

dogruck · on Sept 15, 2017

Great question. I'd guess that Google's workload differs from Netflix.

What's your guess?

Artlav · on Sept 15, 2017

Funny thing is, my old OS project [1] is more or less that - it could be a typical OS with separate tasks and so on, but it is effectively a single program, and can easily be compiled as a single program with all the standard calls replaced with OS-level implementations.

In practice, these are two different sets of use cases - in one you want clean abstractions, process separation, storing message passing and so on, in the other you want to get the call from your code into the libservice ASAP. Trying to (unwittingly) combine the two gave me quite a bit of grief.

Another disadvantage is a lack of any protection (at least in my implementation), since everything shares the address space.

Anyway, it's odd to see that a random idea i had 17 years ago is actually starting to get used.

[1] http://orbides.org/aprom.php

rwmj · on Sept 15, 2017

Yours sounds like a Language-based Operating System, except you didn't implement protection in the language. Anyway they've been around for a very long time. The Burroughs system that I'm familiar with was around in the 1960s (from 1961 according to Wikipedia).

https://en.wikipedia.org/wiki/Language-based_system

beefhash · on Sept 15, 2017

Isn't that basically what SQLOS is doing for Microsoft SQL Server[1]?

[1] cf. https://blogs.msdn.microsoft.com/sqlosteam/2010/06/23/sqlos-...

pishpash · on Sept 15, 2017

Back to real mode we go.

gravypod · on Sept 15, 2017

I guess we should have been listening to Terry all of this time!

throwaway7645 · on Sept 15, 2017

I really wish I could do a good bit of my work on his OS. I'd much rather a Kawasaki than an aircraft carrier that is half sinking.

Charles Moore of Forth fame has an amazing interview where he argues that the OS is really not necessary on modern hardware. Of course this man built his own language, CAD software, and chips...some of which are in outer space. Impressive man.

noblethrasher · on Sept 15, 2017

I don't suppose you have a link to the Chuck Moore interview; I would love to take a look at it.

(Alan Kay and co. also argued that point.)

throwaway7645 · on Sept 15, 2017

http://www.ultratechnology.com/moore4th.htm

https://www.red-gate.com/simple-talk/opinion/geek-of-the-wee...

The first one has where he mentions the OS. Personally, I do think the OS is needed, but not in its present form. Instead, something far far simpler than even Linux. Yes, I've seen Alan Kay's similar presentation on the amount of code in windows and the F-22 raptor. Stacked it is supposedly unbelievably high like many stories, while their Smalltalk Machine built at Xerox had the entire language, GUI, text editor, image programs...etc in a tiny amount of code. In the modern day I think of Rebol which has a very powerful and simple DSL for so many tasks such as creating GUIs and the full download is like a few MB with zero install. Truly a marvelous piece of software.

mwcampbell · on Sept 15, 2017

> the entire language, GUI, text editor, image programs...etc in a tiny amount of code

But it wasn't internationalized, accessible to people with various disabilities (e.g. blind people via a screen reader), automatically installing updates over the network, encrypting all communication over the network (as in TLS today), etc.

No doubt some bloat comes from accumulated incidental complexity. But to some extent (I don't know how much), one person's bloat is another person's essential features. For example, if you tell me about some bloat-free GUI toolkit that draws its own widgets (as opposed to a wrapper over multiple platform native toolkits), my immediate reaction will be "What about accessibility?".

mpweiher · on Sept 15, 2017

[missing features]

Well, MS Office, for example, is around 20000x the code of early Smalltalk "Personal Computing" systems. Let's assume that that core code covers only 1% of the features, so you'd need 100x more code to implement all of them. I think this is being very generous, but let's stick with it.

That would mean you still have a factor 200 unaccounted for. In other words, even when being very generous, we still haven't accounted for 99.5% of the code.

throwaway7645 · on Sept 15, 2017

Thank you, that's what I was trying to say. I'd argue that 95% of all Word users only use a tiny fraction of what the product can do and if they had just written that part, it wouldn't need to be that insane amount of code. Even if you added all the network features...etc to their Smalltalk machine, I bet they would've done it with less code than MSWord, not to mention the size of the rest of Windows which is insane. I think his point might have been that society is on this path with ridiculous software size, complexity, bloat...etc. This software is enormously difficult to maintain, impossible to rewrite, and too hard for even a group of people to understand. Kay's talks are on YouTube and definitely worth a go. He also posts on here sometimes I believe, so maybe he will correct my assumptions.

Qwertious · on Sept 15, 2017

>I'd argue that 95% of all Word users only use a tiny fraction of what the product can do and if they had just written that part, it wouldn't need to be that insane amount of code.

Problem is, they use different tiny fractions.

com2kid · on Sept 15, 2017

> I'd argue that 95% of all Word users only use a tiny fraction of what the product can do and if they had just written that part, it wouldn't need to be that insane amount of code.

I think you'd be surprised.

Insert a graph into Word, now all of a sudden you need OLE so that when the underlying data changes the graph is automatically updated, axis are renumbered properly, the legend is in place, etc.

Insert a picture. Now it needs a caption. How does that picture flow with the rest of the text?

Just looking at the "Insert" tab in Word, I've used:

1. Page Break 2. Cover Page 3. Tables 4. Pictures 5. Shapes. 6. Smart Art 7. Chart 8. Links 9. Bookmarks 10. Cross-references 11. Comments 12. Header, footer, page numbering 13. Drop Caps 14. Interacting with Signature Lines 15. Date & Time 16. Symbols 17. Equations

That is 70% of the features of that one tab.

Feature code is expensive, it costs to write, and it has a heavy cost to maintain. It rarely gets written unless there is a demonstrated customer need for it.

There have been a lot of competing word processors over the years that pop up with just a minimal set of features, some of them even had Word support. These word processors end up being discarded the second someone comes across a feature it doesn't have, they need, and that Word does have.

Excel is even more this way. Look at a power user of Excel, a huge % of the program is exercised on any given day. Some groups use more of the numerical abilities, but a huge number of features are going to get exercised by a user on any given day.

> I think his point might have been that society is on this path with ridiculous software size, complexity, bloat...etc.

Maybe. But what is bloat? Apps written in Electron are easier to maintain and update despite the underlying platform being huge. I'd say it is because Electron solves a lot of small problems for you, but perhaps it is also because people give up on some things. Electron apps don't have the theming support that Winamp did, but adding real time streaming support for audio/video takes months instead of years, and has become table stakes in the modern world instead of an amazing differentiator.

I'm not really sure where I stand on all this anymore. :)

mwcampbell · on Sept 16, 2017

> Electron apps don't have the theming support that Winamp did

Pardon me for answering this at length, though it's tangential. But I actually know quite a bit about Winamp.

On the contrary, Electron, being based on Web technologies, offers a much better way to do theming, via CSS. Of course, most apps don't expose this capability to the user, but they could.

Winamp's theming support, at least in Winamp 2, was actually very crude. A Winamp skin was just a collection of bitmap images (BMPs, I think) that made up the main Winamp window, the EQ window, and (if I remember correctly) the borders of the playlist editor and media library windows. And some windows, like the main preferences dialog and the windows opened by various plug-ins (e.g. SHOUTcast), weren't themed at all; they had a generic Windows look.

And it's fortunate that the most complex parts of the Winamp UI (ignoring the media library) were not themed, because as far as I recall, Winamp never implemented proper accessibility for its custom UI. (I don't have Winamp on my current machine, so I can't easily verify that now.)

The Winamp main window was completely inaccessible with a screen reader, except for the title, and I guess we wouldn't have even had that if it wasn't needed for the Windows task bar. That wasn't so bad, because every function in that window had a keyboard command, and because Winamp is an audio player, the user could usually hear the result immediately. When toggling shuffle and repeat, Winamp provided a way to get the state of those flags through a window message; I don't know if they did that for accessibility or some other reason.

I don't recall if blind people had a good way to use the EQ. Perhaps they could adjust it with the keyboard and just listen to the results rather than looking at the sliders.

IIRC, the playlist window didn't use a standard Windows list view for its contents, but it was still more or less accessible in practice. All actions in that window could be done with the keyboard. And it used standard GDI text rendering functions to display the contents of the playlist. All good Windows screen readers could intercept GDI function calls (or their counterparts in kernel mode) and build an "off-screen model". This was good enough for knowing what text was in a window, and which text was highlighted. So a blind user could access the playlist editor. (Note that no screen reader has attempted to do this for more advanced graphics technologies such as Direct2D and DirectWrite.)

But if the rest of the UI, such as the preferences dialog and the SHOUTcast plugin windows, had been skinned... well, blind people probably wouldn't have adopted SHOUTcast so eagerly, from the day it was first released. As far as I know, no screen reader, even with an advanced off-screen model based on intercepted GDI calls, could provide good access to a fully custom UI, with buttons, text fields, combo boxes, check boxes, and so on, unless the application implements an accessibility API (for Windows, either Active Accessibility or UI Automation). In fact, the reason why Microsoft created Active Accessibility, the first accessibility API, was because they needed it for the dialogs in Office, since those dialogs use windowless controls.

But, bringing this back to the original comparison with Electron, with Electron you can have an application that's both fully skinned and accessible, as long as you follow web accessibility guidelines. So keep that in mind if you're inclined to reject the bloated Electron and use (or create) your idea of the perfect, lightweight, themable GUI toolkit.

com2kid · on Sept 17, 2017

Electron Apps I've used so far (Spotify, Slack) have had minimal theming support. Winamp had some crazy themes, especially the 3.x and 5.x series. Accessible no, little from that era was, but freeform window madness? Quite doable.

mwcampbell · on Sept 17, 2017

Electron seems to be getting there in terms of freeform window support:

https://github.com/electron/electron/blob/master/docs/api/fr...

A developer so inclined could produce a Winamp-like player based on Electron that's themable and accessible.

throwaway7645 · on Sept 18, 2017

And also take an absurd amount of resources.

mpweiher · on Sept 15, 2017

> 95% of all Word users only use a tiny fraction of what the product can do

Right. That was the first 100x I was talking about.

It's the second 100x (well: 200x) that's the issue ;-)

adrianN · on Sept 15, 2017

If only code complexity would scale linearly with features...

mpweiher · on Sept 15, 2017

So why doesn't it?

[My working hypothesis: glue code]

beagle3 · on Sept 15, 2017

kOS from http://kparc.com boots on metal, has a workable GUI, network, file system, text editor, database, programming language and a few other bells and whistles, in less than 500KB uncompressed.

No, it doesn't have accessibility or internationalization, but I suspect if it did (by the same people) it would not be much larger beyond the inherent complexity - e.g. Unicode directionality and glyph data; or sound waveforms for text to speech.

Most of the size of modern software, OS included, is accidental

throwaway7645 · on Sept 15, 2017

For those not aware, this is a bare-metal system for the k-language which is what you use with the kdb+ database system. It allows the user to program in an array based language (ASCI APL descendent) so they can write very terse and fairly fast programs with a GUI and when kOS comes out it wont need a traditional OS. The fintech industry pays kdb+ developers quite well. The entire language is probably only a few pages of C too as Arthur Whitney is a genius and his last language was like that. The only issue i have is it is really expensive and closed source.

feelin_googley · on Sept 15, 2017

Which particular metal has been tested?

I would like to buy that metal now.

Are they making this compliant with the multiboot specification so the user can use her own choice of bootloader?

If it is ARM, which bootloader are they using?

beagle3 · on Sept 15, 2017

Some asus model iirc; user geocar on HN is actually involved with the project, I believe - I'm just following progress.

wolfgke · on Sept 15, 2017

> http://www.ultratechnology.com/moore4th.htm

> https://www.red-gate.com/simple-talk/opinion/geek-of-the-wee...

For a different perspective on Forth and Chuck Moore read

> http://yosefk.com/blog/my-history-with-forth-stack-machines....

(also read the comments).

wolfgke · on Sept 15, 2017

> Instead, something far far simpler than even Linux.

Who said that Linux is simple?

throwaway7645 · on Sept 16, 2017

Good point. I guess in comparison to Windows.

wolfgke · on Sept 16, 2017

If you want an "academically fair" (e.g. only compare parts that implement related functionality) comparison for Linux (the kernel!), you should compare it to ntdll.dll+hal.dll (these implement the kernel; cf. https://en.wikipedia.org/w/index.php?title=Microsoft_Windows...). Here I think Linux would not be fare better (in terms of size) against Windows.

You say this is (from a practical perspective) an unfair comparison for Linux, since under Linux, your applications can (in principle) call the kernel directly, while under Windows, they are linked against the subsystem that one wants to use - so one has to include the subsystem, too. I agree here - but this already makes a comparison much harder, since there is no related concept to "subsystem" under GNU/Linux. Indeed what people describe as "bloated" under Windows is not the kernel itself, but the perhaps Windows subsystem and even more the immense amount of intertwined components that are installed by default on a Windows installation.

So a more fair comparison (from practical considerations) for Windows against GNU/Linux in terms of size would be a minimal system with which you can work properly and is supported by some vendor (to exclude some really experimental hacks of some Linux nerds which are really small in size, but not suitable for proper work). For Linux one can surely find such a configuration. For Windows the proper configuration to choose is "Windows Server 2016" in the "Server Core" or even "Nano Core" configuration: https://en.wikipedia.org/w/index.php?title=Server_Core&oldid...

Here I am sure the Linux sample would still be smaller than the "Server Core"/"Nano Core" installation of Windows Server - but Microsoft has learned their lesson that there exist users who want a really slicked down Server version of Windows and has delivered.

ioddly · on Sept 15, 2017

Where is Chuck Moore now? He used to write a blog that was sometimes quite educational.

Jtsummers · on Sept 15, 2017

https://groups.google.com/forum/#!original/comp.lang.forth/L...

Per this, he's retired now.

fish_fan · on Sept 15, 2017

I believe he's still with GreenArrays.

throwaway7645 · on Sept 15, 2017

He retired a week ago apparently.

olewhalehunter · on Sept 15, 2017

Chuck and Terry's work prove that most kernels are just really terrible compiler systems

wolfgke · on Sept 15, 2017

Terry A. Davis' TempleOS does not run in real mode, but in 64 bit mode (long mode) in ring 0.

emteycz · on Sept 15, 2017

I think what he really meant was real >time<.

wolfgke · on Sept 15, 2017

> I think what he really meant was real >time<.

On "normal" x86 processors (say: the CPU that will be inside your PC/laptop if you buy one) it is really hard (I don't want to claim "impossible", but at least really hard) to do real-time stuff. Intel knows that and this is also among the reasons why they released their Intel Quark SoC/micro controller, which is perfectly suitable for real-time stuff.

Why is this the case? One obvious reason is that the caches (D$, I$, micro op cache etc.) und the pipeline stages (EDIT: including out-of-order execution) make it really hard to predict/prove/test strict upper bounds on the performance of some code fragment on "normal" x86 processors.

Another also well-known "issue" is that many modern x86 clock up or down depending on core temperate (I think this behavior can at least partly controlled by the firmware (UEFI)).

But there is a much more subtle reason, too: It is SMM (system management mode): https://en.wikipedia.org/w/index.php?title=System_Management...

For backward compatibility a lot of legacy functions are implemented via SMM calls such as emulating a PS/2 mouse/keyboard (while really a USB mouse/keyboard is connected). As long as you cannot guarantee that the code of your OS will not trigger such a function that is emulated via SMM, it is very hard to ensure real-time guarantees.

EDIT: To quote directly from the Wikipedia article: "Operations in SMM take CPU time away from the applications, operating system kernel and hypervisor, with the effects magnified for multicore processors since each SMI causes all cores to switch modes. There is also some overhead involved with switching in and out of SMM, since the CPU state must be stored to memory (SMRAM) and any write-back caches must be flushed. This can destroy real-time behavior and cause clock ticks to get lost. The Windows and Linux kernels define an ‘SMI Timeout’ setting a period within which SMM handlers must return control to the operating system or it will ‘hang’ or ‘crash’.

The SMM may disrupt the behavior of real-time applications with constrained timing requirements."

Zaak · on Sept 15, 2017

> Another also well-known "issue" is that many modern x86 clock up or down depending on core temperate (I think this behavior can at least partly controlled by the firmware (UEFI)).

From what I've read, current Intel processors simply can't be set to run at a single constant frequency.

cbetti · on Sept 15, 2017

If I can offload work to the OS, great! I don't want to manage the virtual memory backing my mmap when the kernel is really good at it. It is not inherently bad when CPU time is spent in the kernel.

The point I'm taking away from this article is that some workloads (probably not mmap) trigger bottlenecks in the kernel, and would benefit if it were possible to replace those kernel components via app library.

That said, we do things like this all the time today. For example, I can provide my own memory allocation library instead of depending on the one provided by the operating system.

So I'm unconvinced there are many real world bottlenecks without a descent solution available today.

I wonder: Is the real point that leveraging custom implementations is hard and we could all benefit from a nice central repository and a reasonable dependency manager?

Animats · on Sept 15, 2017

Well, having an entire copy of Linux in a container is a bit much.

mgraczyk · on Sept 15, 2017

This seems like a trade-off we're already okay with making.

    $ du -h /boot/vmlinuz-4.4.0-93-generic 
    6.8M    /boot/vmlinuz-4.4.0-93-generic
    $ du -h /usr/bin/docker
    12M     /usr/bin/docker

skrebbel · on Sept 15, 2017

I think you're misunderstanding what GP means by "linux" on purpose.

orbat · on Sept 15, 2017

And conveniently forgetting that docker is written in Go, so the binary also has the runtime and all used libraries baked in

bluedino · on Sept 15, 2017

I thought this was going to be about things like Dynix

known · on Sept 16, 2017

I'm currently using https://askubuntu.com/questions/138356/how-do-i-get-a-live-u... and really like it