I have a dumb question. Why can “we” write pretty good apps in languages other t...

immibis · on Jan 21, 2022

It boils down to abstractions papering over ABI details.

How do you write "put 0x12345678 to register 0x04000001" in assembler? mov eax, 0x04000001 / mov [eax], 0x12345678

How do you write it in C-as-portable-assembler? You write (u32)0x04000001 = 0x12345678;

How do you write it in Java? You can't, the language has no such ability and if you try it's a syntax error. You have to call into a routine written in a lower-level language.

How do you write it in C-as-abstract-machine? You can't, the language has no such ability and if you try it's undefined behaviour. You have to call into a routine written in a lower-level language.

By the way, you can't write an operating system in C-as-portable-assembler either. No access to I/O port space, no way to define the headers for the bootloader, no way to execute instructions like LGDT and LIDT, no way to get the right function prologues and epilogues for interrupt handlers and system calls, no way to invoke system calls. All those things are usually written in assembler. Writing operating systems in C has always been a lie. Conversely, you can extend the compiler to add support for those things and then you can write an operating system in extended-C!

wruza · on Jan 21, 2022

This addresses a part of my question, which I didn’t make clear, thanks! I mean after all this assembler stuff one could just use BASIC or similar. Yes, Java has no concept of PEEK/POKE, IN/OUT, but it just wasn’t designed for that. Meanwhile, 1980s small systems were all assembly + basic/fortran. Of course they had no kernel in a modern sense, but all the devices were there: a speaker (SOUND), a serial line to a streamer/recorder (BLOAD), a graphics controller, no dma though, but it’s just a controller with the same “ports” as well, which can r/w memory and generate interrupts. I don’t get it why we don’t just skip C to something high-level after wrapping all this pio/dma/irq/gdt/cr3 stuff into __cdecl/__stdcall format and then use ffi of a language which would decide to support that. I also don’t understand GC arguments down the thread, because GC over malloc seems to be just a synthetic detail. You definitely can implement GC over a linear address space, just bump alloc it until the limit, or page-table however you want for dma. Malloc is not hardware, it isn’t even a kernel thing. Apps run on mmap and brk, which are similar to what kernels have hardware-wise. Mmap is basically a thin layer over paging and/or dma.

It was so easy and then blasted into something unbelievably complex in just few years. Maybe 80386 wasn’t a good place to run typescript-over-asm kernel, but do we still have this limitation today?

pjmlp · on Jan 22, 2022

We don't do that, mostly because many communities cargo cult C and C++, so unless you have a companies like Apple, Microsoft and Google stepping in and asserting "this is how we do it now if you want to play on our platform".

Arguably with their push for Swift and Java/Kotlin, Apple and Google are much forward than Microsoft on this matter, given that .NET tends to suffer from WinDev worshiping C++ and COM.

You can get that BASIC experience nowadays when playing with uLisp, MicroPython and similar environments for embedded platforms, most of them more powerful than 16-bit home computers.

phicoh · on Jan 21, 2022

Let's assume you want write most of the operating system in the high level language and as little as possible in assembler.

For most languages, writing hardware trap handler becomes quite a bit of an issue. In trap handler you cannot rely on an extensive runtime system. Anything that does garbage collection is probably out. Anything that does dynamic memory allocation is out as well.

Beyond that, how easy is it to create pointers, create datastructures that match a specific memory layout, etc. Low level device drivers need to talk to hardware in very specific ways. If it is hard to talk to the hardware, most people are probably not going to bother using that language for an operating system.

In theory you could mix and match languages in a kernel. For example, a filesystem could be written in a language that has an extensive runtime system.

nine_k · on Jan 21, 2022

I'd say that Rust (and, to a smaller extent, Zig, and, AFAICT, Ada) allow to write code that is guaranteed to not allocate, and define the exact memory layout of certain structures, all while offering much tighter protections than C.

Of course, there are things that cannot be expressed in safe code in either language. But marking fragments of code as unsafe, where C-like unconstrained access is allowed, helps a lot to minimize such areas and make them explicit.

There is definitely room for a more expressive and safe languages in the kernel-level space. We can look at Google Fuchsia or maybe at Redox OS, both are very real operating systems trying to use safer languages, with some success.

zdragnar · on Jan 21, 2022

I think stability plays a big role in C continuing to remain dominant. Rust and Zig arent there yet, and wrt Rust in particular the ownership model doesn't play nearly as nicely in non-deterministic environments (taking far more code to deal with hardware such as an external display or printer that might, for example, get randomly unplugged at any point in time works against the grain of a static memory ownership analysis).

nine_k · on Jan 21, 2022

I'd say it's a good example why more static checks like lifetimes are useful.

With them, you can at least tell apart data structures that are fleeting and can disappear when a cable is ejected, and those which should stay put. This likely might help avoid another double-free or another freed pointer dereference.

zdragnar · on Jan 22, 2022

You may be interested in this:

https://way-cooler.org/blog/2019/04/29/rewriting-way-cooler-...

int_19h · on Jan 21, 2022

Ada still has plenty of UB; it's just that the cases where it arises are usually more explicit.

pjmlp · on Jan 22, 2022

If we don't adopt X because it doesn't solve 100% of the problems, then we will never improve.

bell-cot · on Jan 21, 2022

If anything in the kernel is written in a language that has an extensive runtime system... Well, extensive runtime systems are pretty reliably resource hungry. And when they might suddenly need which sorts of resources tends to be unpredictable.

Vs. the kernel must keep working reliably when resources are running low.

wruza · on Jan 21, 2022

But, today linux simply kills any process to free memory. What could prevent a gc (which also serves allocations, not only collects them back) to just do that on an emergency cycle? Destroy a record in a process array and reclaim its pages (of course without doing any allocations on the way, or by using an emergency pool). Or even just reclaim its pages and wait for it to crash on a page fault if you feel lazy.

which sorts

Dynamic languages only do memory-related ops unpredictably, or is it more than that?

rocqua · on Jan 21, 2022

I would guess that the big difference between an app and an OS is that the OS needs to do more complicated things with memory addresses.

An app that runs has its own nicely mapped address space. And it interfaces with devices through system calls. An operating system has to keep the actual addresses of everything in mind, and it usually has to talk to devices through virtual addresses.

As an example of what I think might be the problem. If the OS wants to read data from a device, it might allocate a buffer, wait for the device to write into that buffer, and then later read it. For the compiler, that is essentially "reading uninitialized memory" and thus undefined behavior.

nine_k · on Jan 21, 2022

The example works because the compiler has no way to know that the programmer intends the memory to be filled by e.g. a DMA transfer from a device.

If a programmer could communicate this idea to the compiler, it would be somehow safer to write such code. There is a big difference between intentionally reading what looks like initialized memory, and doing so by an oversight.

rocqua · on Jan 21, 2022

It's not so much about 'intent'. The spec simply says this operation is undefined behavior. You could have a compiler that you could somehow inform "please just define this behavior as reading whatever is in memory there". But that supports the original point of the article, that plain ISO C is not suitable for OS programming.

pjmlp · on Jan 22, 2022

Which is why many that learn "my compiler C" than get surprised by what happens when their code runs somewhere else and then most likely blame the other compiler instead of blaming themselves by not learning the differences between ISO C and their daily compiler.

tenebrisalietum · on Jan 21, 2022

> Is talking to hardware so much different than talking to APIs?

It depends. If your hardware is behind a bus or controller device that's serviced by a separate driver, then you're using APIs of that bus/controller driver.

But think of having to talk to the TCP/IP stack using system calls - you are using an API but you'll still need to have some structure just beyond moving data back and forth over a bus. A USB mass storage driver is going to need different data moving over the USB interface than a USB network interface driver.

Different buses work differently as well - USB device addressing is different than SATA or PCI-E device addressing.

If you are really talking directly to a device, you're manipulating registers, bits, ports, etc. You may have to involve IRQs, etc. Your serial port, for example, can hold 16 bytes before it generates an IRQ to tell the CPU it has data if it's a 16550 I think. Your SATA interface doesn't work like that, it can actually DMA data directly to memory. But both of these could be streamable devices to an operating system.

rowls66 · on Jan 21, 2022

I think that the statement should be refined to say that it is not possible to develop a monolithic kernel based OS in ISO standard C. A monolithic kernel generally relies on passing pointers to memory between components with few restrictions. This can be problematic for some languages/compilers. However a microkernel OS that provides more structure around how data is shared between OS components can support development of many OS components in multiple languages. Even languages requiring significant runtimes like Java or C# could be used for many OS components like file systems, network stacks or device drivers.

Historically, it has been difficult to beat monolithic kernels for performance and efficiency, and through significant effort, monolithic kernel based OS's exist that are reliable enough to be useful. However, the monolithic kernel is not the only OS architecture.

adamc · on Jan 21, 2022

While maybe theoretically of interest, it is far afield of the pragmatic considerations that underlie the paper: worked on operating systems in common use TODAY.

pjmlp · on Jan 22, 2022

Ever heard of Android?

pjmlp · on Jan 22, 2022

microEJ, Meadow Project, Android are such examples.

_3u10 · on Jan 21, 2022

Yes, C provides a way to talk to ABIs, in addition to APIs. It's not just "talking to hardware" it's talking to other software in a reliable way, such that you can upgrade your C compiler and have code written in C89 talk to code written in C11 which is unheard of in most of the other languages that don't support an ABI. (Think Python2 being incompatible with Python3)

Software has gotten much faster. Yes, almost all software benefits from -O3. What do you mean "hardcode"? as far as I know libjpeg can be linked statically...

UB is easy to maintain, lets take integer addition & overflow, you just issue an ADD instruction and however that CPU executes the ADD instruction is how integer overflow works on that platform and then in the C standard you write "integer overflow is undefined behavior".

eatbitseveryday · on Jan 21, 2022

> Why can “we” write pretty good apps in languages other than C, but can’t write operating systems? Is talking to hardware so much different than talking to APIs?

Operating systems are written in other languages, such as C++ and Rust.

One requirement is that a language must be compiled and thus cannot rely on a runtime. That excludes Go and Java.

The language needs to support direct memory manipulation.

The compiled binary cannot be emitting system calls since that binary IS the kernel. Thus the compiler must be told to not link or include standard libraries.

You need to disable certain optimizations like advanced vectorization extensions and red zone use on the stack.

There are others. Lots of specific control needed.

matthewaveryusa · on Jan 21, 2022

This is why rust is so exciting: it's the first new language that's graduated from toy-language space we've seen in a while without a runtime. (python, ruby, go and typescript-nodejs are the other graduates I'm thinking about.)

ArbixMenix · on Jan 21, 2022

>One requirement is that a language must be compiled and thus cannot rely on a runtime. That excludes Go and Java.

Maybe I'm wrong, but I know that there exist CPUs made specifically to natively execute Java bytecode, so in reality if the hardware has a baked-in language interpretation it would be actually possible to write an OS completely in Java

pjc50 · on Jan 21, 2022

ARM "Jazelle" was capable of this, but it required a C implementation of a JVM. Any GC-dependant language has this problem.

pjmlp · on Jan 21, 2022

Please find the C code here,

https://people.inf.ethz.ch/wirth/ProjectOberon/Sources/Kerne...

bluGill · on Jan 21, 2022

True, you can design a CPU for anything. However a OS that depends on such a CPU is not portable to anything else, and can't easily run most programs that people depend on (emulators are possible, but not easy). Also most CPU advances haven't gone into such a thing and it is tricky to apply those advances while also providing what the language needs. None of this is impossible, but it makes such CPUs in todays world of questionable value.

Note that you can port any OS written in C to such a CPU with "just" a new compiler backend and a few drivers. Your OS won't take advantage of the features the CPU provides, but it will work.

3836293648 · on Jan 21, 2022

Eh, can you really properly implement a CPU without interrupts? I wouldn't categorise anything in that space as a driver

bluGill · on Jan 21, 2022

Good point. I assumed there was some form of interrupt system, but not all CPUs need to have it, and lacking that your OS choices will be limited.

gpderetta · on Jan 21, 2022

running java bytecode natively is neither necessary nor sufficient as you can compile java to any other native ISA, but you do still a relatively heavy runtime for GC.

Having said that, there have been OSs written in languages with heavy runtimes, even GC.

pjmlp · on Jan 21, 2022

> That excludes Go and Java.

Only for those that cargo cult against using them.

https://www.f-secure.com/en/consulting/foundry/usb-armory

https://developer.arm.com/solutions/internet-of-things/langu...

https://www.ptc.com/en/products/developer-tools/perc

https://www.aicas.com/wp/products-services/jamaicavm/

eatbitseveryday · on Jan 21, 2022

I was answering the question in a general sense for the more prolific operating systems and on generic commonly available general-purpose processors.

Yes one can implement a CPU that natively executes a runtime for a high-level language, make your own ASIC, or FPGA, etc. that does this. That is a more advanced response to the general question.

Knowing the detailed points I mentioned will help understand why specialization of processors is needed to support other higher-level languages that do not meet the requirements I laid out.

pjmlp · on Jan 21, 2022

Which just proves your lack of knowledge that those runtimes target generic commonly available general-purpose processors.

None of those products use FPGAs or custom ASICs.

eatbitseveryday · on Jan 21, 2022

> just proves your lack of knowledge

Tone is not needed.

For TamaGo, it seems to allow developers run their application, not build an OS on the hardware. But I have not played with it, you are right.

> TamaGo is a framework that enables compilation and execution of unencumbered Go applications on bare metal

The environment does not seem to allow building a generic operating system [1]. F-Secure ported the runtime itself to boot natively. But please correct me.

> There is no thread support

The environment you run in is specifically curated for Go applications, such as the memory layout. I'd call this an "appliance" rather than enabling Go to be used for full-fledged generic operating system implementations.

[1] https://github.com/f-secure-foundry/tamago/wiki/Internals

pjmlp · on Jan 21, 2022

Tone follows the last paragraph, returning ball.

An OS is an OS, regardless of the userspace.

MauranKilom · on Jan 21, 2022

https://xkcd.com/801/

user-the-name · on Jan 21, 2022

> I have a dumb question. Why can “we” write pretty good apps in languages other than C, but can’t write operating systems? Is talking to hardware so much different than talking to APIs?

To some small extent, yes. But I don't think that is the main issue here.

The real issue is that the stakes are much, much higher when implementing an operating system than when writing, say, an image editor. You can live with an occasional crash in a userland app. But the same crash in an operating system may open the door to taking over the entire computer, possibly even remotely.

pjc50 · on Jan 21, 2022

There are some other candidates, as expressed below, but really the main problem is how difficult it is to write and deploy an operating system that's of usable capability. Even just hitting enough of POSIX to get a GUI and a browser up is a pretty huge amount of work.

How many operating systems do we use that are less than 20 years old?

tyingq · on Jan 21, 2022

I suppose because there aren't many languages that allow you to manipulate arbitrary memory locations and cast portions of that to arbitrary types, and also allow relatively easy inline ASM. Which maybe isn't 100% necessary, but seems to be helpful at an OS level.

GoblinSlayer · on Jan 21, 2022

There are operating systems written in other languages than C.

A driver doesn't need to be implemented in C, but the kernel API is likely written in C, your code needs to talk to it somehow. If your driver is written in C, it's as simple as #include.

wruza · on Jan 21, 2022

Isn’t this sort of circular? If very low-level, I mean in/out, lgdt, etc, were exposed as v8-bare-metal modules, it would be as simple as require() then.

  t = require(“awesome-8253”)
  p = require(“pio-node”)
  // cast your usual 0x42, 0x43, 0x61 spells

pjmlp · on Jan 22, 2022

Yes, contrary to C myths, any language can have those primitives as Assembly libs, which ISO C also requires anyway.

gpderetta · on Jan 21, 2022

A lot of C++ codebases benefit greatly by O3 due to the more aggressive inlining and interprocedural optimizations.

Also may UB exploiting things like strict aliasing are enabled by default at all optimization levels in GCC.