Hacker News new | past | comments | ask | show | jobs | submit login
Crun: Fast and lightweight OCI runtime and C library for running containers (github.com/containers)
109 points by nateb2022 on June 4, 2023 | hide | past | favorite | 49 comments



systemd has nspawn, often overlooked ion the container discussion.

https://wiki.archlinux.org/title/systemd-nspawn

"systemd-nspawn is like the chroot command, but it is a chroot on steroids.

systemd-nspawn may be used to run a command or OS in a light-weight namespace container. It is more powerful than chroot since it fully virtualizes the file system hierarchy, as well as the process tree, the various IPC subsystems and the host and domain name. "


nspawn (and LXC/LXD) is overlooked because people mostly want application containers not system containers.


I actually disagree that this is the reason for the aversion. Instead I think it comes down to a couple of things:

1. Containers are commodity. For commodities, price wins first, then marketing. Docker is equally free (as in beer) and is far better marketed 2. There are a very low number of people working on systemd-nspawn and a very high number of people working on docker (and the ecosystem). 3. Dockerfiles and images are ubiquitous. They're easy to support from other run times, but if you're already in the ecosystem, what's the incentive to change?


There is also the fact that some people have a knee jerk reaction to hate anything associated with systemd. Although, I think that is less significant than the two reasons you listed.


I think people in generate are happy that systemd happend


yea thats what my general understanding of that was too, people want the most shrink wrappiest and minimal virtualisation boundary


* crun could go much lower than that, and require < 1M. The used 4MB is a hard limit set directly in Podman before calling the OCI runtime.*

Enough room to hide a lisp implementation!


For those of us not into this space, does this replace docker or kubernetes or just some piece of the puzzle that could be bolted into both to replace some component?


Take a car, the engine is the OCI runtime, crun, containerd, etc.. The whole car is Docker, or Podman. Kubernetes is the city where the cars run.

---

EDIT: For more details, OCI (Open Container Initiative) defines a few things:

  - an image format (how to create them)
  - a runtime specification (how to run them)
  - ...
A container runtime is an implementation of one of those things.

Docker is a set of tools hidden under a unified CLI which gives you the ability to:

  - create images
  - upload/download images to/from a registry
  - run images with volumes, networking configuration, environment variables, ...
But Docker is only one host.

Kubernetes is an interface to abstract a cluster of hosts. Everything is described as a "resource" which goes through the "control loop":

  1. the user uses the k8s REST API to create/read/update/delete the resource
  2. the k8s api server will contact admission controllers (via webhook) to authorize (and/or mutate) the action
  3. the action is persisted to a distributed database (usually, etcd)
  4. then, controllers are notified of the change and will run the side effects
This is the simplified version. But what is stored in the distributed database is called the "desired" state, and controllers have the duty of observing the real state (the "observed" state) and make it converge towards the "desired" state.

So a "Pod" controller's job will be to observe Docker instances, to check what containers are running, and start/stop the containers based on what "Pod" resources exists in k8s's database.

A "Deployment" controller's job will be to observe the "Pod" resources in the k8s database and create/update/delete them based on what "Deployment" resources exists in k8s's database.

etc...

In theory, Kubernetes does not need docker. You could have a "proxmox" controller which would start/stop virtual machines instead.

Kubernetes provides a lot of tooling for storage management, secret management, networking, workload management, etc... so that you can manage it all with a unified REST API.

The very nature of the "control loop" makes it very extensible, allowing you to build layers of abstraction on top of layers of abstraction on top of layers of abstraction ... A real "onion cloud" if I dare say it.


> The whole car is Docker, or Podman. Kubernetes is the city where the cars run.

Kubernetes doesn't use Docker or Podman.


And there are more than 2 models of cars. The metaphor is still valid.


What does Kubernetes use instead?


kubernetes can use anything that conforms to the CRI interface, which in practice is either CRI-O (RedHat) or Containerd (Docker, Inc.). Podman and Docker are also consumers of both of those engines


Kubernetes needs an OCI runtime to run containers with. Crun is one implementation it can use.

Docker also appears to be able to use crun for it's engine as well. https://github.com/containers/crun/issues/37


It replaces `runc` which is used by most non-docker container runtimes to actually start the container. Thus the punny name.

When using kubernetes, the hierarchy is as follows:

  1. kubernetes master tells kubelet what to do (sort of, not important here)
  2. kubelet uses CRI-compatible runtime to start containers
  3. containerd or CRI-O handle management of containers and start them using runc or crun
  4. runc/crun are the applications that setup the final environment of application to run in container, using resources (mounts, devices, etc) provided to them by upper layers. They also handle things like sending stdout/stderr to logs, or setting up a pseudoterminal to talk to a program in container, etc.


I mean, Docker is using runc by default as well.


Kubelet speaks CRI to containerd, which speaks OCI to runc (or crun).

Docker can be wedged in there between Kubernetes and containerd (which was originally part of Docker).

The OCI implementation is the lowest-level component in the “container stack”.


> A Lua binding is available.

Bliss. Fantastic. I'm going to write something with this. I remain delighted every single time somebody rewrites a thing in C.


I try to avoid setuid binaries written in memory-unsafe languages.

This feels like the wrong direction.


Looks like there is youki [1] for that.

[1] https://github.com/containers/youki


crun is not a setuid program


I can’t go into too much detail but: IME the benefits are somewhat doubtful and it can run into weird issues that are hard to debug (because C) at scale. I wouldn’t use it unless I had a maintainer or a C expert on my team. The other OCI runtimes are written in go and are (generally) easier to debug.


Why is Go easier to debug? It kinda sounds like you're more familiar with one language than another and are basing your assessment of the tool on that.


Compare these two programs:

   char foo[123];
   int x;
   foo[124] = 10;
and:

   var foo [123]byte
   var x int
   foo[124] = 10;
In C, this surprisingly changes the value of x. (Well, it's undefined, so it could do anything!) In Go, the program crashes with the error that index 124 is out of bounds.

C is the absolute best programming languages for programmers who don't make mistakes. I'm not one of them, and I've never met one of them. If it works for you, that's impressive!


You know sanitizers and static analysis tools exist, and have existed for decades, and have been the basis for the work done in Rust and other "safe" languages?

Also, a disciplined C programmer will always keep the size of the buffer near the buffer itself.

For example, using something like this is perfectly safe (though, we can see the performance hit of those if-statements if used intensively):

  struct buffer {
    size_t size;
    char *data;
  };

  void buffer_alloc(struct buffer *buf, size_t size) {
    if (buf != NULL) {
      buf->data = calloc(size, sizeof(char));
      if (buf->data != NULL) {
        buf->size = size;
      }
    }
  }

  void buffer_free(struct buffer *buf) {
    if (buf != NULL && buf->data != NULL) {
      free(buf->data);
      buf->size = 0;
    }
  }

  bool buffer_read(struct buffer buf, size_t offset, char *dest) {
    if (dest != NULL) {
      if (buf.data != NULL && offset >= 0 && offset < buf.size) {
        *dest = buf.data[offset];
        return true;
      }
    }

    return false;
  }

  int main() {
    struct buffer buf = {0};
    buffer_alloc(&buf, 123);
    char c;
    buffer_read(buf, 124, &c);
    buffer_free(&buf);
    return 0;
  }
It is possible to write safe C code, but it takes understanding of the language, sometimes of the compiler as well.

The whole "There is no programmer who don't make mistakes" argument is fallacious at best.

C is like a chainsaw, there are certain rules on how to use it safely. Yes you can cut limbs with a chainsaw, that does not make the chainsaw useless.


This comes up from time-to-time. Surely, there's some caveat here either for performance or other reasons (I'm not a solid C programmer enough to know the if this hypothesis is viable or not). If it was so simple this approach would be ubiquitous and C would be safe. What am I missing?


> If it was so simple this approach would be ubiquitous and C would be safe.

There are some unavoidable footguns (notably around unintended integer promotions and overflow), but for four decades life-critical machinery like rockets, munitions, airplanes, heavy industrial equipment, automotive control systems ... and more have been controlled by C code, and the number of lives lost due to bugs you are complaining about are statistical noise.

It's been used extensively in products that could never be patched or updated after release, and could only be recalled, and yet I recall only a few instances when bugs lead to lives lost, and in at least one of those cases the culprit was identified to be something other than the language (i.e. those same errors or worse would have resulted even if a different language was used due to the dev process and architecture).

These bugs are not even a rounding error! So it would seem that writing safe C is ubiquitous. You're seeing the statistical noise and concluding it is representative of all software written in C, when you should be looking at all that noise and saying "is this all there is?"


A chainsaw is not safe. But you can learn to use it safely.

Replace chainsaw with:

  - knife
  - guns
  - C programming language
There are tools and methods (and a lot of discipline) to ensure safety in C:

  - compiler's sanitizers[0]:
    - address: to detect out-of-bounds and use-after-free bugs
    - pointer-compare, pointer-subtract: to detect invalid operation when pointers are non null
    - shadow-call-stack: to detect return address overwrites (stack buffer overflows)
    - thread: to detect data races
    - leak: to detect memory leaks
    - undefined: to detect undefined behaviors
    - ...
  - static analysis tools, like Splint[1]
Would you ride a motorbike without the proper protections (helmet, heavy jacket, ...) ?

[0] - https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.h...

[1] - https://github.com/splintchecker/splint


Unless unsafe package is used which is rare or there’s a bug in the compiler which is rarer still you will not get silent memory corruption bugs which are stupid common in C and hard to debug to boot


So your whole argument isn’t that crun itself is unstable but that C can be hard for non-experts to write programs in. That could be said of tons of stuff including the OS kernel you’re using to run containers, so I’d be curious to hear what you’re using to replace Linux in your work.

The principal issue with Go applications in my work continues to be the massive size of the executables.


I can't speak for crun specifically since I haven't used it personally but almost every single C application I've had to deal with had these hard to debug memory corruption bugs which required an expert (me) days/weeks to fix. And that included the Linux kernel itself btw, the difference being it was almost always fixed in the upstream by the time I found it.

> The principal issue with Go applications in my work continues to be the massive size of the executables.

That is certainly true in general case, however out of curiosity I went to see the size of containerd runtimes on my machine and:

  -rwxr-xr-x 1 1001 121 46889792 Sep 22  2022 /usr/local/bin/containerd
  -rwxr-xr-x 1 1001 121 9699328 Sep 22  2022 /usr/local/bin/containerd-shim-runc-v2
Does not seem that bad... (note that only the shim is launched per container, containerd process is a single systemd service)


Some systems I’ve worked in have entire root filesystems that are merely twice the size of that one binary. And in those cases sending an extra 40 megabytes over the network connection is a big imposition. So there are still places where executable size matters and that’s why we’d want crun and not runc. If someone wrote an alternative in Rust I’d be interested but golang is just too piggy.


Sure but I’m having a hard time imagining a scenario where a full blown OCI (i assume youll run it with kubelet/nomad) is required on such a constrained hardware and simple nspawn or systemd container won’t do


Easier to track allocations and tie them to structures.


Technically if you use jemalloc, which most everyone should do anyway, it comes with built-in instrumentation but you need to enable it compile time and generally not many are aware of this.


I work on many C/C++ projects and none of them use jemalloc. I also use several databases written in C, or C++, and none of them use jemalloc either.

I do however wish Mongo offered the option: https://jira.mongodb.org/browse/SERVER-39325


ClickHouse uses jemalloc as the only option: https://github.com/ClickHouse/ClickHouse/

We have contributed patches and bugfixes to jemalloc as well.


I'd expect nothing less for clickhouse :)


I think tcmalloc will output protos too now which google pprof tool understands. If you’re using standard glibc malloc you’re probably leaving a lot of performance on the table


Performance wise it doesn't matter for most projects like games where allocations aren't done in hot paths anyway.

Problem with tcmalloc is fragmentation. Luckily I can then on aggressive decommit in some scenarios and that helps.


Less undefined behavior I would assume.


>> hard to debug (because C) at scale

I think what the crun author is positing is the container runtime is closer to the kernal (cgroups) than it is the orchestrator (Kubernetes). I tend to agree.

Of course kernel / cgroups / container runtime need to be rock solid. There should be no need to debug these in most use cases.


> I wouldn’t use it unless I had a maintainer or a C expert on my team

I suppose you keep a Linux kernel dev or C expert on your team when you run on Linux too?


There's also 1 in Rust: https://github.com/containers/youki


I'm no fan of Go, but "I used C instead" is not a selling point


being the main author of crun, I can clarify that statement: I am not a fan of Go _for this particular use case_.

Using C instead of Go avoided a bunch of the workarounds that exists in runc to workaround the Go runtime, e.g. https://github.com/opencontainers/runc/blob/main/libcontaine...


Indeed for all its design issues, it is still way better than using plain old C, unless there is no way around it.


I went looking for an answer to the obvious question, and there is indeed a Rust version. https://github.com/containers/youki#motivation has a nice comparison with both runc and crun.


I'd be very cautious about running such a core piece of infrastructure in an insecure language. Especially one without the level of scrutiny other projects like linux have.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: