Hacker News new | past | comments | ask | show | jobs | submit login
Rust and Nix = easier Unix systems programming (kamalmarhubi.com)
255 points by kalmar on April 14, 2016 | hide | past | favorite | 81 comments



I always find it a bit unfair when I see sloppy C programs used for shock value. What if the Rust developer uses fork().unwrap_or(default_value) in a hurry, or writes

    if let Some(child) = fork() {
        do_only_child_stuff(); 
    } else {
        do_only_parent_stuff();
    }
or

    if let Some(ForkResult::Child) = fork() {
        do_only_child_stuff();
    } else {
        do_only_parent_stuff();
    }
Now, if you're about to tell me that the examples above are totally stupid and no developer would do such a thing, then you know how I feel about the sloppy C versions. Doing a system call and not checking for error is totally stupid as well.

By the way, you can also write your own wrapper functions in C, that transform the return value into something like

    struct fork_status {
        enum { ERROR, PARENT, CHILD } state;
        int ret;
    };
Then Clang and GCC will warn you about missing switch cases.

That said, the libc bindings in Rust are pretty low-level and a project that offers higher-level wrappers can be very helpful, so I hope my comment doesn't create the impression that I'm ripping on the project itself.


> What if the Rust developer uses fork().unwrap_or(default_value) in a hurry

The point here is that the language's tools and APIs can significantly better drive the developer towards the safe/right solution, that's a large point of type theory and static type systems after all. In this case rust's type system is used to split out the various "result cases" and notify the developer upfront of the various cases to handle. The return type pretty much tells you how the function will behave and what you need to take care of as the caller.

That aside, why would you unwrap_or(default_value) if you're in a hurry when unwrap() is shorter (and you can later grep for "unwrap()" to find dodgy/hurried code, whereas unwrap_or is a perfectly legitimate recovery strategy).

> Now, if you're about to tell me that the examples above are totally stupid and no developer would do such a thing, then you know how I feel about the sloppy C versions. Doing a system call and not checking for error is totally stupid as well.

The issue being that even though you have a compiled statically typed language it's of absolutely no help in "checking for error", and interactions between syscalls can be hard to predict, not checking for fork(2)'s error isn't the end of the world... until you pass its result to kill(2) for instance (it might also give strange results if you pass specific pids to waidpid)


This has been proven to be wrong over and over again with the many variations of languages that came after C. Newer "safer" languages don't improve bugs


On contrary, empirical studies done in the 90's and such by the likes of Mitre Inc showed people using Ada and C++ were more productive than C developers while producing way less defects. That the language and libraries were designed specifically to counter hard-to-track issues were one of the reasons why. That languages would improve safety was known far back as MULTICS with PL/0 where it's prefixed strings and reverse stack prevented two of the most common crashes/hacks in UNIX/C land.

The evidence is on our side that well-design language significantly reduces number of defects in production code if other variables are equal.


Where's this "proof" you speak of?

By contrast, strong static typing does in fact prove the absence of certain errors.


Where are all the use-after-free bugs leading to remote code execution (for example) in projects written in languages other than C or C++?


That is blatantly false. Type-safety can prove (as in a mathematical proof) that whole classes of bugs are impossible.


Author here. This is a really valid criticism of the post, and I think it'll inform the next post I write on the topic. Thanks!


The difference is that in C the bad way is often the most concise/easiest way. So e.g. snippet examples tend to not check for errors, and the language guides you towards not checking. http://www.haskellforall.com/2016/04/worst-practices-should-... . IIRC C simply doesn't have a nice, standardized way to do "compose several possibly-error calls, resulting in either a single error or the successful result".


The point is that nix-rs doesn't let you kill all the processes by accident, you have to be trying to do it in Rust. You also have to explicitly ignore the error. Whereas in C it's pretty easy to forget to check for the error and subsequently crash everything.

Sure, you can have the same higher level API in C. Though Rust will not compile if there's no exhaustive match, and you cannot access the `int ret` if it's in the wrong state, but it's close enough.


  struct fork_status {
      enum { ERROR, PARENT, CHILD } state;
      int ret;
    };
I think the point of tagged unions (as in OP's example) is that you cannot access `child` if the call failed. Your example does have an exhaustive check (by a compiler warning), but it doesn't prevent you from misinterpreting `ret` as a pid. Does C have a method to enforce this restriction?


Well pids should be pid_t not int. System calls should really return two values, return value and errno, but C doesn't handle that very well either.


> Well pids should be pid_t not int.

pid_t isn't a newtype, it's just a typedef, so that doesn't really make a difference.

> System calls should really return two values, return value and errno, but C doesn't handle that very well either.

Most system calls want to return either, not both. And C doesn't handle that at all.


Appropriate tangent: Remember that part where null terminated strings save at most three-bytes per string? And if the allocator is long word aligned it saves none-bytes. The first to arrive sets a disproportionate amount of the direction.


Not sure how you get that. Let's assume that memory is allocated in sizes that are multiples of 4, and length-prefixed strings have a 4 byte length prefix.

  str len:    c-str bytes:    size-prefix bytes:  difference:
  1           1+1 (4)         4+1 (8)             4
  2           2+1 (4)         4+2 (8)             4
  3           3+1 (4)         4+3 (8)             4
  4           4+1 (8)         4+4 (8)             0
  5           5+1 (8)         4+5 (12)            4
  6           6+1 (8)         4+6 (12)            4
  7           7+1 (8)         4+7 (12)            4
  8           8+1 (12)        4+8 (12)            0
So, for example: "ABCDE" as a C-style string would require 5 bytes for the string plus one for the null terminator, which would be satisfied by an allocation of 8 bytes. An equivalent length-prefixed string would require 4 bytes for the prefix plus 5 for the string, which would then be rounded up to 12.

The only time the size-prefixed variant doesn't require more memory is when the string length is a multiple of 4. So the length-prefixed version requires an additional 3 bytes on average.


Meanwhile this extra memory usage _might_ be relevant on an 8k microcontroller, but even that is questionable given such an application is likely not storing many strings. And the time (and battery! and sanity!) saved by not performing strlen type operations.

Worth every byte.


Thanks for the clarification, but I believe neither has said anything in error (replace most with average). I wonder what the breakdown of string lengths in say, clang or nginx is?


I don't think this is unfair. In order to fail as badly as the sloppy C program, the Rust developer would have to explicitly pass in -1 as the default_value, like fork().unwrap_or(-1), right? The developer might do that anyway, but that's certainly an improvement over -1 happening implicitly; they'd have to think about it, and alarm bells would go off as soon as they re-read the kill() manpage, rather than only upon re-reading both the kill() and fork() manpages.

Similarly, neither of your "if let Some() = fork()" examples would fail as badly as the sloppy C program, right?

All the examples you could come up with of sloppy Rust code fail less poorly or make it harder to fail as poorly as the sloppy C code, and you were deliberately trying make your Rust code sloppy, whereas the sloppy C code was based on examples from the wild: http://rachelbythebay.com/w/2014/08/19/fork/

Still seems like a good argument that Rust's design is an improvement.


I think the C code presented and the rust code presented are approximately equal in their levels of quick and dirty. Using "expect" is the quick and dirty way of ignoring the error case in Rust, and doing nothing is the quick and dirty was of ignoring the error in C. Perhaps "unwrap" is more quick and dirty, but it still fails much faster and with less damage than the quick and dirty C code.


It's trivial to write an incorrect program in C because its a language that expects you, the programmer, to know what you're doing. I assume that if you're going to use the `fork()` and `kill()` syscalls on your OS that you're going to read about them and consider their consequences... not write the first trivial example that comes to mind and ship it.

An example program written like that is not endemic to C or a logical consequence of its design. It's ignorance. Such examples frustrate me too. It's practically a straw-man argument.

> By the way, you can also write your own wrapper functions in C

And your own abstractions to provide more safety guarantees... it really depends on your judgement to balance risk, reward, and goals. It's possible to write a system that does all of the checks for you but you'll sacrifice the time and energy to do so (and will still find users blowing off their feet regardless of your best efforts).

Rust is doing very cool things but it's not going to make systems programming magically easier. Someone still had to write nix and you should still understand your programs.


I often see these precise points in code review arguments, by developers doing something 'risky' in their code: "It's only risky if you don't know what you're doing."

Half the time the same developer has made a mistake in the same code.

With software being so complex, so full of human error -- almost any tool and practice that can help remedy this situation is welcome in my books.


Programming is hard because thinking is hard.

I think it's entirely reasonable to write security-focused software in a highly restricted language with concrete formal semantics.

The problem with trivial examples is that they're often constructed to prove a point and don't represent what a real developer would write if they were to seriously consider the problem in C.


Tell that to the silly mistakes I've made a career out of fixing (many of them mine!)


I was very confused. I thought this had something to do with Rust and nix[1].

[1]: https://nixos.org/nix/


Indeed, using "nix" as a (Rust) library name is a very bad choice, given the increasing popularity of the Nix package manager.


I think we're now at the point where it is very difficult for any name not to clash with the name of something else.

So I no longer worry about it.


It's still easy enough, there are many hundreds of geographical locations in many tens of languages alone. Just make some effort.


Me too!

It's actually about a library, nix, whose mission is "to provide ‘Rust friendly bindings to *nix APIs’".

https://github.com/nix-rust/nix


Same here. To be fair the Nix package manager has a name that is actively harming its growth since most people think *nix as in unix when you say nix.


Google does too to an extent, making troubleshooting more of a pain than it should be.


Same here.

I thought "A match made in heaven"

But it will probably never happen, because Cargo is too good, haha


What do you mean, it will never happen?

The nix package manager has (admittedly, undocumented) support for Rust / Cargo projects.


There is also another OS based on Plan9 [1]

[1] http://lsub.org/ls/nix.html


Very excited, and then very confused.

Because nix and Rust are both great.


Neat library, way too already-overloaded name.


Yeah as a Rust and NixOS fan, I was let down.


I maintain LuaJIT syscall bindings https://github.com/justincormack/ljsyscall - they cover quite a lot, namespaces, netlink and so on. I spent quite a bit of time making them more intuitive than the raw bindings, with consistent error handling, also namespacing constants and so on. It is definitely useful to have these types of interfaces not in C.


This project looks really cool! I'm very curious to find out more about how you make sure constants are correct across platforms and architectures. I will be poking around!


There are a bunch of tests but the whole Linux ABI spec is a mess.


Minor nit pick but don't you typically do something like this in C

pid_t childPid;

switch (childPid = fork()) {

case -1: ... /error handling /;

case 0: ... /Child Specific/

default: sleep (5); }

edit - seems to mangle formatting but something like that seems fairly clean.


You're missing the point in actually a really important way.

Nobody is claiming that C makes it impossible to cleanly do the right thing—obviously the whole world runs on C.

The point is that nothing about the C language, libraries, or toolchain discourage the example given in the blogpost compared to your more correct code. Unless you remember exactly the right details from the manpages, there's nothing about the example in the blogpost that's less natural to write than your more correct code. (And people do forget those details: http://rachelbythebay.com/w/2014/08/19/fork/ )

By contrast, as illustrated in the blogpost, the most natural way to do the same thing in Rust turns out to be the more correct thing. If you wanted the bad behavior to happen, you'd have to go out of your way to pass -1 to kill(). Hence in this example, Rust's design is an improvement.

It's great that C gives you enough rope to hang yourself with, but it's even better if tying yourself to things safely is easy, and to hang yourself you have to really go out of your way.


Typically, I would hope so. A good code review process should typically catch when it isn't done like this.

In context, "typically" means your devops people get to work on Christmas to patch a critical CVE being actively exploited in the wild pissing off all your customers that one time someone didn't do the typical pattern anywhere in your codebase or the codebases of any of your 3rd party libaries, frameworks, applications...

Where "didn't do the typical pattern" might be if conditions as shown, or forgetting the "case -1", or missed a key and typed "case 1", or elided the "break;" from the case above (I note no "break;"s in your switch ;)), or didn't rtfm closely enough to see -1 was a special exit code, or mis-assumed kill(-1,...) was a noop, or ...


It's not that you can't write the code properly in C, it's that the language and function interfaces give you no handrails to protect you from a mistake. And in this case, doing the wrong thing could be disastrous to your system, especially if you had to run it as root.

Great intro to Rust and Unix post!


yes, but quite a lot of tutorials i've seen / intro to systems courses seem to think it looks "cleaner" to use an if statement; switch is definitely the move here but i've seen the if version quite a lot


I can't help but think they're trying to fix something that isn't broken at all.

Adding new abstraction layers rarely helps when doing systems programming. You (as in "the developer") want to be as near to the machine as possible. C does this pretty well.

Perhaps I'm just getting old :-(


In this case it seems like a very thin wrapper that leverages the type system to allow catching a whole class of errors at compile time, like using exhaustiveness checks to make sure a function call handles all possible return values. I think the small overhead is well worth it.

The original API is not "broken" per se, it's just limited by the language features ("magical" return values vs. tagged unions or whatever they're called in Rust, I don't remember.)


It's not even clear to me that there's any overhead to the Rust version. Checking error codes that should be checked isn't overhead. Checking them inefficiently would be overhead, but the Rust version looks like it should compile down to something pretty similar to what the equivalent C switch blocks would produce.


> It's not even clear to me that there's any overhead to the Rust version.

There is a slight bit of stack overhead: Option<ForkResult> is at least {tag:u8, {tag:u8, pid:i32}}, and due to alignment constraints it's actually {tag: u32, {tag: u32, pid: i32 }}). A nonzero wrapper[0] would allow folding either ForkResult or Option into a 0-valued pid_t and remove one level of tagging: http://is.gd/yxStW1

Beyond that you'd need generalised enum folding in order to fold two tags into the underlying value (you'd denote that pid_t is nonzero and nonnegative for instance)

[0] which is unstable, so not really an option


We do have a planned optimization that would fold the tags for cases like `Option<ForkResult>` to give a word pair, which should be returned in %eax:%edx (or %rax:%rdx).


Really? That's exciting. Missed enum layout optimizations are one of my few issues with Rust right now.


But if the wrappers get inlined (which they should be) then SROA kicks in and promotes the tags to SSA values, where other optimizations such as SCCP can eliminate them. Optimizing compilers are awesome :)


Theoretically It should be possible to have a union based on the value of the ints {-1, 0, positive}, which should use only one 32bit integer.


I'm not talking about the efficency of the resulting binary, but the "distance" from what the programmer is thinking, to what the machine will really do.

Compiler optimizations aside, C does a pretty good job at this. It's way more efficient than writing assembly, but your still basically just moving memory around, while doing some arithmethic. Easy to understand in "machine" terms.

Of course, this is only relevant when you're doing low-level stuff, like kernel or drivers programming. For the userland, Rust really looks like a nice language (I've played with it just a bit), and I'd be really happy if it pushes C++ away ;-)


Compiler optimizations included, C does a terrible job at this. It puts forward a seductive but terrible mirage of simple mappings and understandings which are just plain broken. And then you add multithreading into the mix and it gets even worse, even without optimizations.

We live in a world of many cores, and multiple CPUs all over the place - in your GPUs, your hard drives, motherboard controllers - and the intrinsic language support for multithreading literally does not exist as part of the C99 standard? One has to reach out to a mixture of POSIX, and the compiler extensions the POSIX implementation uses to annotate memory barriers so the optimizer won't break things, and intrinsics that introduce atomic operations, and... gah!

C and C++ do such a terrible job of this I have to resort to disassembly to debug program behavior far too frequently. These are the only languages I'm forced to do this with. If C or C++ were really "close to what the machine will really do", I'd expect the opposite result.

Even simple things like class and structure layouts and type sizes are controlled by a mess of compiler and architecture specific rules and extensions to control the application of those rules with regards to padding, alignment, etc. which I get to debug. Ever had to debug differences in class layout between MSVC and Clang due to differently handling EBCO in a multiple inheritance environment? What about handling alignment of 8-byte types on 32-bit architectures differently? At least you've replaced all uses of "long" because of the mixture of LP64 and LLP64 compilers out there...? And what about when two incompatible versions of the standard library with different type layouts get linked in by a coworker? These are the symptoms of a language that doesn't control what the machine is really doing very well at all.

When I really need tight control over what the machine will do at a low level, my tools are actual (dis)assembly, intrinsics, an understanding of the underlying hardware itself, and simple code that eschews features requiring significant runtime support or underpinnings. None of those are C or C++ specific. The last one requires some knowledge of how a language's features are implemented - C and C++ might be broken enough that you're forced to wrestle with that topic, when it's more optional in other languages, but... that still doesn't make it C or C++ specific.

</rant>


Rust is just as close to the hardware as C, it just checks your code more.


Then you are missing the whole point of Rust. The point of Rust IS to allow you to be close to the machine but while you maintain much higher level of safety. Rust is designed to do this with little overhead.

In this day and age with big software packages, security being an increased concern it really is high time programming languages do more to help us avoid bugs which expose us to hackers and crackers.

I do have an affinity for C, but as a Objective-C programmer currently coding in Swift, I am really seeing how many more bugs the compiler helps me uncover.

I think Rust is on the right track. It is a long overdue change to systems programming.


The proposition that Rust is offering is not new. In the 90s Modula-2 was touted as "a better, safer way" of doing system programming than C. It failed to get traction outside of education because it failed to offer a compelling reason for people to migrate. Those that do not study history are doomed to repeat its mistakes.

In the example given it's possible to write a similar library in C to protect against unwanted side effects or bad API design. I'm sure several have been written over the years.

Rust is a great language with lots of improvements over other system programming languages, but that is not going to be enough to get people to switch. You have to show that it's good enough to be worth throwing away 40 odd years of experience and well understood best practice. Something that is going to take a long time and big public projects to do. If just being better was good enough Plan 9 would have been a roaring success and Linux (if it happened) would probably be a footnote in history.

C and UNIX have survived as long as they have not because better alternatives haven't come along, but because the alternatives haven't offered a compelling reason to switch. Unfortunately at least now Rust is falling into the same category.

See also: Niccolo Machiavelli, The Prince


Safe systems languages already existed before C was a thing.

Modula-2 is just one example.

Burrough B5000 was programmed in safe systems programming in 1961.

https://en.wikipedia.org/wiki/Executive_Systems_Problem_Orie...

https://en.wikipedia.org/wiki/NEWP

"NEWP is a block-structured language very similar to Extended ALGOL. It includes several features borrowed from other programming languages which help in proper software engineering. These include modules (and later, super-modules) which group together functions and their data, with defined import and export interfaces. This allows for data encapsulation and module integrity. Since NEWP is designed for use as an operating system language, it permits the use of several unsafe constructs. Each block of code can have specific unsafe elements permitted. Unsafe elements are those only permitted within the operating system. These include access to the tag of each word, access to arbitrary memory elements, low-level machine interfaces, etc. If a program does not make use of any unsafe elements, it can be compiled and executed by anyone. If any unsafe elements are used, the compiler marks the code as non-executable. It can still be executed if blessed by a security administrator."

Sounds similar to modern practices? Done before C and UNIX were a thing.

C and UNIX have survived this long, because they go together as one, just like JavaScript is the king of the browser, C was the only way to go when coding on UNIX systems.


> Unfortunately at least now Rust is falling into the same category.

Rust offers one major thing that Modula-2 never did: eliminating memory management problems (also concurrency problems) with zero overhead. In the '80s and '90s it was not known just how dangerous memory management problems could be (use-after-free was thought to be a harmless annoyance). Not now in 2016, with every single browser engine falling to remote code execution via UAF in Pwn2Own.


Modula2 was from an era before the internet was ubiquitous and everyone had computers in the pocket. To compare lack of uptake of a "safer language" from a time when the internet and attack surface was so much smaller to now seems disingenuous. C and UNIX go hand in hand, nobody is disputing their worth or tenacity. I fail to see how a proposition that is not new detracts from Rust.


Different language, but Modula-3 is actively maintained again. https://github.com/modula3/cm3


Modula-3 descends from Modula-2, although not directly.

Some of the Xerox PARC Mesa/Cedar researchers went to work for DEC (later Compaq) and created Modula-2+ with feedback from Niklaus Wirth. Which had actually used Mesa as inspiration for Modula and Modula-2.

Eventually Modula-2+ evolved into Modula-3.

Nowadays I would say part of its ideas live on C#.


I don't think you want to be as near as the machine as possible, otherwise all system programmers would write machine code. You want to have powerful abstractions that the compiler can see through to produce optimal code.


I tend to agree with you with one caveat. We did a C like scripting language and added one thing that C was missing, a variable value that is "undefined" which is not the same as zero. Really simple but now you can do stuff like

pid_t p;

if (defined(p = fork()) { // parent / child stuff here } else { // fork error here }

It's pretty much the same as try / catch, we just implemented it as part of the variable. And any scalar or complex type can be undefined.

I suspect if C had this a lot of these code samples would be a little more clear. Maybe? Dunno, it's worked well for us. And we like C a lot.


This gets me thinking how awesome it would be to have functional programming on *nix systems, like Haskell (specifically). At least then it might be forcibly designed to be made more useful and ultimately get more people on board. One can dream.


Kinda like turtle! https://hackage.haskell.org/package/turtle

Oh by the way, I accidentally hit downvote on your post and HN doesn't let me undo that action... I was just trying to hide it! Sorry!


Hey, thanks! Hadn't heard of turtle before, I don't believe. Time to install it on my crouton chroot and see what it can do. :)


In reliability theory "X failed" is a poor error message. What we want to know is which failure mode has been triggered.

The function of kill is to kill a given pid, so there are two failure modes : "the pid didn't exist" or "the pid didn't die"


"kill failed" isn't actually the message which is printed. What you get is

    thread '<main>' panicked at 'kill failed: Sys(ESRCH)', ../src/libcore/result.rs:746
So you know that it failed because of ESRCH (no such process).


Ah, that makes much more sense, thanks


> The function of kill is to kill a given pid

1. there are more failure modes than these (POSIX kill(2) can set 3 different errnos)

2. kill(2) signals process, it doesn't usually kill them (let alone kill them outright in such a way that this information could ever be returned)

3. kill(2) can signal process sets of cardinality > 1

4. the `unwrap` panic message will print the unwrapped value, which includes the ERRNO


It's poorly named, because the function of kill isn't to kill a given PID; it's to send a signal to a given PID.


kill(2) unlike it's name actually sends any signal to a pid, not only SIGKILL. Other signals are used for various purpose. SIGINT is for example used when using Ctrl-C on a terminal.


[dead]


I got exposed to reliability theory in Logistics Engineering.

Mostly: Reliability Centered Maintenance

https://en.wikipedia.org/wiki/Reliability-centered_maintenan...


The term NIX is becoming a bit overloaded - we've got the Nix package manger which run on NixOS, Nix the Rust library all of which can run on most 'Nix systems.


I feel there are more promising options for a name than "Nix".


type systems are great.


We have switched to Nix as internal dependency manager for our C++ project. It is really exciting! No more "after commit XXX you need to (re)build/update YYY with ZZZ". Developers just type `nix-shell` and get sane guaranted to work environment on their local machines corresponding to git HEAD. If we need to add or patch dependency we just edit and commit nix file. And if developer need to rollback to old commit/branch it will get old/custom environment from cache without submodule rebuilds.


That's pretty cool... but has nothing to do with the different project named "nix" discussed in the post.


"the return value is conveying three different things all at once. [...] That’s a lot of information for one poor little pid_t—usually a 32-bit integer—to convey!"

Someone never had to bit-pack their programs to save memory, disk space, or bandwidth. In fact, it's a huge waste of memory; if you only need 3 bits, a 'char' would have sufficed. Saves 24 bits!

Of course, we could use nibbles to make data structures where the fork return value only takes up 3 bits instead of a whole byte, but that could be considered micro-optimizing. (the compiler may do this for us anyway, though)


The return value is going to stay in a register the whole time anyway, so a char won't save you anything.

But regardless, the point of that sentence is nothing to do with memory usage, but with semantics. Whether you or the compiler packs all the information into 3 bits or 3 words, that's fine, as long as the language helps you distinguish the parts.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: