"But the biggest potential is in ability to fearlessly parallelize majority of Rust code, even when the equivalent C code would be too risky to parallelize. In this aspect Rust is a much more mature language than C."
Yes. Today, I integrated two parts of a 3D graphics program. One refreshes the screen and lets you move the viewpoint around. The other loads new objects into the scene. Until today, all the objects were loaded, then the graphics window went live. Today, I made those operations run in parallel, so the window comes up with just the sky and ground, and over the next few seconds, the scene loads, visibly, without reducing the frame rate.
This took about 10 lines of code changes in Rust. It worked the first time it compiled.
Since this got so many upvotes, I'll say a bit more. I'm writing a viewer for a virtual world. Think of this as a general-purpose MMO game client. It has no built-in game assets. Those are downloaded as needed. It's a big world, so as you move through the world, more assets are constantly being downloaded and faraway objects are being removed. The existing viewers are mostly single thread, in C++, and they run out of CPU time.
I'm using Rend3, which is a 3D graphics library for Rust that uses Vulkan underneath. Rend3 takes care of memory allocation in the GPU, which Vulkan leaves to the caller, and it handles all the GPU communication. The Rend3 user has to create all the vertex buffers, normal buffers, texture maps, etc., and send them to Rend3 to be sent to the GPU. It's a light, safe abstraction over Vulkan.
This is where Rust's move semantics ownership transfer helps. The thread that's creating object to be displayed makes up the big vertex buffers, etc., and then asks Rend3 to turn them into a "mesh object", "texture object", or "material object". That involves some locking in Rend3, mostly around GPU memory allocation. Then, the loader puts them together into an "object", and tells Rend3 to add it to the display list. This puts it on a work queue. At the beginning of the next frame, the render loop reads the work queue, adds and deletes items from the display list, and resumes drawing the scene.
Locking is brief, just the microseconds needed for adding things to lists. The big objects are handed off across threads, not recopied. Adding objects does not slow down the frame rate. That's the trouble with the existing system. Redraw and new object processing were done in the same thread, and incoming updates stole time from the redraw cycle.
If this was in C++, I'd be spending half my time in the debugger. In Rust, I haven't needed a debugger. My own code is 100% safe Rust.
Wonderful! Thanks for sharing. This sounds like the exact sort of work that Rust is perfect for.
I'm making a game in Rust and Godot (engine) and since it's a factory game the simulation performance is important. Rust means I worry far less about stability and performance.
I bet if you wrote a good blog entry with screenshots and explanation of how your code loads and renders I imagine it would do well on HN.
Too soon. Someday perhaps a Game Developers Conference paper/talk. I was considering one, but live GDC has been cancelled for 2021. My real interest in this is how do we build a big, seamless metaverse that goes fast. I'm far enough along to see that it's possible, but not far enough along that people can use the client.
Rust is good for this sort of thing. It's overkill for most web back end stuff. That's where Go is more useful. Go has all those well-used libraries for web back end tasks. Parallelism in web back ends tends to be about waiting for network events, not juggling heavy compute loads of coordinated disparate tasks. Hence all the interest in "async" for web servers. As I've said before, use the right tool for the job.
You have an authoritive world simulation server, as usual.
You then have several servers whose chief job is to keep clients in sync with the authoritive server.
Most network games combine these two roles, but there is a lot of processing and network traffic required to keep clients in sync. For massive multiplayer there is a benefit to scale the "client-interaction" servers.
My question is probably off because I lack the knowledge but how do the commercial games/game engines do this then if this is such a rocket science? Something like Fortnite or an aged GTA do what you've described (downloading assets on demand without any fps-drop) for quite some time now.
The claim isn't that it's impossible, or "rocket science", it's that it's hard to do right and was made much easier. You're bringing up for comparison a game engine that has been in constant development by experts for over two decades (Unreal Engine) and a game engine in constant development for over a decade ago (RAGE). Just because someone makes a professional product using tens or hundreds of millions of dollars doesn't mean it was easy.
There's a reason why Epic is able to charge a percentage of sales for their engine, and companies still opt for it. That's because it's hard to do reliably and with good performance and visuals.
Yeah, but just picking one of many requirements in game dev and advocating why lang x can do this better than y ignores all the other checkboxes. Yeah, C++ is nerve-wrecking but Rust can be even more. IIRC there was a thread about Zig vs Rust and why Rust is just the wrong tool (for OPs use case in that case). IDK but there is a reason why C++ dominates game dev and a reason why Rust still struggles with mainstream adoption compared to same agers like Go or TS.
The claim was that Rust made making something parallel easy/easier, and the illustrative example given was someone trying to parallelize something in a game engine. Whether Rust is good for game development or not is irrelevant to the point that was being made.
Even if everyone accepted that Rust was horrible for game development on most metrics, the example given would still carry the exact same weight, because it's really about "parallelizing this would be very hard in C++, but it was very easy because of Rust."
I think you mean that Arc is an atomic reference counter (it uses atomic cpu instructions to prevent race conditions when incrementing and decrementing the ref count)
Eh, with Arc you can share ownership easily, and there are probably a lot of cleverer concurrent data structures or entity component kinda things that'd just work too. But maybe you can arrange things so that one thread owns the scene but the other thread can still do useful work?
This typically isn't possible because the rendering context is global and is needed for both loading and rendering. You need an Arc to guarantee the correct Drop mechanism.
I'm not sure how your architecture but you might not even need to lock things. I find that using mpsc channels allows me to get around like 60% of locking. Essentially, you have some sort of main loop, then you spawn a tread, load whatever you need there and then send it to the main thread over mpsc. The main thread handles it on the next iteration of the main loop.
C being barebones does not mean it is faster. Because it has such weak typing and gives a huge amount of programmer freedom, compilers have to do a lot of work to be able to understand a C program well enough to optimise it.
Rust, on the other hand, requires the programmer to give the compiler more information about what they're doing.
Because C pointer types are so barebones, the compiler can't tell whether foo() and bar() can modify f->a just from looking at the above code. So it will always have to load and store that for each += operation.
Rust on the other hand has two kinds of reference, rather than pointers:
This is more high-level. But it's good for performance! Rust has a rule that you can only have one mutable reference to a struct at one time. Therefore, foo() and bar() can't be modifying f.a and it can simplify this to `f.a += 6;`.
(You can see it in action for yourself here: https://godbolt.org/z/hWs67P. Sadly, Rust doesn't do this by default due to problems with LLVM, but eventually it can.)
Lately on another Rust thread somebody pointed out that C programs use a lot of indirection like pointers and vtable dispatches which actually detracts from the supposed mega-speed of the low-level C. I found that to be mind-blowing and felt stupid for not remembering that earlier.
I think they meant function pointers in general (callbacks and the like, see qsort). Also, a lot of C codebases end up implementing manual vtables of some sort where polymorphism is required.
It's true they aren't that common, but have you looked at the Linux kernel code? It's written in C, and vtables are everywhere.
I've also used them on occasions. They do a particular job people call pluggable implementations / dynamic typing / sub-classing depending on their background. If there isn't some central place that knows all the types (if there is you can use switch/case/match), there isn't much choice, it's vtables or nothing. But needing to do that job is rare, so seeing it in languages that don't use vtables promiscuously to implement OO is correspondingly rare.
Which is as it should be, because vtables are more expensive than a direct call, very expensive if they prevent inlining. One of rust's triumphs is it provides them, but you have to go out of your way to use them so it encourages avoiding their overheads.
Dynamic dispatch (vtables are one way to do it, and so is a function pointer) is incredibly common in C, because there's no language-built-in way to perform monomorphization.
You have correctly identified it as FUD. People have a bone to pick with Pin, so they irrationally latch on to it, but the general problem is the fact that the &mut invariants cannot currently permit any self-referential data, which is a useful concept in general (for intrusive data structures, etc) and whose lack that people have been hacking around since before 1.0, with crates like rental and owning_ref. The plan to fix this is to make self-referentiality a first-class concept in the language, as a principled exception to the usual &mut uniqueness invariant that preserves memory safety while properly encoding the aliasing guarantees.
> The plan to fix this is to make self-referentiality a first-class concept in the language, as a principled exception to the usual &mut uniqueness invariant that preserves memory safety while properly encoding the aliasing guarantees.
Are there issues I can subscribe to or RFCs for this?
Code affected by unsafe already has to go through `UnsafeCell` which disables most aliasing optimizations, and the stacked borrow semantics have explicit opt-outs for such cases. I don't believe there are any Rust semantic issues standing in the way of exploiting aliasing information in this way, just LLVM bugs (and the fact that its restrict model isn't currently equipped to handle the information at such a fine granularity).
There is undefined behavior in Rust affecting real-world code, including Tokio's scheduler, and code produced by async fn definitions. UnsafeCell doesn't solve the problem. There's more information at https://gist.github.com/Darksonn/1567538f56af1a8038ecc3c664a....
This is the `Pin<&mut>` example which I'm not that familiar with but was aware of. I think it's highly unlikely that this one, relatively niche use case is going to prevent Rust from ever being able to safely turn on aliasing optimizations. There have been several solutions proposed, e.g. adding a stricter UnsafeCell to the language; they may technically not be backwards-compatible, but given how `Pin` is used and the fact that this is a soundness issue, I think it should be fine.
The HN thread is mostly unrelated. I agree that it would have been better to integrate `Pin` directly into the language, though, but mostly for ergonomic reasons; it could still probably happen in an edition upgrade.
The github discussion thread where that originates from also contain the language designers optimistically discussing how in the worst case `Pin` just has to be hardcoded to exclude these optimizations. Which wouldn't make it the first type in the std lib that works similary (see interior mutability and UnsafeCell), so I personally intepret that as "unlikely to be an issue".
> There's a significant difference between what these languages can achieve in theory, and how they're used in practice.
Theoretically in C you can use restrict for this. Practically nobody does (rust users keep finding bugs and miscompilation in LLVM's noalias support), and it's a huge footgun because you're completely on your own and all bets are off if you misuse it.
Meanwhile in rust land, it's the default and the compiler checks your homework, and while you can always lie to the compiler it's much rarer that you'd even get in a position to do so.
But restrict is very hard to reason about correctly, and the consequences for making a mistake are potentially catastrophic.
There's a very real difference between what's possible in theory and what humans do in practice. I wish Hacker News people engaged more with the practice.
Yes, C99 added that keyword, and you can use it in C code. (C++ is a more complicated matter, I believe…)
But I think it's pretty uncommon in practice. One reason is that the C compiler has no borrow checker to help you notice when you're using `restrict` unsafely.
If you can have "one mutable" and "more than one non-mutable" reference (not sure if this is the case), you still cannot safely do that optimization, as foo() and bar() could in principle be reading f.a and the optimized version would then not have the correct values of f.a when foo() and bar() are called.
Note, I genuinely don't know if this caveat applies to Rust. But, m general, not only mutable references need to be considered.
> My overall feeling is that if I could spend infinite time and effort, my C programs would be as fast or faster than Rust, because theoretically there's nothing C can't do that Rust can.
The exact same argument applies to assembly code. There are very good reasons that it's not used nowadays except in incredibly rare circumstances or in the embedded world.
It doesn't matter in the slightest how fast your language is in theory. Not one iota.
The only thing that matters is how fast the programs you write with it are in practice. The evidence is clear: it is significantly easier to write faster programs in Rust than it is in C, and this applies to even the most skilled developers.
Rust won’t prevent all manner of bugs, but if you need to prevent multiple parts of your application from accessing a resource concurrently, it’s fairly trivial to design types that would guarantee that at compile time.
If you need to do it across processes, then you need a file system that supports exclusive file opens.
The db question is a red-herring, because to properly solve for concurrency in the db, you need to use locks in the db (table and row locks), not the code accessing the db.
You dont necessarily need to use locks in the db. You can use transactions as well, unless you want to do major changes to the DB where transactions would be in constant conflict state.
You’re right, though that wasn’t really the point of my reply. More that concurrency in the DB is a DB design choice and not the responsibility of the language accessing the DB. This is especially true when accessing the DB from multiple nodes, which most deployed software does.
Locks and transactions are intertwined. Write locks are typically released only when the transaction ends, and read locks too at some transaction isolation levels.
It’s hard to see how you could be arguing in good faith given that your point of “fearless concurrency” being an overstated “sales pitch” has now been answered with multiple substantive answers.
Please stop moving the goal posts from “program concurrency” to “distributed or multi-process transactions.” It subtracts from the conversation.
I am not moving goal posts, rather talking about issues that many apparently lack the knowledge to understand how many variants of data races exist in an application.
Quite understanable given the majority of developers that keep writing single threaded applications.
The fearless concurrency sales pitch doesn't overpromise. You're trying to stretch it, but it's pretty clear on what kind of concurrency issues it covers (eg. data races).
Yeah it would be a grave misinterpretation of "fearless concurrency" to think Rust somehow validates that access to some shared resource with it's own semantics is also safe. I'm not educated on the subject but that problem seems pretty intractable for a language to solve in a general sense.
Any blog post that gives examples accessing shared variables and never goes beyond anything else, leaving to the reader that it works the same way regardless of what resources are being accessed.
Rust has the expressive power to design an API that explicitly makes it impossible if you so desire though. For instance by using an API similar to mutexes where you have to lock to access the contents.
I never claimed otherwise. If you want to enforce invariants in your DB API you'll have to implement them yourselves, Rust won't do it for you because Rust doesn't know what a database is. Still, it's perfectly possible to design a DB API is such a way that it would prevent some issues, such as for instance doing changes without an active transaction.
But you're right to point out that Rust won't magically solve all your possible sources of crash and data corruption, it just does make it a lot easier to enforce arbitrary constraints. For instance if you decide that in order to avoid these issues you want all your DB stuff to only run in a single thread you could design your API in a way that would prevent DB handles from being shared between threads, all that enforced by the language. You couldn't accidentally leak a DB handle to a different thread.
> Code converted from C to Rust seems much more voluminous.
This is a fairly odd claim.
If you have to deal with strings (ASCII and UTF-8) properly, C is stupidly verbose.
If you need a data structure more complex than an array of something, C is stupidly verbose.
If you want to deal with pattern matching/regexen, C is ridiculously verbose.
Do I agree that Rust is far more verbose for an embedded "blinky" (the embedded equivalent of "Hello, World!")? Yes.
But once I start doing things like processing messages in a communications stack (BLE, CANOpen, Ethernet, etc.), Rust starts looking better and better.
// Main function
#[arduino_uno::entry]
fn main() -> ! {
let peripherals = arduino_uno::Peripherals::take().unwrap();
let mut pins = arduino_uno::Pins::new(
peripherals.PORTB,
peripherals.PORTC,
peripherals.PORTD,
);
// Pin D13 is connected to L led
let mut led = pins.d13.into_output(&mut pins.ddr);
loop {
led.toggle().void_unwrap();
arduino_uno::delay_ms(500);
}
}
AFAIK it is a common experience that both C and C++ code tend to become fewer LOC in a move to Rust, a lot of it due to stuff like serde that greatly reduces boilerplate. It probably depends on your application and program size, though. I'm sure that if you're doing a lot of pointer wrangling or something it won't be smaller in Rust, while if your C program was just a bunch of SIMD intrinsics anyway like a lot of high performance code is now it'll probably be basically the same length.
Did you ever take a look to try and figure out where the extra verboseness came from? For instance, was it spread equally across all functions or concentrated in those doing numeric computations (or whatever)? It would be interesting to learn something from this example.
Did you convert it by hand and translate the code into idiomatic rust? Or just run some C code through c2rust? Because the latter will not produce anything like a typical rust program, it will be C types and logic with minimal changes to make it compile in rust (and those changes will make it look very verbose).
You have to allocate and deallocate--where and when does that occur? When the data structure needs to grow, all pointers need to be invalidated--where and when does that occur? How do you iterate across the data structure? Do those iterators hold a local pointer or do they hold a root pointer plus an accessor function pointer plus memoization? What happens when you want to copy a subset of the data structure? I can go on and on.
Every one of those things requires a fairly primitive function call that sometimes doesn't even need to exist in another language. For contrast, think about all the macrology that the Linux kernel does to iterate across every element of a linked list that is effectively "for element in list {}" in any higher language.
And, even worse, most of that stuff in C needs to be runtime-only, while some of this kind of thing gets elided by the compiler in other languages.
If you don't consider all that "stupidly verbose" I'm really curious what you would?
And this is long before we start talking about concurrency.
C allows you to write something that works "just enough" that you put it into production and then it bites you in the ass (similar to quadratic algorithms--functional enough to get to production and then bite you).
The author did say that there is nothing that Rust does that C cannot. The difference is that in Rust, those things are easier, or many times, the default way, while in C, you would have to take care of way too many things to make sure things work.
There are a lot of great libraries for sure, but they aren't in the stdlib and C doesn't make it as easy to use external libraries as languages with modern tooling. Everybody gets grumpy about dependencies and a lot of people probably figure it's easier to maintain their own container code in their application than to deal with that.
You are right in a library-demographical sense, but not in a fundamental sense. There is a 3rd way. Have a look at the CTL I linked to (downvoted..maybe I should have explained more?).
Once you give up the closed source/prebuilt binary library idea and embrace the C++-like header library idea and write implemenations in terms of "assumed macro/inline function" definitions, the problem becomes straightforward with no performance issue and different ergonomics issues than you probably think.
It's a more "manual instantiation" than C++ templates or generics in other languages where just refering to them works, but most of C is quite manual. So, it fits the headspace & the hard parts of data structures/meddlesome hands remain factored out. Since you parameterize your files/code with #define/#include, you have to name your parameters which can make the instantiating client code more obvious than C++ templates with many arguments. OTOH, there is no/poor type checking of these parameters.
I had a look, and it feels like template programming but with even worse guarantees.
Having a type declaration dependent on #define P whether it is plain old data or not, and needing to know what that means, is not the kind of ergonomics I'd want. That requires learning a whole new paradigm to ensure I am not doing wrong things.
In my mind it is so big an extension of the C language, that it leaves the C headspace and becomes its own headspace.
Yeah. It's not for everyone. I think "different ergonomic issues" may cover that and I did mention the type checking already. :-)
It is a smaller learning curve from pure C than "all of Rust" or even "all of C++/STL". You got the basic idea in short order (that may be for ill as well as good..I was never trying to make a normative claim).
As someone who's worked predominantly in high level languages (Scala, Ruby, etc...) I've found Rust to be relatively straightforward to use for simple CLI tools or gRPC servers (tonic library is pretty nice). I haven't tried building a CRUD app yet, but I don't see any real reason why it would be impossible to have an ergonomic web framework in Rust.
The things I find most difficult:
1. Wrangling with the borrow checker can be painful before you know what you're doing (and even afterwards) but if you understand the standard library/patterns well, it seems to minimize the cost. Example being trying to write your own `get_or_else_insert` style method for a HashMap. Writing your own version is easy in other languages but hard in Rust. If you didn't know that methods like that already exist on HashMap you will experience a lot of pain until you understand the "right" way to do something.
2. Shared memory concurrency is definitely at the nexus of all the more difficult parts of Rust. Especially with async/await. There's no question in my mind that if you want to write a webserver that has async functions accessing shared memory, that you will for sure need to fill in any gaps in your knowledge as it will be difficult to get to a working program without understanding significantly more concepts than what it might take for a simple single-threaded CLI app
I'm pretty sure that it will be possible (if it isn't already) to get the Rust ecosystem to a state where writing a CRUD app is about as simple as in Go (and considerably easier/more ergonomic w.r.t certain things like JSON serialization).
Firefox's is compiled code is mostly written in C++ not C. You conflate C with C++, Java, and C#. C++ while has source compatibility with C tends to end up much different than C. C will not give you the OOP hell-scape you can dig yourself into with those three languages. C code tends be simpler and much more close to the assembly that will be generated than you would get in those language. Moreover, Java and C# are not even compiled. Anyways C != C++. C++ has changed quite a bit since it's earlier days and has diverged from plain C in a lot of ways. Some people even feel C++ keeps adding too many new features too fast.
If you mean GHC, the language version is whatever the pile of configuration flags at each source file ends up meaning, some of them even contradict themselves.
Great for language research, which is Haskell main purpose in life, hardly a good idea for getting industry love.
I am talking about Haskell, specifically the removal of 'n+k patterns' and 'monad comprehensions'.
About GHC: I think their approach with pragmas is great and something for other languages to emulate. It's also great in production, with the caveat that you might want to restrict that mechanism to surface level changes only, and nothing that changes the intermediate format.
I guess people mostly take source compatibility to mean that you can write the headers for your C library so that it can be used from C++. That's not the same thing as C being a proper subset of C++ or whatever, but it's still a vast enough advantage of C++ over most competitors that it might as well be.
But if you want something between rust and ruby, checkout crystal (garbage collected, strongly typed with inference, llvm binary, ruby-like syntax, fast like golang).
> Code converted from C to Rust seems much more voluminous.
I haven't seen that in practice. A good point of reference are implementations of things like ruby, python, or the erlang vm in rust compared to the C alternatives. This might be because Rust is also more expressive (probably by borrowing certain syntax/semantics from ocaml/haskell), though the borrow checker does add back some verbosity.
But Rust works badly with mmapped (memory-mapped) files, as the article notes. So in C you could load (and save!) stuff almost instantly, whereas in Rust you still have to de-serialize the input stream.
No you don't. I've written multiple programs that load things instantly off the file system via memory maps. See the fst crate[1], for example, which is designed to work with memory maps. imdb-rename[2] is a program I wrote that builds a simple IR index on your file system that can then instantly search it by virtue of memory maps.
Rust "works badly with memory mapped files" doesn't mean, "Rust can't use memory mapped files." It means, "it is difficult to reconcile Rust's safety story with memory maps." ripgrep for example uses memory maps because they are faster sometimes, and its safety contract[3] is a bit strained. But it works.
I didn't read your code but one problem I suspect you ran into is that you had to re-invent your container data structures to make them work in a mmapped context.
No, I didn't. An fst is a compressed data structure, which means you use it in its compressed form without decompressing it first. If you ported the fst crate to C, it would use the same technique.
And in C, you have to design your data structures to be mmap friendly anyway. Same deal in Rust.
But this is moving the goal posts. This thread started with "you can't do this." But you can. And I have. Multiple times. And I showed you how.
> So your code operates directly on a block of raw bytes? I can see how that can work with mmap without much problems.
Correct. It's a finite state machine. The docs of the crate give links to papers if you want to drill down.
> My argument was more about structured data (created using the type system), which is a level higher than raw bytes.
Yes. You should be able to do in Rust whatever you would do in C. You can tag your types with `repr(C)` to get a consistent memory layout equivalent to whatever C does. But when you memory map stuff like this, you need to take at least all the same precautions as you would in C. That is, you need to build your data structures to be mmap friendly. The most obvious thing that is problematic for mmap structures like this that is otherwise easy to do is pointer indirection.
With that said, this technique is not common in Rust because it requires `unsafe` to do it. And when you use `unsafe`, you want to be sure that it's justified.
This is all really besides the point. You'd have the same problems if you read a file into heap memory. The main problem in Rust land with memory maps is that they don't fit into Rust's safety story in an obvious way. But this in and of itself doesn't make them inaccessible to you. It just makes it harder to reason about safety.
It's very tedious to debate with someone who explicitly makes assumptions about something (like code) without having read it, and puts the burden of refuting those assumptions on you...
It doesn’t say it “works badly” it says the borrow checker can’t protect against external modifications to the file while memory-mapped, which has a host of issues in C as well.
You can mmap files in Rust just fine, but it’s generally as dangerous as it is in C.
I don’t get this obsession with “dangerous.” Honestly, what does that even mean? I think a better word is “error-prone.” Danger is more like, “oh my god a crocodile!”
Unfortunately, as is most always the case of negligence instead of some particular language features:
“A commission attributed the primary cause to general poor software design and development practices rather than single-out specific coding errors. In particular, the software was designed so that it was realistically impossible to test it in a clean automated way.“
You sound like you make a refutation, but you really don't. This whole discussion is about giving tools to developers that are systematically less error-prone, which your quote suggests would have been helpful to that specific development team.
the main problem here is that C has the capability to declare mmap regions correctly: `volatile char[]` and Rust does not (`[Cell<u8>]` is close but not exactly right, and annoying)
most rust folks who use mmap don't mark the region as Celled, which means they risk UB in the form of incorrect behavior because the compiler assumes that the memory region is untouchable outside the single Rust program, and that's not true
(it's also not true generally b/c /dev/mem and /proc/pid/mem exist, but it's beyond Rust's scope that the OS allows intrusion like that)
Errors are up to interpretation. It just means the thing didn't happen as requested. Errors are meant to be expected or not expected depending on the context.
Dangerous means dangerous. It's not up for interpretation.
Languages have multiple, very different words, for exactly this reason.
But that may be of little solace. If you snapshot your entire heap into an mmapped file for fast I/O, then basically the entire advantage of Rust is gone.
Is there literally no other code in the application?
Rust has plenty of situations where you do unsafe things but wrap that in safe APIs. If you’re returning regions of that mmapped file, for example, a lifetime can be associated to those references to ensure that those are valid for the duration of the file being mmapped in the program.
It can be used to ensure that if you need to write back to that mmapped file (inside the same program) that there are no existing references to it, because those would be invalid after an update to the file. You need to do the same in C, but there are no guardrails you can build in C to make that same assurance.
I'd call mmaping data structures into memory an advanced systems programming trick which can result in a nice performance boost but which also has some severe drawbacks (portability across big/little endian architectures and internal pointers being two examples).
I know some very skilled C++ and Rust developers who can pull it off. If you're at that skill level, Rust is not going to get in your way because you're just going to use unsafe and throw some sanitizers and fuzzers at it. I wouldn't trust myself to implement it.
You have to combine it with other techniques, e.g. journaling to make it safe, but this is not always necessary (e.g. when using large read-only data-structures)
In C you can access pointers to memory mapped files effortlessly in ways that are often extremely unsafe against the possible existence of other writers and against the making being unmapped and mapped elsewhere. It’s also traditional to pretend that putting types like int in a mapped file is reasonable, whereas one ought to actually store bytes and convert as needed. Rust at least requires a degree of honesty.
It's more like, Rust wants to make guarantees that just aren't possible for a block of memory that represents a world-writable file that any part of your process, or any other process in the OS, might decide to change on a whim.
In other words, mmaped files are hard, and Rust points this out. C just provides you with the footgun.
The problem is that compilers are allowed to make some general assumption about how they're allowed to reorder code, always based on the assumption that no other process is modifying the memory. For example, the optimizer may remove redundant reads. That's a problem if the read isn't really redundant -- if the pointer isn't targeting process-owned memory, but a memory mapped file that's modified by someone else. Programs might crash in very "interesting" ways depending on optimization flags.
C has this issue as well, but Rust's compiler/borrow checker is particularly strong at this kind of analysis, so it's potentially bitten even harder.
You have made this claim multiple times. Why do you see this as a language issue and not an OS issue? It becomes an even bigger problem when we talk about distributed systems and distributed resources. Is there a language that handles this?
These issues about multiple processes and distributed systems are framework and OS level concerns. Rust helps you build fast concurrent solutions to those problems, but you’re correct that it can not solve problems exterior to the application runtime. How is that a deficiency with Rust?
Erlang has a great concurrency model with higher overhead than Rust, but similar cross thread safety, doesn’t do anything about exterior resources to the application.
I’ve not worked with Coyote, but if it is the system for .net, it describes itself as a framework, “Coyote provides developers a programming framework for confidently building reliable asynchronous software on the .NET platform”.
Orleans similarly describes itself as a framework, “Orleans is a cross-platform software framework for building scalable and robust distributed interactive applications based on the .NET Framework.”
Rust is a language, similar frameworks are being built with it, the point your making does not appear to be about the language.
If I understand correctly, the Erlang point was, that you can have a distributed system by using Beam to scale to multiple machines and have them communicate via message passing, which is all possible and encouraged, because of how you structure and write code in Erlang, as actors with mailboxes, isolating actors from each other, except for the messages, that are passed.
You say, that the Erlang concurrency model has higher overhead than Rust. In Rust there are probably multiple projects going on right now (one of them is Bastion, but I guess there are probably others), which try to provide Erlang like concurrency. What do you mean by overhead of a concurrency model (that of Erlang) being higher than the overhead a programming language (Rust)? As far as I know Erlang's lightweight processes are about as lightweight as you can get. Is there a Rust framework for Erlang like concurrency, which reduces the footprint of lightweight processes even more?
That wasn’t meant to be a snide comment about Erlang in any way. All I meant by the comment about higher-overhead was that the language itself generally has more costs to run, i.e. runtime, memory usage, garbage collector, interpreted, etc, than Rust.
The “process” model of Erlang is about as lightweight as you can get, agreed.
In terms of capabilities of beam across systems, point taken. Though we start stretching some of the understanding of where languages end and runtimes begin... Rust and C make those boundaries a little more clear.
It's not about number of available threads, the very act of scheduling tasks across multiple threads has scheduling and communication overheads, and in many situations actually ends up being slower than running it on the same thread.
That said, I think the original comment was rightly pointing out how easy it was to make the change and test it, which in this case did turn out to be noticeably faster.
Parallelization is the nuclear energy of comp science.
Loads of potential, high risk reward and they would have gotten away with it, if it were not for those meddling humans. Its non-trivial and can only be handled by accomplished engineers.
Thus it is not used - or is encapsulated, out of sight, out of reach of meddling hands. (CPU Microcode shovelling non-connected work to pipelines comes to mind / NN-Net training frameworks, etc.)
> [...] or is encapsulated, out of sight, out of reach of meddling hands.
That's the real issue here! Most language have poor abstractions for parallelism and concurrency. (Many languages don't even differentiate between the two.)
Encapsulating and abstracting is how we make things usable.
Eg letting people roll hash tables by themselves every time they want to use one, would lead to people shooting themselves in the foot more often than not. Compared to that, Python's dicts are dead simple to use. Exactly because they move all the fiddly bits out of the reach of meddling hands.
As a general thought about parallelizing all the things it's true though. When looking for speedups, parallelization granularity has to be tuned and iterated with benchmarking, else your speedups will be poor or negative.
I think the example case in this subthread was about making some long app operations asynchronous and overlapping, which is a more forgiving use case than trying to make a piece of code faster by utilizing multiple cores.
Also Rust is risky to parallelize: you can get deadlocks.
I don't get the obsession of parallel code in low level languages by the way. If you have an architecture where you can afford real parallelism you can afford higher level languages anyway.
In embedded applications you don't usually have the possibility to have parallel code, and even in low level software (for example the classical UNIX utilities), for simplicity and solidity using a single thread is really fine.
Threads also are not really as portable as they seem, different operating systems have different way to manage threads, or even don't supports thread at all.
This is a bad take. ripgrep, to my knowledge, cannot be written in a higher level language without becoming a lot slower.[1] And yet, if I removed its use of parallelism by default, there will be a significantly degraded user experience by virtue of it being a lot slower.
This isn't an "obsession." It's engineering.
[1] - I make this claim loosely. Absence of evidence isn't evidence of absence and all that. But if I saw ripgrep implemented in, say, Python and it matched speed in the majority of cases, I would learn something.
I am not trying to contradict anyone here, but any language mature enough to have an impl/way to not have arbitrary performance ceilings needs access to inline assembly/SIMD. Cython/Nim/SBCL can all do that..probably Haskell..Not so sure about Go or Swift. Anyway, many languages can respond well to optimization effort. I doubt anyone disagrees.
At the point of realizing the above no ceiling bit, the argument devolves to more one about (fairly subjective) high/low levelness of the code itself/the effort applied to optimizing, not about the language the code is written in. So, it's not very informative and tends to go nowhere (EDIT: especially when the focus is on a single, highly optimized tool like `rg` as opposed to "broad demographic traits" of pools of developers, and "levelness" is often somewhat subjective, too).
You're missing the context I think. Look at what I was responding to in my initial message in this thread:
> If you have an architecture where you can afford real parallelism you can afford higher level languages anyway.
My response is, "no you can't, and here's an example."
> but any language mature enough to have an impl/way to not have arbitrary performance ceilings needs access to inline assembly/SIMD
If you ported ripgrep to Python and the vast majority of it was in C or Assembly, then I would say, "that's consistent with my claim: your port isn't in Python."
My claim is likely more subtle than you might imagine. ripgrep has many performance sensitive areas. It isn't enough to, say, implement the regex engine in C and write some glue code around that. It won't be good enough. (Or at least, that's my claim. If I'm proven wrong, then as I said, I'd learn something.)
> At the point of realizing the above no ceiling bit, the argument devolves to more one about (fairly subjective) high/low levelness of the code itself/the effort applied to optimizing, not about the language the code is written in. So, it's not very informative and tends to go nowhere.
I agree that it's pretty subjective and wishy washy. But when someone goes around talking nonsense like "if parallelism is a benefit then you're fine with a higher level language," you kind of have to work with what you got. A good counter example to that nonsense is to show a program that is written is a "lower" level language that simultaneously benefits from parallelism and wouldn't be appropriate to do in a higher level language. I happen to have one of those in my back-pocket. :-) (xsv is another example. Compare it with csvkit, even though csvkit's CSV parser is written in C, it's still dog slow, because the code around the CSV parser matters.)
Ok. "Afford parallelism => afford high level" with the implication of HL=slow does sound pretty off base. So, fair enough.
FWIW, as per your subtle claim, it all seems pretty hot spot optimizable to me, at least if you include the memchr/utf8-regex engine in "hot spot". I do think the entire framing has much measurement vagueness ("hot", "vast majority", "levelness", and others) & is unlikely to be helpful, as explained. In terms of "evidence", I do not know of a competitor who has put the care into such a tool to even try to measure, though. { And I love rg. Many thanks and no offense at all was intended! }
ack might be an example. It's Perl, not Python, and its author is on record as saying that performance isn't his goal. So it's a bit of a strained one. But yes, it's true, I don't know any other serious grep clone in a language like Python. This is why I hedged everything initially by saying that I know that absence of evidence isn't evidence of absence. :-) And in particular, I framed this as, "I would learn something," rather than, "this is objective fact." So long as my standard is my own experience, the hand wavy aspect of this works a bit better IMO.
> I do not know of a competitor who has put the care into such a tool to even try to measure, though.
Right. Like for example, I am certain enough about my claim that I would never even attempt to do it in the first place. I would guess that others think the same. With that said, people have written grep's in Python and the like, and last time I checked, they were very slow. But yeah, the "development effort" angle of this likely makes such tools inappropriate for a serious comparison to support my claim. But then again, if I'm right, the development effort required to make a Python grep be as fast as ripgrep is insurmountable.
> it all seems pretty hot spot optimizable to me
As long as we're okay with being hand wavy, then I would say that it's unlikely. Many of the optimizations in ripgrep have to do with amortizing allocation, and that kind of optimization is just nearly completely absent in a language like Python unless you drop down into C. This amortization principle is pervasive and applies as deep as regex internals to the code the simply prints ripgrep's output (which is in and of itself a complex beast and quite performance sensitive in workloads with lots of matches), and oodles of stuff inbetween.
> { And I love rg. Many thanks and no offense at all was intended! }
:-) No offense taken. This is by far the best convo I'm having in this HN post. Lol.
When I used to write in Cython + NumPy I would pre-allocate numpy arrays written into by Cython. It's C-like, but because of the gradual typing I think firmly in the higher level (for some value of "er"). One can certainly do that stuff in Nim/SBCL/etc. (and one sees it done).
While allocation is pretty pervasive, I'm skeptical that everywhere or even most places you do it is an important perf bottleneck. Without a count of these 20 times it matters and these 40 it doesn't, it's just kind guesswork from an all too often frail human memory/attention that "ignores the noise" by its very nature. You might be right. Just trying to add some color. :-)
Another way to think of this is to imagine your own codebase "in reverse". "If I drop this optim, would I see it on that profile?" Or look at the biggest piles of code in your repo and ask "Is this in the critical path/really perf necessary?" and the like. Under the assumption that higher level things would be a lot shorter that kind of thought experiment can inform. Maybe an approach toward more objectivity, anyway. Little strewn about tidbits in every module don't really count { to me :-) } - that speaks more to abstraction problems.
But I don't think there is a lot of value in all the above gendankenizing. While I realize some bad "just throw money at it" kicked this off, one of my big objections to the entire framing is that I think people and their APIs really "alter the level" of a language. Indeed their experience with the language has big impact there. Every one reading this knows C's `printf(fmt, arg1, arg2,..)`. Yet, I'd bet under 1% have heard of/thought to do an allocating (or preallocated) string builder variant like `str(sub1, sub2, ..., NULL)` or using/acquiring something like glibc's `asprintf`. People will say "C is low level - It has no string concatenation!". Yet, inside of an hour or two most medium-skill devs could write my above variadic string builder or learn about vasprintf. Or learn about Boehm-Wiser for garbage collected C or 100 other things like that CTL I mentioned elsewhere in this thread.
So what "level" is C, the language? Beats me. Does it have concatenation? Well, not spelled "a+b" but maybe spelled not much worse "str(a,b,NULL)". Level all just depends so much on how you use it. Performance is similar. Much C++ (and Rust for that matter) is terribly inefficient because of reputations for being "fast languages" leading to less care (or maybe just being done by junior devs..). These "depends" carry over to almost anything..not just Rust or C, but sometimes even English. I am usually told I write in much too detailed a way and a trimmer way might have higher persuasion/communication performance! { How's that for "meta"? ;-) }
> This is by far the best convo I'm having in this HN post. Lol.
Cool, cool. There can be a lot of "Rust Rage" out there (in both directions, probably). :)
Anyway, I don't think we'll resolve anything objective here, but don't take a lack of response as indicating anything other than that. You aren't making any strong objective claims to really rebut and I'm glad that you personally undertook the challenge to do ripgrep in any language. I do think many might have done..maybe Ada, too, and probably many more, but maybe all at the same "realized levelness". You just did not know them/feel confident about getting peformance in them. Which is fine. A.Ok, even! I guess your other biggy is Go and that might actually not have worked of all the alternatives bandied about by pjmlp and myself so far.
> While allocation is pretty pervasive, I'm skeptical that everywhere or even most places you do it is an important perf bottleneck. Without a count of these 20 times it matters and these 40 it doesn't, it's just kind guesswork from an all too often frail human memory/attention that "ignores the noise" by its very nature. You might be right. Just trying to add some color. :-)
In general I agree. But I'm saying what I'm saying because of all the times I've had to change my code to amortize allocation rather than not do it. It's just pervasive because there are all sorts of little buffers everywhere in different parts of the code. And those were put there because of experimenting that said the program benefited from them.
The key here is that the loops inside of ripgrep can grow quite large pretty quickly. There's the obvious "loop over all files," and then there's "loop over all lines" and then "loop over all matches." ripgrep has to do work in each of those loops and sometimes the work requires allocation. Even allocations at the outermost loop (looping over all files) can cause noticeable degradations in speed for some workloads.
This is why I'm so certain.
The numpy example is a good one where a substantial amount of code has been written to cater to one very specific domain. And in that domain, it's true, you can write programs that are very fast.
> So what "level" is C, the language?
Oh I see, I don't think I realized you wanted to go in this direction. I think I would just say that I absolutely agree that describing languages as "levels" is problematic. There's lots of good counter examples and what not. For example, one could say that Rust is both high level and low level and still be correct.
But like, for example, I would say that "Python is high level" is correct and "Python is low level" is probably not. But they are exceptionally general statements and I'm sure counter-examples exist. They are, after all, inherently relativistic statements, so your baseline matters.
That's kind of why I've stayed in "hand wavy" territory here. If we wanted to come down to Earth, we could, for example, replace "high level languages" in the poster's original statement with something more precise but also more verbose that this discussion still largely fits.
> I am usually told I write in much too detailed a way and a trimmer way might have higher persuasion/communication performance! { How's that for "meta"? ;-) }
Yeah, it's hard to be both pithy and precise. So usually when one is pithy, it's good to take the charitable interpretation of it. But we are technical folks, and chiming in with clarifications is to be expected.
> I don't think we'll resolve anything objective here
Most definitely. At the end of the day, I have a prior about what's possible in certain languages, and if that prior is proven wrong, then invariably, my mental model gets updated. Some priors are stronger than others. :-)
> You aren't making any strong objective claims to really rebut
Right. Or rather, my claims are rooted in my own experience. If we were going to test this, we'd probably want to build a smaller model of ripgrep in Rust, then try implementing that in various languages and see how far we can get. The problem with that is that the model has to be complex enough to model some reasonable real world usage. As you remove features from ripgrep, so to do you remove the need for different kinds of optimizations. For example, if ripgrep didn't have replacements or didn't do anything other than memory map files, then that's two sources of alloc amortization that aren't needed. So ultimately, doing this test would be expensive. And that's ignoring the quibbling folks will ultimately have about whether or not it's fair.
> I guess your other biggy is Go and that might actually not have worked of all the alternatives bandied about by pjmlp and myself so far.
I would guess Go would have a much better shot than Python. But even Go might be tricky. Someone tried to write a source code line counter in Go, put quite a bit of effort into it, and couldn't get over the GC hurdle: https://boyter.org/posts/sloc-cloc-code/ (subsequent blog posts on the topic discuss GC as well).
I feel we've talked past each other about what is/is not Python a few times. There is Cython and Pythran and Pypy and ShedSkin and Numba and others that are targeting, for lack of a more precise term, "extreme compatibility with" CPython, but also trying to provide an escape hatch for performance which includes in-language low levelness including allocation tricks that are not "mainstream CPython" (well, Pypy may not have those...).
My first reply was questioning "what counts" as "Python". Cython is its own language, not just "C", nor just "Python", but can do "low level things" such as using C's alloca. Maybe the only prior update here is on the diversity of "Python" impls. There are a lot. This is another reason why language levelness is hard to pin down which was always my main point, upon which we do not disagree. Maybe this is what you meant by "exceptionally general", but I kinda feel like "there isn't just one 'Python'" got lost. { There used to be a joke.."Linux isn't" related to the variety of distros/default configs/etc. :-) }
Advice-wise, I would say that your claim can be closer to easily true if you adjust it to say "ripgrep needs 'low level tricks' to be fast and a language that allows them, such as Rust". That phrasing side-steps worrying about levelnesses in the large of programming languages, re-assigns it to techniques which is more concrete and begs the question of technique enumeration. That is the right question to beg, though, if not in this conversation then in others. You might learn how each and every technique has representation in various other programming languages. It's late for me, though. So, good night!
Ah I see. You are right. I missed that you were going after that. I'm personally only really familiar with CPython, so that is indeed what I had in mind. To be honest, I don't really know what a ripgrep in Cython would look like. Is there a separate Cython standard library, for example? Or do you still use Python's main standard library?
We don't have to tumble down that rabbit hole though. If someone wrote a ripgrep in Cython and matched performance, then I would definitely learn something.
> "ripgrep needs 'low level tricks' to be fast and a language that allows them, such as Rust"
I might use that, sure. I think my point above was that I had to riff off of someone else's language. But I think we covered that. :-) In any case, yes, that phrasing sounds better.
> I do not know of a competitor who has put the care into such a tool to even try to measure, though.
As an aside, I'm the author of ack, and I would ask that folks not use the word "competitor" to describe different projects in the same space. Speaking for me/ack and burntsushi/ripgrep, there is absolutely no competition between us. We have different projects that do similar things, and neither of us is trying to best the other. We are each trying to make the best project we can for the needs we are looking to fill. ripgrep won't replace ack, ack won't replace ripgrep, and neither one will replace plain ol' grep. Each has its place.
Hah. I'm highly skeptical. But I suppose if anyone could do it, it'd be him. I would certainly learn something. :-)
I've tried optimizing Haskell code myself before. It did not go well. It was an implementation of the Viterbi algorithm actually. We ported it to Standard ML and C and measured performance. mlton did quite well at least.
I suspect you could make a very Haskell-like language that's also really fast, but you'd have to base it on linear types from the ground up, and make everything total by default. (Hide non-total parts behind some type 'tag' like we do with IO in current Haskell (and have something like unsafePerformPartial when you know your code is total, but can't convince the compiler).)
That way the compiler can be much more aggressive about making things strict.
Cython with all the appropriate cdef type declarations can match C and so might also do it. Not sure Cython exactly counts as "Python"..it's more a superset/dialect { and I also doubt such a port would hold many lessons for @burntsushi, but it bore noting. }
You would go to parallelism precisely on those platforms where simpler performance fixes (changing some data structures or implementing limited sections in a fast language) are insuficient. Eficient parallelization of an existing algorithm is a major undertaking.
> In embedded applications you don't usually have the possibility to have parallel code, and even in low level software (for example the classical UNIX utilities), for simplicity and solidity using a single thread is really fine.
Depends on which of the classic utilities you are talking about.
Many of them are typically IO bound. You might not get much out of throwing more CPU at them.
Indeed. Many of the optimizations ripgrep (and the underlying regex engine) does only show benefits if the data you're searching is already in memory.[1] The same is true of GNU grep. This is because searching data that's in your OS's file cache is an exceptionally common case.
[1] - I'm assuming commodity SSD in the range of a few hundred MB/s read speed. This will likely become less true as the prevalence of faster SSDs increases (low single digit GB/s).
No it's not. Its regex library is written in Rust, but was inspired by RE2. It shares no code with RE2. (And RE2 is a C++ library, not C.)
Off the top of my head, the only C code in ripgrep is optional integration with PCRE2. In addition to whatever libc is being used on POSIX platforms. Everything else is pure Rust.
It couldn't figure it out from looking through ripgrep's website: does ripgrep support intersection and complement of expressions? Like eg https://github.com/google/redgrep does.
Regular languages are closed under those operations after all.
No, it doesn't. It's only theoretically easy to implement. In practice, they explode the size of the underlying FSM. Moreover, in a command line tool, it's somewhat easy to work around that through the `-v` switch and shell pipelining.
ripgrep's regex syntax is the same as Rust's regex crate: https://docs.rs/regex/1.4.4/regex/#syntax (Which is in turn similar to RE2, although it supports a bit more niceties.)
> No, it doesn't. It's only theoretically easy to implement.
Oh, I didn't say anything about easy! I am on and off working on a Haskell re-implementation (but with GADTs and in Oleg's tagless final interpreter style etc, so it's more about exploring the type system).
> In practice, they explode the size of the underlying FSM.
You may be right, but that's still better than the gymnastics you'd have to do by hand to get the same features out of a 'normal' regex.
> Moreover, in a command line tool, it's somewhat easy to work around that through the `-v` switch and shell pipelining.
Alas, that only works, if your intersection or complement happen at the top level. You can't do something like
Perhaps I'll try and implement a basic version of redgrep in Rust as an exercise. (I just want something that supports basically all the operations regular languages are closed, but don't care too much about speed, as long as the runtime complexity is linear.)
Yeah sorry, I've gotten asked this question a lot. The issue is that building a production grade regex engine---even when it's restricted to regular languages---requires a lot more engineering than theory. And these particular features just don't really pull their weight IMO. They are performance footguns, and IMO, are also tricky to reason about inside of regex syntax.
If you get something working, I'd love to look at it though! Especially if you're building in a tagless final interpreter style. I find that approach extremely elegant.
For my current attempts, I bit off more than I could chew:
I tried to build a system that not only recognizes regular languages, but also serves as a parser for them (a la Parsec).
The latter approach pushes you to support something like fmap, but the whole derivatives-based approach needs more 'introspection' so support general mapping via fmap (ie a->b) is out, and you can only support things that you have more control over than functions.
(And in general, I am doing bifunctors, because I want the complement of the complement be the original thing.)
Sorry, if that's a bit confused.. If I was a better theoretician, I could probably work it out.
I haven't touched the code in a while. But recently I have thought about the theory some more. The Brzozowski derivative introduced the concept of multiplicative inverse of a string. I am working out the ramifications of extending that to the multiplicative inverse of arbitrary regular expressions. (The results might already be in the literature. I haven't looked much.)
I don't expect anything groundbreaking to come out of that, but I hope my understanding will improve.
> And these particular features just don't really pull their weight IMO. They are performance footguns, and IMO, are also tricky to reason about inside of regex syntax.
Well, in theory I could 'just' write a preprocessor that takes my regex with intersection and complement and translates it to a more traditional one. I wouldn't care too much if that's not very efficient.
I'm interested in those features because of the beauty of the theory, but it would also help make production regular expressions more modular.
Eg if you have a regular expression to decide on what's a valid username for someone to sign up to your system. You decide to use email addresses as your usernames, so the main qualification is that users can receive an email on it. But because they will be visible to other users, you have some additional requirements:
'.{0,100} & [^@]@[^@] & not (.(root|admin|<some offensive term>).@.) & not (.<sql injection>.*)'
That's a silly example. I think in production, I would be more likely to see something as complicated as this in eg some ad-hoc log parsing.
> The issue is that building a production grade regex engine---even when it's restricted to regular languages---requires a lot more engineering than theory.
The interesting thing here is that rust has good threading and fantastic crates
I played with making a regex library in rust. Which, as per RE2 design involves constructing graphs and glueing them together as the regex is traversed
This requires a cycle catching gc, or, just a preallocated arena... It was my first foray into rust and felt I would need to be hitting into unsafe, which I wasn't ready for. Array indexing might decompose into an arena, but syntactically just a bit messier (imho)
Would be interesting to see how the RE2 does it in rust (didn't know that)
I like how the article shows both sides of the fence, it makes me realize:
I get a lot of optimizations from ptr stuffing in c. But sometimes we should lay down the good, for the better
You're overcomplicating it. When it comes to finite state machines at least, it's very easy to use an ID index instead of the raw pointer itself. That's exactly what the regex crate does.
For reference, I am also the author of the regex crate. The only unsafe it uses specific to finite automata is to do explicit elimination of bounds checks in the core hybrid NFA/DFA loop.
> When it comes to finite state machines at least, it's very easy to use an ID index instead of the raw pointer itself.
As an old C programmer, the difference between an array index and a pointer caught me by surprise. In C a pointer is just an unchecked offset into memory. A real array index is just a unchecked offset into ... maybe a smaller chunk of raw memory.
But in rust, an array index is something that comes with additional bounds checking overheads with every use. And the memory it points to is also constrained - the entire array has to be initialised, so if the index passes the bounds check you are guaranteed rusts memory consistency invariants are preserved. Indexes also allow you to escape the borrow checker. If you own the slice, there is no need to prove you can access an element of the slice.
So yeah, you can use indexes instead of pointers, but for rust that's like saying you can use recursion instead of iteration. Indexing and pointers are two very different things in rust.
I guess so. But note that I didn't equate them. I just said that you can use an ID index instead. For the particular program of FSMs, they work very well.
If bounds checks prove to be a problem, you can explicitly elide them. Indeed, Rust's regex does just that. :-)
> I played with making a regex library in rust. Which, as per RE2 design involves constructing graphs and glueing them together as the regex is traversed
Has anyone built a production grade regex engine using derivatives? I don't think I've seen one. I personally always get stuck at how to handle things like captures or the very large Unicode character classes. Or hacking in look-around. (It's been a while since I've given this thought though, so I'm not sure I'll be able to elaborate much.)
I've made some attempts, but nothing production grade.
About large character classes: how are those harder than in approaches? If you build any FSM you have to deal with those, don't you?
One way to handle them that works well when the characters in your classes are mostly next to each other unicode, is to express your state transition function as an 'interval map'
What I mean is that eg a hash table or an array lets you build representations of mathematical functions that map points to values.
You want something that can model a step function.
You can either roll your own, or write something around a sorted-map data structure.
The keys in your sorted map are the 'edges' of your characters classes (eg where they start and end).
Does that make sense? Or am I misunderstanding the problem?
> I personally always get stuck at how to handle things like captures [...]
Let me think about that one for a while. Some Googling suggests https://github.com/elfsternberg/barre but they don't seem to support intersection, complement or stripping prefixes.
What do you want your capture groups to do? Do you eg just want to return pointers to where you captured them (if any)?
But that's only for parsing the regex itself. I don't see any match APIs that utilize them. I wouldn't expect to either, because you can't implement capturing inside a DFA. (You need a tagged DFA, which is a strictly more powerful thing. But in that case, the DFA size explodes. See the re2c project and their associated papers.)
If I'm remembering correctly, I think the problem with derivatives is that they jump straight to a DFA. You can't do that in a production regex engine because a DFA's worst case size is exponential in the size of the regex.
> If I'm remembering correctly, I think the problem with derivatives is that they jump straight to a DFA. You can't do that in a production regex engine because a DFA's worst case size is exponential in the size of the regex.
Oh, that's interesting! Because I actually worked on some approaches that don't jump directly to the DFA.
The problem is the notion of (extended) NFA you need is quite a bit more complicated when you support intersection and complement.
Indeed. And in the regex crate and RE2, for example, captures are only implemented in the "NFA" engines (specifically, the PikeVM and the bounded backtracker). So if you support captures, then those engines have to be able to support everything.
I don’t think people are downvoting you because they disagree on a matter of opinion. You’ve literally got the author of ripgrep having replied to you to tell you that what you’ve said is categorically false.
I anticipated I could well be wrong. I ALSO anticipated it would be a hard statement for people to take
I think it was a reasonable statement -- I can't research everything I say, and I had read re2 and regexes in rust to be the same
Interesting to read about redgrep and derivatives approach. Currently I'm programming a language that adds turing completeness to PEG expressions -- as in functions, but extended so the lhs is like a PEG -- just as function body can call sub functions, so too can the lhs
I'm hoping this will give a simple unified language
--
Philosophically:
We make mistakes. If we can't handle that, then either we don't speak or program; or we deny it, program in c, then have flamewars and real wars
Or thirdly, we accept it, program in rust, and let others correct us
We can say you are my rustc compiler. So in effect I used a rust philosophy..... While programming in c
I guess the thing is that if we’re not sure whether or not what we’re saying is true, it can be considerate to phrase it that way, e.g. “I think ripgrep’s regex library is written in C” rather than stating it as a fact. While it is particularly likely that folks on this website will correct mistaken statements, stating them as fact seems more likely to potentially spread misinformation.
But anyways, cheers and good luck with your programming language!
> C libraries typically return opaque pointers to their data structures, to hide implementation details and ensure there's only one copy of each instance of the struct. This costs heap allocations and pointer indirections. Rust's built-in privacy, unique ownership rules, and coding conventions let libraries expose their objects by value
The primary reason c libraries do this is not for safety, but to maintain ABI compatibility. Rust eschews dynamic linking, which is why it doesn't bother. Common lisp, for instance, does the same thing as c, for similar reasons: the layout of structures may change, and existing code in the image has to be able to deal with it.
> Rust by default can inline functions from the standard library, dependencies, and other compilation units. In C I'm sometimes reluctant to split files or use libraries, because it affects inlining
This is again because c is conventionally dynamically linked, and rust statically linked. If you use LTO, cross-module inlining will happen.
Rust provides ABI compatibility against its C ABI, and if you want you can dynamically link against that. What Rust eschews is the insane fragile ABI compatibility of C++, which is a huge pain to deal with as a user:
I don't think we'll ever see as comprehensive an ABI out of Rust as we get out of C++, because exposing that much incidental complexity is a bad idea. Maybe we'll get some incremental improvements over time. Or maybe C ABIs are the sweet spot.
Rust has yet to standardize an ABI. Yes you can call or expose a function with C calling conventions. However, you cant pass all native rust types like this, and lose some semantics.
However, as the parent comment you responded to you can enable LTO when compiling C. As rust is mostly always statically linked it basically always got LTO optimizations.
Even with static linking, Rust produces separate compilation units a least at the crate level (and depending on compiler settings, within crates). You won't get LTO between crates if you don't explicitly request it. It does allow inlining across compilation units without LTO, but only for functions explicitly marked as `#[inline]`.
Swift has a stable ABI. It makes different tradeoffs than rust, but I don't think complexity is the cliff. There is a good overview at https://gankra.github.io/blah/swift-abi/
Swift has a stable ABI at the cost of what amounts to runtime reflection, which is expensive. That doesn't really fit with the goals of Rust, I don't think.
This is misleading, especially since Swift binaries do typically ship with actual reflection metadata (unless it is stripped out). The Swift ABI does keep layout information behind a pointer in certain cases, but if you squint at it funny it's basically a vtable but for data. (Actually, even more so than non-fragile ivars are in Objective-C, because I believe actual offsets are not provided, rather you get getter/setter functions…)
I don't disagree that Rust probably would not go this way, but I think that's less "this is spooky reflection" and more "Rust likes static linking and cares less about stable ABIs, plus the general attitude of 'if you're going to make an indirect call the language should make you work for it'".
Do you have a source on this? I didn't think Swift requires runtime reflection to make calling across module boundaries work - I thought `.swiftmodule` files are essentially IR code to avoid this
Pretty sure the link the parent (to my comment) provided explains this.
It's not the same kind of runtime reflection people talk about when they (for example) use reflection in Java. It's hidden from the library-using programmer, but the calling needs to "communicate" with the library to figure out data layouts and such, and that sounds a lot like reflection to me.
Yes, and if you use the C abi to dynamically link rust code, you will have exactly the same problem as c: you can't change the layout of your structures without breaking compatibility, unless you use indirecting wrappers.
That's ABI compatibility of the language, not of a particular API.
If you have an API that allows the caller to instantiate a structure on the stack and pass a reference to it to your function, then the caller must now be recompiled when the size of that structure changes. If that API now resides in a separate dynamic library, then changing the size of the structure is an ABI-breaking change, regardless of the language.
-rwxr-xr-x 1 root root 199K Nov 10 06:37 /usr/bin/grep
-rwxr-xr-x 1 root root 4.2M Jan 19 09:31 /usr/bin/rg
My very unscientific measurement of the startup time of grep vs ripgrep is 10ms when the cache is cold (ie, never run before) and 3ms when the cache is hot (ie, was run seconds prior). For grep even in the cold case libc will already be in memory, of course. The point I'm trying to make is even the worst case, 10ms, is irrelevant to a human using the thing.
However, speaking as a Debian Developer, it makes a huge difference to maintaining the two systems that use the two programs. If a security bug is found in libc, all Debian has to do is make the fixed version of libc as a security update. If a bug is found in the rust stdlib create Debian has to track down every ripgrep like program that statically includes it, recompile it. There are current 21,000 packages that link to libc6 right in Debian right now. If it was statically linked, Debian would have to rebuilt and distribute _all_ of them. (As a side note, Debian has a lot hardware resources donated to it but if libc wasn't dynamlic I wonder if it could get security updates to a series of bugs in libc6 out in a timely fashion.)
I don't know rust well, but I thought it could dynamically link. The Debian rust packagers don't, for some reason. (As opposed 21,000 dependencies, libstd-rust has 1.) I guess there must be some kink in the rust tool chain that makes it easier not to. I imagine that would have to change if rust replaces C.
I am sympathetic to the point you make but to be accurate, one can consume and create C and C compatible dynamic libraries with rust. So, one is not “losing” something because what you (and me) want - dynamic linking and shared libraries with a stable and safe rust ABI - was not there to begin with.
Also to be pedantic, C doesn't spec anything about linkage. Shared objects and how linkers use them to compose programs is a system detail more than a language one.
The reason Common Lisp uses pointers is because it is dynamically typed. It’s not some principled position about ABI compatibility. If I define an RGB struct for colours, it isn’t going to change but it would still need to be passed by reference because the language can’t enforce that the variable which holds the RGBs will only ever hold 3 word values. Similarly, the reason floats are often passed by reference isn’t some principled stance about the float representation maybe changing, it’s that you can’t fit a float and the information that you have a float into a single word[1].
If instead you’re referring to the fact that all the fields of a struct aren’t explicitly obvious when you have such a value, well I don’t really agree that it’s always what you want. A great thing about pattern matching with exhaustiveness checks is that it forces you to acknowledge that you don’t care about new record fields (though the Common Lisp way of dealing with this probably involves CLOS instead).
[1] some implementations may use NaN-boxing to get around this
Lisp users pointers because of the realization that the entities in a computerized implementation of symbolic processing can be adequately represented by tiny index tokens that fit into machine registers, whose properties are implemented elsewhere, and these tokens can be whipped around inside the program very quickly.
What your describing are symbols where the properties are much less important than the identity. Most CL implementations will use fixnums rather than pointers when possible because they don’t have some kind of philosophical affinity to pointers. For data structures, pointers aren’t so good with modern hardware. The reason Common Lisp tends to have to use pointers is that the type system cannot provide information about how big objects are. Compare this to the arrays which are often better at packing because they can know how big their elements are.
This is similar in typed languages with polymorphism like Haskell or ocaml where a function like concat (taking a list of lists to a single list) needs to work when the elements are floats (morally 8 bytes each) or bools (morally 1 bit each). The solution is to write the code once and have everything be in one word, either a fixnum or a pointer.
Dynamic linking is one thing I miss from Swift - I used dynamic linking for hot code reloading for several applications, which resulted in super fast and useful development loops. Given Rust's sometimes long compile times, this is something which would be welcome.
> This costs heap allocations and pointer indirections.
Heap allocations, yes; pointer indirections no.
A structure is referenced by pointer no matter what. Remember that the stack is accessed via a stack pointer.
The performance cost is that there are no inline functions for a truly opaque type; everything goes through a function call. Indirect access through functions is the cost, which is worse than a mere pointer indirection.
An API has to be well-designed this regard; it has to anticipate the likely use cases that are going to be performance critical and avoid perpetrating a design in which the application has to make millions of API calls in an inner loop. Opaqueness is more abstract and so it puts designers on their toes to create good abstractions instead of "oh, the user has all the access to everything, so they have all the rope they need".
Opaque structures don't have to cost heap allocations either. An API can provide a way to ask "what is the size of this opaque type" and the client can then provide the memory, e.g. by using alloca on the stack. This is still future-proof against changes in the size, compared to a compile-time size taken from a "sizeof struct" in some header file. Another alternative is to have some worst-case size represented as a type. An example of this is the POSIX struct sockaddr_storage in the sockets API. Though the individual sockaddrs are not opaque, the concept of providing a non-opaque worst-case storage type for an opaque object would work fine.
There can be half-opaque types: part of the structure can be declared (e.g. via some struct type that is documened as "do not use in application code"). Inline functions use that for direct access to some common fields.
Escape analysis is tough in C, and data returned by pointer may be pessimistically assumed to have escaped, forcing exact memory accesses. OTOH on-stack struct is more likely to get fields optimized as if they were local variables. Plus x86 has special treatment for the stack, treating it almost like a register file.
Sure, there are libraries which have `init(&struct, sizeof(struct))`. This adds extra ABI fragility, and doesn't hide fields unless the lib maintains two versions of a struct. Some libraries that started with such ABI end up adding extra fields behind internal indirection instead of breaking the ABI. This is of course all solvable, and there's no hard limit for C there. But different concerns nudge users towards different solutions. Rust doesn't have a stable ABI, so the laziest good way is to return by value and hope the constructor gets inlined. In C the solution that is both accepted as a decent practice and also the laziest is to return malloced opaque struct.
I'd like to point out that this is not always the case. Some libraries, especially those with embedded systems in mind, allow you to provide your own memory buffer (which might live on the stack), where the object should be constructed. Others allow you to pass your own allocator.
> "Clever" memory use is frowned upon in Rust. In C, anything goes. For example, in C I'd be tempted to reuse a buffer allocated for one purpose for another purpose later (a technique known as HEARTBLEED).
That is nice, although I think Heartbleed was due to a missing bounds check enabling the reading of adjacent memory, not due to reusing the same buffer...
If my memory is correct: yes, the root cause was a missing bounds check, but the vulnerability was much worse than it could have been because OpenSSL tended to allocate small blocks of memory and aggressively reuse them — meaning the exploited buffer was very likely to be close in proximity to sensitive information.
I don’t have time right now to research the full details, but the Wikipedia article gives a clue:
> Theo de Raadt, founder and leader of the OpenBSD and OpenSSH projects, has criticized the OpenSSL developers for writing their own memory management routines and thereby, he claims, circumventing OpenBSD C standard library exploit countermeasures, saying "OpenSSL is not developed by a responsible team." Following Heartbleed's disclosure, members of the OpenBSD project forked OpenSSL into LibreSSL.
Until very recently, memory allocators were more than happy to return you the thing you just deallocated if you asked for another allocation of the same size. It makes sense, too: if you're calling malloc/free in a loop, which is pretty common, this is pretty much the best thing you can do for performance. Countless heap exploits later (mostly attacking heap metadata rather than stale data, to be honest) allocators have begun to realize that predictable allocation patterns might not be the best idea, so they're starting to move away from this.
True of the more common ones, but it should be acknowledged that OpenBSD was doing this kind of thing (and many other hardening techniques) before heartbleed, which was the main reason Theo de Raadt was so upset that they decided to circumvent this, because OpenBSD's allocator could have mitigated the impact otherwise.
Even higher-performance mallocs like jemalloc had heap debugging features (poisoning freed memory) before Heartbleed, which -- if enabled -- would catch use-after-frees, so long as libraries and applications didn't circumvent malloc like OpenSSL did (and Python still does AFAIK).
Don't you sort of have to do that if you're writing your own garbage collector, though? I guess for a simple collector you could maintain lists of allocated objects separately, but precisely controlling where the memory is allocated is important for any kind of performant implementation.
Python does refcount-based memory management. It's not a GC design. You don't have to retain objects in an internal linked list when the refcount drops to zero, but CPython does, purely as a performance optimization.
Type-specific free lists (just a few examples; there are more):
And also just wrapping malloc in general. There's no refcounting reason for this, they just assume system malloc is slow (which might be true, for glibc) and wrap it in the default build configuration:
So many layers of wrapping malloc, just because system allocators were slow in 2000. Defeats free() poisoning and ASAN. obmalloc can be disabled by turning off PYMALLOC, but that doesn't disable the per-type freelists IIRC. And PYMALLOC is enabled by default.
Thanks for the links! I wasn't aware of the PyMem_ layer above, the justification for that does sound bad.
But Python runs a generational GC in addition to refcounting to catch cycles (https://docs.python.org/3/library/gc.html): isn't fine control over allocation necessary for that? E.g. to efficiently clear the nursery?
Ah, good point; at the very least things like zeroing out buffers upon deallocation would have helped. Yes, I was a fan of the commits showing up at opensslrampage.org. One of the highlights was when they found it would use private keys as an entropy source: https://opensslrampage.org/post/83007010531/well-even-if-tim...
That's what happens by using normal malloc/free anyway, no? Implementations of malloc have a strong performance incentive to allocate from the cache hot most recently freed blocks.
Yes, all allocators (except perhaps OpenBSDs from what I see in this thread) do this. It is also why `calloc` exists - because zero-initializing every single allocation is really, really expensive.
Heartbleed wasn't caused by reusing buffers; it was caused by not properly sanitizing the length of the buffer from entrusted input, and reading over it's allocated size, thus allowing the attacker to read into memory that wasn't meant for him.
OpenSSL had its own memory-recycling allocator, which made the bug guarantee leaking OpenSSL's own data. Of course leaking random process memory wouldn't be safe either, but the custom allocator added that extra touch.
OpenSSL, like many other software indeed used a custom allocator, but this hasn't much to do with this to anything at all, as the system allocator also strongly favors giving back memory that once belonged to the same process, as it has to zero memory that belonged to other processes first.
This is of course a kernel feature when the lower level primitives are used that ask for blocks of memory from the kernel, which zeroes them if they had belonged to another process prior, and does not when they had not, and thus strongly favors giving back own memory. — allocators, such as the standard library's one or any custom ones, are built on top of these primitives.
Rust's safety rules also forbid access to uninitialized memory, even if it's just a basic array of bytes. This is an extra protection against accidentally disclosing data from a previous "recycled" allocation.
I did a deep dive into this topic lately when exploring whether to add a language feature to zig for this purpose. I found that, although finnicky, LLVM is able to generate the desired machine code if you give it a simple enough while loop continue expression[1]. So I think it's reasonable to not have a computed goto language feature.
Somewhat off-topic: I just looked into zig, because you mentioned it.
> C++, D, and Go have throw/catch exceptions, so foo() might throw an exception, and prevent bar() from being called. (Of course, even in Zig foo() could deadlock and prevent bar() from being called, but that can happen in any Turing-complete language.)
Well, you could bite the bullet and carefully make Zig non-Turing complete. (Or at least put Turing-completeness behind an escape hatch marked 'unsafe'.)
With respect to deadlocks, there’s little practical difference between an infinite loop and a loop that holds the lock for a very long time.
Languages like Idris and Agda are different because sometimes code isn’t executed at all. A proof may depend on knowing that some code will terminate without running it.
> Languages like Idris and Agda are different because sometimes code isn’t executed at all. A proof may depend on knowing that some code will terminate without running it.
Yes. They are rather different in other respects as well. Though you can produce executable code from Idris and Agda, of course.
> With respect to deadlocks, there’s little practical difference between an infinite loop and a loop that holds the lock for a very long time.
Yes, that's true. Though as a practical matter, I have heard that it's much harder to produce the latter by accident, even though only the former is forbidden.
For perhaps a more practical example, have a look at https://dhall-lang.org/ which also terminates, but doesn't have nearly as much involved proving.
Rust's output[0] is basically the same as Zig in this case. The unsafe is needed here because it's calling extern functions.
However, in this specific instance at least, this isn't as optimal as it could be. What this is basically doing is creating a jump table to find out which branch it should go down. But, because all the functions have the same signature, and each branch does the same thing, what it could have done instead is create a jump table for the function to call. At that point, all it would need to do is use the Inst's discriminant to index into the jump table.
I'm not sure what it would look like in Zig, but it's not that hard to get that from Rust[1]. The drawback of doing it this way is that it now comes with the maintenance overhead of ensuring the order and length of the jump table exactly matches the enum, otherwise you get the wrong function being called, or an out-of-bounds panic. You also need to explicitly handle the End variant anyway because the called function can't return for its parent.
I don't know Zig, but from what I understand it has some pretty nice code generation, so maybe that could help with keeping the array and enum in step here?
As an observation, performance optimized code is almost always effectively single-threaded these days, even when using all the cores on a CPU to very efficiently process workloads. Given this, it is not clear to me that Rust actually buys much when it comes to parallel programming for the purposes of performance. Is there another reason to focus on parallelism aside from performance?
This reminds me of when I use to write supercomputing codes. Lots of programming language nerds would wonder why we didn’t use functional models to simplify concurrency and parallelism. Our code was typically old school C++ (FORTRAN was already falling out of use). The truth was that 1) the software architecture was explicitly single-threaded — some of the first modern thread-per-core designs — to maximize performance, obviating any concerns about mutability and concurrency and 2) the primary performance bottlenecks tended to be memory bandwidth, of which functional programming paradigms tend to be relatively wasteful compared to something like C++. Consequently, C++ was actually simpler and higher performance for massively parallel computation, counterintuitively.
My impression is that what kind of parallelism patterns you need is pretty consistent within entire fields of programming. So you can go an entire career of performance optimization within HPC, game dev, film rendering or trading systems and never use the patterns the others say they use all the time.
My experience with process-based parallelism is that yes on Linux it's basically isomorphic to thread-based parallelism. It's just so much more code to do the same thing.
In Rust adding a new special-purpose background thread with some standard-library channels is 30 lines of code and I can probably even access the same logging system from the other thread.
If I wanted to do that with processes I need to:
- Coordinate a shared memory file over command line arguments or make sure everything is fork-safe
- Find a library for shared-memory queues
- Deal with making sure that if either process crashes the other process goes down with it in a reasonable way.
- Make sure all my monitoring/logging is also hooked up to the other process.
If I want to use a shared memory data-structure with atomics I need to either not use pointers or live dangerously and try and memory-map it at the exact same offsets in each process and ensure I use a special allocator for things in the shared file.
Yes you can do all the same things with both approaches, I just find threads take way less code. It's not too bad if all your processes are doing the same thing, and you also need to scale to many servers anyhow. It's more annoying if you want to have a bunch of different types of special background processes.
When you have built in support for threads in a language, it definitely makes sense that it would be easier to use than operating system mechanisms. For a lot of the non-embedded code that I end up writing, though, there's usually an inherent benefit to using processes over threads. It usually comes down to the benefits of having separate memory spaces. You can safely use code that was never written to be thread-safe, saving time otherwise spent refactoring gnarly old code. Also, it makes it a lot easier to mix and match different languages. For python in particular, it avoids having to battle for the global interpreter lock.
I think what's nice about rust is that, because it makes it difficult to write thread-unsafe code, it's naturally easier to add threading at some point in the future without too much pain. As a result, more applications can benefit from having access to multiple CPU cores. I don't think that's quite the same thing as pure performance per watt, though. That really comes down to how the code was written, and how well the compiler can optimize it. Rust may have some advantages there over C, since it constrains what you can do so much that the compiler has a smaller state space to optimize over. Someone who knows what they're doing in C, though, could likely write very efficient code that effectively uses parallelism, and may gain an edge over rust simply by cleverly leveraging the relative lack of training wheels. For high performance compute, rust vs. C may be a wash. For consumer facing applications, though, the more programs that can use multiple cores to run faster (even if less efficiently), the better.
The bigger issue is coordinating these threads ("workers") with threads from other processes, there is nothing on Windows and Linux to do so, then again I haven't had much experience with Grand Dispatch (OSX) to know if it's worth. Windows has new thread pool API, but even TBB or ConCRT do not use it. (though the new par-support in STL (msvc) does).
Windows, Linux and macOS all have inter-process mutexes and condition-variable-ish constructions. Windows has named mutex, semaphores, and events that can be opened by multiple processes, and the pthread API supports mutex and condition variables in shared memory. Linux additionally supports its futex primitive in shared memory regions (which is how the pthread API is implemented on that OS).
Thanks! Wasn't aware of this possible use, and havent' used much explicitly named pipes/mailslot except for direct communication. Wondering how would still work - I mean something ought to be the "task manager" in all that, coordinating.
I realise your post is an argument in favour of Rust over C for these things, but regardless, you might be interested in a WIP library I've started to solve most of the issues you outlined: https://github.com/amboar/shmapper#libshmap
> In Rust adding a new special-purpose background thread with some standard-library channels is 30 lines of code and I can probably even access the same logging system from the other thread.
Do you happen to have a link to code that does this? This sounds similar to a problem I have right now and I’d love to see what solution you’ve arrived at.
> As an observation, performance optimized code is almost always effectively single-threaded these days, even when using all the cores on a CPU to very efficiently process workloads.
Not my experience at all. One big problem is that most languages in 2021 have very, very poor support for thread-based parallelism. It’s crazy how many languages make it hard to do basic data parallel tasks. That steers people toward writing single threaded code and/or trying to rely on process-based parallelism which is basically strictly worse.
Parallelism in 2021 should not be tightly coupled across threads if performance matters, the limitations of that model are well-understood. There is no way to make that comparatively efficient; the CPU cache waste alone ensures that. Nothing you can do with thread support in a programming language will be competitive with e.g. a purpose-built scheduler + native coroutines. That’s right up against the theoretical limit of what is possible in terms of throughput and it doesn’t have any thread overhead. It does introduce the problem of load shedding across cores but that’s solve for all practical purposes.
I’ve been writing parallel code at the largest scales most of my career. The state-of-the-art architectures are all, effectively, single-threaded with latency-hiding. This model has a lot of mechanical sympathy with real silicon which is why it is used. It is also pleasantly simple in practice.
I don't understand -- isn't what you are suggesting single threaded async code? That might be useful for servers, where you are mostly waiting for other things (like databases and networks), but in ithe places the point of parallel is to get all your CPUs doing useful work, and then (in my experience, happy to be shown counterexamples), coroutines aren't very useful. You just want to blast a bunch of threads (or rightly coupled processes)
Yes, roughly single-threaded async, with each core running a disjoint subset of the workload on data private to that core. You can’t beat the operation throughput. The software architecture challenge is shedding load between cores, since this will hotspot under real workloads with a naive design. Fortunately, smoothly and dynamically shedding load across cores with minimal overhead and latency is a solved design problem.
It works pretty well for ordinary heavy crunch code too. I originally designed code like this on supercomputers. You do need a practical mechanism for efficiently decomposing loads at a fine granularity but you rarely see applications that don’t have this property that are also compute bound.
Ah, I understand now. I misinterpreted "no thread overhead" as meaning "I'm not running things in multiple threads", like the current node.js/javascript obsession, where we just run code in one thread and use a bunch of async to "parallelise". Sorry!
I've (badly) written code like you describe -- usually by abusing fork to do my initial data structure setup, then using C pipes to pass jobs around. I suspect there are much better ways of doing it, but that parallelised well enough for the stuff I was doing. I'd be interested to know if there are good libraries (or best practices) for doing this kind of parallelism.
I was also confused. I agree that touching raw threads is usually not the right thing to do, and chains of parallel coroutines are one of the good abstractions. It’s crazy how few languages have easy access to that very basic abstraction.
There is not a meaningful semantic difference between what you're describing and what tools like rayon provide (and BTW, threads do just fine when pinned to a core and appropriately managed as they should be in large data processing workloads). Whether threads are used on the backend is largely a distraction, you still have to write things roughly the same way to create correct code (for example, you cannot share memory between tasks on different cores, or on different nodes, without synchronizing somehow).
There are many operations on data that are relatively slow from a CPU’s perspective — filling a cache line, page faulting, cache coherency, acquiring a lock, waiting on I/O to complete, etc. All of these add latency by stalling execution. In conventional software, when these events occur you simply stall execution, possibly triggering a context switch (which is very expensive). In many types of modern systems, these events are extremely frequent.
Latency hiding is a technique where 1) most workloads are trivially decomposed into independent components that can be executed separately and 2) you can infer or directly determine when any particular operation will stall. There are many ways to execute these high latency operations in an asynchronous and non-blocking way such that you can immediate work on some other part of the workload. The “latency-hiding” part is that the CPU is rarely stalled, always switching to a part of the workload that is immediately runnable if possible so that the CPU is never stalled and always doing real, constructive work. Latency-hiding optimizes for throughput, maximizing utilization of the CPU, but potentially increasing the latency of specific sub-operations by virtue of reordering the execution schedule to “hide” the latency of operations that would stall the processor. For many workloads, the latency of the sub-operations doesn’t matter, only the throughput of the total operation. The real advantage of latency-hiding architectures is that you can approach the theoretical IPC of the silicon in real software.
There are exotic CPU architectures explicitly designed for latency hiding, mostly used in supercomputing. Cray/Tera MTA architecture is probably the canonical example as well as the original Xeon Phi. As a class, latency-hiding CPU architectures are sometimes referred to as “barrel processors”. In the case of Cray MTA, the CPU can track 128 separate threads of execution in hardware and automatically switch to a thread that is immediately runnable at each clock cycle. Thread coordination is effectively “free”. In software, switching between logical threads of execution is much more by inference but often sees huge gains in throughput. The only caveat is that you can’t ignore tail latencies in the design — a theoretically optimal latency-hiding architecture may defer execution of an operation indefinitely.
> There are exotic CPU architectures explicitly designed for latency hiding, mostly used in supercomputing.
I don’t know much about supercomputers, but what you described is precisely how all modern GPUs deal with VRAM latency. Each core runs multiple threads, the count is bound by resources used: the more registers and group shared memory a shader uses, the less threads of the shader can be scheduled on the same core. The GPU then switches threads instead of waiting for that latency.
That’s how GPUs can saturate their RAM bandwidth, which exceeds 500 GB/second in modern high-end GPUs.
Latency hiding is a a way to substantially increase throughput by queuing massive numbers of requests or tasks while waiting on expensive resources (e.g. main memory access can have latency in the hundreds of cycles for GPUs). By scheduling enough tasks or enough requests that can be dispatched in parallel (or very soon after one another), after an initial delay you may be able to process the queued requests very quickly (possibly once per clock cycle or two) providing similar overall performance to running the same set of tasks sequentially with very low latency. However, such long pipelines are very prone to stalling, especially if there are data dependencies that prevent loads from being issued early, so getting maximum performance out of code on architecture that heavily exploits latency hiding techniques can require a lot of very specific domain knowledge.
"Threads" are nothing but kernel-mode coroutines with purpose-built schedulers in the kernel.
Redoing the same machinery except in usermode is not the way to get performance.
The problem is that scripting languages don't let you access the kernel API's cleanly due to various braindead design decisions - global locks in the garbage collector, etc.
But the solution isn't to rewrite the kernel in every scripting language, the solution is to learn to make scripting languages that aren't braindead.
There are a few tasks, where process-based parallelism is working fine. Like a forking server for handling network requests. This is quite efficient and works even if the handling function is not thread save. Obviously, you get memory safety as well.
Unfortunately, that is only a certain subsection of problems and usually you want to be able to use parallel computations on the function call level. There the support for parallel computations of Rust or Go shines. When at each point in the program flow you can decide to go parallel.
Doesn't this just move the hard part, the maintenance, configuration and understanding, to a different (and equally complex) abstraction layer?
Personally, I'd prefer to have a single binary that I start with some arguments, then need to also have a launch script, probably in a different language, which needs to coordinate all the starting, stopping, shared state etc.
But, most of my work has been at the workstation level. Maybe it's different once you start needing clusters? The issue is that workstations have grown extremely powerful over the last few years with 50+ cores, 1000s of GPU 'cores', and hundreds of GB of memory. Clusters now bring the same headaches, in hardware, of multiprocess design.
> Doesn't this just move the hard part, the maintenance, configuration and understanding, to a different (and equally complex) abstraction layer?
That's true indeed, but I find it more manageable that way.
To me, managing independent processes instead of threads is especially powerful when the lifecycle of the concurrent work can vary.
A typical example that happens quite often is when you have some kind of producer/consumer application. Say, you need to receive some data from a socket, and then process it.
If implemented with threads, it becomes quite messy very fast. You end up with a big blob binary that spawns whatever the hell it wants and need to engineer some complex configuration file to tell it how many workers you want, etc. You also need some notification to this process to tell him to increase or decrease its amount of workers, etc.
With independent process, it can be much more manageable. You can have a "collector" process that reads the socket and places messages in a shared memory, and if you want more/less workers, you just spawn worker processes to read from the shared memory.
You can make exactly this argument the other way around:
If implemented with processes it becomes quite messy very fast. You end up with a bunch of processes that do whatever the hell they want, and need some complex orchestrator script to tell it how many processes you want, and need some notifications to the orchestrator to increase or decrease the number of workers
With threads, you can have a main thread that reads the socket and places messages in a shared queue, and if you want more/less workers you just spawn worker threads to read from the queue.
> if you want more/less workers you just spawn worker threads to read from the queue.
My point is precisely that's its not trivial to do that.
You now need some signal mechanism to tell your main process to spawn or kill workers. That's an additional layer of complexity.
Things get even worse if you want to add new types of workers that were unplanned.
Say I want a new type of worker that is just a forwarder, or a new type of worker that stores the queue on disk for replaying it.
With a monolithic thread based design, you now need to stop your whole main and worker threads and start a newly built process that support these workers.
With a process based design, you just spawn a new worker process and point it to the existing shared memory without any interruption.
Some languages make it very simple to do parallel operations. Java and Scala parallel streams, for example, are like one line of code. Obviously that won't work for all use cases, but when it does, it is simple.
Heavily multi threaded code is difficult to write correctly. Do it wrong, you wind up with race conditions, data corruption, dead locks because a thread pool or other resource is exhausted, thread leaks because you didn't shut something down correctly. The problems go on and on.
Use a map-reduce style pattern for your work allocation and you lose some small amount of efficiency, but the design becomes much much easier. You can even mix and match different types of work in your reduce stage to keep the different types of hardware busy.
In what way is process based parallelism strictly worse? On Linux a process and a thread in the kernel are both the same thing. The main difference between the two is that threads share memory by default whereas a process would need to explicitly mmap a chunk of memory to share with another process. This means that with threads you get to save RAM because of application code not having to take up more space (except memory is effectively deduplicated already because those are read only blocks) and with threads you have to explicitly guard against trampling over each others’ memory, whereas with processes it’s safety by default and you have to make an explicit choice to share memory, limiting the number of places you can forget to add or check a lock. I will grant you that languages where standard libraries by default shoehorn you into using sockets to communicate between processes do by default introduce more overhead than just using threads. But any serious language will have at least one queue implementation that’s based on very fast primitives over shared virtual memory. I say this as someone whose done projects using a number of different types of parallelism, including process and thread, and can find strengths for all these. I just think treatment of process based parallelism is undeserved: it can be very efficient if done well, and no worse than a lot of the other methods for a large number of use cases.
I do not think your model of threads and processes is correct. Processes have different address spaces whereas threads share an address space. Context switching between threads is much cheaper than context switching between processes because you do not have to swap page tables and do a tlb flush. tlb flushes are extremely expensive. I also think you are misunderstanding how mmap works. mmap is not related to thread spawning.
If you stick to one process per core, the number of TLB flushes doesn't change. You can set processor affinity to make sure of that. If you create more threads/processes than cores, you might be able to get measurable impact.
I don't understand your comment about mmap. It is often used to share memory between related processes.
Not the parent, but... mmap is related to thread spawning (well, process forking) in that using the MAP_SHARED flag will result in a pointer that is valid shared memory for both the parent process and any forked processes.
As pointed out in a sibling, processes have their own address spaces and aren’t as cheap to spawn as threads. I write code involving shared memory. It’s usable, but it also a pretty big pain to get right. It also significantly complicates things like managing memory ownership.
I was really struck by a comment Jonathan Blow made on stream recently: he said he’s never written a parallel for loop in his whole career. I seem to recall the implication being that they’re often not really necessary for performant code. There’s also been some discussion lately about issues with asynchronous code both in Rust and Python. Point being that parallelism still had a ways to go before it’s proven it’s usefulness. However, I agree with you that it would be nice to see more language tootling to make it simpler since i work on some bits of code that I think could benefit from parallelization but the amount of work I’d have to put in mean it’s a very low priority given the savings.
I never wrote a parallel for-loop in 15 years working on Firefox, because it's hard in C++, it's risky and difficult to maintain the thread-safety invariants, and it's not all that useful in most parts of the browser.
I write them quite often in Rust, because Rayon makes it super easy, there is almost no risk because the compiler checks the relevant thread-safety invariants, and I'm working on different problems where data parallelism is much more useful.
I've used them extensively in C++. Doing it manually by managing your own threads is a pain, but simple OpenMP based parallel loops work really well, and also supports tasks like building vectors and simple reductions.
When your loop body uses complex library APIs over complex data it's still hard to be confident in C++ that everything's threadsafe and you're avoiding data races.
Maybe it's not so hard if you're in a domain like HPC where the libraries you use are designed specifically to be used with data parallelism. But when you're pulling together code from different sources that may or may not have been used in an aggressively parallel application before...
I think it's less about libraries and more about the general approach to programming.
In the HPC world, software is usually doing one thing at a time. Most of the time it's either single-threaded, or there are multiple threads doing the same thing for independent chunks of data. There may be shared immutable data and private mutable data but very little shared mutable data. You avoid situations where the behavior of a thread depends on what the other threads are doing. Ideally, there is a single critical section doing simple things in a single place, which should make thread-safety immediately obvious.
You try to avoid being clever. You avoid complex control flows. You avoid the weird middle ground where things are not obviously thread-safe and not obviously unsafe. If you are unsure about an external library, you spend more time familiarizing yourself with it or you only use it in single-threaded contexts. Or you throw it away and reinvent the wheel.
If the APIs that you're interacting with are side-effect free then it's easy. If they are full of side effects, then they aren't written with multithreading in mind and you wouldn't be able to even compile it in Rust. C++ just takes off the training wheels.
It's a bit more complicated than that, because code can be thread-safe but not side-effect-free, but basically you're just restating what I said. C++ makes it hard to be sure code is really safe to use across threads, which means in practice developers should be more reluctant to do so.
The world is full of highly parallel programs getting useful work done. Most graphics, AI and compression libraries (picking 3 easy examples I've worked on) parallelize well, and can usually make use of all the cores you can throw at them.
Jonathan Blow makes good games, but chooses not to make particularly CPU intensive ones. That's fine, but that's also his choice.
He's also currently building one of the fastest compilers around. It's unreasonable to consider that he never encountered use cases where parallelism makes sense.
Indeed, he wasn’t saying parallelism is not useful, just that the specific construct of a parallel for loop was not in his wheelhouse for certain reasons.
My impression of Jon's work is that he requires low enough level access to his hardware so he's the one that makes decisions about where and what runs. Language level parallel for is definitely not that. :D
Parallelism and asynchronous code are not the same, and in the case of Rust they are very much not the same. Parallel for provides massive advantages for many things including game programming (from experience) so with all due respect I think this says more about Jonathan Blow than it does anything about "parallelism still needing to prove itself."
I wrote a parallel iteration (map-reduce) last week in some CPU-heavy code, took 5 minutes with Rayon. Sped my code up by around 10x on a 12-core machine, example benchmark going from 7 seconds to 700 milliseconds. It's serious business.
It really depends of your definition of terms. What do you call "performance optimized"?
For example I consider glyph drawing as "performance optimized". It requires massive parallelism just to be able to display text smoothly in a high definition screen.
But most people will never see it, because they use a library that they call that does all the work for them and do not need to care about that.
The difference is tremendous. We are talking 100x more efficiency just using GPUs alone. You can get 1000x, 10.000x with hardware(electronic chip design) acceleration parallelism(increasing the cost and rigidity, and times to market too).
It is so big that it is a different level. It is not performance alone. It is that some things are so inefficient that are just not practical(like expending a million dollars in your energy bill in order to solve a problem).
Same happens with of course 3D, audio or video recognition. Sensor I/O. Artificial intelligence.
Rust lets you just prototype lots of code in a parallel way in the CPU, even for things that will run in a FPGA or ASIC in the future. It let's you transition smaller steps: CPU->GPU->FPGA->ASIC
> As an observation, performance optimized code is almost always effectively single-threaded these days, even when using all the cores on a CPU to very efficiently process workloads.
Why?
Edit: Thanks for all the replies. It seems this applies to data-parallel workloads only. I'd use a GPU for this. An RTX 3090 has around ~10000 CUDA cores (10000 simultaneous operations) v/s just ~10 for CPUs.
Data locality is everything for computational throughput. Having all data private to a single core is extraordinarily efficient compared to sharing data, and particularly mutable data, across cores.
This creates a new problem: how do you balance load across cores? What if the workload is not evenly distributed across the data held by each core? Real workloads are like this! Fortunately, over the last decade, architectures and techniques for dynamic cross-core load shedding have become smooth and efficient while introducing negligible additional inter-core coordination. At this point, it is a mature way of designing extremely high throughput software.
Can you point to any references/resources summarizing the latest cross-core dynamic load shedding techniques? Are they old techniques just now being applied in practice, or has something new been proposed?
Sometimes your job has few or no inter-task dependencies and so there's no need to share between threads, but there's a heck of a lot of work that needs to be completed.
There's a practical question of whether this is true for realistically encountered problems, with a sufficient threshold on both the size and utility of a "significant task" and with realistic numbers of cores.
Without a requirement of utility it's easy to come up with counterexamples from math, eg "does the Collatz starting from Graham's number reach one?" Once you've exhausted cores that can be used for the actual arithmetic, you are still gated by the decision-making at each step so cores cannot work too far "ahead" of each other. There may well be much smarter things we can do than brute force, but that's not "just coding work" at that point.
Theoretically, it doesn't hold - at some point you have split apart everything that can be split, and you are left with some essential chains of data dependency that cannot be further parallelized.
I think the issue is that the memory is the bottleneck in many applications (i.e. 2 loads per cycle, despite many more Functionanl units) and those workloads tend to be very non-embarassinglt parallel.
Functional programming (especially, say, actor systems) is better for organizing mental models of concurrency when your concurrency is coupled with communication between the components. For hpc, you're typically optimizing for gustafson scaling (versus amdhal scaling) where you are running multiple copies of the same, computationally costly linearly organized code with no coupling between instances except statistical aggregation of results, so there is no particular benefit to functional-style concurrency.
(And some FPLs,like Julia, are perfectly good at hpc anyways)
FWIW, most supercomputing looks nothing like map-reduce; only the most trivial problems look like that. In a data model sense, a lot of supercomputing is join operation intensive, hence why they spend big bucks on high-bandwidth low-latency interconnects. STREAM benchmarks were more predictive of real-world supercomputing code performance than LAPACK in the majority of case 15+ years ago and it became more biased toward the former with time.
The codes I worked on were complex graph analysis, spatiotemporal behavioral analysis, a bit of geospatial environmental modeling, and in prehistoric times thermal and mass transport modeling. These codes (pretty much anything involving reality) are intrinsically tightly coupled across compute nodes. Low-latency interconnects eventually gave way to latency-hiding software architectures but at no point did we use map-reduce as that would have been insanely inefficient given the sparsity and unpredictability of the interactions between nodes.
These were the prototype software architectures for later high-performance databases. Every core is handling thousands or millions of independent shards of the larger computational model, which makes latency-hiding particularly efficient.
This is similar to my experience too. If people can write out a single Python function and apply it to all of a large amount of data then great, but that isn’t the majority of supercomputing programming.
I've done this across 10,000 cores, and helped people do similar tasks as well, I guess my experience is not normal. But IIRC even for things that, say use MPI to coordinate cores in the small you want to repeat across many cores, limit the spread across cores, limit your blocking coordination, lest you wind up with the old joke that "HPC is the art of turning a CPU-bound job into an I/O bound job".
I have nothing to add, just wanted to say that reading all your comments this morning has been fascinating and educational. I’ve really enjoyed it so Thanks for sharing. This kind of expert insight is one of the reasons i visit HN.
When we talk about support for threading/concurrent programming in programming languages, it is less about how to reach the theoretical limits of your system best, especially if you are free to architect the whole software stack towards that goal. In that case, your statements might apply.
It is about how easily a programmer, who deals with a certain subtask in a system, can utilize more cores for the this task. Not talking about supercomputing, but looking at a smarktphone or a typical PC. There you usually have most cores just idle unused, but if the user triggers an action, you want to be able to use as many cores as it speeds up computation. Language support for parallelism makes a huge difference there. In Go I can write a function to do a certain computation and quite often it is trivial to spread several calls across goroutines.
You are not factoring in the cost of context switches, and that many user applications today are memory-bound and not CPU-bound.
It's one of the secrets exploited by the M1 chip, seen in how many more cache lines the CPU's LFB can fill concurrently compared to Intel chips and that these are now 128 byte cache lines instead of 64 byte cache lines.
Which context switches? With the Go model, I have exactly one thread per CPU, no context switches. And if you are memory-bound, why have more CPUs?
But sure, there is a reason why the M1 has so stellar performance, it has one of the fastest single-thread performances and many applications do not manage to load more than 4 cores for common tasks - which partially is also a consequence of doing that is difficult in many programming languages, but easy in some, which are only slowly gaining traction.
> Which context switches? With the Go model, I have exactly one thread per CPU, no context switches.
Not in the user application model you were describing. Those threads would need to coordinate and communicate (for example, back to the user interface), and that implies context switches.
However, for independent processes, each additional CPU adds memory bandwidth (according to the NUMA model) because there's a concurrency limit to each CPU's LFB that puts an upper bound of 6 GB/s on filling cache lines for cache misses (even if the bandwidth of your memory system is actually much higher): https://www.eidos.ic.i.u-tokyo.ac.jp/~tau/lecture/parallel_d...
Looks like you're compiling C code with -O2. Does Rust build set -O3 on clang? Did you try -O3 with C? I know it's not guaranteed to be faster, just curious.
I completely agree with the points made here, it matches my experience as a C coder who went all-in on Rust.
>"Clever" memory use is frowned upon in Rust. In C, anything goes. For example, in C I'd be tempted to reuse a buffer allocated for one purpose for another purpose later (a technique known as HEARTBLEED).
Ha!
>It's convenient to have fixed-size buffers for variable-size data (e.g. PATH_MAX) to avoid (re)allocation of growing buffers. Idiomatic Rust still gives a lot control over memory allocation, and can do basics like memory pools, combining multiple allocations into one, preallocating space, etc., but in general it steers users towards "boring" use or memory.
Since I write a lot of memory-constrained embedded code this actually annoyed me a bit with Rust, but then I discovered the smallvec crate: https://docs.rs/smallvec/1.5.0/smallvec/
Basically with it you can give your vectors a static (not on the heap) size, and it will automatically reallocate on the heap if it grows beyond that bound. It's the best of both world in my opinion: it lets you remove a whole lot of small useless allocs but you still have all the convenience and API of a normal Vec. It might also help slightly with performance by removing useless indirections.
Unfortunately this doesn't help with Strings since they're a distinct type. There is a smallstring crate which uses the same optimization technique but it hasn't been updated in 4 years so I haven't dared use it.
In case somebody stumbles upon this conversation in the future: I just migrated a project to use smartstrings and it works a bit differently from smallvec. Smallvec lets you decide how big you want to make the static buffer before it allocates, whereas smartstring's static buffer size is alsways `size_of::<String>() - 1`, that is 23 bytes on 64 bit architectures and 11 on 32 bits. If I want, say, a static 128B string smartstring won't do any better than std::string::String.
It's still a very nice lib though, and a smart optimization, but it doesn't cover all of my use cases for small string buffers.
This entire article is nonsense. To a first approximation, the speed of your program in 2021 is determined by locality of memory access and overhead with regard to allocation and deallocation. C allows you to do bulk memory operations, Rust does not (unless you turn off the things about Rust that everyone says are good). Thus C is tremendously faster.
There is this habit in both academia and industry where people say "as fast as C" and justify this by comparing to a tremendously slow C program, but don't even know they are doing it. It's the blind leading the blind.
The question you should be asking yourself is, "If all these claims I keep seeing about X being as fast as Y are true, then why does software keep getting slower over time?"
(If you don't get what I am saying here, it might help to know that performance programmers consider malloc to be tremendously slow and don't use it except at startup or in cases when it is amortized by a factor of 1000 or more).
> To a first approximation, the speed of your program in 2021 is determined by locality of memory access and overhead with regard to allocation and deallocation.
I wouldn't call that a first approximation. Take ripgrep as an example. In a checkout of the Linux kernel with everything in my page cache:
$ time rg zqzqzqzq -j1
real 0.609
user 0.315
sys 0.286
maxmem 7 MB
faults 0
$ time rg zqzqzqzq -j8
real 0.116
user 0.381
sys 0.464
maxmem 9 MB
faults 0
This alone, to me, says "to a first approximation, the speed of your program in 2021 is determined by the number of cores it uses" would be better than your statement. But I wouldn't even say that. Because performance is complicated and it's difficult to generalize.
Using Rust made it a lot easier to parallelize ripgrep.
> C allows you to do bulk memory operations, Rust does not (unless you turn off the things about Rust that everyone says are good). Thus C is tremendously faster.
Talk about nonsense. I do bulk memory operations in Rust all the time. Amortizing allocation is exceptionally common in Rust. And it doesn't turn off anything. It's used in ripgrep in several places.
> There is this habit in both academia and industry where people say "as fast as C" and justify this by comparing to a tremendously slow C program, but don't even know they are doing it. It's the blind leading the blind.
I've never heard anyone refer to GNU grep as a "tremendously slow C program."
> The question you should be asking yourself is, "If all these claims I keep seeing about X being as fast as Y are true, then why does software keep getting slower over time?"
There are many possible answers to this. The question itself is so general that I don't know how to glean much, if anything, useful from it.
> This alone, to me, says "to a first approximation, the speed of your program in 2021 is determined by the number of cores it uses" would be better than your statement. But I wouldn't even say that.
You chose an embarrassingly parallel problem, which most programs are not. So you cannot generalize this example across most software. When you try to parallelize a structurally complicated algorithm, the biggest issue is contention. I was leaving this out because it really is a 2nd order problem -- most software today would get faster if you just cleaned up its memory usage, than if you just tried to parallelize it. (Of course it'd get even faster if you did both, but memory is the E1).
> There are many possible answers to this.
How come so few people are concerned with the answers to that question and which are true, but so many people are concerned with making performance claims?
Well, I mean, you chose an embarrassingly general statement to make? Play stupid games, win stupid prizes.
> which most programs are not
Programs? Or problems? Who says? It's not at all obvious to me that it's true. And even if it were true, "embarrassingly parallel" problems are nowhere close to uncommon.
> When you try to parallelize a structurally complicated algorithm, the biggest issue is contention.
With respect to performance, I agree.
> How come so few people are concerned with the answers to that question and which are true, but so many people are concerned with making performance claims?
The question is itself flawed. Technology isn't fixed. We "advance" and try to do more stuff. This is not me saying, "this explains everything." Or even that "more stuff" is a good thing. This is me saying, "there's more to it than your over-simplifications."
If you do not understand that "embarrassingly parallel" is a technical term and that it's generally understood that most programs are not easily parallelizable, there is not a discussion we can have here.
Really feels like you're just digging yourself deeper into a hole here:
Burntsushi began with "here, parallelization is beating out memory locality and optimization in its impact," but explicitly declined to generalize this the way you generalized your claim about memory.
He further pointed out that ripgrep is fast not just because of parallelization, but also because of how it handles memory.
Then you come back with "you can't always parallelize this well" (which burntsushi agreed with from the beginning) and "you also need to deal with memory" (which ripgrep does)? How is this burntsushi's problem with understanding "embarrassingly parallel" and not your problem with understanding Rust?
I agree that a discussion is difficult. Your comments are so vague and generalized that it's not clear what you're talking about at all. Bring something more specific to the table like the OP did instead of pontificating on generalities.
Thanks for making The Witness, Jonathan. It's one of my favorite games of all time and an exemplar of what it means to work through the consequences of logical axioms.
Makes me all the more sad that you're consistently unable to work through the consequences of Rust's axioms.
I don't disagree that memory access is nowadays critical for speed, but I haven't found Rust standing in the way of optimizing it.
As I've pointed out in the article, Rust does give you precise control over memory layout. Heap allocations are explicit and optional. In safe code. You don't even need to avoid any nice features (e.g. closures and iterators can be entirely on stack, no allocations needed).
Move semantics enables `memcpy`ing objects anywhere, so they don't have a permanent address, and don't need to be allocated individually.
In this regard Rust is different from e.g. Swift and Go, which claim to have C-like speed, but will autobox objects for you.
Bulk operations are not really about layout, they are about whether you mentally consider each little data structure to be an individual entity with its own lifetime, or not, because this determines what the code looks like, which determines how fast it is. (Though layout does help with regard to cache hits and so forth).
I don't know what you're trying to imply that Rust does, but I'll reiterate that Rust lifetimes don't exist at code generation time. They're not a runtime construct, they have zero influence over what code does at run time (e.g. mrustc compiler doesn't implement lifetimes, but bootstraps the whole Rust compiler just fine).
If you create `Vec<Object>` in Rust, then all objects will be allocated and laid out together as one contiguous chunk of memory, same as `malloc(sizeof(struct object) * n)` in C. You can also use `[Object; N]` or ArrayVec that's is identical to `struct object arr[N]`. It's also possible to use memory pools/arenas.
And where possible, LLVM will autovectorize operations on these too. Even if you use an iterator that in source code looks like it's operating on individual elements.
Knowing your other work I guess you mean SoA vs AoS? Rust doesn't have built-in syntax for these, but neither does C that we're talking about here.
> They're not a runtime construct, they have zero influence over what code does at run time (e.g. mrustc compiler doesn't implement lifetimes, but bootstraps the whole Rust compiler just fine).
This kind of reasoning seems like it makes sense, but actually it is false. ("Modern C++" people make the same arguments when arguing that you should use "zero-cost abstractions" all over the place). Abstractions determine how people write code, and the way they write the code determines the performance of the code.
When you conceptualize a bunch of stuff as different objects with different lifetimes, you are going to write code treating stuff as different objects with different lifetimes. That is slow.
> If you create `Vec<Object>` in Rust, then all objects will be allocated and laid out together as one contiguous chunk of memory
Sure, and that covers a small percentage of the use cases I am talking about, but not most of them.
> When you conceptualize a bunch of stuff as different objects with different lifetimes, you are going to write code treating stuff as different objects with different lifetimes. That is slow.
This is not how lifetimes work at all. In fact this sounds like the sort of thing someone who has never read or written anything using lifetimes would say: even the most basic applications of lifetimes go beyond this.
Fundamentally, any particular lifetime variable (the 'a syntax) erases the distinctions between individual objects. Rust doesn't even have syntax for the lifetime of any individual object. Research in this area tends to use the term "region" rather than "lifetime" for this reason.
Lifetimes actually fit in quite nicely with the sorts of things programs do to optimize memory locality and allocations.
> Sure, and that covers a small percentage of the use cases I am talking about, but not most of them.
Fortunately the other stuff you are talking about works just fine in Rust as well.
Rust's flavor of RAII is different from C++'s, because Rust doesn't have constructors, operator new, implicit copy constructors, and doesn't expose moved-out-of state.
Rust also has "Copy" types which by definition can be trivially created and can't have destructors. Collections take advantage of that (e.g. dropping an array doesn't run any code).
So I don't really get what you mean. Rust's RAII can be compiled to plain C code (in fact, mrustc does exactly that). It's just `struct Foo foo = {}` followed by optional user-defined `bye_bye(&foo)` after its last use (note: it's not free/delete, memory allocator doesn't have to be involved at all).
I suspect you're talking about some wider programming patterns and best practices, but I don't see how that relates to C. If you don't need per-object init()/deinit(), then for the same you wouldn't use RAII in Rust either. RAII is an opt-in pattern.
RAII is completely orthogonal to lifetimes, for one thing. You can have either without the other.
But, I am familiar with the kind of thing you're complaining about here, and frankly the mere existence of RAII is not its cause. Working with a large dataset, managing allocation/layout/traversal in a holistic way, you just... don't write destructors for every tiny piece. It works fine, I do it all the time (in both Rust and C++).
You haven't really explained in any detail what is slow about "treating stuff as objects with different lifetimes", and specifically how Rust differs there from C. Can you give an example?
Maybe you'd be interested to hear that Rust's borrow checker is very friendly to the ECS pattern, and works with ECS much better than with the classic OOP "Player extends Entity" approach.
> (If you don't get what I am saying here, it might help to know that performance programmers consider malloc to be tremendously slow and don't use it except at startup or in cases when it is amortized by a factor of 1000 or more).
Rust is now getting support for custom local allocators ala C++, including in default core types like Box<>, Vec<> and HashMap<>. It's an unstable feature, hence not yet part of stable Rust but it's absolutely being worked on.
I guess I am confused by the question. The job of the borrow checker is to constrain what you are allowed to do, and it's well-understood that it constrains you to a subset of correct programs, so that you stay in a realm that is analyzable.
Sure, but the borrow checker only operates on references. Rust gives you the tools to work with raw everything, if you dip into unsafe. Memory allocators, doing this kind of low-level thing, don't work with references. Let's say you want to implement a global allocator (this is the only current API in stable Rust, non-global allocators are on the way). The trait you use, which gives you the equivalent of malloc/free, has this signature:
Note the *mut u8 rather than say, &mut u8. Most people would not be using this interface directly, they'd be using a data structure that uses it internally.
Now, there's a good argument to be had about safe and unsafe, how much you need, in what proportion, and in what kinds of programs... but when you say things like "C allows you to do bulk memory operations, Rust does not" and ask about the borrow checker when talking about allocators, to someone who is familiar with Rust's details, it seems like you are misinformed somehow, which makes it really hard to engage constructively with what you're saying.
I'll try to further bridge some of the understanding gap.
People in this thread keep talking about "arena allocators" as if they are special things that you would use a few times (Grep Guy said this above, for example), or, here you imply they would be used internally to data structures, in a way that doesn't reach out to user-level.
That makes them not nearly as useful as they can be!
The game we are working on is currently 100k lines (150k lines if you count comments etc), and almost all allocations in the entire program are of this bulk type, in one way or another. Like if I want to copy a string to modify it a little bit to then look up a bitmap or make a filename, those are temporary allocations at user level and are not wrapped by anything. The string type is not some weird heavyweight C++ std::string kind of thing that wraps a bunch of functionality, it is just a length and a data pointer.
So the proposal to use unsafe in this kind of context doesn't make sense, since then you are putting unsafe everywhere in the program, which, then, why pretend you are checking things?
You can say, "well you as the end-user shouldn't be doing this stuff, everything should be wrapped in structures that were written by someone smarter than you I guess," but that is just not the model of programming that I am doing.
I understand how you can think the statement "Rust does not (allow you to do bulk memory operations)" is false, but when I say this, part of what I am including in "bulk memory operations" is the ability (as the end user) to pretend like you are in a garbage-collected language and not worry about the lifetime of your data, without having to take the performance penalty of using a garbage-collected language. So if you add back in worrying about lifetimes, it's not the same thing.
If you think "bulk memory allocation" is like, I have this big data structure that manages some API and it has some linked lists, and instead of allocating those nodes on the heap I get them from an arena or pool managed by the bigger structure ... that's fine, it is better than not doing it, but it doesn't help the end user write simpler code, and in practical terms it means that most of the allocations in the program are going to be non-bulk, because there's just too much friction on doing them broadly.
If it helps, I can revise my statement to "Rust enables you to do certain kinds of internal bulk memory allocation, but using bulk allocation broadly and freely across your program goes against the core spirit of the language" ... that sounds pretty uncontroversial? Then to bring it back to the original post, I would say, "This kind of broad use of bulk allocation is important for high performance and simplicity of the resulting code."
One last note, I am pretty tired of the "you don't understand Rust, therefore you are beneath us" line that everyone in the Rust community seems to deploy with even the slightest provocation -- not just when responding to me, but to anyone who doesn't just love Rust from top to bottom. Really it makes me feel that the only useful thing to do is just ignore Rust folks and go do useful things instead. I know I am not the only person who feels this way.
> People in this thread keep talking about "arena allocators" as if they are special things that you would use a few times (Grep Guy said this above, for example)
I didn't say anything about arena allocators. What I said was that amortizing allocation was routine and commonplace in ripgrep's code. I definitely wouldn't say that amortizing allocation is "special" or something I use a "few" times. As one example of amortizing allocs, it's very common to ask the caller for some memory instead of allocating memory yourself.
> I am pretty tired of the "you don't understand Rust, therefore you are beneath us" line that everyone in the Rust community seems to deploy with even the slightest provocation
Kind of like opening a comment with "This entire article is nonsense." Right? Snubbing your nose and then getting miffed by the perception of others snubbing their nose at you is a bunch of shenanigans. And then you snub your nose at pretty much everyone: "It's the blind leading the blind." I mean, c'mon dude.
The problem with your comments is that they lack specifics. Even after this comment where you've tried to explain, it's pretty hard for me to understand what you're getting at. I suspect part of the problem is your use of this term "bulk allocation." Is it jargon that refers to the specific pattern you have in mind? Because if it is, I can't find it documented anywhere after a quick search. If it's not jargon, then "bulk allocation" could mean a lot of things, but you clearly have a very specific variant of it in mind.
It's clear to me that your argument is a very subtle one that requires nuance and probably lots of code examples to get the point across. Going about this at the other end---with lots of generalities and presumptions---just seems like a fool's errand.
This style of memory management can be as pervasive as you like! You are reading way more detail out of people's comments than they put there, and then getting upset about your misinterpretation.
If every throwaway string in your program comes from an arena that you clear later, great! Rust won't stop you, or even force you to use unsafe every time you build one. The unsafe code goes in a "give me a fresh chunk of temporary memory" function, and that function is safe to call all over the place: unsafe-in-a-safe-function is a common pattern for extending the set of analyzable programs.
(It's also worth pointing out that Rust's primitive string type is "just a length and a data pointer," so once you've allocated one out of an arena like this, you can do all the nice built-in string-y things with it, with no std::string-like interference.)
The Rust compiler itself uses this sort of bulk memory all the time. It's not limited to the internals of data structures there- it's spread across larger phases and queries of its operation, with all kinds of stuff allocated the same way.
Now, to be fair, this is not the default- e.g. Rust's standard library of collections don't participate. But this is why everyone keeps mentioning custom allocators to you- there is ongoing work to extend these collections with the ability to control how they perform their allocation!
> One last note, I am pretty tired of the "you don't understand Rust, therefore you are beneath us" line that everyone in the Rust community seems to deploy with even the slightest provocation -- not just when responding to me, but to anyone who doesn't just love Rust from top to bottom.
You would get this kind of reaction a lot less often if you didn't make vague or nonsense claims about it so often.
Okay, but if I do this everywhere, then I de facto don't have memory safety. Why, then should I use Rust and pretend like I am getting memory safety? Why wouldn't I use a lower-friction language with a faster compiler?
It looks to me like the Rust community has this weird way of wanting to have its cake, and eat it too, about memory. Y'all want to advertise how important memory safety is, how great it is to have, and so forth. Then in cases like this, it's always "oh but you just use unsafe, it's fine". These stories are mutually inconsistent. Either you have memory safety or you don't. Paying the cost that Rust makes programmers pay for memory safety, and then not actually getting memory safety, is the worst of both worlds.
Then when you guys say I am making nonsense claims because of course you can have your cake and also eat it as long as you use the Rust programming language, well, it's just pretty weird at that point.
Memory safety is not some sort of binary thing which you either have or you don't. All memory safe environments are built on a foundation of unsafe code. For example, Java being memory safe assumes the JVM or JNI code doesn't have any memory safety bugs.
What Rust does is reduce the amount of code that's memory unsafe, that needs to be triply reviewed and audited. Reduction of the scope of high-scrutiny code is the single most leveraged thing that can be done to improve code quality in a large, long-running project. Why? Because it lets you do careful, time-consuming analysis on a small part of your codebase (the bits that are marked unsafe), then scale the analysis up to the rest of your code.
> These stories are mutually inconsistent. Either you have memory safety or you don't.
This is... what can I say. This is simply incorrect. It pains me to say this as a fan of your games but you really don't seem to have any idea what you're talking about.
> Okay, but if I do this everywhere, then I de facto don't have memory safety.
No, that's not how this works. You write the unsafe code in one place and make sure it's correct (just like you'd do in C or Jai), and then you wrap it in a function signature that lets the compiler apply its memory safety checks to all the places that call it (this is what Rust gives you over C).
This is still a meaningful improvement to memory safety over C. The compiler checks the majority of your program; if you still see memory bugs now you only have a small subset to think about when debugging them.
This is also not very different from a hypothetical language with your "actual" memory safety- in that case, you still have to consider the correctness of the compiler checks themselves. Rust just takes a more balanced and flexible approach here and moves some of that stuff out of the compiler. (In fact this simplifies the compiler, which increases confidence in its correctness...)
Rust has been very clear about all this from the beginning. If you are still reading people's claims about Rust memory safety a different way, that's on you.
> Why wouldn't I use a lower-friction language with a faster compiler?
That's totally up to you! I don't have a problem with people using other languages for these kinds of reasons. My goal here is not to convert you, but to cover more accurately what Rust is and isn't capable of. (At the root of this thread, that's things like "writing fast software with effective memory management styles.")
They are the opposite of meaningless. This is just straight-up incorrect, both in theory and in practice.
Please take some time and think about this a bit more. Please think about how code review processes work, how audits work, how human attention spans work. Please think about how people endlessly nitpick small PRs but accept large ones with few comments. What unsafe does is make it easy to spot the small bits of critical code to nitpick while not having to worry about safety for the rest.
They're conditionally meaningful: if a small amount of your program is correct, the entire program satisfies some useful properties.
This may or may not be something you care about, but it is certainly a meaningful tool that is quite useful to me, including when I use your type (3) style described in the sibling thread.
> here you imply they would be used internally to data structures, in a way that doesn't reach out to user-level.
Ah! I think I am understanding you a bit better. The thing is, ultimately, Rust is as flexible as you want it to be, and so there are a variety of options. This can make it tricky, when folks are talking about slightly different things, in slightly different contexts.
When you say "doesn't reach out to user level," what I mean by what I said was that users don't generally call alloc and dealloc directly. Here, let's move to an actual concrete example so that it's more clear. Code is better than words, often:
use bumpalo::{Bump, boxed::Box};
struct Point {
x: i32,
y: i32,
}
fn main() {
let bump = Bump::with_capacity(256);
let c = Box::new_in(Point { x: 5, y: 6 }, &bump);
}
This is using "bumpalo", a very straightforward bump allocator. As a user, I say "hey, I want an arena backed by 256 bytes. Please allocate this Point into it, and give me a pointer to it." "c" here is now a pointer into this little heap it's managing. Because my points are eight bytes in size, I could fit 32 points here. Nothing will be deallocated until bump goes out of scope.
But notably, I am not using any unsafe here. Yes, I am saying "give me an allocation of this total size", and yes I am saying "please allocate stuff into it and give me pointers to it," but generally, I as a user don't need to mess with unsafe unless I'm the person implementing bumpalo. And sometimes you are! Personally, I work in embedded, with no global heap at all. I end up using more unsafe than most. But there's no unsafe code in what I've written above, but it's still gonna give you something like what you said you're doing in your current game. Of course, you probably want something more like an arena, than a pure bump allocator. Those exist too. You write 'em up like you would anything else. Rust will still make sure that c doesn't outlive bump, but it'll do that entirely at compile time, no runtime checks here.
Oh, and this is sorta random but I didn't know where to put it: Rust's &str type is a "pointer + length" as well. Using this kind of thing is extremely common in Rust, we call them "slices" and they're not just for strings.
> You can say, "well you as the end-user shouldn't be doing this stuff, everything should be wrapped in structures that were written by someone smarter than you I guess," but that is just not the model of programming that I am doing.
While that's convenient, and in this case, I am showing that, the point is that it's about encapsulation. I don't have to use this existing allocator if I wanted to write something different. But because I can encapsulate the unsafe bit, no matter who is writing it, I need to pay attention in a smaller part of my program. Maybe I am that person, maybe someone else is, but the benefit is roughly the same either way.
> So if you add back in worrying about lifetimes, it's not the same thing.
To be super clear about it, Rust has raw pointers, that are the same as C. No lifetimes. If you want to use them, you can. The vast, vast, vast majority of the time, you do not need the flexibility, and so it's worth giving it up for the compile time checks.
> If you think "bulk memory allocation" is like...
It's not clear to me above if the API I'm talking about above is what you mean here, or something else. It's not clear to me how you'd get simpler than "please give me a handle to this part of the heap," but I haven't seen your latest Jai streams. I am excited to give it a try once I am able to.
> but using bulk allocation broadly and freely across your program goes against the core spirit of the language
I don't know why you'd think these techniques are against the core spirit of the language. Rust's primitive array type is literally "give me N of these bits of data laid out next to each other in memory." We had a keynote at Rustconf about how useful generational arenas are as a technique in Rust. As a systems language, Rust needs to give you the freedom to do literally anything and everything possible.
> One last note, I am pretty tired of the "you don't understand Rust, therefore you are beneath us"
To be clear, I don't think that you or anyone else is "beneath us," here. What I want is informed criticism, rather than sweeping, incorrect statements that lead people to believe things that aren't true. Rust is not perfect. There are tons of things we could do better. But that doesn't mean that it's not right to point out when facts are different than the things that are said. You of all people seem to appreciate a forward communication style.
> rather than sweeping, incorrect statements that lead people to believe things that aren't true
I agree, and if I say things that are incorrect, then I definitely want to fix them, because I value being correct.
But what I am meeting in this thread is people wanting to do some language-lawyer version of trying to prove I am incorrect, without addressing the substance of what I am actually saying. I think your replies have been the only exception to this (and only just).
I realize my original posting was pretty brusque, but, the article was very bad and I am very concerned with the ongoing deterioration of software quality, and the hivemind responses to articles like this on HN, I think, are part of the problem.
I know that Rust people are also concerned with software quality, and that's good. I just think most of Rust's theories about what will help, and most of the ways these are implemented semantically, are just wrong.
So if something I am saying doesn't seem to make sense, or seems "incorrect", well, maybe it's that I am just coming from a very different place in terms of what good programming looks like. The code that I write just looks way different from the code you guys write, the things I think about are way different, etc. So that probably makes communication much harder than it otherwise would be, and makes it much easier to misunderstand things.
On the technical topic being discussed here...
Using a bump allocator in the way you just did, on the stack for local code that uses the bump allocator right there, is semantically correct, but not a very useful usage pattern. In a long-running interactive application, that is being programmed according to a bulk allocation paradigm that maybe is "data oriented" or whatever the kids call it these days, there are pretty much 4 memory usage patterns that you ever care about:
(1) Data baked into the program, or that is initialized so early at startup that you don't have to worry about it. [This is 'static in Rust].
(2) Data that probably lives a long time, but not the whole lifetime of the program, and that will be deallocated capriciously at some point. (For example, an entry in a global state table).
(3) Data that lasts long enough that local code doesn't have to care about its lifetime, but that does not need to survive long-term. For example, a per-frame temporary arena, or a per-job arena that lasts the lifetime of an asynchronous task.
(4) Data that lives on the stack, thus that can't ever be used upward on the stack.
Now, the thing is that category (3) was not really acknowledged in a public way for a long time, and a lot of people still don't really think of it as its own thing. (I certainly didn't learn to think about this possibility in CS school, for example). But in cases of dynamic allocation, category (3) is strictly superior to (4) -- because it's approximately as fast, and you don't have to worry about your alloca trying to survive too long. You can whip up a temporary string and just return it from your function and nobody owns it and it's fine. So having your program really lean on (3) in a big way is very useful. This is what I was saying before about pretending to have a garbage collector, but you don't pay for it.
So if you are doing a fast data-oriented program (I don't really use the term "data-oriented" but I will use it here just for shorthand), dynamic allocations are going to be categories 1-3, and (4) is just for like your plain vanilla local variables on the stack, but these are so simple you just don't need to think about them much.
Insofar as I can tell, all this lifetime analysis stuff in Rust is geared toward (4). Rust wants you to be a good RAII citizen and have "resources" owned by authoritative things that drop at very specific times. (The weird thing about "resources" is that in reality this almost always means memory, and dealing with memory is very very different from dealing with something like a file descriptor, but this is genericized into "resources", which I think is generally a big mistake that many modern programming language people make).
With (1), you don't need any lifetime checking, because there is no problem. With (2), well, you can leak and whatever, but this is sort of just your problem to make sure it doesn't happen, because it is not amenable to static analysis. With (3), you could formalize a lifetime for it, but it is just one quasi-global lifetime that you are using for lots of different data, so this by definition cannot do very much work for you. You could use it to avoid setting a global to something in category (3), and that's useful to a degree, but in reality this problem is not hard to catch without that, and it doesn't seem worth it to me in terms of the amount of friction required to address this problem. Then there is (4), which, if you are not programming in RAII style, you don't really need checking very much??, because everything there is simple, and anyway, the vast majority of common stack violations are statically detectable even in C (the fact that C compilers did not historically do this is really dumb, and has been a source of much woe, but like, it is very easy to detect if you return a pointer to a local from a function, for example. Yes, this is not thorough in the way Rust's lifetime checking is, and this class of analysis will not catch everything Rust does, but honestly it will catch most of it, at no cost to the programmer).
So when I said "Rust does not allow you to do bulk memory allocation" what I am saying is, the way the language is intended to be used, you have most of your resources being of type (4), and it prevents you from assigning them incorrectly to other resources of type (4) but that have shorter lifetimes, or to (2) or (1).
But if almost everything in (4) is so simple you don't take pointers to it and whatnot, and if most of your resources are (3), they have the same lifetime as each other, all over the place, so there is no use checking them against each other. So now the only benefit you are getting is ensuring that you don't assign (3) to (2) or (1). But the nature of (3) is such that it is reset from a centralized place, so that it is easy, for example, to wipe the memory each frame with a known pattern to generate a crash if something is wrong, or, if you want something more like an analytical framework, to do a Boehm-style garbage collector thing on your heap (in Debug/checked builds only!) to ensure that nothing points into this space, which is a well-defined and easy thing to do because there is a specific place and time during which that space is supposed to be empty.
So to me "programming in Rust" involves living in (4) and heavily using constructors and destructors, whereas I tend to live in (3) and don't use constructors or destructors. (I do use initializers, which are the simple version of constructors where things can be assigned to constant values that do not require code execution and do not involve "resources" -- basically, could you memcpy the initial value of this struct from somewhere fixed in memory). Now the thing that is weird is that maybe "programming in Rust" has changed since last time I argued with Rust people. It seems that it used to be the sentiment that one should minimize use of unsafe, that it should just be for stuff like lockfree data structure implementation or using weird SIMD intrinsics or whatever, but people in this thread are saying, no man, you just use unsafe all over the place, you just totally go for it. And with regard to that, I can just say again what I said above, that if your main way of using memory is some unsafe stuff wrapped in a pretend safe function, then the program does not really have the memory safety that it is claiming it does, so why then be pretending to use Rust's checking facilities? And if not really using those, why use the language?
So that's what I don't get here. Rust is supposed to be all about memory safety ... isn't it? So the "spirit of Rust" is something about knowing your program is safe because the borrow checker checked it. If I am intentionally programming in a style that prevents the borrow checker from doing its job, is this not against the spirit of the language?
I'll just close this by saying that one of the main reasons to live in (3) and not do RAII is that code is a lot faster, and a lot simpler. The reason is because RAII encourages you conceptualize things as separately managed when they do not need to be. This seems to have been misunderstood in many of the replies above, as people thinking I am talking about particular features of Rust lifetimes or something. No, it is RAII at the conceptual level that is slow.
> We had a keynote at Rustconf about how useful generational arenas are as a technique in Rust.
If that's the one I am thinking of, I replied to it at length on YouTube back in 2018.
Thanks for expanding on that. Now I think I get what you mean.
Rust does case (3) with arenas. In your frame loop you'd create or reset an arena and then borrow it. That would limit lifetime of its items to a single loop iteration.
The cost would be only in a noisy syntax with `'arena` in several places, but other than that it compiles to plain pointers with no magic. Lifetimes are merely compile-time assertions for a static analyzer. Note that in Rust borrowing is unrelated to RAII and theoretically separate from single ownership.
Rust's `async fn` is one case where a task can hold all of the memory it will need, as one tagged union.
As for unsafe, the point is in encapsulating it in safe abstractions.
Imagine you're implementing a high level sandbox language. Your language is meant to be bulletproof safe, but your compiler for that language may is in C. The fact that the compiler might do unsafe things doesn't make the safe language pointless.
Rust does that, but the safe language vs unsafe compiler barrier is shifted a but, so that users can add "compiler internals" themselves for the safe language side.
eg. `String` implementation is full of unsafe code, but once that one implementation has been verified manually to be sound, and hidden behind a safe API, nobody else using it can screw it up when eg concatenating strings.
I know it seems pointless if you can access unsafe at all, so why bother? But in practice it really helps, for reasons that are mostly social.
* There are clear universal rules about what is a safe API. That helps review the code, because the API contract can't be arbitrary or "just be careful not to…". It either is or isn't, and you mark it as such. Not everything can be expressed in terms of safe APIs, but enough things can.
* Unsafe parts can be left to be written by more experienced devs, or flagged for more thorough review, or fuzzed, etc. Rust's unsafe requires as much diligence as equivalent C code. The difference is that thanks to encapsulation you don't need to write the whole program with maximum care, and you know where to focus your efforts to ensure safety. You focus on designing safe abstraction once, and then can rely on the compiler upholding it everywhere else.
> So if something I am saying doesn't seem to make sense, or seems "incorrect", well, maybe it's that I am just coming from a very different place in terms of what good programming looks like.
I do think this is probably true, and I know you do care about this! The thing is...
> The code that I write just looks way different from the code you guys write, the things I think about are way different, etc.
This is also probably true! The issue comes when you start describing how Rust code must be or work. There's nothing bad about having different ways of doing things! It's just that when you say things like "since then you are putting unsafe everywhere in the program," when that's empirically not what happens in Rust code.
> Using a bump allocator in the way you just did, on the stack for local code that uses the bump allocator right there, is semantically correct, but not a very useful usage pattern.
Yes. I thought going to the simplest possible thing would be best to illustrate the concept, but you're absolutely right that there is a rich wealth of options here.
Rust handles all four of these cases, in fairly straightforward ways. I also agree that 3 isn't often talked about as much as it should be in the broader programming world. I also have had this hunch that 3 and 4 are connected, given that the stack sometimes feels like an arena for just the function and its children in the call graph, and that it has some connection to the young generation in garbage collectors as well, but this is pretty offtopic so I'll leave it at that :)
Rust doesn't care just about 4 though! Lifetimes handle 3 as well; they ensure that the pointers don't last longer than the arena lives. That's it.
I don't have time to dig into this more, but I do appreciate you elaborating a bit here. It is very helpful to get closer to understanding what it is you're talking about, exactly. I think I see this differently than you, but I don't have a good quick handle on explaining exactly why. Some food for thought though. Thanks.
(Oh, and it is the one you're thinking of; I forgot that you had commented on that. My point was not to argue that the specifics were good, or that your response was good or bad, just that different strategies for handling memory isn't unusual in Rust world.)
> they ensure that the pointers don't last longer than the arena lives. That's it.
Sure, but my point is, when most things have lifetimes tied to the same arena, this becomes a almost a no-op. Both in the sense of, you are not really checking much (as Ayn Rand said, 'a is 'a), and you're paying a lot in terms of typing stuff into the program, and waiting around for the compiler to be not usefully checking all these things that are the same. Refactoring a program so that most things' lifetimes are the same does not feel to me like it's in the spirit of Rust, because then why have all these complicated systems, but maybe you feel that it is.
There is a bit of a different story when you are heavily using threads, because you want those threads to have allocators that are totally decoupled from each other (because otherwise waiting on the allocator becomes a huge source of inefficiency). So then there are more lifetimes. But here I am not convinced about the Rust story either, because here too I think there are simpler things to do that give you 95% of the benefit and are much lower-friction.
(And I will admit here that "Rust doesn't allow you to X", as I said originally, is not an accurate statement objectively. Attempting to rephrase that objection to be better, I would say, by the time you are doing all this stuff, you are outside the gamut that Rust was designed for, so by doing that program in Rust you are taking a lot of friction, but not getting the benefit given to someone who stays inside that gamut, so, it seems like a bad idea.)
> Refactoring a program so that most things' lifetimes are the same does not feel to me like it's in the spirit of Rust, because then why have all these complicated systems, but maybe you feel that it is.
I think a common sentiment among Rust programmers would instead phrase this as, "the complicated system we designed to find bugs keeps yelling at us when we have lots of crazy lifetimes flying around, so presumably designs that avoid them might be better."
In this sense, even for someone who doesn't feel the borrow checker is worth it in their compiler, this can just be taken as a general justification for designs in any language that have simpler lifetime patterns. If they're easier to prove correct with static analysis, maybe they're easier to keep correct by hand.
A comparison between Rust and modern C++ would be more interesting in my opinion. It seems that those languages are closer in the design goal space than either is to C.
Agreed, came here to say the same thing. Would be interesting to see how they stack up against each other. Both are highly evolved modern languages that make pretty much the same claims.
> Rust can't count on OSes having Rust's standard library built-in, so Rust executables bundle bits of the Rust's standard library (300KB or more). Fortunately, it's a one-time overhead.
I remember making an argument on a mailing list against using alloca on the grounds that there's usually a stack-blowing bug hiding behind it. As I revisited the few examples I remembered of it being used correctly, I strengthened my argument by finding more stack-blowing bugs hiding behind uses of alloca.
A few years ago I hand ported a skip list implementation that used inlined dynamic arrays from C to rust. (Like, the last entry of the struct was a dynamically sized Foo[];). I needed a scattering of unsafe{} blocks and a bunch of tricks to make the resulting rust code equivalent to C, in order to prevent extra allocations + memory fragmentation on the rust side.
When I ran my simple fuzz test in rust it seg faulted, crashing in 'safe' code. I thought for a moment there might be something wrong with the compiler (hahaha no). Sure enough, there was a bug in one of my far-too-clever unsafe blocks that was corrupting memory. Then that was in turn causing a crash later in the program's execution.
That was one of my first big "aha" moments for rust - in rust because segfaults (should be) impossible in safe code, I only needed to study the code in my ~30 lines of unsafe code to find the bug. (Compared to 150+ lines of regular code). I had some similar bugs when I wrote the C version earlier, and they took all day to track down because in C memory corruption can come from anywhere.
I don't tend to think of Rust as "portable assembly", and this is indeed one of the points where I think it differs the most from C. I think of "portable assembly" as being applicable to C, because it is some version of a "minimal" level of abstraction for a high-level language. Rust is very much a tool for abstraction, and one of the USPs of rust is that the compiler abstracts away the low-level details of memory management in a way which is not as costly as other automatic memory management strategies.
Maybe it's due to lack of experience, but with C code it's fairly easy to look at a block of code and imagine approximately which assembly would be generated. With highly abstract Rust code, like with template-heavy C++ code, I don't feel like that at all.
With a bit of experience you get the same in Rust.
Rust does not abstract away memory management. For example, it never heap allocates anything implicitly. It inserts destructors, but does so predictably at end of scopes, in a specified order.
Rust heavily uses iterators with closures, but these get aggressively inlined, and you can rely on them optimizing down to a basic loop. For code generation they're not too different from a fancy C macro.
Code 'bloat' is a bizarre metric to use for anything unless you're on a platform with incredibly constrained executable memory like an embedded device.
The fact that Rust specialises its generic code according to the type it's used with it not some inherent disadvantage of generics. That's what they're supposed to do. By choosing to not specialise, you're actively making the decision to make your code slower. Rust has mechanisms for avoiding generic specialisation. They're called trait objects and they work brilliantly.
When you use void* in your data structures in C, you're not winning anything when compared to Rust. You're just producing slower code that mimics the behaviour of Rust's trait objects, but more dangerously.
Code 'bloat' (otherwise known as 'specialising your code correctly to make it run faster') is not a reason to not use Rust in 2021, so please stop pretending that it is.
It's not that simple. While fully specializing everything wins microbenchmarks, as C++ has shown time and time again, it can easily lose performance in large applications. If fully specializing code saves a few branches in the hot loop, but also blows through all the L1i, it can easily be a huge net negative.
> Rust has mechanisms for avoiding generic specialisation. They're called trait objects and they work brilliantly.
As someone who uses a lot of rust, they are sort of the red-headed stepchild. As a minimum to make the properly usable, we need a way of passing one object with multiple different traits.
> As someone who uses a lot of rust, they are sort of the red-headed stepchild. As a minimum to make the properly usable, we need a way of passing one object with multiple different traits.
Unless TraitB is an auto trait, that isn't currently valid?
From the reference:
> Trait objects are written as the optional keyword dyn followed by a set of trait bounds, but with the following restrictions on the trait bounds. All traits except the first trait must be auto traits, there may not be more than one lifetime, and opt-out bounds (e.g. ?Sized) are not allowed.
The only one of those restrictions that is acceptable to have is the single lifetime one. All the others are seriously restricting. The devs seem to agree, but work on this aspect of rust is very slow, and people are arguing on how to implement it. (I, for one, feel very strongly that dyn TraitA + TraitB should have a size of 3 pointers. That is, no magic combining vtables, just every added trait adds a new pointer to vtable.)
> For example, in C I'd be tempted to reuse a buffer allocated for one purpose for another purpose later (a technique known as HEARTBLEED).
You can do that in Java (with byte arrays) or in Common Lisp, so what is the point here? It is not practice in Java, Lisp nor in C and C++.
> It's convenient to have fixed-size buffers for variable-size data (e.g. PATH_MAX) to avoid (re)allocation of growing buffers
This is because OS/Kernel/filesystem guarantee path max size.
> Idiomatic Rust still gives a lot control over memory allocation, and can do basics like memory pools, ... but in general it steers users towards "boring" use or memory.
The same is done by sane C libraries (e.g. glib).
> Every operating system ships some built-in standard C library that is ~30MB of code that C executables get for "free", e.g. a "Hello World" C executable can't actually print anything, it only calls the printf shipped with the OS.
printf is not shipped with the OS, but with libc runtime. It doesn't have to be runtime (author needs to learn why this libc runtime is shared library and not the usually statically linked library) and you can use minimal implementations (musl) if you want static binaries with minimal size.
So you are saying Rust doesn't call (g)libc at all and directly invoke kernel interrupts? Sure, you can avoid this print "overhead" in C with 3-4 lines of inline assembly, but, why?
> Rust by default can inline functions from the standard library, dependencies, and other compilation units.
So do C compiler.
> In C I'm sometimes reluctant to split files or use libraries, because it affects inlining and requires micromanagement of headers and symbol visibility.
Functions doesn't have to be in headers to be inlined.
> C libraries typically return opaque pointers to their data structures, to hide implementation details and ensure there's only one copy of each instance of the struct. This costs heap allocations and pointer indirections. Rust's built-in privacy, unique ownership rules, and coding conventions let libraries expose their objects by value, so that library users decide whether to put them on the heap or on the stack. Objects on the stack can can be optimized very aggressively, and even optimized out entirely.
WTF? Stopped reading after this.
I find this post a random nonsense and I'd urge author to read some serious C book.
And I find your comment to be a super-annoying combination of pedantic and mostly wrong. I'm not going to go through every example but just pick a few:
> > For example, in C I'd be tempted to reuse a buffer allocated for one purpose for another purpose later (a technique known as HEARTBLEED).
> You can do that in Java (with byte arrays) or in Common Lisp, so what is the point here? It is not practice in Java, Lisp nor in C and C++.
C is a really old language with ancient libraries that are still widely used even though they are simply bad by modern standards. For that reason, I roll my eyes when people say something is not practice in C or talk about "sane" C libraries. A big part of working with C is dealing with ancient insanity.
You can make much stronger statements about what is idiomatic in Rust (and to some extent Java) simply because it's newer and more cohesive.
> > It's convenient to have fixed-size buffers for variable-size data (e.g. PATH_MAX) to avoid (re)allocation of growing buffers
> This is because OS/Kernel/filesystem guarantee path max size.
I think you've got that backwards. There's an advertised max path size because people wanted to stick paths in fixed-size buffers rather than deal with dynamic allocation. PATH_MAX is fairly arbitrary considering that there are certainly ways of creating and opening files which have paths exceeding that limit. I found this doc talking about this: https://eklitzke.org/path-max-is-tricky
> printf is not shipped with the OS, but with libc runtime.
"The OS" doesn't mean "the kernel". Read...anything...even the lackluster wikipedia article about operating systems...and you'll see stuff like GUIs described as part of the OS. They (generally) don't mean those are in the kernel. You can also see this for example in the GNU GPL; they call out "system libraries", which certainly includes libc.
> So you are saying Rust doesn't call (g)libc at all and directly invoke kernel interrupts? Sure, you can avoid this print "overhead" in C with 3-4 lines of inline assembly, but, why?
Rust's own standard library uses libc's system call wrappers but not stdio. It has its own libraries for buffer management and formatting which provide the safety one would expect of Rust, know how to integrate with Rust's Display trait for formatting arbitrary Rust data structures, etc. You could call libc::printf yourself if you wanted to, but that's not idiomatic. I wrote some Rust code calling libc::vsnprintf just the other day, because I got a format string + va_list from C in a log callback.
Modern C, by Jens Gustedt is one of the best books on C that I have read. That said, I don't think it scratches the surface of backing up the parents claims––though if anyone know of such a text please let me know.
You'll be relieved to know I've been writing C code of varying quality for like fifteen years on and off. These days I get asked to write Go most of the time and we've gotten rid of the last C codebase we were maintaining a while ago, but I'm always down for discussion of strict aliasing rules or stupid preprocessor tricks.
"The C Programming Language" from K&R is something everyone should read, even if they are not fond of C.
"Expert C Programming" [1]. Not up to date, but written from a C compiler writer standpoint. A lot of references to why C (and libs) are the way they are.
Human-friendlyness and bug-prevention is very important, of course everthing in Rust can be created in C or Assembler och in machine-code but the question is how feasible is it that a typical human can do it? Rust has a lot of potential I think
To practise Rust, I rewrote my small C99 library in it [1]. Performance is more or less the same, I only had to use unchecked array access in one small hot loop (details in README.md). I haven't ported multithreading yet, but I expect Rust's Rayon parallel iterators will likewise be comparable to OpenMP.
> There are other kinds of concurrency bugs, such as poor use of locking primitives causing higher-level logical race conditions or deadlocks, and Rust can't eliminate them, but they're usually easier to diagnose and fix.
Which is why so many people are creating formal verification languages and spending years in research to fix those ... That just isn't true. It's a very complex problem that is an issue in both hardware (cache-coherency protocols) to OS (atomics locks) to higher level construct (commit-rollback in databases).
Consequently
> But the biggest potential is in ability to fearlessly parallelize majority of Rust code, even when the equivalent C code would be too risky to parallelize. In this aspect Rust is a much more mature language than C.
This couldn't be more wrong either. Rust doesn't help you write synchronization primitives safely because it doesn't handle synchronization like locks, condition variables or atomics. You need formal verification to be fearless.
Rust may or may not help you write synchronization primitives safely, but it for sure helps you use synchronization primitives without having to worry about memory safety. If you aren't parallelizing particularly subtle shenanigans, that's plenty for fearlessness.
You've just taken the word 'fearless', a word that's clearly subjective, and said that the definition the author gives of it "couldn't be more wrong". That's... a choice.
The word is misrepresenting the problem of synchronization and reducing to only memory safety.
If it was that simple, Tokio wouldn't need to formally verify their implementation with an external tool and it wouldn't have found dozens of well hidden bugs.
Shouldn’t this be Rust vs C++? C++ has a lot more parallels to Rust. Both are big, complex, and safe languages that can tuned for high performance. Infact, I would like to see more comparisons of Rust and C++ in the future.
Author here: I'm a C programmer, who's replacing C with Rust. I've never liked C++ and never felt I fully get it. I've managed to fully grasp Rust though. I don't see that much similarity between Rust and C++ other than both use angle brackets for generic code and aspire to have zero-cost abstractions.
C programming patterns have more-or-less equivalents in Rust. OTOH non-trivial C++ OOP or template usage is alien and hard to adapt to Rust.
Rust has 1 (one) way to initialize an object. No constructors, initializer lists, or rules-of-<insert number>. Move semantics are built-in, without move/copy constructors/NRVO/moved-out-of state. No inheritance. No object truncation. Methods are regular function pointers. No SFINAE (generics are equivalent to concepts, and dumber, e.g. no variadic). Iterators require only implementing a single method. Operator overloading is all done in the style of the spaceship operator.
No? I mean, if you're asking whether a Rust vs C++ comparison is useful, then sure, the answer is trivially true. If you're asking whether a Rust vs C++ comparison is more useful than a Rust vs C comparison, then the answer is "maybe yes, depending." But certainly a Rust vs C comparison is useful on its own.
Rust replaces uses of C in many ways that C++ never could so I think the comparison is apt. There isn't extensive use of C++ in the embedded world nor is it used much for writing kernel drivers, but Rust is making big inroads into both of those arenas.
There's already The Benchmarks Game and ixy-languages if you want hard numbers.
Maximum speeds are already explored. I wanted to discuss an aspect that's not typically covered by pure benchmarks: what can you expect from normal day-to-day use of these languages. Not fine-tuned hot loops, but a "median" you can expect when you just need to get shit done.
If I tried to write a benchmark code to represent average, practical, idiomatic, but less-than-maximally optimized code, I don't think anyone would believe me that's a fair comparison. So I describe problems and patterns instead, and leave it to readers to judge how much applies to their problems and programming style.
Benchmarks wouldn't tell the whole story. This detailed writeup is far better in that it gives information about how and where the two languages differ.
Here's my completely unbiased benchmark which use different data structure, uses outside library in one language and non recursive implementation. I hope you don't need the link.
I prefer to have great ideas in rust ported over to C instead of rewriting everything with Rust. this approach will benefit all the existing softwares written in C which I think is much larger than Rust in terms of both impact and code size.
I don't know if you are a minority, but Rust is available right now and C-but-with-Rust's-great-ideas isn't. As far as I know no one is working on C-but-with-Rust's-great-ideas, so I don't think it's a good strategy to wait around for it instead of using the tools that exist and are already used with great impact.
This is a popular sentiment. However, there's Checked-C and Cyclone, and they have very little traction.
To make static analysis robust in C you need to start reliably tracking ownership and forbid type-erasing constructs. This typically means adding smart pointers, some kind of borrow checking or garbage collection, generics to replace void*, maybe tagged unions, and a new standard library that embraces these features.
It's going to bring most of Rust's complexity and require major code changes anyway, but you won't even get benefits of a newer language.
I think it would be basically impossible to perform this task without making the language fundamentally not C. Zig is an interesting take in that direction (learn from the last 30 years but still try to be "C") that I think gets a lot closer to the ideal than most other alternatives.
C++, OTOH, you could probably port most of Rust's concepts into (with some extra language changes for various reasons I don't want to get into). However, since almost no existing C++ code would typecheck in the "safe" subset without modifications, it would effectively be a different language anyway. And to be clear, this isn't necessarily because people are routinely doing dangerous stuff in C++ -- the whole Rust ecosystem has grown up around the borrow checker, which means some very basic things people use in most other languages aren't done. Here are some examples of things typical Rust code does differently from typical C++ code due to it making it much harder to perform safety checks, beyond the obvious aspect of lifetime annotations and genuinely unsafe patterns like accessing globals (sorry, it just is):
* far less use of accessors, especially mutable ones (because Rust can't track split field ownership)
* Rust tends to split up big "shared context" structures depending on function use, rather than logical relationships, for much the same reason (Rust conservatively assumes that all fields are used when a context object gets passed to a function as long as any pointer to the structure remains, even if the fields you use aren't being accessed).
* Rust almost never uses internal or cyclic pointers. It's safe to do it with boxed data or data that doesn't move, and there are safe type mechanisms around that, but it's cumbersome since it has to be visible to the typechecker, so people usually don't bother.
* single-threaded mutation through multiple pointers into the same data structure, which may even be aliased. Again, often safe (though not always), and in the safe cases there are generally safe types to enable it in Rust, but since it's not the default and requires pre-planning for all but the simplest cases, people usually don't bother.
* Rust types are always annotated with thread safety information. This is usually done by default, but if it weren't it would be a huge amount of boilerplate. The reason this works is that in the cases where people are doing unsafe stuff, the type system automatically opts out and requires them to opt in. Libraries have been built around this assumption. Even if we were to port such a mechanism over to C++, the lack of these explicit annotations would mean that in practice it just wouldn't work that well--you would have to do a very detailed thread safety analysis of basically any existing library to try to assign types.
Often, complying with these kinds of rules is what people coming to Rust struggle with--not so much local lifetime issues which the compiler can usually figure out, but how to structure the entire program to make life easy for the borrow checker. However, complying comes with a big benefit--it allows safety analysis to proceed purely locally in almost all cases. The reason that static analyzers don't just "do what Rust does" is that they're dealing with programs that aren't structured that way and need to perform far more global analysis to catch most of the interesting memory safety bugs that pop up in mature C++ codebases, especially the ones that evade code review.
So--do I think it would be great to port this stuff over to C++ (or C, hypothetically?). Absolutely--I still prefer Rust as a language, but at the end of the day memory safety you could layer on top of existing C code would be a huge win for everyone. But I don't see it happening because of the fact that Rust's solution requires serious code restructuring. if people are going to have to rewrite their old programs anyway to work with a tractable static analysis, and not be able to use almost any existing libraries, it's not clear how much more benefit they'd have from using this subset than from just switching to Rust.
I do agree with most of your points, porting may not be possible.
However, I was just wondering if the future of C/C++ can be much safer than it is right now. for example, GCC's GUARDED_BY macro is a big help in thread safety for c/c++. not sure how much further we can go but just a thought.
Its just amusing, in this thread everyone with critical thinking and skeptical is down voted, even if one expresses himself moderately. It shows how much of zealots, Rust fanboys have become.
The problem is that most people criticising Rust don't make the case very well. If you want to read good critique, I'd recommend this - https://matklad.github.io/2020/09/20/why-not-rust.html. This post up-to-date, succinct and objective.
And most pertinently, this critique was written by someone who genuinely loves programming in Rust. Shows you that Rust users aren't blinded to the faults of the language. You shouldn't think that Rust users are fanboys just because you see push back to low effort, low knowledge critiques.
> You shouldn't think that Rust users are fanboys just because you see push back to low effort, low knowledge critiques.
That's too much assuming, btw I read in this thread a comment from a well-known Nim dev working in multithreading (with much knowledge on the subject) and it was downvoted to oblivion.
> And most pertinently, this critique was written by someone who genuinely loves programming in Rust.
That is putting the bar impossible high. I would expect most of the criticism to come from people who hate to program in Rust, which it is fine as long as the criticism is well argued.
You've got the contrapositive there. The claim was that folks who love Rust do not accept criticism of the language. Therefore, a criticism by someone who loves the language was presented, to show that claim was false. Your parent isn't saying that only folks who love Rust can criticize Rust.
Point A: The claim was that folks who love Rust do not accept criticism of the language.
Point B: a criticism by someone who loves the language was presented, to show that claim was false.
Point B does not contradict point A at all. That is like saying that is false I cannot accept criticism because , hey, look at the weak points I have and I gladly mention("I am too perfectionist","I work too hard", "I put the wellness of the company ahead of myself")
I'm not putting the bar high. I'm giving an example of people who love Rust criticising Rust. Person I replied to claimed that Rust fanboys didn't do this because they were zealots. That's not true, clearly.
I've read a lot of criticism of Rust and most of it is from people who tried it for a weekend, couldn't understand the borrow checker and wrote some low quality criticism of it. If someone points out problems in that post, they are accused of zealotry and fanboyism.
Read the post I linked. It covers all the issues and makes the strongest possible case against the language. Then tell me if you've ever seen one that is as negative, accurate and succinct as that one.
> I'm not putting the bar high. I'm giving an example of people who love Rust criticising Rust. Person I replied to claimed that Rust fanboys didn't do this because they were zealots. That's not true, clearly
But that was OP's point.If your best example of Rust lovers accepting criticism of the language is the existence of a critical article written by a Rust lover, that does no say anything. The bar is naively high if the best example you got of tolerance to criticism was a critical article made by a member of the "tribe". The implicit point is that those kind of articles will be the ones playing "soft ball" with the language,so any perceived tolerance to criticism is almost meaningless.
Bothsidesism is unhelpful in technical discussions just as much as in politics. If you have specific critiques please share them.
I have a number of specific critiques of Rust, chief being that APIs and implementations are bound too tightly. &[String] and &[&str] are logically similar but changing from one to the other in your implementation might mean a breaking API change.
If you need api flexibility you use generics, and that is the way to be generic over types that refer to strs. I'm pretty sure this is in the book, and it's common enough that even someone who doesn't use rust full time (myself) knows it off the top of their head
You can use impl Trait in returns, this is actually one of the reasons why that feature exists.
Ownership, mutability, and thread-safety are not easy to abstract over and hide as implementation details in Rust.
It's a side effect of the fact that Rust actually cares about these and checks them strictly, but for users coming from eg Java that's a bit of a shock.
Hm, is there some specific criticism of Rust you'd like to see discussed more? It's easy to get side-tracked in these "actually my language is better than your language" with everybody launching whole broadsides of arguments, so I wouldn't be surprised if some more subtle points get lost.
eh, I don't partake in the usual "actually my language is better than your language" that comes up in all posts. I just don't like people going overboard with their claims, when they try to promote any PL really, and would appreciate more fact checking.
Can you point to some specific comments like this? None of the top threads seem to show this, as of this writing it's mostly about thread versus process parallelism and which kinds of conditions require unsafe.
I think it would a be very in interesting psychological study on the reason for this. The similar thing happens for some other languages, but never at the level of rust.
And now one of mine https://news.ycombinator.com/item?id=26448822 in a subthread where the other main commentor says the subthread is his favorite of this whole thread. There just might be something to this downvoting claim..
For parallelism, Modern tooling like TSAN can close the gap somewhat. If you are planning to introduce threads, not testing it with TSAN is silly at best.
If you're writing safe, parallel Rust code, you don't really need to use TSAN. You may hit a deadlock sometimes, but those tend to be easy to figure out in my experience.
The people implementing the libraries you use (e.g. Rayon) may have to use TSAN, of course.
I think it’s a reasonable comparison. C is still a language that is widely used. In some niches, it is the only acceptable language. Comparing C with Rust is useful for people in those niches. An example of this is the Linux kernel.
Actually I t’s even older. I know it’s not an official standard, but most if not all points on C in the article would also apply to K&R C. The book was published in 1978, more than 40 years ago.
@steveklabnik, RCU is different from RwLock in that the single writer and all readers never block each other.
Given that RCU is a complex wait-free data structure (though I don't fully understand it), I suspect it may not necessarily be possible to implement it without unsafe blocks, purely in terms of the standard library concurrency types (atomics and Arc can be used without unsafe, but themselves contain unsafe blocks). The general goal is to create an abstraction which encapsulates unsafe blocks such that it's impossible for outside users calling safe functions to violate memory safety. Of course, libraries sometimes have bugs that need to be fixed.
The article talks way too high level and is written like a marketing people even the title sounds technical, for example:
"Rust enforces thread-safety of all code and data, even in 3rd party libraries, even if authors of that code didn't pay attention to thread safety. Everything either upholds specific thread-safety guarantees, or won't be allowed to be used across threads."
But this is true. I mean specifically about Send and Sync traits that have to be implemented on types for the compiler to allow them in multi-threaded constructs, like `thread::spawn` or Rayon's parallel iterators.
If you write a library, and use e.g. thread-unsafe `Rc` or not-sure-if-safe raw pointers anywhere in your structs, the compiler will stop me from using your library in my threaded code.
This is based on a real experience. I've written a single threaded batch-processing code, and then tried to make it parallel. The compiler told me that I used a GitHub client, which used an HTTP client, which used an I/O runtime, which in this configuration stored shared state in an object without a Mutex. Rust pointed out exactly the field in 3rd party code that would cause a data race. At compile time.
That doesn't sound too high level to me. Maybe a small quibble is the definition of "thread safety," but a reasonable one would be, "no undefined behavior in the presence of simultaneous access." In other words, no data races. And that's absolutely true and consistent with Rust's definition of safety. Another small quibble might be that, "even if the authors of that code didn't pay attention to thread safety and didn't use 'unsafe'" would be more precise.
There is simply no way you can enforce "thread safety on ALL data", unless you pay unreasonable amount of synchronization costs, which in that case, is a trivial thing to accomplish.
This is as same as some one tell you that you will never loose any money by investing a certain asset.
Rust is a constructive proof that your assertion is simply false. It comes at the cost of some complexity—every Rust type carries thread-safety information with it—but the benefit is that writing correct parallel Rust code becomes very easy.
What you cannot easily do in Rust is dynamically switch thread safety on or off.
My experience is that languages survives not because of a particular feature, but because they are USEFUL in practice to produce a software.
The fact that C is used in so many places speaks for itself about it usefulness. And this is done by writing software by majority of C programmers instead of jumping on every forum to attack other languages, writing extended blog posts just to convince people that they "should" switch to the language they like.
Also if you believe bounds check is the most difficult thing in software development, it just mean that you haven't dealt with a sufficient system yet or you just pretends to be.
The similar thing also applied to that if you think naively putting pthread_mutex_lock and unlock around the data structure is hard, it just means you haven't touched the scenarios that C programmers resorts to non-trivial locking mechanisms for.
Nothing in this article seems to be saying that C "isn't useful". It also doesn't state that bounds checks are the "most difficult thing in software development."
As the article mentions, C is 50 years old. The fact that it's still used is evidence of its usefulness, sure. It has outlasted almost all of its peers.
Rust has been stable for under 6 years. In that time, it's been adopted by a slew of major companies, and people have used their free time to write some extremely good software in it. So by that metric, Rust's usefulness speaks for itself, too.
The article is using one or two features in a quick marketing style to promote rust.
- Regardless it is true or not, this seldom works in long term. I just simply point this observation out.
In fact language as tool is never about more features, it is about minimum features for maximize utilities, and Rust is already on the domain of "feature-rich" language.
I appreciate the article, but it would be really nice if the author could add a timestamp to his blog posts. Without timestamps, it's impossible to know if any issue described in the article body still exists.
I didn't read it, because it might present outdated knowledge.
dig1 is wrong. He uses the age old C defence of "it's not a problem with the language, it's just bad programmers programming badly". Apparently buffer reuse isn't a problem because "sane" libraries don't do it. Well, I'll believe it when we stop seeing security issues in C code bases.
The fact that my perfectly valid comment was down voted like this shows that HN has a pretty dysfunctional community. I think that is my last comment here ;)
> "Clever" memory use is frowned upon in Rust. In C, anything goes.
No, it does not. If Rust programmers don't have discipline in C, other people have.
And don't drag out some random CVE numbers again. These are about a fraction of existing C projects, many of them were started 1980-2000.
It is an entirely different story if a project is started with sanitizers, Valgrind and best practices.
I'm not against Rust, except that they managed to take OCaml syntax and make it significantly worse. It's just ugly and looks like design by committee.
But the evangelism is exhausting. I also wonder why corporations are pushing Rust. Is it another method to take over C projects that they haven't assimilated yet?
> It's just ugly and looks like design by committee.
I don't think it's ugly because it's design-by-committee, I think they intentionally made it ugly so that it's familiar to C++ people.
> I also wonder why corporations are pushing Rust.
You said it yourself: undisciplined people can't write C without introducing memory-related bugs, and it's much easier to hire undisciplined people than disciplined people.
> It is an entirely different story if a project is started with sanitizers, Valgrind and best practices.
Do you have an example of a project that is (a) built in such a way, (b) large, and (c) has a good track record on memory safety?
We get that you don't like rust. But it seems like a lot of people currently using C or C++ while like to use rust at work, and might disagree about the benefits of the language and tooling. I personally know a few friends in distinct domains who work on established C++ codebases and are in this situation.
There are also a lot of people who do not use C or C++, but use a bit of rust because it's so much easier to write fast little tools with it. I'm in this category. I even use threads sometimes, and it's reasonably easy. A crop of new unixy tools in rust seems to indicate other people also think alike.
Quite on the contrary, Rust is the ideal language to replace C and C++ where automaric memory option is a no go, like MISRA-C, kernel and device drivers.
Liking a programing language doesn't make me blind to what use cases it actually makes sense to use it, I don't see nails everywhere.
I think you'll find that most rational advocates for any language agree that their favorite language is only strong in its subdomain.
Any compiled language is more painful than a quick scripting one for quick projects where the project complexity is low and the language overhead doesn't matter.
Rust is substantially more painful to get compiling (due to the borrow checker) and harder to debug (due to tool maturity) than C# or C++. It's much harder to use than Python. Every language has its place.
But when you are investing the time to make an efficient, high performance program... or you have limited requirements like you said -- Rust becomes a great choice.
Every langauge has its place. I'm just dreadfully excited that we have a new choice now to trade a bit more time interacting with compiler errors for high performance and stability -- when that makes sense.
What was done in C, C++ and Tcl, I nowadays use Java and .NET languages.
If we really need something low level that either Java or .NET cannot offer, a native library for a specific component will do, no need to throw the whole thing away and do one of those rewrite blog posts.
Tail latency due to memory pressure tends to be inherent to the nature of garbage collected languages with mutable state. This is not an issue if you have more RAM then the system needs, but often RAM is extremely scarce.
Garbage collected languages also offer means to do C like memory allocation, it is matter to use the language features and FFI capabilities, but many just learn their stacks superficially and follow Rewrite in X trendy blog posts instead.
There is a very big difference between what particular environments offer in theory (yes, you can write object pools in Java and many high-performance projects use them) and the situation in practice (there are people who spend a large chunk of their professional careers doing JVM tuning).
Idiomatic Rust avoids the situations which require JVM tuning experts. You can write a Rust service, put it into production and tail latency is very likely not going to be a problem at all.
Now you may decide that needing the occasional services of a JVM tuning expert is better overall than ownership concerns being pervasive throughout a codebase, depending on the specifics. But do accept that the trade-off exists.
C evangelism is exhausting too. Maybe we can stick to discussing the merits of each language instead of complaining about how people with differing opinions make us feel.
Yeah? Check out any very public discussion of Rust and to a first approxiation there's always gonna be someone talking about how we should all just be using C instead. It's also not hard to find instances in open source projects of people ascribing ulterior motives or brain damage or ineptitude or whatever to anyone using another programming language.
Can you give an example instead of just describing something rarely happened?
For rust this is certainly the case, demonstrated by this thread and almost any other thread about rust, it is . TBH it is a pattern to see title "fastest xxx written in rust".
C programmers do not have the tradition to ASK other people to write something in C, they WRITE something in C. That's the real difference here.
A graph would be good. Any graph. Preferably multiple. Otherwise, this is all empirical data. Show me why Rust wins, and how. Telling me "doubly-linked lists are slow" is not useful, as a developer considering one of these two languages.
This isn't that type of post. Sometimes what's useful is a brain-dump of heuristics and tidbits and general impressions formed over years and years of experience. Sometimes that's more useful, or even more accurate, than hard benchmark data.
All benchmarks should be delivered in the form of a graph and histogram, I had to close a PR recently where the "optimization" was 1% of a standard deviation away from the mean without even running either implementation!
Most things in life are subjective and cannot be reduced to graphs and other "empirical data". I learned this later in life than I should have, and since then I've spent time and effort building some of the mental circuits required to evaluate subjective experiences and arguments. Perhaps doing so may be useful to you as well.
Yes. Today, I integrated two parts of a 3D graphics program. One refreshes the screen and lets you move the viewpoint around. The other loads new objects into the scene. Until today, all the objects were loaded, then the graphics window went live. Today, I made those operations run in parallel, so the window comes up with just the sky and ground, and over the next few seconds, the scene loads, visibly, without reducing the frame rate.
This took about 10 lines of code changes in Rust. It worked the first time it compiled.