Sure, but the borrow checker only operates on references. Rust gives you the tools to work with raw everything, if you dip into unsafe. Memory allocators, doing this kind of low-level thing, don't work with references. Let's say you want to implement a global allocator (this is the only current API in stable Rust, non-global allocators are on the way). The trait you use, which gives you the equivalent of malloc/free, has this signature:
Note the *mut u8 rather than say, &mut u8. Most people would not be using this interface directly, they'd be using a data structure that uses it internally.
Now, there's a good argument to be had about safe and unsafe, how much you need, in what proportion, and in what kinds of programs... but when you say things like "C allows you to do bulk memory operations, Rust does not" and ask about the borrow checker when talking about allocators, to someone who is familiar with Rust's details, it seems like you are misinformed somehow, which makes it really hard to engage constructively with what you're saying.
I'll try to further bridge some of the understanding gap.
People in this thread keep talking about "arena allocators" as if they are special things that you would use a few times (Grep Guy said this above, for example), or, here you imply they would be used internally to data structures, in a way that doesn't reach out to user-level.
That makes them not nearly as useful as they can be!
The game we are working on is currently 100k lines (150k lines if you count comments etc), and almost all allocations in the entire program are of this bulk type, in one way or another. Like if I want to copy a string to modify it a little bit to then look up a bitmap or make a filename, those are temporary allocations at user level and are not wrapped by anything. The string type is not some weird heavyweight C++ std::string kind of thing that wraps a bunch of functionality, it is just a length and a data pointer.
So the proposal to use unsafe in this kind of context doesn't make sense, since then you are putting unsafe everywhere in the program, which, then, why pretend you are checking things?
You can say, "well you as the end-user shouldn't be doing this stuff, everything should be wrapped in structures that were written by someone smarter than you I guess," but that is just not the model of programming that I am doing.
I understand how you can think the statement "Rust does not (allow you to do bulk memory operations)" is false, but when I say this, part of what I am including in "bulk memory operations" is the ability (as the end user) to pretend like you are in a garbage-collected language and not worry about the lifetime of your data, without having to take the performance penalty of using a garbage-collected language. So if you add back in worrying about lifetimes, it's not the same thing.
If you think "bulk memory allocation" is like, I have this big data structure that manages some API and it has some linked lists, and instead of allocating those nodes on the heap I get them from an arena or pool managed by the bigger structure ... that's fine, it is better than not doing it, but it doesn't help the end user write simpler code, and in practical terms it means that most of the allocations in the program are going to be non-bulk, because there's just too much friction on doing them broadly.
If it helps, I can revise my statement to "Rust enables you to do certain kinds of internal bulk memory allocation, but using bulk allocation broadly and freely across your program goes against the core spirit of the language" ... that sounds pretty uncontroversial? Then to bring it back to the original post, I would say, "This kind of broad use of bulk allocation is important for high performance and simplicity of the resulting code."
One last note, I am pretty tired of the "you don't understand Rust, therefore you are beneath us" line that everyone in the Rust community seems to deploy with even the slightest provocation -- not just when responding to me, but to anyone who doesn't just love Rust from top to bottom. Really it makes me feel that the only useful thing to do is just ignore Rust folks and go do useful things instead. I know I am not the only person who feels this way.
> People in this thread keep talking about "arena allocators" as if they are special things that you would use a few times (Grep Guy said this above, for example)
I didn't say anything about arena allocators. What I said was that amortizing allocation was routine and commonplace in ripgrep's code. I definitely wouldn't say that amortizing allocation is "special" or something I use a "few" times. As one example of amortizing allocs, it's very common to ask the caller for some memory instead of allocating memory yourself.
> I am pretty tired of the "you don't understand Rust, therefore you are beneath us" line that everyone in the Rust community seems to deploy with even the slightest provocation
Kind of like opening a comment with "This entire article is nonsense." Right? Snubbing your nose and then getting miffed by the perception of others snubbing their nose at you is a bunch of shenanigans. And then you snub your nose at pretty much everyone: "It's the blind leading the blind." I mean, c'mon dude.
The problem with your comments is that they lack specifics. Even after this comment where you've tried to explain, it's pretty hard for me to understand what you're getting at. I suspect part of the problem is your use of this term "bulk allocation." Is it jargon that refers to the specific pattern you have in mind? Because if it is, I can't find it documented anywhere after a quick search. If it's not jargon, then "bulk allocation" could mean a lot of things, but you clearly have a very specific variant of it in mind.
It's clear to me that your argument is a very subtle one that requires nuance and probably lots of code examples to get the point across. Going about this at the other end---with lots of generalities and presumptions---just seems like a fool's errand.
This style of memory management can be as pervasive as you like! You are reading way more detail out of people's comments than they put there, and then getting upset about your misinterpretation.
If every throwaway string in your program comes from an arena that you clear later, great! Rust won't stop you, or even force you to use unsafe every time you build one. The unsafe code goes in a "give me a fresh chunk of temporary memory" function, and that function is safe to call all over the place: unsafe-in-a-safe-function is a common pattern for extending the set of analyzable programs.
(It's also worth pointing out that Rust's primitive string type is "just a length and a data pointer," so once you've allocated one out of an arena like this, you can do all the nice built-in string-y things with it, with no std::string-like interference.)
The Rust compiler itself uses this sort of bulk memory all the time. It's not limited to the internals of data structures there- it's spread across larger phases and queries of its operation, with all kinds of stuff allocated the same way.
Now, to be fair, this is not the default- e.g. Rust's standard library of collections don't participate. But this is why everyone keeps mentioning custom allocators to you- there is ongoing work to extend these collections with the ability to control how they perform their allocation!
> One last note, I am pretty tired of the "you don't understand Rust, therefore you are beneath us" line that everyone in the Rust community seems to deploy with even the slightest provocation -- not just when responding to me, but to anyone who doesn't just love Rust from top to bottom.
You would get this kind of reaction a lot less often if you didn't make vague or nonsense claims about it so often.
Okay, but if I do this everywhere, then I de facto don't have memory safety. Why, then should I use Rust and pretend like I am getting memory safety? Why wouldn't I use a lower-friction language with a faster compiler?
It looks to me like the Rust community has this weird way of wanting to have its cake, and eat it too, about memory. Y'all want to advertise how important memory safety is, how great it is to have, and so forth. Then in cases like this, it's always "oh but you just use unsafe, it's fine". These stories are mutually inconsistent. Either you have memory safety or you don't. Paying the cost that Rust makes programmers pay for memory safety, and then not actually getting memory safety, is the worst of both worlds.
Then when you guys say I am making nonsense claims because of course you can have your cake and also eat it as long as you use the Rust programming language, well, it's just pretty weird at that point.
Memory safety is not some sort of binary thing which you either have or you don't. All memory safe environments are built on a foundation of unsafe code. For example, Java being memory safe assumes the JVM or JNI code doesn't have any memory safety bugs.
What Rust does is reduce the amount of code that's memory unsafe, that needs to be triply reviewed and audited. Reduction of the scope of high-scrutiny code is the single most leveraged thing that can be done to improve code quality in a large, long-running project. Why? Because it lets you do careful, time-consuming analysis on a small part of your codebase (the bits that are marked unsafe), then scale the analysis up to the rest of your code.
> These stories are mutually inconsistent. Either you have memory safety or you don't.
This is... what can I say. This is simply incorrect. It pains me to say this as a fan of your games but you really don't seem to have any idea what you're talking about.
> Okay, but if I do this everywhere, then I de facto don't have memory safety.
No, that's not how this works. You write the unsafe code in one place and make sure it's correct (just like you'd do in C or Jai), and then you wrap it in a function signature that lets the compiler apply its memory safety checks to all the places that call it (this is what Rust gives you over C).
This is still a meaningful improvement to memory safety over C. The compiler checks the majority of your program; if you still see memory bugs now you only have a small subset to think about when debugging them.
This is also not very different from a hypothetical language with your "actual" memory safety- in that case, you still have to consider the correctness of the compiler checks themselves. Rust just takes a more balanced and flexible approach here and moves some of that stuff out of the compiler. (In fact this simplifies the compiler, which increases confidence in its correctness...)
Rust has been very clear about all this from the beginning. If you are still reading people's claims about Rust memory safety a different way, that's on you.
> Why wouldn't I use a lower-friction language with a faster compiler?
That's totally up to you! I don't have a problem with people using other languages for these kinds of reasons. My goal here is not to convert you, but to cover more accurately what Rust is and isn't capable of. (At the root of this thread, that's things like "writing fast software with effective memory management styles.")
They are the opposite of meaningless. This is just straight-up incorrect, both in theory and in practice.
Please take some time and think about this a bit more. Please think about how code review processes work, how audits work, how human attention spans work. Please think about how people endlessly nitpick small PRs but accept large ones with few comments. What unsafe does is make it easy to spot the small bits of critical code to nitpick while not having to worry about safety for the rest.
They're conditionally meaningful: if a small amount of your program is correct, the entire program satisfies some useful properties.
This may or may not be something you care about, but it is certainly a meaningful tool that is quite useful to me, including when I use your type (3) style described in the sibling thread.
> here you imply they would be used internally to data structures, in a way that doesn't reach out to user-level.
Ah! I think I am understanding you a bit better. The thing is, ultimately, Rust is as flexible as you want it to be, and so there are a variety of options. This can make it tricky, when folks are talking about slightly different things, in slightly different contexts.
When you say "doesn't reach out to user level," what I mean by what I said was that users don't generally call alloc and dealloc directly. Here, let's move to an actual concrete example so that it's more clear. Code is better than words, often:
use bumpalo::{Bump, boxed::Box};
struct Point {
x: i32,
y: i32,
}
fn main() {
let bump = Bump::with_capacity(256);
let c = Box::new_in(Point { x: 5, y: 6 }, &bump);
}
This is using "bumpalo", a very straightforward bump allocator. As a user, I say "hey, I want an arena backed by 256 bytes. Please allocate this Point into it, and give me a pointer to it." "c" here is now a pointer into this little heap it's managing. Because my points are eight bytes in size, I could fit 32 points here. Nothing will be deallocated until bump goes out of scope.
But notably, I am not using any unsafe here. Yes, I am saying "give me an allocation of this total size", and yes I am saying "please allocate stuff into it and give me pointers to it," but generally, I as a user don't need to mess with unsafe unless I'm the person implementing bumpalo. And sometimes you are! Personally, I work in embedded, with no global heap at all. I end up using more unsafe than most. But there's no unsafe code in what I've written above, but it's still gonna give you something like what you said you're doing in your current game. Of course, you probably want something more like an arena, than a pure bump allocator. Those exist too. You write 'em up like you would anything else. Rust will still make sure that c doesn't outlive bump, but it'll do that entirely at compile time, no runtime checks here.
Oh, and this is sorta random but I didn't know where to put it: Rust's &str type is a "pointer + length" as well. Using this kind of thing is extremely common in Rust, we call them "slices" and they're not just for strings.
> You can say, "well you as the end-user shouldn't be doing this stuff, everything should be wrapped in structures that were written by someone smarter than you I guess," but that is just not the model of programming that I am doing.
While that's convenient, and in this case, I am showing that, the point is that it's about encapsulation. I don't have to use this existing allocator if I wanted to write something different. But because I can encapsulate the unsafe bit, no matter who is writing it, I need to pay attention in a smaller part of my program. Maybe I am that person, maybe someone else is, but the benefit is roughly the same either way.
> So if you add back in worrying about lifetimes, it's not the same thing.
To be super clear about it, Rust has raw pointers, that are the same as C. No lifetimes. If you want to use them, you can. The vast, vast, vast majority of the time, you do not need the flexibility, and so it's worth giving it up for the compile time checks.
> If you think "bulk memory allocation" is like...
It's not clear to me above if the API I'm talking about above is what you mean here, or something else. It's not clear to me how you'd get simpler than "please give me a handle to this part of the heap," but I haven't seen your latest Jai streams. I am excited to give it a try once I am able to.
> but using bulk allocation broadly and freely across your program goes against the core spirit of the language
I don't know why you'd think these techniques are against the core spirit of the language. Rust's primitive array type is literally "give me N of these bits of data laid out next to each other in memory." We had a keynote at Rustconf about how useful generational arenas are as a technique in Rust. As a systems language, Rust needs to give you the freedom to do literally anything and everything possible.
> One last note, I am pretty tired of the "you don't understand Rust, therefore you are beneath us"
To be clear, I don't think that you or anyone else is "beneath us," here. What I want is informed criticism, rather than sweeping, incorrect statements that lead people to believe things that aren't true. Rust is not perfect. There are tons of things we could do better. But that doesn't mean that it's not right to point out when facts are different than the things that are said. You of all people seem to appreciate a forward communication style.
> rather than sweeping, incorrect statements that lead people to believe things that aren't true
I agree, and if I say things that are incorrect, then I definitely want to fix them, because I value being correct.
But what I am meeting in this thread is people wanting to do some language-lawyer version of trying to prove I am incorrect, without addressing the substance of what I am actually saying. I think your replies have been the only exception to this (and only just).
I realize my original posting was pretty brusque, but, the article was very bad and I am very concerned with the ongoing deterioration of software quality, and the hivemind responses to articles like this on HN, I think, are part of the problem.
I know that Rust people are also concerned with software quality, and that's good. I just think most of Rust's theories about what will help, and most of the ways these are implemented semantically, are just wrong.
So if something I am saying doesn't seem to make sense, or seems "incorrect", well, maybe it's that I am just coming from a very different place in terms of what good programming looks like. The code that I write just looks way different from the code you guys write, the things I think about are way different, etc. So that probably makes communication much harder than it otherwise would be, and makes it much easier to misunderstand things.
On the technical topic being discussed here...
Using a bump allocator in the way you just did, on the stack for local code that uses the bump allocator right there, is semantically correct, but not a very useful usage pattern. In a long-running interactive application, that is being programmed according to a bulk allocation paradigm that maybe is "data oriented" or whatever the kids call it these days, there are pretty much 4 memory usage patterns that you ever care about:
(1) Data baked into the program, or that is initialized so early at startup that you don't have to worry about it. [This is 'static in Rust].
(2) Data that probably lives a long time, but not the whole lifetime of the program, and that will be deallocated capriciously at some point. (For example, an entry in a global state table).
(3) Data that lasts long enough that local code doesn't have to care about its lifetime, but that does not need to survive long-term. For example, a per-frame temporary arena, or a per-job arena that lasts the lifetime of an asynchronous task.
(4) Data that lives on the stack, thus that can't ever be used upward on the stack.
Now, the thing is that category (3) was not really acknowledged in a public way for a long time, and a lot of people still don't really think of it as its own thing. (I certainly didn't learn to think about this possibility in CS school, for example). But in cases of dynamic allocation, category (3) is strictly superior to (4) -- because it's approximately as fast, and you don't have to worry about your alloca trying to survive too long. You can whip up a temporary string and just return it from your function and nobody owns it and it's fine. So having your program really lean on (3) in a big way is very useful. This is what I was saying before about pretending to have a garbage collector, but you don't pay for it.
So if you are doing a fast data-oriented program (I don't really use the term "data-oriented" but I will use it here just for shorthand), dynamic allocations are going to be categories 1-3, and (4) is just for like your plain vanilla local variables on the stack, but these are so simple you just don't need to think about them much.
Insofar as I can tell, all this lifetime analysis stuff in Rust is geared toward (4). Rust wants you to be a good RAII citizen and have "resources" owned by authoritative things that drop at very specific times. (The weird thing about "resources" is that in reality this almost always means memory, and dealing with memory is very very different from dealing with something like a file descriptor, but this is genericized into "resources", which I think is generally a big mistake that many modern programming language people make).
With (1), you don't need any lifetime checking, because there is no problem. With (2), well, you can leak and whatever, but this is sort of just your problem to make sure it doesn't happen, because it is not amenable to static analysis. With (3), you could formalize a lifetime for it, but it is just one quasi-global lifetime that you are using for lots of different data, so this by definition cannot do very much work for you. You could use it to avoid setting a global to something in category (3), and that's useful to a degree, but in reality this problem is not hard to catch without that, and it doesn't seem worth it to me in terms of the amount of friction required to address this problem. Then there is (4), which, if you are not programming in RAII style, you don't really need checking very much??, because everything there is simple, and anyway, the vast majority of common stack violations are statically detectable even in C (the fact that C compilers did not historically do this is really dumb, and has been a source of much woe, but like, it is very easy to detect if you return a pointer to a local from a function, for example. Yes, this is not thorough in the way Rust's lifetime checking is, and this class of analysis will not catch everything Rust does, but honestly it will catch most of it, at no cost to the programmer).
So when I said "Rust does not allow you to do bulk memory allocation" what I am saying is, the way the language is intended to be used, you have most of your resources being of type (4), and it prevents you from assigning them incorrectly to other resources of type (4) but that have shorter lifetimes, or to (2) or (1).
But if almost everything in (4) is so simple you don't take pointers to it and whatnot, and if most of your resources are (3), they have the same lifetime as each other, all over the place, so there is no use checking them against each other. So now the only benefit you are getting is ensuring that you don't assign (3) to (2) or (1). But the nature of (3) is such that it is reset from a centralized place, so that it is easy, for example, to wipe the memory each frame with a known pattern to generate a crash if something is wrong, or, if you want something more like an analytical framework, to do a Boehm-style garbage collector thing on your heap (in Debug/checked builds only!) to ensure that nothing points into this space, which is a well-defined and easy thing to do because there is a specific place and time during which that space is supposed to be empty.
So to me "programming in Rust" involves living in (4) and heavily using constructors and destructors, whereas I tend to live in (3) and don't use constructors or destructors. (I do use initializers, which are the simple version of constructors where things can be assigned to constant values that do not require code execution and do not involve "resources" -- basically, could you memcpy the initial value of this struct from somewhere fixed in memory). Now the thing that is weird is that maybe "programming in Rust" has changed since last time I argued with Rust people. It seems that it used to be the sentiment that one should minimize use of unsafe, that it should just be for stuff like lockfree data structure implementation or using weird SIMD intrinsics or whatever, but people in this thread are saying, no man, you just use unsafe all over the place, you just totally go for it. And with regard to that, I can just say again what I said above, that if your main way of using memory is some unsafe stuff wrapped in a pretend safe function, then the program does not really have the memory safety that it is claiming it does, so why then be pretending to use Rust's checking facilities? And if not really using those, why use the language?
So that's what I don't get here. Rust is supposed to be all about memory safety ... isn't it? So the "spirit of Rust" is something about knowing your program is safe because the borrow checker checked it. If I am intentionally programming in a style that prevents the borrow checker from doing its job, is this not against the spirit of the language?
I'll just close this by saying that one of the main reasons to live in (3) and not do RAII is that code is a lot faster, and a lot simpler. The reason is because RAII encourages you conceptualize things as separately managed when they do not need to be. This seems to have been misunderstood in many of the replies above, as people thinking I am talking about particular features of Rust lifetimes or something. No, it is RAII at the conceptual level that is slow.
> We had a keynote at Rustconf about how useful generational arenas are as a technique in Rust.
If that's the one I am thinking of, I replied to it at length on YouTube back in 2018.
Thanks for expanding on that. Now I think I get what you mean.
Rust does case (3) with arenas. In your frame loop you'd create or reset an arena and then borrow it. That would limit lifetime of its items to a single loop iteration.
The cost would be only in a noisy syntax with `'arena` in several places, but other than that it compiles to plain pointers with no magic. Lifetimes are merely compile-time assertions for a static analyzer. Note that in Rust borrowing is unrelated to RAII and theoretically separate from single ownership.
Rust's `async fn` is one case where a task can hold all of the memory it will need, as one tagged union.
As for unsafe, the point is in encapsulating it in safe abstractions.
Imagine you're implementing a high level sandbox language. Your language is meant to be bulletproof safe, but your compiler for that language may is in C. The fact that the compiler might do unsafe things doesn't make the safe language pointless.
Rust does that, but the safe language vs unsafe compiler barrier is shifted a but, so that users can add "compiler internals" themselves for the safe language side.
eg. `String` implementation is full of unsafe code, but once that one implementation has been verified manually to be sound, and hidden behind a safe API, nobody else using it can screw it up when eg concatenating strings.
I know it seems pointless if you can access unsafe at all, so why bother? But in practice it really helps, for reasons that are mostly social.
* There are clear universal rules about what is a safe API. That helps review the code, because the API contract can't be arbitrary or "just be careful not to…". It either is or isn't, and you mark it as such. Not everything can be expressed in terms of safe APIs, but enough things can.
* Unsafe parts can be left to be written by more experienced devs, or flagged for more thorough review, or fuzzed, etc. Rust's unsafe requires as much diligence as equivalent C code. The difference is that thanks to encapsulation you don't need to write the whole program with maximum care, and you know where to focus your efforts to ensure safety. You focus on designing safe abstraction once, and then can rely on the compiler upholding it everywhere else.
> So if something I am saying doesn't seem to make sense, or seems "incorrect", well, maybe it's that I am just coming from a very different place in terms of what good programming looks like.
I do think this is probably true, and I know you do care about this! The thing is...
> The code that I write just looks way different from the code you guys write, the things I think about are way different, etc.
This is also probably true! The issue comes when you start describing how Rust code must be or work. There's nothing bad about having different ways of doing things! It's just that when you say things like "since then you are putting unsafe everywhere in the program," when that's empirically not what happens in Rust code.
> Using a bump allocator in the way you just did, on the stack for local code that uses the bump allocator right there, is semantically correct, but not a very useful usage pattern.
Yes. I thought going to the simplest possible thing would be best to illustrate the concept, but you're absolutely right that there is a rich wealth of options here.
Rust handles all four of these cases, in fairly straightforward ways. I also agree that 3 isn't often talked about as much as it should be in the broader programming world. I also have had this hunch that 3 and 4 are connected, given that the stack sometimes feels like an arena for just the function and its children in the call graph, and that it has some connection to the young generation in garbage collectors as well, but this is pretty offtopic so I'll leave it at that :)
Rust doesn't care just about 4 though! Lifetimes handle 3 as well; they ensure that the pointers don't last longer than the arena lives. That's it.
I don't have time to dig into this more, but I do appreciate you elaborating a bit here. It is very helpful to get closer to understanding what it is you're talking about, exactly. I think I see this differently than you, but I don't have a good quick handle on explaining exactly why. Some food for thought though. Thanks.
(Oh, and it is the one you're thinking of; I forgot that you had commented on that. My point was not to argue that the specifics were good, or that your response was good or bad, just that different strategies for handling memory isn't unusual in Rust world.)
> they ensure that the pointers don't last longer than the arena lives. That's it.
Sure, but my point is, when most things have lifetimes tied to the same arena, this becomes a almost a no-op. Both in the sense of, you are not really checking much (as Ayn Rand said, 'a is 'a), and you're paying a lot in terms of typing stuff into the program, and waiting around for the compiler to be not usefully checking all these things that are the same. Refactoring a program so that most things' lifetimes are the same does not feel to me like it's in the spirit of Rust, because then why have all these complicated systems, but maybe you feel that it is.
There is a bit of a different story when you are heavily using threads, because you want those threads to have allocators that are totally decoupled from each other (because otherwise waiting on the allocator becomes a huge source of inefficiency). So then there are more lifetimes. But here I am not convinced about the Rust story either, because here too I think there are simpler things to do that give you 95% of the benefit and are much lower-friction.
(And I will admit here that "Rust doesn't allow you to X", as I said originally, is not an accurate statement objectively. Attempting to rephrase that objection to be better, I would say, by the time you are doing all this stuff, you are outside the gamut that Rust was designed for, so by doing that program in Rust you are taking a lot of friction, but not getting the benefit given to someone who stays inside that gamut, so, it seems like a bad idea.)
> Refactoring a program so that most things' lifetimes are the same does not feel to me like it's in the spirit of Rust, because then why have all these complicated systems, but maybe you feel that it is.
I think a common sentiment among Rust programmers would instead phrase this as, "the complicated system we designed to find bugs keeps yelling at us when we have lots of crazy lifetimes flying around, so presumably designs that avoid them might be better."
In this sense, even for someone who doesn't feel the borrow checker is worth it in their compiler, this can just be taken as a general justification for designs in any language that have simpler lifetime patterns. If they're easier to prove correct with static analysis, maybe they're easier to keep correct by hand.
Now, there's a good argument to be had about safe and unsafe, how much you need, in what proportion, and in what kinds of programs... but when you say things like "C allows you to do bulk memory operations, Rust does not" and ask about the borrow checker when talking about allocators, to someone who is familiar with Rust's details, it seems like you are misinformed somehow, which makes it really hard to engage constructively with what you're saying.