Hacker News new | past | comments | ask | show | jobs | submit login
Thoughts on Rust, a few thousand lines in (rcoh.me)
228 points by airstrike on Jan 9, 2019 | hide | past | favorite | 174 comments



I'm a network engineer, and I've done C++ for over a decade now. One of the nice things about Rust is that they decided to go with async over fibers, which is in line with how most high performance C++ is written. The Rust team also isn't rushing Future out the door, so it's coming along much nicer than the C++ Future, which is usually replaced because it's not monadic.

Rust is great, and I highly recommend learning Rust if you want to become a better C++ developer. C++ introduces a lot of unnecessary headaches Rust completely solves. Rust is worth the switch if only because reading error messages from expanded templates is horrible compared to type system that actually understands type parameters.

I'm really bullish on Rust becoming even more mainstream, to the point anyone can pick it up and use it in their business without a second thought by 2020.


I am a big fan of rust but I don’t think the Async using the ”await” keyword or using Future/promis is the way to go.

I wrote a large web app in scala using Futur only and it turned into a monster because of it. Even if you don’t use callback you still need to manualy unwrap them and they pollute your méthode signature.

Then more recently I wrote a pub/sub server in c# using the “await” keyword. While much better it still polute your method signature and make the debugger and stacktrace useless. Also you still have to think about trying to no make anything that could block spinlock or calling api that are not Async ex: Createfile

Compare this to Golang where project like Groupcache (1) have super clean code that replaced a thousand line of code C++ system at google.

Why do you think Golang fiber/gorotine are worse ?

(1) https://github.com/golang/groupcache


The answer to this question is quite involved. As you may already know, Rust at one time had a fiber model, but it was removed in favor of Futures.

One big advantage of Futures is they are decoupled from their execution environment.

For example, different types of executors can be used depending on the type of Future. For example, you may have an IOThreadPoolExecutor to handle accepting and responding to connections that should spend very little time blocking and a CPUThreadPoolExecutor for offloading heavy processing. In Go, you have no say in how goroutines are scheduled, you sacrifice control to reduce complexity.

Another big one in network programming is the polling mechanism. In Go, you have no control over your polling mechanism, whereas with a Future is decoupled from the event loop, and you can write your own event loop on any experimental kernel APIs you'd like.

Rust made the right move because in Rust, just as in C++, we want maximum control. Futures are more primitive, and it's possible to build coroutine models on top of them, but not vice versa.


I agree with you. I guess the best solution would be something in between “await” and goroutine.

My understanding is the main issue with “await” model is that you only abstract The Task model.

Running task don’t gave a real ID and stack like “Go” and “Erlang” you can list all running fiber how much memory and CPU they use ...

I believe you could write your own event loop easily in Go if they exposed the internal api used to pause and resume goroutine, you can already pause goroutine by having them wait on a mutex and resume them from your custom event loop by releasing that mutex.


In go toh can't control the polling mechanism if you are using the Builtin types.

You can drop to raw fd's and do whatever you like... of course this negates a lot of the niceties that go affords you.


> Rust at one time had a fiber model, but it was removed in favor of Futures.

No, fibers were removed with nothing to replace them. The futures as we know them today did not exist in the far, far past when fibers and the whole runtime thing got dropped.


Futures and fibers are not mutually exclusive, as shown in the latest (2018) Oracle Code One keynote speech, where they demoed project Loom (basically you can wrap a fiber in a CompletableFuture, in the same fashion you can wrap a thread/execution context and expose it in a CompletableFuture). That being said, the JVM team seem to have come to a similar conclusion that using fibers is the least invasive way in terms of syntax as compared to async.


Goroutines are just threads, with a particularly idiosyncratic implementation. You can use threads and blocking I/O in Rust too.

You wouldn't want a Go-like M:N solution in Rust. It would be slower for no reason.


> You wouldn't want a Go-like M:N solution in Rust. It would be slower for no reason.

Depends on how many threads you have and what OS you’re running on.


Goroutine are very different from thread in practice.

you don’t have to pay the memory usage and context switch cost you would have if using large amounts of threads.

It’s also faster to do IO processing from a single thread in batch instead of having one OS thread per request.


Context switches on I/O are the same for 1:1 and M:N because you're eating the cost of a kernel to user context switch either way.

Memory usage per thread is a property of the GC, not M:N threading. You can have very small stacks in a 1:1 implementation too.


there is less context switches because you cant write some data to several distinct file descriptor using a single system call.

You can also read data from several file descriptor in a single System Call.

This way you significantly reduce the number of System call instead of doing one blocking read() per connection. I believe context switch round trip (from user space to kernel to user space) is much more expensive than simply switching between goroutine of the same process.


> and make the debugger and stacktrace useless

Chrome manages to do async stack traces for their implementation of the similar JavaScript feature. I wonder if this would be possible for C# and Rust.


[flagged]


Could you site this?

I've looked at ilovecaching's comment history (at least up to ~60 days ago) and they seem consistent on being a networking programmer who uses C++. Was there a particular comment you found which would indicate this isn't the truth?


nb: you mean "cite".

why would you make me do this? you can doxx people on your own dime... still, for posterity:

https://news.ycombinator.com/item?id=18328964 "...now I am working full time as a Rust developer"

https://news.ycombinator.com/item?id=18310666 "As a long time Erlang developer..."

https://news.ycombinator.com/item?id=18265199 "As someone who has worked two jobs now writing, deploying, and operating Erlang clusters, I recommend switching to Rust."

https://news.ycombinator.com/item?id=18251478 "Becoming a professional Haskell and Erlang developer really shifted my view on OOP"

https://news.ycombinator.com/item?id=18185795 "I've been using Rust in production for a little over half a year, and my team and I have run into very few issues. [...] We're all extremely glad we chucked C++ and Go and switched to Rust. "

https://news.ycombinator.com/item?id=18171917 "I am exclusively using Go and Rust at home and at work, and I find myself equally productive in both."

you get the point.


It really isn't that ridiculous that someone who has worked as a network engineer for a decade would use Erlang, C++, and Go. Even Haskell isn't too much of a stretch, considering the user's company has functional programming expertise in Erlang. Rust is a neat language that, depending on their workload, could replace all of those languages. Why the presumption of bad faith?


Not seeing any major inconsistencies. Three months ago they exclusively used Go and Rust, at some point before that they used Erlang and Haskell and they used C++ for a total of over 10 years.


Talk about cherry-picking. It's pretty clear from those comments that they've worked with Erlang before, as well as Haskell and C++, and that they've transitioned to use Rust.

For example, in the same comment that begins with "As a long time Erlang developer", they later continue: "I’m more optimistic about Rust ... We’ve been able to scale Rust to handle the same load as our Erlang cluster". So there's no contradiction here.


A couple languages doesn't seem beyond the realm of possibility for one engineer to be familiar with, though?


You have a point, but he never said he programmed exclusively in C++, so I don’t really see the issue here...


I guess I'm glad that someone sits around looking through comment histories skeptically, but I don't see what about these comments seem suspicious to you.


I think what you're posting definitely sounds suspicious and a lot of rust users are doing it for the hype but.

1) rust genuinely is a great language and his opinion in the comment is something I have as a professional c++ programmer or the past

2) his comment history is only suspicious and only mildly, there's nothing outright contradictory. I would give him the benefit of doubt.


One thing I don't like about Rust is how taking a slice of a string can cause a runtime panic if the start or end of the slice ends up intersecting a multi-byte UTF-8 char.

I would prefer it if this feature didn't exist at all rather than cause runtime panics.

https://play.rust-lang.org/?gist=e02ce5e9aacfee3a2b4917d5624...


It's not a problem in practice, because you'd use something like `.char_indices()` iterator, or result from a substring search, etc. to get correct offsets in the first place.

It's not useful to blindly read at random offsets in UTF-8 strings. If it didn't panic, you'd get garbage. If offsets were automatically moved to skip over garbage, you wouldn't know what you're getting, and your overall algorithm would likely end up with nonsense (duplicated or skipped chars).

For algorithms that don't care about characters or UTF-8 validity, there's zero-cost `.as_bytes()`.


Couldn't syntax like `a_string[..3]` be made to result in compilation errors in Rust? Since that'd almost always be a bug? (right?)

And in the rare cases, when it's not a bug, then one can just use `as_bytes` which would be good to do in any case, to indicate to other humans that this is not a bug.

B.t.w. I love the error message `[..3]` generates: "thread 'main' panicked at 'byte index 3 is not a char boundary; it is inside '早' (bytes 2..5) of `ab早`'" — I've never seen such easy to understand error messages in any language (except for in a few cases in Scala).


We could have never implemented Index for String, sure. We have though, so removing it would be a breaking change.


Ok (Maybe a compile time warning? that doesn't break the build)


That could be done, if it was agreed that this is a mis-feature. I don't think there's agreement on that, though.


What does zero-cost mean in this context? It must cost something to run, no? Or is it basically a compiler hint instructing the next function to treat the data as pure bytes?


In this particular context, you can think of going from a `&str` to a `&[u8]` via `string.as_bytes()` as a safe cast. The in-memory representation remains the same, and the function call will almost certainly be inlined because its implementation is trivial.


It is a common pattern in Rust to use [] for things that cannot fail and will panic otherwise and a method for things that can fail and return Option or Result.

e.g. my_hashmap["foo"] will panic at runtime if the key "foo" is not present, or return the associated value if it is. But my_hashmap.get("foo") will return None if "foo" is not present and Some(value) if it is.


What's the point of the [] version then? It seems inherently more dangerous, and Rust emphasizes safety. I know it wants to be pragmatic as well as safe, but this seems like a strange default.


There's a few things that come into play here:

First of all, panics are perfectly safe. None of this has to do with safety guarantees.

Second, the [] syntax is controlled by the Index trait, which returns an &T, not an Option<&T>. It does this due to Rust's error handling philosophy. There's two kinds of errors: recoverable and unrecoverable errors. When something shouldn't fail, unless there's a bug, you shouldn't be using Option/Result, you should panic. When something may normally fail, and you want to be able to handle that explicitly, you should use Option/Result.

If [] always returned an Option, you'd be seeing tons and tons and tons of unwraps. It's not the right default here. However, that's why the .get method also exists: If you do think that this may fail, but not due to a bug, then you should use .get instead, which does give you an option.

TL;DR: everything is tradeoffs, and we picked a specific set of them, and that's how they all play out together.

Personal commentary: this is the kind of thing that's largely concerning until you actually use the language more, IMHO. Dealing with Options all the time here would feel really bad. Consider the other sub-thread about floats; it often feels like boilerplate for no good reason. That would introduce this for every single time you want to index something, which is a very common operation.


Does Rust support a monadic coding style (like Haskell "do" blocks or F# computation expressions)? That would allow you to work with Options without having to explicitly unwrap them.


Yes, there are a bunch of methods that let you do this, though with a bit more syntax than do notation; for example, and_then is pretty much bind.


Not generic monads, but it does have the `?` operator for Option (similar to Haskell Maybe) and Result (similar to Haskell Either) which would support a similar syntax to using `do` with the Maybe monad


Scala programmers would recognize this as the difference between () and .get(). I hope rust copied scalas syntax- its much cleaner, rather than trying to be nice to the established system languages (c/c++)

This would also free up the [] to be used for generics and avoid syntactical warts like ::<> parsin


We did have [] for generics, but we changed it back.

It doesn't remove those warts, it moves them.


Scala and C++ syntax are rather similar, no?


Python does something similar with [] vs .get()


Taken from C++ I guess, which does the same thing.


TIL! I'm still learning Rust so it's good to learn this now! Thanks!



This seems specious to me. The only way to get an invalid index in a string in any language is that you either have an array index arithmetic error or you are blindly operating on a string you haven't validated.

If you want all the data after a : character, you slice on the index of the :. The character after it is going to be the beginning of a UTF-8 character.

You do not under any circumstances guess that the colon is at position 6 in the string. That's not safe. Why are you going cowboy in a language that is so obsessed with safety?


I just realized that I have bug in my GPS driver. It operates on ASCII data, so [] operator is safe, BUT data can be corrupted (low chance, but non-zero), so it can form valid multibyte character, so my code will panic on it, trying to parse and validate NMEA message.


Panicking on parsing corrupted data seem like a feature to me...

It's like the default rule in a lexer, if it ever gets to it then it's an unrecognized character and lexing stops so error handling can proceed.

--edit--

Which I now realize was probably your point.


Truncating a string to fit in a fixed-size storage field is probably the most common reason to split at a particular byte position. If you’re throwing data away anyway, you probably don’t care too much about the little bit of corruption.

Granted, this is certainly incorrect but has little to do with safety, especially if the downstream code has to revalidate everything anyway.


String slicing using byte indices has to exist in some form, since it is the only thing that is efficient (O(1)). But, I guess it could have used syntax other than somestring[...].


It could slice on bytes and return a slice of bytes since the String type is a wrapper over Vec<u8>.


That means one loses all the conveniences and guarantees of the string types and, in many cases, forces an immediate revalidation the byte slice as UTF-8 to get back to &str, which is O(n). Furthermore, this is also rather clunky.

I suppose one could have it return StrWithInvalidSurrounds, where just the first (at most) 3 and last (at most) 3 bytes might be invalid, which would then allow for O(1) revalidation to a &str, and even other operations like continuing to slice... But this is even more clunky for actual use!

I think a moderately less clunky API might have been to not use integers for byte indexing, but instead some ByteIndex wrapper type that string operations return, meaning one can't just write `s[..5]` in an attempt to get the first 5 characters of the string.

(Also, there's str::get that returns an Option: https://doc.rust-lang.org/std/primitive.str.html#method.get )


If you want to just slice on bytes without any String semantics, why not use Vec<u8> then? String implies that it is, well, a string.


Does this bug exist because it would be too expensive to check every string before slicing? (Being Rust-ignorant), can you not type a binary as UTF-8? Are there 2 versions of string functions, fast ones that assume ASCII and slow ones that assume UTF-8?


Every string is checked. But UTF8 is a multi-byte encoding, and slicing works per bytes, so you if you slice in the middle of a multi-byte character, you may get nonsense. The error happens because of this checking, not in spite of it.

String always assumes full UTF-8. You could make an AsciiString type if you wanted, but it's not provided by the standard library.


The obvious follow up question would be: so why is slicing a string a byte-wise operation and not a character-wise operation? If a string is an array of characters, why does it let me refer to individual bytes without explicitly casting it to a byte array? How often comparatively do you want the nth byte compared to the nth character? I would suspect that's pretty rare.


> How often comparatively do you want the nth byte compared to the nth character? I would suspect that's pretty rare.

It's exactly the opposite of what you expect. Getting the nth codepoint is often (not always) semantically incorrect since a codepoint isn't necessarily one character. Multiple codepoints might combine to form one character. (In Unicode, these are called grapheme clusters.)

Byte offsets are used a ton because you might often have the index to a position in the string from some routine, like, say, a search[1].

I've been working on text related things in both Rust and Go for several years. Both languages got this part of their strings exactly right given that their representation in memory is always a sequence of bytes.

[1] - https://doc.rust-lang.org/std/primitive.str.html#method.find


I still think that using the common [] operator for this is a mistake. Strings shouldn't offer [] at all, and instead should provide methods like codepoints(), bytes(), grapheme_clusters() etc for indexing, slicing, and iterating.

The reason being that the behavior of [] for string varies widely in different languages, and so this is something that's best made explicit, both to force the author of the code to consider whether their assumptions are valid and reasonable for what they're trying to do, and to give additional context to anyone else reading the code.

As it is, I suspect a common class of bugs for Rust will be with people assuming that [] slices codepoints, because it seems to work that way for ASCII.


I'm quite thankful that Rust has succinct notation for slicing strings. Do note that `string[n]` is not supported, so you'll stumble over an inconsistency in your mental model quite quickly if you think slicing is by codepoint.


The lack of direct indexing is a good point. But strings aren't sliced on byte boundaries all that often either - it's far more common to use higher-level APIs like split(), that deal with offsets under the hood, so that sugar mostly ends up being used in the implementation of such APIs. And, really, would something like s.slice_u8(x, y) be that unwieldy over s[x..y]?


How often do you actually want the nth character as opposed to the nth grapheme?

There is pretty much no case where indexing by character actually makes sense because it is almost always incorrect and it is always inefficient.

Indexing by byte is rarely useful, but it does have some usefulness since it can be used correctly and efficiently since you can easily find the next or previous character by searching a maximum of four bytes for the a byte that has a MSB of 0. If you want to do something like get a &str that would fit in a n-byte buffer, then byte indices will let you do that efficiently and correctly.


As stated below, indexing is an O(1) operation, and that is a O(n) operation.

> If a string is an array of characters

It is not, it is an array (technically vector) of bytes.


Who cares if it's O(1) if it causes a panic? What good is high performance if it doesn't complete or isn't safe?

At the very least, shouldn't there be an O(n) method to do character-wise slicing?


panics are safe. You expect the “I don’t have a bug” case to be fast.

You can, but it depends on what you mean by “character”, as that’s not a concept in Unicode. Every kind of thing you could mean has a method, specific to it, since they’re different things.

(char in Rust is a Unicode scalar value, and you can collect into a Vec<char> and then slice it, as an example of one of those things. And that’s still O(1) at the cost of using up to four times the memory.)


Why not this?

  fn main() {
      let a = "ab早".as_bytes();
      let a = &a[..3];
      println!("Hello, world!");
  }


It's not clear to me what you're suggesting; is it that String shouldn't have supported indexing in the first place? That code does work, but you have a &[u8] not a &str.


But neither AsciiString. It has a as_str method, but it's still a kludge.

This example was basically a suggestion to throw0u1t: if they want to cut in the middle of the utf-sequence for whatever reason, they can [edit:] do it without extra crates.

What I don't understand is why slices are indexed in bytes and not in objects. If String has an ability to check that we're cutting in the middle of the character sequence, why doesn't it provide an ability to take 3 fully formed characters.


I think the rust designers want to keep the implicit contract that indexing into a string is fast and O(1).

If you want to find the one millionth codepoint of a UTF8-encoded string, you have to more or less (1) visit every byte of the string.

If, on the other hand, you want to find the codepoint that covers the millionth byte, on the other hand, you have to read at most four bytes (read the millionth byte, and there are three cases:

- it’s a full codepoint. If so, you‘re done.

- it is the first byte of a multi-byte codepoint. If so, read forwards in the string for up to 3 continuation characters.

- it is a continuation character. If so, search backwards in the string for the first byte, then, if necessary, read forwards to find more continuation characters.

So, that is O(1)

(1) you can skip continuation characters, but these typically are rare.


> What I don't understand is why slices are indexed in bytes and not in objects.

Slicing is an O(1) operation, and that would be an O(n) operation.


It does: `s.chars().take(3)`. It just does it with iterators rather than with indexes because that better communicates the performance characteristics.


I think he's suggesting that slicing on strings should be by character, and if you want to slice on bytes, you should explicitly ask to treat the string as a byte array. It makes more sense semantically, and it's safe.


Slicing on characters is a linear time operation and indexing is meant to be cheap.


That seems like taking it too far. It's like using pointer arithmetic to index a linked list on the assumption that the nodes happen to be allocated contiguously in memory. I mean, I guess the thinking is, indexing a Unicode string isn't cheap, but indexing strings used to be cheap once upon a time, when strings were encoded in fixed one-byte-per-character representations, so let's pretend that's still the case and panic if it doesn't work out.... That's weirdly antithetical to Rust's purported focus on safety.

Also, you can get the same performance from an operation that returns a byte array instead of a string. If that kind of performance is what you want, then a Unicode string is simply not the right type to use.


Indexing a Unicode string is cheap... if you have a byte index. If you want to count out some fixed number of codepoints, then of course you've just moved the cost to calculating the corresponding byte index. But counting codepoints is almost always the wrong thing to do anyway [1]. In practice, it's more common to obtain indices by inspecting the string itself, e.g. searching for a substring or regex match. In that case, it's faster for the search to just return a byte index; there's no benefit to having it return a codepoint index, and then having to do an O(n) lookup when you try to use the index. And byte indices obtained that way will always be valid character boundaries, so you can use [] without worrying about panics.

You suggest just using a byte array instead, but then you'd lose the guarantee that what you're working with is valid Unicode. Contrary to your assertion, it is useful to have a type that provides that guarantee, yet which can still be operated on efficiently.

[1] https://manishearth.github.io/blog/2017/01/14/stop-ascribing...


Safety is about memory safety. Immediately exiting your program is about as memory safe as it gets.


Panics are not unsafe. Panic exists in Rust because they are safe. If you don't want a panic on index, just don't index.

Indexing into a UTF-8 string doesn't serve any reasonable consistent purpose anyway, because it is an abstraction of text that doesn't provide support to the notion that a "character" is more fundamental than a word or paragraph, etc. Rust's string slicing exists solely to make ASCII text easy to handle. If your text is not ASCII, then you shouldn't be slicing it at all. Thus the panic.


Indexing into a UTF-8 string doesn't serve any reasonable consistent purpose anyway

If that's true, isn't it the job of a type system to help avoid such nonsensical operations? If "slice" only makes sense for byte arrays and ASCII strings, it could be provided on those types without being defined on UTF-8 strings.

Panics are not unsafe. Panic exists in Rust because they are safe.

That's "safe" by a very limited definition of safety. It's one step up from undefined behavior, granted, but it's not a very high standard. In practice, in most programs, you'd want to ensure that such a panic would never happen, and personally I think the language's unhelpfulness in that regard is a wart.


>If that's true, isn't it the job of a type system to help avoid such nonsensical operations?

It's not strictly true, because there are situations where you want to slice UTF-8. For instance, if you already know where the code point boundaries are for newlines. But if you know that, then you've run something like a regex with >O(1) behavior and you certainly wouldn't want string slicing to do redundant work.

>hat's "safe" by a very limited definition of safety

That's the definition of safe that is used. Safety in the context of Rust means memory safety. (Division can panic, btw.) If you don't see why undefined behavior is so much worse than a panic, then do some research on it. If you want programs that never fail, you need a comprehensive plan that takes into account things like hardware failure. A programming language can't do that.


I think that's too extreme. There are many legitimate reasons to slice non-ASCII text - for example, to split it on newlines.


That's not trivial and different languages vary in how they handle new line characters even. https://stackoverflow.com/questions/44995851/how-do-i-check-...


You can still split non-ASCII text on ASCII newlines, and quite often that's exactly what needs to be done.


And usually, you don't want it to cost O(n) on top of whatever parser you ran to find those newlines.


Go indexes bytes on strings, even though there's a builtin type called Rune which delimits utf-8 codepoints. This is yet another footgun. Is there a language that doesn't handle this poorly?

https://play.golang.org/p/CkBp0w8T621


In Rust, you're supposed to use `unicode-segmentation`[1] if you need to split on logical character (grapheme cluster in the Unicode standard). Otherwise, the iterators `.bytes` emits raw bytes, and `.chars` emits UTF-8 codepoints.

Basically, string indexing is a lot harder than it seems at first glance, depending on what you want.


One nitpick: `.chars`[1] gives you an iterable[2] of `char`s[3], each of which is always a 4 byte representation of a valid unicode character. This means that `"asdf".chars().collect()` will have a different size to `"asdf"` and `"asdf".chars().as_str()`. `.chars()` will never give you an incomplete codepoint, but it will give you incomplete characters, as you could have many c̶̼̟̏ó̷̘̉n̴̖̞̏̇t̸̡̃ĭ̸̻̬n̴̯͉̂͑ṵ̴̑a̷̛̫̳ẗ̸͕́i̷̱̫̓̋ǫ̸̑ǹ̶̼̅s̸̩̾̌ to represent what visually are a single char.

[1]: https://doc.rust-lang.org/std/string/struct.String.html#meth...

[2]: https://doc.rust-lang.org/std/str/struct.Chars.html

[3]: https://doc.rust-lang.org/std/primitive.char.html


> visually are a single char

IIRC that's what grapheme clusters are for.


UTF-8 is at odds with efficient array indexing. I like pythons approach where bytes and strings are distinct types, though I have no idea what it is doing under the hood.


Modern Python uses whatever representation is sufficient to ensure 1-unit-per-codepoint for a given string (which it can do on creation, since strings are immutable). So you get ASCII, UTF-16 sans surrogate pairs, or UTF-32.

This is great for high-level code, but painful to work with from native code, because it usually needs some specific encoding to call into other libraries, and it's usually UTF-8 - so you need to re-encode all the time.


I actually had to work with Python strings at the C level recently, and their approach is pretty clever. IIRC, the runtime can take any common form of Unicode, and will store it. When you access that string, the accessor requests a specific encoding, and the runtime will convert if need be, and then store it in the string object.

So it handles the (very) common case of needing the same encoding multiple times (e.g. for all file paths on Windows), while not introducing too much overhead in memory or speed.

I could be mistaken on exact details though, especially since I recall there being multiple implementations even within py3.x.


Any idea how it handles indexing? Does it convert everything to 32 bit chars and ignore graphemes?


Go allows slicing UTF8 strings just fine: https://play.golang.org/p/eUQ5L58KwZy


> A lot of great, well loved, crates don’t have a lot of Github stars.

This was a really important learning for me. When I'm looking for crates to solve a problem and there's only a handful of them, I almost always go through every single one of them, even if some have thousands and the others only tens of downloads.

It's an incredible feeling to find those unknown diamonds..


Make sure to thanks the maintenars of those repo. It really help!


> the escape hatch suggested on the internet, `partial_cmp(...).unwrap_or(Ordering::Less)`

This is often a Bad Idea, as you get unstable sorts and you are right back to the same problem.

- Explanation: How do I get the minimum or maximum value of an iterator containing floating point numbers? — https://stackoverflow.com/a/50308360/155423

- Example: https://play.integer32.com/?version=stable&mode=debug&editio...

See also:

- How to do a binary search on a Vec of floats? — https://stackoverflow.com/q/28247990/155423

Instead, use a wrapper type or raise a panic.


OP here -- this post is pretty dated. I currently use the OrderedFloat [1] crate to solve this problem which works quite well and plays nicely with NaN

https://crates.io/crates/ordered-float


To me this feels quite pedantic. Being able to do less than on NaNs and just having it do something vaguely sensible (as in your linked solution) is a far more common requirement than need to handle NaNs specially.

You could even say the reason NaN exists is so that you don't have to check for NaN constantly. Rust is being technically correct but practically really annoying, for basically no benefit.


For what it's worth, many in the Rust community agree with you. Many people consider this a mistake, in hindsight, though not everyone.


Isn't there a default total order for floats? E.g. -Inf < -1.0 < -0.0 < 0.0 < 1.0 < Inf < NaN?


Yes, there is: https://en.wikipedia.org/wiki/IEEE_754#Total-ordering_predic...

It is not something to use by default though. It is implemented in software and thus a lot slower than the hardware comparison.


I'm curious as to why it isn't implemented in hardware. Is it really so rare to need to sort floats, or so common to need a different ordering when you do?


Of course sorting floats happens a lot. In practice one rarely encounters NaN's and ±Inf's, so fast comparison for concrete values is the default. I don't know why the 'slow' total order is not implemented in hardware though.

But fortunately in comparison sort algorithms that run in O(n lg n) you can get away with doing an O(n) partitioning of the array into [-, +, NaN] and then applying a fast integer comparison operator to the negative values (-) and positive values (+).

In fact the above idea ties in neatly with QuickSort, which is already based on partitioning & sorting recursively.


> Of course sorting floats happens a lot.

Is this true?

I am actually struggling to remember the last time I did a sort with a float/double as the key--especially in a performance bounded context

... <thinking> ...

Aha. Graphic engine. Octtree with coordinates.

I really had to think about that.

So, I'm a bit skeptical of float sorting happening a "lot".

Is this perhaps an ML primitive somewhere?


Languages that only have floats, such as JavaScript and Lua, certainly sort floats quite often.


No, because x < NaN and x > NaN are false for any value of x.


Yes, the literal < in the language induces a partial order by convention. What I'm getting at in my comment is that you can define a sensible total ordering.


The challenge becomes what code to emit when you see those operators: native target comparisons, or the software implementation of your total ordering? The latter is safe and slow and the former is fast and IMO idiomatic.

So since no one needs this often enough to emit the soft-float comparison code, we should emit the fast code. If folks need different behavior they should use different types. This is similar to the behavior with integer overflow, which you can opt into by using checked types or checked operations. Though in rust we have a convenience that the overflow-detecting code is emitted for debug targets.


It is possible to do redefine NaN as something different than what IEEE 754 contains, but it will surprising to some users, and it will come at a performance cost because you can no longer let the hardware handle all float comparisons directly.


Rust is one of the few times where the language/ecosystem does precisely what I wished it would do. For instance, being able to partially destructure JSON into a struct in a typesafe manner is just awesome.


This sounds really useful, you don't happen to have an example of this do you?

Thanks in advance!


I’m basically talking about Serde. Read the section titled “Parsing JSON as strongly typed data structures”. What’s nice is that you don’t have to include all the fields in the JSON in the struct. Serde will only give you the ones defined by the struct.

https://github.com/serde-rs/json/blob/master/README.md


I'm not sure I'd consider that lib any safer than the average serialization library when things like this happen: https://github.com/serde-rs/json/issues/464


Panics are memory safe, so that’s still more safe than many parsing bugs.


The issue does not mention memory safety and neither did I. Honestly, knee-jerk reactions like "BUT MUH MEMORY SAFETY" doesn't inspire confidence specially when it couldn't help saving that dev from the troubles and bugs documented in the issue. To quote a few:

> it was a hassle to track down because Rust itself didn't complain and the panic message during serialization wouldn't tell me which file of the hundreds of thousands was causing it to die. For lack of a purpose-built tool, I had to manually bisect it until I narrowed it down.

> That said, definitely a footgun in the standard library to be remedied.

> My main concern here is getting rid of the footgun if at all possible. I really don't want to have to maintain a special "Never allow these types to creep into structs I'm deriving Serialize/Deserialize on, because the compiler certainly won't warn you" audit list.

If that's considered safe in Rust's standards then I rest my case.


Safe in rust usually has a very specific meaning: https://doc.rust-lang.org/nomicon/what-unsafe-does.html


Memory safety issues plague parsers, and often have dire consequences. rust claims to be memory safe. This bug does not invoke memory unsafety.

Yes, things can still be improved, but this is nowhere near as bad as many parsing bugs.



https://gitlab.com/zanny/oidc-reqwest/blob/master/src/token....

You can declare structs that derive from Serde and convert to / from Json in a typesafe way.


One thing Rust doesn't seem to be doing very well yet is guard clauses, specifically when handling Option<T>.

I've seen and appreciated the use of guard clauses in many languages, as a good way to quickly check for a few conditions at the top of a function, and return early if those conditions aren't met.

Since it seems that Option<T> are recommended in Rust, there's a lot of time you want to quickly return if `Some(x)` is not here (i.e. it's `None`), and if it's here, continue through the function, without having an unnecessary indentation from an extra brackets.

There seem to be a good amount of smart discussion into handling those [1][2]. some threads are more than a year old, but it seems to be making progress.

[1] https://github.com/rust-lang/rust/issues/45978 [2] https://internals.rust-lang.org/t/pre-rfc-allow-pattern-matc...


You can use ? on an Option if your type returns an Option. If it returns a Result, you can use ok_or()?, and at some point in the nearish future, you can just use ?.


I see, not very familiar with both those idioms `?` and `ok_or()?`

My current understanding is that those would return an Error only? I was more describing cases where you do want to return, but not necessarily return an `Error`.

For instance in a simplified example function that returns a boolean, you could decide to return `false`. is it possible there?

    // Function that returns a boolean value
    fn is_equal_to_ten(n: Option<u32>) -> bool {
        // some one liner  that checks for None, if it's not none, gives you `x` when `n` matches content of `Some(x)` (not real code): 
        if let Some(x) = n else { return false; /* what to do in case it's a None*/ } 
        
        // `x` is available here:
        return (x == 10);        
    }

Would this be considered bad practice in Rust?


The question mark operator works via a trait, Try. Both Option and Result implement Try. If you use ? on a None value, it will return None, just like using ? on an Err returns an Err.

ok_or is a method on Option that would let you manually convert it to a Result. You could then combine it with ?, turning a None into a specific Err.

It won’t help for stuff that returns bool, it’s true.


Thanks for the detailed answer :)


Often such functions can be rewritten to perform some operations "inside" the option. For example:

    fn is_equal_to_ten(n: Option<u32>) -> bool {
        n.map(|n|n == 10).unwrap_or_default()
    }


I'm using `.ok_or(SomeError)?` which converts Option into Result and opens Ok() for the rest of the scope, short-circuiting on Err(). I know about Swift's `guard` statement, but I never had a case yet when I didn't want to return Result in such kind of code, so ok_or was working for me well.


Related to the discussion in the second link, it sounds like Mr Pearce got to coin the phrase 'flow typing' to describe this situation but this is something people have been talking about for a long time.

Pseudocode:

    if (foo is a String) {
       foo.someStringMethod();
    }
Flip that around a little bit:

    if (foo is not a String) {
        return "error";
    }
    foo.someStringMethod();
And you've got a guard clause that's fundamentally the same kind you're asking for. I've wanted this structure in a language for a very long time. I was happy to see it pop up in Kotlin and would love to see it in Rust as well.


> Unlike nearly every language I’ve ever used, Rust actually encourages variable shadowing.

Haskell does it too, for the same reason/purpose / with the same effect.


Swift does, but typically for Optional promotion. You see stuff like `guard let delegate = delegate else { return }` inside functions all the time, shadowing a property with a local & promoting the type from Optional<T> to T.

It's not the same as the Rust example because you're shadowing an ivar to a local, but since `self.` is implicit you're still shadowing.


> for Optional promotion

You'll see the same in Rust

    fn example(name: Option<String>) -> Option<usize> {
        let name = name?;
        Some(name.len())
    }
Or

    fn example(name: Option<String>) {
        if let Some(name) = name {
            println!("{}", name.len());
        }
    }
A main difference is the requirement to use `Some`, which allows for the flexibility to apply to any enum.

> but since `self.` is implicit

To make sure I'm following, do you mean that Rust's `self.` is implicit in Swift?


That all makes sense. I've only done a couple days of Rust, so my recollection is spotty. The shadowing makes sense because if a local variable was moved out, then you're not really shadowing it anymore. Is my understanding of that correct?

> do you mean that Rust's `self.` is implicit in Swift?

Swift's `self.` is implicit in Swift – in most contexts, to access a property the `self.` is not required. `self.name = "John"` and `name = "John"` are equivalent (assuming self is an object with a name property).

There are places where explicit `self.` is required though – when you want to differentiate between a shadowed local and a property (obviously), or when you're inside a closure (to make it clear that the closure is capturing self, not just capturing a reference to the property).


> The shadowing makes sense because if a local variable was moved out, then you're not really shadowing it anymore. Is my understanding of that correct?

This is kind of a philosophical corner: can you shadow something that isn't there anymore? Once you've moved out of something, if you attempt to use the old name, then you'll get a compiler error different from "no such variable", so it's still there in some sense.

Pragmatically, I think you are on the money.


Yeah `self.` is implicit. In Swift:

  struct S {
    var string: String
    func doSomething() {
      // These two lines are equivalent.
      print(string)
      print(self.string)
    }
  }


I actually really try to avoid this pattern. For one it pollutes the function scope with a shadowed binding and it's pretty unnecessary boilerplate. Most always you can map across the optional and use the unwrapped version as the block scoped argument identifier, e.g. `$0`. When you can't or you want to handle both cases, I find a simple `if let delegate = delegate {} else {}` is more explicit and scopes the binding to the block instead of polluting the current scope. I'm not saying `guard` is bad, I use it all the time when there are good names to bind things to, but I dislike how much it proliferates and how often it doesn't actually provide value and just lets the program silently fail instead of loudly fail which half the time is worse than a nullpointerexception anyway...


F# supports it, too, although it doesn't seem very popular. Personally, I'm still on the fence and usually avoid it in practice.


It's idiomatic in all ML-family languages, and generally in languages that are functional first and foremost.


Single worst (bad?)"feature"? Shadowing should need an explicit declaration


    let foo = "...";
    let foo = parse(foo);
    let foo = escaped(foo);
    ...
    doSomethingWith(foo);
I don't see how this is helpful for avoiding the bug described. The most common bug with this type of code is mistaking which form "foo" represents at a given line of code, or that form changing as the code evolves. For example, if one programmer writes

    let foo = "...";
    let foo = parse(foo);
    ...
    doBarWithFoo(foo);
and another programmer comes along, doesn't notice the call to doBarWithFoo, and needs an escaped version of foo:

    let foo = "...";
    let foo = parse(foo);
    ...
    let foo = escaped(foo);
    doBazWithFoo(foo);
    ...
    doBarWithFoo(foo); // This still looks correct in isolation
This is a classic problem with mutable variables that frequently causes hard-to-spot bugs. Whereas if each form has a distinct meaningful name, this change wouldn't introduce a bug, and if somehow a bug were introduced, good names will make it possible to spot even considering the line in isolation:

    doBarWithFoo(fooEscaped);  // Hey!  The input to doBarWithFoo shouldn't be escaped!
If the only useful form of foo is the final one, then you can avoid having accessible names for the invalid intermediate forms like this:

    let foo = escaped(parsed("..."));
Or like this for more complicated logic (I'm not a Rust programmer (yet) so this may not be the right syntax):

    let foo = {
        // Complicated logic
        finalForm;
    };
I feel like if I ever write a lot of code in Rust I'll find a linter rule that warns about shadowing variables and use it religiously. But maybe I'm missing a use case where it's crucial.


I believe the way this is solved is by having the type of escaped(foo) different than that of parse(foo) and only accepting a EscapedString in doBaz and an ParsedString in doBar.

Your type structure at no extra runtime cost is String : ParsedString : EscapedString.

This ensures you don't escape strings before parsing them too. Nice!


This is common in Haskell as well using `newtype`. You can have type aliases like

  type A = B
but you can use the following function with a B:

  f :: A -> _
whereas

  newtype EscapedString = EscapedString String
  f' :: EscapedString -> _
would prevent from using f' with the wrong data. Newtype is a zero-cost abstraction.


I think the example only really works if `parse` and `escaped` return something other than a string. If that was the case, the type system will save you.


In your particular example, the ownership system would help here, since `foo` is consumed by `doBazWithFoo`. (Only exception is when `foo` is `Copy`.)


I've not done any rust dev before but the whole variable shadowing thing really scares me, in that it makes it not obvious where the initial declaration comes from. This seems difficult to reason about.


Two things, 1) the compiler doesn’t really allow you to screw it up in really scary ways, but you can I suppose. 2) it allows you to rebind a variable name to a new type, of a new state during execution without coming up with a hokey name for the new thing that’s really just the original modified in some way.


IMO, it should scare you less than mutating the variable, which is the equivalent pattern in most other languages.


I've done a fair bit of rust development and never found that to be an issue.

It probably helps that the only ways to introduce new variables are all very distinctive:

   let [new variables] = [expr];

   if/while let [new variables] = [expr] { [new variables scoped here] }

   for [new variables] in expr { [new variables scoped here] }

   fn name([new_variables]) { [new variables scoped here] }


You forgot `match` and closures. But in general I think new variables can be created in all contexts where a pattern can be used.


Yes, in fact, it's always a pattern. Even let x = 5; is a pattern, just a very simple one!


The good news is that if you want to disallow shadowing in your codebase, there are optional clippy lints that can be applied at compile-time.


  let foo = "...";
  let foo = parse(foo);
  let foo = escaped(foo);
Is it really shadowing or mutation of foo?

I would consider this shadowing (something you can do in OcaML):

  let foo = "..." in
    let foo = parse(foo) in
     let foo = escaped(foo) in
        dosomething(foo);;


It is shadowing (though it could be converted into mutation by a sufficiently smart compiler...). ReasonML does something similar with their take on Ocaml.

https://reasonml.github.io/docs/en/let-binding


The foos can all have different types which is not possible with mutation.


Look at it closely - it's exactly the same as your OCaml example, except it uses ";" instead of "in", and indentation is adjusted accordingly.

Although in practice I found that it's common to not indent adjacent nested let..in blocks in OCaml, either, so you'd often see something like:

  let foo = "..." in
  let foo = parse(foo) in
  let foo = escaped(foo) in
  ...
And the rules are really simple - "let" always introduces a new binding.


You can't mutate variables in rust without declaring mut


But how do you know just by reading that code? It would not be obvious to me at all.


I think it's quite fine to assume the reader knows enough rust to know the difference between mutable and immutable variables.


let foo = bar; let foo = bat;

is shadowing

let foo = bar; foo = bat;

is a compilation-time error because foo isn't mutable.

let mut foo = bar; foo = bat;

is reassignment.

in working code, either foo is declared as mutable or it's not, and it's pretty obvious from the code what's happening.


I guess. I would just never write code like that.


You mean if you don't know the language?


I guess I just need to learn Rust. I wouldn't write a style of code where I have 3 lets after an other and using the same variable name.


I guess if you don't know rust you might not? But one of the first things you learn about rust is the `mut` keyword.


`let` rebinds. If there isn't a `let`, it's mutation.


It's really shadowing. The type can change and it isn't declared mutable.


> Like Go, Rust can compile statically linked linux binaries.

The GNU C library (needed not only by C programs for C things) doesn't support static linking, so the only way this is possible is to use another library entirely, or raw inlined syscalls (where applicable).


Yes, we have full support for MUSL. It's a

  $ rustup target add x86_64-unknown-linux-musl
  $ cargo build --target x86_64-unknown-linux-musl
away.


Yeah, Rust's musl support is great. We use it heavily at work for all sorts of CLI tools. Many thanks to everybody who worked on this.

(And if you need to link against common C libraries like OpenSSL or PostgreSQL, I maintain a Docker image with the necessary C toolchains, and instructions on how to use it: https://github.com/emk/rust-musl-builder. There are a couple of similar images out there, too, I think.)


You can also download the cross-compilation tarball for musl at https://static.rust-lang.org/dist/rust-std-1.31.0-x86_64-unk... if you installed rust that way. You'd then build normally:

    $ cargo build --target x86_64-unknown-linux-musl


By default rust links dynamically to glibc, while rust stuff is statically linked. Which I think is a reasonably good default. Most Linux systems provide glibc, and you can deploy binaries to them without those systems needing a rust installation.

Of course, there's the usual issue with glibc symbol versioning, so you probably want to build your binaries on the oldest supported system (say, a centos 6 container).


Rust's main toolchain has first class support for musl: https://rust-lang-nursery.github.io/edition-guide/rust-2018/...


Great job. Just a nit/request for that gif illustrating agrind's use -- I keep watching it to see what args you've passed to see how it's being used but the output finishes and loops before I understand what I'm seeing. Not sure if it's a property of the image or my browser but if you can turn off the looping in the image that might work better. Or add static frames at the end for the slow folks like me ;)


This makes me want to add `p90`, `p50` and so on to jq...


You could always just pipe JQ output to angle grinder...I should make a output-json mode so you can pipe it back to JQ :-P


for the record, pXX are super useful in splunk.


I think the concept of ownership of rust should be divided into "ownership " and "right of use " two parts.

This is more in line with the "bank lending model " it advocates.


Wow the Rust hype is really an echo chamber right now. Seems very academic and nobody really seems to be using it in production..


Quite a few people are using it in production: dropbox, npm, and I believe amazon and microsoft.

I've used it in production, and subjectively it was much easier to work with than C++.


Amazon and Microsoft both are, yes.


Except for Dropbox, Amazon, Microsoft, Google, Mozilla...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: