Hacker News new | past | comments | ask | show | jobs | submit login
SeaHash: A fast, portable hash function in Rust (docs.rs)
186 points by pcwalton on Nov 28, 2016 | hide | past | favorite | 114 comments



Looks like a pretty straightforward 64-bit block hash unrolled 4 times. I'd prefer a bit more assymetry in the diffuse() method, but since it passes SMHasher it's probably OK.

I wonder how the Rust version compares with plain-jane C.

-Austin (murmurhash guy).


Actually, just noticed a minor issue - since there's no intermixing between the four lanes and the diffuse() function is the same for all of them, if any of the IVs match then I can swap all the blocks in those lanes and get the same hash out.

For example, if IV1 and IV2 match and the block pattern is ABCDABCDABCD, then BACDBACDBACD will produce the same hash value.

A minor finalizer change would fix it for any IV (pseudocode as I don't actually know Rust) -

vec[0] ^= diffuse(vec[1]); vec[1] ^= diffuse(vec[0]); vec[2] ^= diffuse(vec[3]); vec[3] ^= diffuse(vec[2]);

u64 result = diffuse(vec[1] ^ diffuse(vec[3]));

that's probably overkill but should work.


That won't help much; you can zero the entire final state easily, e.g., with the message IV0 IV1 IV2 IV3 (or by xoring the latest diffuse() output back into the state), in which case you get diffuse(0) = 0 and with your finalization function you still get easy collisions.

The operating mode of this hash is broken by default. The author calls it Merkle-Damgard, but that is not what it is. Merkle-Damgard uses a compression function, whereas what we have here is more like a sponge with full state absorption (you can pretend F(IV ^ m) is a compression function, but it is trivially susceptible to collisions F(IV ^ m) = F(m ^ IV)). Meaning, without a secret IV and some sort of truncation there's no way this mode can be secure, even with an ideal diffuse() function.


Not sure what you mean by "secure". The linked page flat-out says:

> Warning! This is not a cryptographic function, and it certainly should not be used as one.

So the focus here is speed, not resistance to malicious actors attempting to generate collisions.


I'm not a "hash guy" by any means - what impact would that have on its performance?


None for large strings and you'd have to benchmark small strings.


Negligible, a few tens of cycles of extra overhead.


I love working with murmurhash, but where the hell do I even start learning about hashing and the current theories around it?


Take a look at the tests in SMHasher, they indirectly spell out a lot of what makes a hash "good" (though not all the tests are particularly readable).


Since you clearly pit a lot of thought into these kinds of hash functions, I wonder what you thoughts are about the kind of hash functions used in theory? That is theoretically proven "k-independent" functions, such as polynomial hashing, multiply shift or tabulation hashing?


Making a fast, good hash function isn't too hard now, so some sort of "provable key-independent collision resistance" is definitely the next thing that needs to be worked on.

That said, I haven't really looked into the theory much.


Have you looked at Siphash? I remember reading recently that murmur has been found to have some predictable hashes independent of salts but a lot of big names still use it since it's fast and has good properties.


When SipHash was presented at CCC, it was alongside the proof of concept attacks against MurmurHash and CityHash. See the "Attacks" section of [0].

Murmur was notable at the time for its use in Java and Ruby. Ruby has since moved to SipHash-2-4, while Java (OpenJDK) has thrown up its hands in disgust at the problem and created a binary tree fallback mode for its HashMaps[1]. Which at this point I'm pretty confident is the only rigorously correct solution.

[0]: https://131002.net/siphash/

[1]: http://openjdk.java.net/jeps/180


It's a bit disingenuous to say java has given up and created a binary tree fallback mode. It doesn't actually switch the whole hash table into a binary tree, but rather it switches the linked list inside each bucket into a binary tree, and only when a certain threshold is passed.


If you trigger this fallback (e.g. someone is colliding all your hashes), your entire collection is probably stuffed into one bucket.

Degrading only individual buckets is of course a nice middle ground that avoids harsh discontinuities in performance for slightly bad inputs.


See also http://perl11.org/blog/seed.html "The dangerous SipHash myth"

It's technically impossible to declare a hash function used in hash table secure, a countermeasure against DoS attacks. djb made a bad mistake here. You always get seed exposure somehow. It is independent on the hash function. You can always brute-force it.

So java is right. The only countermeasure against collision attacks are fixes in the collision resolution. Adding stronger hash functions only makes the table slower, but not secure. And lot's of prominent hash tables are insecure, since they drank djb's cool aid.


This is is not a knock against SeaHash, but I was looking at buffer.rs [0] and noticed pretty much all the code is wrapped in unsafe {} blocks. How much advantage is there to rust implementation vs c++ if unsafe is used so liberally? I ask this in ernest.

[0] https://docs.rs/crate/seahash/2.0.0/source/src/buffer.rs


Because the unsafe code is isolated into an easily-auditable portion that basically just exists to perform word-aligned reads of a byte buffer.

It might be nice to factor this out into a separate library, but it's fairly harmless.


Addendum: There is a separate library for this, of course: byteorder. Can't believe I forgot about it :)


> How much advantage is there to rust implementation vs c++ if unsafe is used so liberally?

I think this code could potentially be refactored with smaller unsafe blocks, if that were a goal.

The benefit in general is present for many reasons, among which is that you still have to opt-in to unsafe{} and SeaHash consumers wouldn't need to in order to leverage these features.

The benefit for SeaHash specifically is that Rust isn't merely a safer language, it's also one with arguably newer/better language features than C++. And it's one that has support for several targets today.


> How much advantage is there to rust implementation vs c++ if unsafe is used so liberally?

I'm no expert on the topic. IIRC, Rust does not have the same kind/amount of unspecified behavior as C/C++ do.

FYI: This is a "highly optimized version of SeaHash" (see comment in first line) with "optimized" meaning fiddling with raw pointers. You can find a version without any unsafe code is in `reference.rs`. I have no idea how fast the reference implementation is.


As others have mentioned, Rust allows you to write a safe wrapper around unsafe code. In this case, all of the functions implemented in buffer.rs are safe, even if their contents are not, so they can be used without having to worry about unsafety.

Another advantage over C++ is that it's much more clear what code is safe and what isn't, as unsafe code is wrapped in `unsafe` blocks, as you mentioned.


That code looks like it could (and should, IMO) be refactored to put the unsafe code in more contained locations, e.g. the &[u8] could be manipulated into a (&[u64], &[u8]) pair (with the main sequence of values, and the trailing ones). Rust unfortunately can't stop people writing code that makes life hard for themselves.


It seems like a small kernel of unsafe code, while the rest of the library is likely written safe. Perhaps more of this could be written safe?


This code is C code written as unsafe Rust:

    let mut ptr = buf.as_ptr();        
    let end_ptr = buf.as_ptr().offset(buf.len() as isize & !0x1F) as usize;
    while end_ptr > ptr as usize {
        a = a ^ read_u64(ptr);
        ptr = ptr.offset(8);
        b = b ^ read_u64(ptr);
        ptr = ptr.offset(8);
        c = c ^ read_u64(ptr);
        ptr = ptr.offset(8);
        d = d ^ read_u64(ptr);
        ptr = ptr.offset(8);

        ....
        match excessive {
            0 => {},
            1...7 => {                
                a = a ^ read_int(slice::from_raw_parts(ptr as *const u8, excessive));
                a = diffuse(a);
            },
            8 => {
                a = a ^ read_u64(ptr);
                a = diffuse(a);
            },
            9...15 => {               
                a = a ^ read_u64(ptr);
                ptr = ptr.offset(8);
                excessive = excessive - 8;
        ....
This bothers me about Rust. There's too much "unsafe" code in libraries. The language is unable to express some essential concepts. Known areas of trouble include partial initialization of an array, needed to implement growable collections, and single ownership doubly linked lists. Neither of those is expressible within Rust, which leads to unsafe code to implement them. Here, though, it's purely a performance issue. That's disturbing. If you can't do fast big-banging in safe Rust, there's a problem somewhere.

If Rust let you access a slice of bytes as an slice of ints, alignment and length permitting, the code above could be much more straightforward. That's what I mean about expressive power. The hack to do that used here:

    let end_ptr = buf.as_ptr().offset(buf.len() as isize & !0x1F) as usize;
is iffy. Why is there an "isize" (a signed quantity) in there? They want to align with a 32-bit cache line, yes, but why the signed quantity? The documentation for Rust's "std::ops::BitAnd" doesn't say what the semantics are for signed numbers. What would happen on a 32-bit machine if someone allocated a buffer bigger than 2GB? Exploitable?


Why would hard-coding the ability to access a slice of bytes as ints into the compiler be safer than a well-encapsulated unsafe code abstraction?

We used to implement things like vectors directly in the compiler, but it was a big headache for no gain. Writing actual code is way easier than writing code to generate LLVM IR.

Anyway, there is a commonly-used crate for this: byteorder. Had I written the library, I would have just used that crate. But it's not a big deal either way.

> Why is there an "isize" (a signed quantity) in there?

Because pointer arithmetic is signed.

> The documentation for Rust's "std::ops::BitAnd" doesn't say what the semantics are for signed numbers

Bitwise operations on signed integers are applied to their twos complement representations.

> What would happen on a 32-bit machine if someone allocated a buffer bigger than 2GB?

It would work.

> Exploitable?

No.


I actually talked to Lattner about the isize thing a few months back -- according to him it's fine to overflow while doing GEP because llvm shouldn't care if you pass in negative offsets to represent really big positive ones.


My understanding is that overflow is pretty much fine and dandy in LLVM unless your frontend opts into the nsw flag.


Yeah, it just wasn't clear to me (or anyone else I asked) that GEP wasn't "allowed" to strictly interpret signedness and inboundness. LLVM docs, amirite?


   This bothers me about Rust.
   There's too much "unsafe" code in libraries.
I like how with Rust you use one or two unsafe blocks and everyone loses their mind. But in C/C++ you spatter your code with undefined behavior and nobody bats an eye. I get Rust is _safe_ so violating this contract is in a way self defeating. But even with a handful of unsafe blocks you are miles ahead of other guarantees C/C++ give you. Lastly unlike C/C++ Rust makes you call out I'm doing dangerous stuff here!

    The language is unable to express some essential concepts.
Not really. The same code the parent poster highlighted [1] is undefined behavior in C/C++ (with standard types). So really no language has the ability to express those concepts. Your doing pointer casts and possibly unaligned dereferences at the same time. This has zero consistency between CPU vendors.

[1] https://docs.rs/crate/seahash/2.0.0/source/src/buffer.rs

    Known areas of trouble include partial initialization of an array, needed to implement growable collections
You mean Vector? C Doesn't have grow-able array's. They have re-alloc but Rust's Vector does that.

https://doc.rust-lang.org/std/vec/struct.Vec.html

    and single ownership doubly linked lists.
They already did, and it is in the standard library.

https://doc.rust-lang.org/std/collections/struct.LinkedList....

    If Rust let you access a slice of bytes as an slice of ints, alignment and length permitting,
    the code above could be much more straightforward. 
I mean C/C++ do, but without stdint.h you do this at your own peril. Even then you'll likely use Unions which are undefined behavior.

    What would happen on a 32-bit machine if someone allocated a buffer bigger than 2GB? Exploitable?
You can state this about C/C++ also.


  > The same code the parent poster highlighted [1] is
  > undefined behavior in C/C++ (with standard types). So
  > really no language has the ability to express those
  > concepts. Your doing pointer casts and possibly unaligned
  > dereferences at the same time. This has zero consistency
  > between CPU vendors.
It's undefined behavior in Rust, too. Rust code that type-puns an unaligned pointer into an integer would crash on MIPS or SPARC just like the C code would. And that crash could potentially be triggered by malicious input. "unsafe" doesn't make behavior well-defined in Rust anymore than an explicit cast makes it well-defined in C.

Moreover, type-punning is generally a warning of bad code in C. Both C and Rust will generally[1] diagnose a type-pun. And you can silence the diagnostic in both languages by using special syntax--unsafe in Rust, a cast in C. But in both cases the way that you silence the warning is over-broad; you often need to silence it for good reason, X, while accidentally silencing the diagnostic for usage Y.

The correct way to read in a little-endian integer in C is the same way you'd do it anywhere else:

  unsigned char *p;
  size_t plen;
  uint64_t n;
  // initialize p and plen
  _Static_assert(CHAR_BIT == 8, "CHAR_BIT != 8"); // [2]
  n = 0;
  for (size_t i = 0; i < MIN(plen, sizeof n); i++) {
    n |= p[i] << (8 * i);
  }
It's correct regardless of endianness and regardless of whether the address is aligned. And the above loop can be unrolled, too, so that it pipelines well. If performance is so important that you can't be bothered to care about alignment constraints, you may as well drop down into assembly and use SIMD instructions directly. Type-punning with a C-style cast or Rust-style unsafe block is just a bad idea, IMO.

I've never seen a situation in C where type-punning of this sort was a good idea. The performance aspect is negligible. My parsers generally runs rings around code that uses type-punning. The gains are nothing compared to what you can get by better restructuring of higher-level code. For example, with hashing functions you're generally hashing small strings; thus the alignment checks you would need to add would typically cost more than the benefit because they couldn't be amortized well.

If you really had to, then C11 provides _Alignof that can be used to type-pun in a safe manner just like you could in Rust. (If a builtin type has padding issues, so would the same Rust code. It just so happens that Rust affirmatively has selected at the outset to never support such architectures. Thus, running the same code on the same architecture would work correctly in both languages.) It's not even type-punning if in addition to correct alignment you can prove that all bits are value bits. That would be the case for both the fixed-sized integer types, as well as for unsigned types where you can prove there are no padding bits (which can be accomplished using well-defined code as well).

So for general usage, if you really want to you could implement two versions of the code--one that type-punned correctly for long strings, and a simpler, more concise, more obviously correct one for typical strings. So it can be done correctly while still reaping the same performance; it's just more hassle than writing incorrect code. But even the incorrect code is more obtuse than the trivially correct version, which is why I've never had a good reason to type-pun.

[1] The exception in C is implicit conversions through a void pointer. But in C++, which lacks implicit void pointer conversions, engineers will often instinctively add an explicit cast where you would normally use a void pointer in C, substantially blunting the benefit of removing the implicit conversion from the language. And that habit to cast can easily lead to more bugs, just like in this case, where unsafe was used to permit the use of a broken idiom that is poor code even in C.

Good C rarely uses casts. Avoiding casts is a good habit to get into in C. And I've personally never seen good reason to type-pun anything, period, in C.

[2] The code could be trivially made correct on platforms where CHAR_BIT > 8 if the convention was that input strings only filled the bottom 8 bits of char. It would just be a distraction here, though.


Apropos of the mention of SHA-3 elsewhere in this thread:

I recently implemented my own version of SHA-3 in C. The original reference code from the authors (Google "Keccak-readable-and-compact.c") used type-punning in the inner loops (inside the the round function) on little-endian. On non-little-endian systems a macro was used to read and convert the 64-bit value.

This is typical premature optimization that is habitual among some developers, and another example of needless use of type-punning.

My code uses the simple code above to read bytes into the uint64_t buffer in the outer loop (outside the round function). Not only is my code simpler and easier to understand, it's no slower than the type-punning code even on x86-64.

Seriously, people, just don't type-pun. It's bad practice--in C, in Rust, in any language when using a remotely modern compiler. The only time it _might_ make sense is in very peculiar circumstances with peculiar access patterns, you've exhausted other easy gains, _and_ you've benchmarked and confirmed that type-punning is an improvement worth the cost in code complexity.

And even then, at least implement it correctly and safely. If you've already met the prerequisites above, the additional effort is negligible in the grander scheme of things. And committing to always writing correct and safe code keeps you honest when assessing whether performance hacks are truly necessary.


> Rust code that type-puns an unaligned pointer into an integer would crash on MIPS or SPARC just like the C code would.

Actually you can enforce that on Intel too. I do that for some hash functions in debugging mode, to avoid valgrind slowdown.

e.g. https://github.com/perl11/cperl/blob/master/cpan/Digest-MD5/...

    #if defined(U32_ALIGNMENT_REQUIRED) && defined(__GNUC__) && (defined(__x86_64__) || defined(__i386))
        /* Generate SIGBUS on unaligned access even on x86:
           Set AC in EFLAGS. See http://orchistro.tistory.com/206
           Also see https://sourceforge.net/p/predef/wiki/Architectures/
           for possible other compilers. Here only GNU C: gcc, clang, icc.
           MSVC would be nice also. */
    #ifdef __x86_64__
        __asm__("pushf\n"
                "orl $0x40000, (%rsp)\n"
                "popf");
    #else
        __asm__("pushf\n"
                "orl $0x40000, (%esp)\n"
                "popf");
    #endif
    #endif
This way you won't get the SPARC/MIPS surprises debian maintainers are struggling with. If it doesn't align, copy it temp. It's still faster.


Type punning is a mess in C because you have to jump through hoops to make it legal. The language subsequently failed to provide ergonomic solutions to the very real problems type punning solves in a systems language.

Type punning is perfectly allowed in Rust, I'm not aware of any lints against it. Although you need to use an annotation to specify the struct layout algorithm to do it "correctly" for custom types.

We use it in BTreeMap to implement a kind of inheritance between nodes. Internal nodes have the same layout as Leaf nodes, except internal nodes have an extra field for their array of edges (which you don't want to allocate for leaf nodes). So everything stores pointers to leaf nodes, and mostly manipulates all nodes as leaf nodes, but sometimes you "down cast" them to internal nodes to manipulate the edges.

In this particular case we use the standard C++ pattern of making a LeafNode the first field of an InternalNode.

https://doc.rust-lang.org/nightly/src/collections/up/src/lib...

The other common usage of punning in Rust is to gain access to some raw representation of a type. For instance, last time I worked on Rust, this was how fat pointers (&[T], &Trait) were constructed and decomposed at the lowest level.

I can't speak to whether the usage of punning in this code is particularly good though.


Type-punning is the not the same thing as deriving an object pointer through casting. Or relying on the rule concerning the equivalence of structures containing the same initial sequence of sub-members.

Specifically, the following common macro in C is _not_ type-punning, at least not the kind I had in mind.

  #define container_of(ptr, type, member) \
     (type *)((char *)(ptr) - offsetof(type, member))
Neither is the idiomatic BSD <sys/queue.h> library. It's all valid, well-defined code, as long as they're not being abused to hide undefined shenanigans.

In C, as long as the last access to an object had the same type as the type you're accessing that object from (and provided it's the same object), that's perfectly well-defined. People think that this is an aliasing violation in C, but it's not. Aliasing issues only come into play when there are side-effects (including order of evaluation) that you implicitly depend on but that the compiler cannot see. That issue is too complex to bother discussing in fine detail here, but suffice it to say that Rust either has similar undefinedness issues, or it assumes any pointer however derived can alias even inside an unsafe block and therefore cannot perform the same optimizations that a C compiler can. I doubt the latter is the case given that rustc relies on the LLVM backend.

Note that the general rule in C is that all pointers of the same type can alias, so if you derive two object pointers to the same type through explicit conversion (casting) or implicit conversion, and as long as they're actually referring to the same object, then as long as the last access is through the same type as the last store all is well-defined.[1] This is not type-punning. If this wasn't allowed by the language there wouldn't even be any use for casting at all. The cast is a way to stop the optimizer in its tracks and ensure that it doesn't fubar otherwise correct code.

The aliasing issue typically comes into play when you access at least one object through a structure or union, and there's no union definition in scope that hints that the layout is such that the sub-members of the union or structure might alias. Though the dereferenced expressions might have the same type, the dereference isn't occurring through pointers with the proper compatible type. This is one of the few cases where the compiler isn't required to assume that accesses might alias. And this is why the C standard requires those types of evaluations to occur through pointers to a union.

Type-punning, at least the kind I had in mind, is violating the core rule that access can only happen through an object with the same type as the last store. This is type-punning:

  unsigned long l = 1;
  unsigned *i;
  i = (unsigned *)&l;
  printf("i:%d\n", *i);
The access through i has a different type than the store to l.

An example that isn't type-punning per se, but raises the aliasing issue,

  struct foo {
    int i;
  }

  int add(struct foo *fp, int *ip) {
    int i;
    fp->i = 0;
    i = *ip;
    return fp->i + i;
  }

  int main(void) {
    struct foo f;
    return add(&f, &f.i);
  }
In add(), the C compiler isn't required to assume that &fp->i might alias ip and so might reorder the statements. If the hidden dependency on ordering didn't exist, the code could otherwise be okay (that's why I don't call it type-punning).

(Note: I'm having trouble using the asterisk without bolding everything. Please keep that in mind.)

The above can be made correct simply by casting:

  int add(struct foo *fp, int *ip) {
    int i;
    *(int *)&fp->i = 0;
    i = *ip;
    return *(int *)&fp->i + i;
  }
because now the initial store occurs through type pointer-to-integer, same as the type of ip. IOW, the compiler must assume that the store and loads might alias and cannot reorder things.

As you can see, this is a much more contrived scenario. It's not as common as you'd think. It's most common when type-punning--storing through one type and accessing through another. Don't type-pun and you won't run into this issue very often, if ever. It's not even that common when using the typical C OOP-like inheritance tricks. It can happen in a silent and deadly way, but that requires some serious hackery. Don't pretend that C is Java and you're unlikely to write such code. One of the common places this occurs in practice is type-punning struct sockaddr, struct sockaddr_storage, etc. That's a very unique situation for many reasons. But as optimizations in compilers improve it is admittedly an increasing problem; it's a loaded gun, for sure.

FWIW, C11 defines type-punning in a footnote 95 of section 6.5.2.3p3.

  If the member used to read the contents of a union object is
  not the same as the member last used to store a value in the
  object, the appropriate part of the object representation of
  the value is reinterpreted as an object representation in
  the new type as described in 6.2.6 (a process sometimes
  called ‘‘type punning’’). This might be a trap
  representation.
This definition is even narrower than mine, but it still comports with the core rule about loads occurring through the same type as stores.

[1] Storing a value through char is also okay as long as you ensure the representation is valid. Which is trivial when dealing with the standard fixed-width unsigned types. And access is always valid through char. That's why sometimes you'll see a seemingly superfluous cast through (char *). It's not necessarily type-punning; it might be used because a pointer-to-char can alias _anything_, and that can be useful as a barrier to prevent an optimizer from deciding two expressions might not alias.


> There's too much "unsafe" code in libraries.

Which libraries? I see very few that do this, and all of them are safe abstractions containing some unsafe code.

You keep repeating this claim but I haven't seen any evidence to back it up.

> If Rust let you access a slice of bytes as an slice of ints, alignment and length permitting, the code above could be much more straightforward. That's what I mean about expressive power. The hack to do that used here:

The byteorder crate lets you do this. Of course, it uses unsafe code, but that's safely encapsulated away (and easy to verify). This crate doesn't use it, but it could. Not every operation needs to be baked into the language semantics.


While I mostly agree with you, I'd like to play devil's advocate:

The Rust core team is relatively relaxed about the community's usage of `unsafe`. I say this because they do not seem to be interested in actively discouraging it's usage. i.e. `unsafe` is discouraged in documentation, not via tools.

"Hey please don't use `unsafe` unless you know what you're doing". Is like writing a comment in Javascript, `function(x /* int */) {`.

Ideally, the usage of `unsafe` should be discouraged by the compiler via friction. The compiler should, at the least, spit out some metrics after a compile cycle about the percentage of `unsafe` lines/instructions. Even more ideal, when the compiler detects an `unsafe` it would pause and ask, "Is Crate Z trusted [y/N]?". Cargo can then make this easy in Cargo.toml.

All of a sudden Crate writers need to think twice about their usage of `unsafe`, will users be willing to trust my code? is using `unsafe` here really worth the risk of adopting fewer users? is there already a library that solves this which is generally trusted?


Nobody is putting effort into propaganda about unsafe because the community is already very strongly averse to this, and is careful about writing unsafe code. It's not a problem. If it becomes a problem (I doubt it) we can put effort into it. People learn about the language through discussion or documentation, and both of these venues actively discourage unsafe. The one resource out there that teaches unsafe code in depth (the Rustonomicon) is very heavy on warning the reader about unsafe code pitfalls and in general discouraging the reader from writing unsafe code.

A tool for tracking unsafe dependencies has been talked about before, though. Sounds like a good idea to me. Like I said I don't think there's a particular need for it, but it would be nice to have.


Have you ever been bitten by unsafe code?

IMHO any metrics would be gamed and prioritized above actually quality. As actual quality isn't suffering, why add the unsafe metric in at all? Seems like paranoia not born out of experience.


> Have you ever been bitten by unsafe code?

Yeah. Most often in FFI code (when the invariants are much harder to uphold). Rarely when writing unsafe abstractions. The few times I remember this happening with abstractions is due to really old code that broke in a compiler upgrade (pre-1.0).

> IMHO any metrics would be gamed and prioritized above actually quality.

Yeah, there have been discussions in the past about a "safe code" badge for crates and stuff like this, and the conclusion is that it might discourage people from using unsafe code where they actually should be.


I have been bitten by the lack of a type checker.

I have been bitten by inexplicit casts.

I have been bitten by segfaults.

I have been bitten by poor/no tests.

I have been bitten by mutable containers.

In every case, more constraints and more explicitness have liberated portions of my cognitive load. I've ended up with fewer bugs, more flexibility to play, reduced ramp-up time for new devs, increased speed of iteration. Why would this trend, to add constraints and explicitness, not continue to be beneficial?

P.S. You haven't been bitten by unsafe code, yet, because the people that write Rust, are still builders of the language. It is still in the early adopter stage. Think about Java, which is now predominantly written by users of the language, imagine that for Rust. The teams currently working with Rust, chose to do so as an experiment. They don't have to get a feature out tomorrow, but that will come. Similarly, there isn't "legacy Rust", yet (except in the compiler - which is fine because it is literally maintained by the experts). Is it really worth waiting for crappy code to get written, then re-written, then forgotten and finally broken, and to relive our mistakes, before we fix a predictable problem?


> Ideally, the usage of `unsafe` should be discouraged by the compiler via friction.

This is enforced via tools: you have to say 'unsafe' to use something unsafe. That's the speed bump.

In addition, there's a lint that you can on to fail the build if you use `unsafe`. This won't apply to your dependencies, but you can make it apply to your code.


Explain to me why, ticki, who is a relatively experienced Rust dev choose to retype `unsafe` over re-using a crate (byteorder) which is supposedly equivalent?

1. Crate discoverability could be improved.

2. He wanted full control of the abstraction.

3. This particular library is just an experiment.

4. ...

Regardless of the reason, the reality is the same, `unsafe` could have been ignored but wasn't. Instead, the path of least resistance was to re-write `unsafe` blocks. IMO, it is not truly "friction"/discouraged if it is also the path of least resistance.


> is a relatively experienced Rust dev choose to retype

Probably because they were an experienced dev and are experienced enough with unsafe code to write good unsafe code without worrying about it. It was a minor unsafe operation of which the validity is easily proven. Doesn't really prove that there isn't an aversion to unsafe code in the community.

I would generally put the folks who work on Redox in the bucket of "experienced people who know how to write unsafe code" and am pretty sure that they're more comfortable with writing unsafe code than most. Byteorder is a crate that handles this neatly and well (including endianness and stuff), but for ticki's purposes, really, a simple type pun on integers is not that bad. It's like the left-pad of unsafe code.


To paraphrase - "If you're experienced, the Rust community doesn't discourage it. They likely know what they're doing." :)

This sounds very similar to C or C++ developers who criticize Rust. "Experienced C / C++ developers know what they are doing. Why do we need the borrow checker. In my code, these issues Rust claims to solve haven't been an issue for a decade."

The argument seems a little hypocritical. Anyways, its not a big deal, it just seems like an easy problem to solve today, but a tedious problem to live with tomorrow. I'll stop playing devil's advocate now.


> "If you're experienced, the Rust community doesn't discourage it. They likely know what they're doing."

That's not what I'm saying. I'm saying that I can understand why ticki just reimplemented it here; because they are experienced enough to know what they're doing, AND because this isn't a dangerous unsafe operation. The AND is important here. There are a couple of unsafe operations which are basically no big deal, and this is one of them (another is the use of unchecked indexing in some cases).

An inexperienced Rust programmer would generally still avoid unsafe like the plague and ask around before doing this. We get this often enough -- folks asking how to do lower level operations well (and often getting pointed to crates like byteorder).

An experienced Rust programmer knows that this specific operation is trivial enough to not worry that much. I know about byteorder, so I would use it in this case, but if I hadn't I'm pretty sure I wouldn't be too averse to just doing the type pun.

I'm not talking about general unsafe code here. I've seen very experienced Rust programmers ask for help in getting rid of or double-checking unsafe code. I'm talking about this specific instance, and saying that it's not in the category of unsafe code that needs to be worried about.


I disagree: this sort of operation that seems safe is some of the most deceptive code to write... there's always the risk of reading a little bit more than intended or incrementing a pointer a little too far and hitting undefined behaviour. (There is in fact a bug (or two) in the type punning `read_u64` code, fortunately just a correctness one, but the small amount of code in that function is enough to obscure it, let alone the long sequence of copy-pasted `unsafe` code that makes up the main algorithm.)

It is definitely a red flag for a function that just reads some bytes to be entirely wrapped in an unsafe block, and I don't think it makes sense to defend it while also saying that the Rust community discourages gratuitous use of `unsafe`. The whole point of Rust is the "experts know what they're doing" argument doesn't work in practice, and people shouldn't get a free pass for large amounts of seemingly undocumented and unjustified `unsafe` just because they have some prominent projects.


Fair; I guess I'm wrong here. Perhaps we should be doing something to prevent gratuitous use of unsafe code. I'm quite wary that we may overly stigmatize unsafe code this way, though (leading to people avoiding even the use of libraries like byteorder that contain unsafe code). It's a careful balance to maintain.


I can't read ticki's mind, so I can't really say why they made this decision. But I don't think you're right to suggest that this is inherently about using the path of least resistance; #2 on your list is not about that, for example.


Disagree.

Unless improving the flexibility of the existing abstraction has been explored and deemed infeasible, re-building an abstraction instead of working with the existing abstraction's owner, to improve its flexibility, is taking the path of least resistance.


Tooling could encourage this by requiring that if a module or crate contains unsafe code, its name has to contain "unsafe".


That sounds horrifying (to me).

I'm not a Rust user. I've never written a line of Rust code in my life. So my opinion is not worth much. And I'm exaggerating a bit. But still.

I don't think I can explain my real reason for objecting to such a thing; it's more emotional than rational, but at least let me give one rational-sounding example of where I think even you would agree such a policy would backfire.

Some library comes out, becomes fairly popular. Eventually a version 2 is released, but the new version has a few lines of unsafe code, whereas the old one did not. Maybe it's to enable a new feature, or maybe it's for performance. Maybe the author was properly paranoid and did a full-fledged correctness proof that his code had no bugs compared to the documentation, and then verified the proof with several different interactive proof assistants. (This is getting a bit unrealistic, but bear with me for a few more sentences.)

But he now has to rename his crate, instantly making the upgrade process for existing users way more annoying. And to add insult to injury, the new name is wrong! There's nothing unsafe about his code, as far as users are concerned. The only trouble is that the Rust compiler could not prove it safe. As you pointed out yourself, this is not surprising: it can't even cope with a linked list! Sophomores in college implement these things, but the borrow checker can't cope with them. So of course unsafe is needed.

And if version 3 finds a workaround, so that unsafe is no longer needed, do you rename it yet again? People think Java's checked exceptions are annoying, but surely this is far worse?


How does one verify unsafe blocks in a library? Wondering what are the best practices.


Better integration within the Rust library ecosystem. Instead of having to install a C/C++ library through my package manager, I can just add the seahash dependency to my Cargo.toml file and cargo will handle the rest.


None. But once you're out of the buffer code, it's safe.


It's safe as long as the unsafe code didn't mess with pointers in unexpected ways. Your unsafe code still needs to behave in a sane way for any safety guarantees to hold.


Sure. And if there's a problem, the search for the bug should be limited to the unsafe blocks. Which is the promised benefit over C++.


No. As soon as there's unsafe code, there's the possibility of safe code misusing the unsafe code to create unsafe behavior. One would like to have un-abusable unsafe code, but the language does nothing to guarantee that.


The parent post is correct. Unsafe code must uphold the invariants of safe Rust. If unsafe code is safe only if the caller upholds some invariants not enforced by the compiler, then by definition it's the unsafe code that is wrong.


Unsafe code must uphold the invariants of safe Rust.

Ideally, yes. In practice, maybe. We're probably going to see "unsafe" code that assumes good behavior on the part of the caller. That's a classic problem with APIs.


There's no way to solve that problem without just forbidding unsafe code entirely. Unsafe code can have bugs; that's why you should keep it to the minimum and keep it well-known and audited.

In this case, the byteorder crate would have been more appropriate than handrolling unsafe code.


I think the issue, if there is one, is that arguably the more appropriate thing to do would have been to implement the code without unsafe, using an intrinsically correct algorithm. The type-punning is premature optimization. _That_ was the misstep.

That the byteorder crate exists is irrelevant in as much as this was an example of the urge for premature optimization leading the developer down the wrong path. The same amount of time pondering whether to even bother using a pre-existing library might have been better spent second-guessing the urge to type-pun at all.

Also, looking at the byteorder crate, I wouldn't be surprised if it's even slower than the simpler and correct loop I posted elsethread. read_num_bytes in that create uses copy_nonoverlapping, which I assume is analogous to memcpy in C. That's a very round-a-bout and inefficient way to accomplish the task, and likely patterned after similarly bad C code.

To even make it worthwhile, any byteorder library should provide some kind of iterator interface so that it can maintain alignment state while permitting the loop to be unrolled by the compiler. (And it might require a closure or someway of expanding a block of code inline.) That's probably the only way it could outperform the simple, hand-rolled, endianness- and alignment-neutral solution. But it doesn't provide that kind of interface AFAICT.

It's all sort of ironic, which I suppose was the point upthread--this is an example of the irrational urge for premature optimization and of bad programming idioms being hauled into Rust land completely unhindered by Rust's type safety features. And the better, correct, and likely more performant way of accomplishing this task could have been done just as safely from C as it could from Rust.


> Also, looking at the byteorder crate, I wouldn't be surprised if it's even slower than the simpler and correct loop I posted elsethread. read_num_bytes in that create uses copy_nonoverlapping, which I assume is analogous to memcpy in C. That's a very round-a-bout and inefficient way to accomplish the task, and likely patterned after similarly bad C code.

It wasn't patterned after any C code. ptr::copy_nonoverlapping doesn't necessarily compile down to memcpy. Namely, concrete sizes are given, so the compiler backend can optimize this down to simple loads and stores on x86, which is probably going to do better than the bit-shifting approach. Namely, loading a little-endian encoded integer on a little-endian architecture should be as simple as a single word-sized load (because the byte swap is unnecessary). It would be interesting to consider whether the safer and more readable bit-shifting approach could be compiled down to the same code, but when I wrote the byteorder crate, this wasn't the case.

This isn't the only place that ptr::copy_nonoverlapping is useful. I used it in my snappy[1] implementation as well, specifically to avoid the overhead of memcpy. To be clear, this wasn't my idea. This is what the C++ Snappy reference implementation does as well. Avoiding memcpys in favor of unaligned loads/stores is a dramatic win. I know this because I tried to write my Snappy implementation without specific unaligned loads/stores, and it performed quite a bit worse. The performance of the Rust implementation is now on par with the C++ implementation. Of course, this is always dealing with raw bytes---there's no type punning here.

ptr::copy_nonoverlapping is a bit generic for this use case. That's why we recently accepted an RFC to add read_unaligned/write_unaligned to the standard library[2]. (Which are implemented via straight-forward calls to ptr::copy_nonoverlapping.)

[1] - https://github.com/BurntSushi/rust-snappy/blob/master/src/de...

[2] - https://github.com/rust-lang/rfcs/blob/master/text/1725-unal...


  Namely, concrete sizes are given, so the compiler backend can optimize this down to simple loads and stores, which is going to do better than the bit-shifting approach
It can't optimize it down to simple loads and stores unless it can prove that it's aligned. If it can't optimize it to a simple load, it has to check for alignment. If it has to check for alignment, it's unlikely to be faster than the byte-loading function. The bit-shifting approach can be parallelized by superscalar CPUs if you unroll the loop. Whereas the alignment check cannot be parallelized on CPUs where alignment matters, whether or not it's been unrolled to load in chunks.

FWIW, memcpy can be similarly optimized in C. memcpy -> scalar assignment is an optimization that GCC (and probably clang) performs. But if it can't prove alignment it can't optimize it to a scalar load/store, and alignment typically can't be proven except for small functions where the optimizer can see the definition of the array _and_ can prove any pointer derived from the array is properly aligned. That's generally not the case when juggling user-provided strings because there are too many conditionals between where memcpy is invoked and the origin of the pointer.

Also, as a general rule unaligned loads are slower even on x86, so it often times makes sense to check for alignment regardless, especially to optimize the case of loading a long series of integers. And when performance matters, that's precisely what you want to do if you can. You want to batch load the series of integers because doing operations in batches is the key to performance on any modern processor. Indeed, it's the key to SeaHash as well. And that's what I meant by saying effort and code complexity is better spent refactoring the algorithm at a higher level than trying to micro-optimize such a small operation. And in addition to often reaping much better gains, you often marginalize if not erase any benefit the micro-optimization might had provided. It's beyond dispute that the gains from SeaHash primarily come from how it refactored its inner loop to operate on a 64-bit word instead of 8 8-bit words.


> It can't optimize it down to simple loads and stores unless it can prove that it's aligned. If it can't optimize it to a simple load, it has to check for alignment. If it has to check for alignment, it's unlikely to be faster than the byte-loading function.

I had edited my comment after-the-fact to include the "on x86" qualification.

> And that's what I meant by saying effort and code complexity is better spent refactoring the algorithm at a higher level than trying to micro-optimize such a small operation.

Your advice is overspecified. If you want to make something faster, then build a benchmark that measures the time you care about and iterate on it. If "micro optimizations" make it faster, then there's nothing wrong with that. I once doubled the throughput of a regex implementation by eliminating a single pointer indirection in the inner loop. It doesn't get any more micro then that, but consumers are no doubt happier with the increased throughput. In general, I find most of your hand waving about performance curious. You seem keen on making a strong assertion about performance, but the standard currency for this sort of thing is benchmarks.

I did all of this with byteorder when I built it years ago. I'll do it again for you.

    $ curl -sOL https://gist.github.com/anonymous/042d05e1e480b89434a673b30534efd8/raw/d2c9a4516a57c26da23c8beaffd5ad583da0a889/Cargo.toml
    $ curl -sOL https://gist.github.com/anonymous/042d05e1e480b89434a673b30534efd8/raw/d2c9a4516a57c26da23c8beaffd5ad583da0a889/lib.rs
    $ RUSTFLAGS="--emit asm" cargo bench
    test bit_shifting ... bench:   1,999,496 ns/iter (+/- 53,427)
    test type_punning ... bench:     476,105 ns/iter (+/- 11,920)
(The `RUSTFLAGS="--emit asm"` dumps the generated asm to target/release/deps.)

The benchmark reads 1,000,000 64 bit integers from a buffer in memory and sums them.

Analyzing the hotspots of each benchmark using `perf` is instructive. For type_punning:

    $ perf record target/release/deps/benchbytes-a1cc37a72d289957 --bench type_punning
    $ perf report
The corresponding asm is:

    cmpq	$7, %rsi
    jbe	.LBB4_10
    movq	(%rbx), %rcx
    addq	(%rcx,%rax), %rdi
    addq	$8, %rax
    addq	$-8, %rsi
    cmpq	%rax, %rdx
    ja	.LBB4_6
Notice how tight this loop is. In particular, we're dealing with a single simple load to read our u64. Now let's repeat the process for bit shifting:

    $ perf record target/release/deps/benchbytes-a1cc37a72d289957 --bench bit_shifting
    $ perf report
The hotspot's corresponding asm is:

    .LBB5_6:
    	cmpq	$7, %rsi
    	jbe	.LBB5_10
    	movzbl	(%rdx,%rbx), %ecx
    	movzbl	1(%rdx,%rbx), %eax
    	shlq	$8, %rax
    	orq	%rcx, %rax
    	movzbl	2(%rdx,%rbx), %ecx
    	shlq	$16, %rcx
    	orq	%rax, %rcx
    	movzbl	3(%rdx,%rbx), %eax
    	shlq	$24, %rax
    	orq	%rcx, %rax
    	movzbl	4(%rdx,%rbx), %ecx
    	shlq	$32, %rcx
    	orq	%rax, %rcx
    	movzbl	5(%rdx,%rbx), %eax
    	shlq	$40, %rax
    	orq	%rcx, %rax
    	movzbl	6(%rdx,%rbx), %ecx
    	shlq	$48, %rcx
    	movzbl	7(%rdx,%rbx), %edi
    	shlq	$54, %rdi
    	orq	%rcx, %rdi
    	orq	%rax, %rdi
    	addq	%rdi, %r12
    	addq	$8, %rbx
    	addq	$-8, %rsi
    	cmpq	%rbx, %r11
    	ja	.LBB5_6
It's no surprise that the type punning approach is faster here. (N.B. Compiling with `RUSTFLAGS="-C target-cpu=native"` seems to permit some auto-vectorization to happen, but I don't observe any noticeable improvement to the benchmark times for bit_shifting. In fact, it seems to get a touch slower.)

I could be reasonably accused of micro-optimizing here, but I do feel like reading 1,000,000 integers from a buffer is a pretty generalizable use case, and the performance difference here in particular is especially dramatic. Finding a real world problem that this helps is left as an exercise to the reader. (I've exceeded my time budget for a single HN comment.)

> It's beyond dispute that the gains from SeaHash primarily come from how it refactored its inner loop to operate on a 64-bit word instead of 8 8-bit words.

Do you feel anyone has contested this point? I note your use of the word "primarily." If type punning gives a 10% boost to something that is already fast, do you care? If not, do you think other people might care? If they do, then what exactly is your point again?

Note that I am responding to your criticism of byteorder in particular. I don't really know whether the OP's optimization of reading little-endian integers is actually worth while or not. I would hazard a guess, but would suspend certainty until I saw a benchmark. (And even then, it is so incredibly easy to misunderstand a benchmark.)


  Notice how tight this loop is. In particular, we're dealing with a single simple load to read our u64. 
Notice that you're reading the data into a statically allocated buffer, and doing it in such a way that it's trivial for the compiler to prove alignment. This is a classic case where the benchmark is irrelevant for a general purpose implementation.

Try running the code so that the buffer is dynamically allocated, and so that the first access is unaligned.

Now, I'm not saying that type-punning can't be faster, but to do it properly from a general-purpose library it should be done correctly so that every case is as fast as possible.

Assuming I'm correct and that the modified benchmark sees substantially different results, reimplement byteorder such that it produces the same tight loop even when the data isn't aligned.

I don't think it can be done without modifying the byteorder interface to expose something more iterator-like, because it needs to maintain state across invocations for doing the initial unaligned parse followed by the aligned parse.

If you can get it done in a reasonable amount of time[1], look at the difference between type-punning and byte-loading. I'll bet that relative difference will be much smaller than the difference between the unaligned performance before you refactored the interface, and the unaligned performance after refactoring the interface. In that case my point would stand--the most important part is refactoring code at a higher-level; gains quickly diminish thereafter.

If my argument is over-specified, that's because it's meant as a rule of thumb. Specifying a rule of thumb but then qualifying it with "unless" is counter-productive. For inexperienced engineers "unless" is an excuse to avoid the the rule; for experienced engineers "unless" is implied.

Note that I'm no stranger to optimizing regular expressions. I wrote a library to transform PCREs (specifically, a union of thousands of them, many of which used zero-width assertions that required non-trivial transformations and pre- and post-processing of input) into Ragel+C code and got a >10x improvement over PCRE. After that improvement micro-optimizations were the last thing on our minds. (RE2 couldn't even come close to competing; and unlike re2c, the Ragel-based solution would compile on the order of minutes, not lifetimes.)

We eventually got to >50x improvement by doubling-down on the strategy and paying someone to modify Ragel internally to improve the quality of the transformations.

[1] Doubtful as I bet it's non-trivial and you have much better things to do with your time. But I would very much like to see just benchmarks numbers after making the initial changes--dynamic allocation and unaligned access. I don't have a Rust dev environment. I'll try to do this myself later this week if I can. However, given that I've never written any Rust code whatsoever it'd be helpful if somebody copy+pasted the code to dynamically allocate the buffer. I can probably figure the rest out from there.


Hi, author of Hyperscan (https://github.com/01org/hyperscan) here.

I strongly suspect we don't support enough of this:

> many of which used zero-width assertions that required non-trivial transformations and pre- and post-processing of input

... to really support your use case. But we're interested in the workload, especially as we're looking at extensions to handle more of the zero-width assertion cases. We'll never be able to handle some of them in streaming mode (they break our semantics and the assumption that stream state is a fixed size for a given set of regular expressions).

Can you share anything about what you're doing with zero-width assertions?


> Notice that you're reading the data into a statically allocated buffer

It is not statically allocated. The data is on the heap. The give-away is that the data is in a `Vec`, which is always on the heap.

> and so that the first access is unaligned

I modified both benchmarks in this fashion:

    let mut sum: u64 = 0;
    let mut i = 1;
    while i + 8 <= data.len() {
        sum += LE::read_u64(&data[i..]);
        i += size_of::<u64>();
    }
    sum
The results indicate that both benchmarks slow down. The gap is narrowed somewhat, but the absolute difference is still around 4x (as it was before):

    test bit_shifting ... bench:   2,293,921 ns/iter (+/- 65,243)                                                                                                                                                        
    test type_punning ... bench:     659,350 ns/iter (+/- 15,550)
The loop is not so tight any more:

    .LBB4_6:
    	leaq	-8(%rcx), %rdi
    	cmpq	%rdi, %rsi
    	jb	.LBB4_11
    	cmpq	$7, %rax
    	jbe	.LBB4_12
    	movq	(%rbx), %rdi
    	addq	-8(%rdi,%rcx), %rdx
    	addq	$8, %rcx
    	addq	$-8, %rax
    	cmpq	%rsi, %rcx
    	jbe	.LBB4_6

> Now, I'm not saying that type-punning can't be faster, but to do it properly from a general-purpose library it should be done correctly so that every case is as fast as possible.

You haven't actually told me what is improper with byteorder. I think that I've demonstrated that type punning is faster than bit-shifts on x86.

You have mentioned other workloads where the bit-shifts may parallelize better. I don't have any data to support or contradict that claim, but if it were true, then I'd expect to see a benchmark. In that case, perhaps there would be good justification for either modifying byteorder or jettisoning it for that particular use case. With that said, the data seems to indicate the the current implementation of byteorder is better than using bit-shifts, at least on x86. If I switched byteorder to bit-shifts and things got slower, I have no doubt that I'd hear from folks whose performance at a higher level was impacted negatively.

> Note that I'm not stranger to optimizing regular expressions. I wrote a library to transform PCREs (specifically, a union of thousands of them, many of which used zero-width assertions that required non-trivial transformations and pre- and post-processing of input) into Ragel+C code and got a >10x improvement over PCRE. After that improvement micro-optimizations were the last thing on our minds. We eventually got to >50x improvement by doubling-down on that strategy and modifying Ragel internally. Much like micro-optimizations RE2 couldn't even come close to competing; and unlike re2c, the Ragel-based solution would compile on the order of minutes, not lifetimes.

My regex example doesn't have anything to do with regexes really. I'm simply pointing out that a micro-optimization can have a large impact, and is therefore probably worth doing. This is in stark contrast to some of your previous comments, which I found particularly strongly worded ("irrational" "premature" "bad" "incorrect"). For example:

> It's all sort of ironic, which I suppose was the point upthread--this is an example of the irrational urge for premature optimization and of bad programming idioms being hauled into Rust land completely unhindered by Rust's type safety features. And the better, correct, and likely more performant way of accomplishing this task could have been done just as safely from C as it could from Rust.

Note that I am not making the argument that one shouldn't do problem-driven optimizations. But if I'm going to maintain general purpose libraries for regexes or integer conversion, then I must work within a limited set of constraints.

(OT: Neither PCRE nor RE2 (nor Rust's regex engine) are built to handle thousands of patterns. You might consider investigating the Hyperscan project, which specializes in that particular use case (but uses finite automata, so you may miss some things from PCRE): https://github.com/01org/hyperscan)


Compilers understand memcpy, especially in the context of type punning (historically being the recommended standards-compliant way to do it) where one has small constant sizes. The copy_nonoverlapping "function" is actually a compiler intrinsic, but even if it wasn't, compilers like LLVM recognises calls to "memcpy" and even loops that reimplement memcpy and canonicalise them all to the same internal representation.


There's no way to solve that problem without just forbidding unsafe code entirely.

That's not at all clear. It's worth looking at unsafe code and asking "why was this necessary"? What couldn't you do within the language? As patterns reoccur, it may become clear what new safe primitives are needed.


> As patterns reoccur, it may become clear what new safe primitives are needed.

In these cases you have a choice between inventing a safe language primitive or inventing a safe library primitive. This exists in most cases for seemingly-safe operations, like the byteorder crate in this case.

If it were a language primitive it would be just as trustworthy as the corresponding verified library primitive.


Why is it better to add safe primitives directly to the compiler rather than implementing them in libraries?


The compiler can look at more data to decide if something is valid, or can be optimized.

C++ is trying to add move semantics via templates, but can't get all the way to Rust's borrow checker that way.


OK, but we're not talking about the borrow checker, we're talking about byteorder. There's nothing about the byteorder crate that would benefit from being added to the compiler.

In fact it would make it less safe, since we'd be debugging code that emits LLVM IR instead of writing in an actual language.


Right, but in this case it doesn't need to. I have yet to see an example of an operation that:

- should be safe in Rust but isn't

- needs /compiler/ support to work well (can't be done cleanly as a library)

- isn't already on the track for implementation (non-lexical lifetimes, SEME regions)

You did mention uninitialized arrays but uninitialized data is inherently unsafe. It's not an operation that can be made safe. Instead, you make it safe by encoding the invariants specific to your use case in your code and creating a safe wrapper -- these invariants differ by use case, so it can't be made a generic operation.


You did mention uninitialized arrays but uninitialized data is inherently unsafe. It's not an operation that can be made safe.

Sure it can. You just need primitives which can be used in asserts such as

    is_initialized(tab,i,j)
indicating that an array is initialized within those limits. Then you can write asserts such as

    assert(is_initialized(tab,i,j-1));
    ... initialize tab[j]
    assert(is_initialized(tab,i,j));
Standard program verification technology. Verification of unsafe sections is a useful goal, and deserves language support. Hand-waving about "encoding the invariants specific to your use case in your code" is insufficient. You need to write them down and prove them. Then you can eliminate them from the run-time code.


You want us to add dependent types to Rust (which is what you just proposed)? Half the time I see you complaining about Rust you're complaining that it's too complicated!


> You just need primitives which can be used in asserts such as

Sounds like you're going along the path of a dependent type system (in this specific case)? Yes, that could be done, and would perhaps let you reduce a couple of unsafe blocks in the implementation of Vec and other buffer-based abstractions (but not get rid of all of them).

FWIW there is active work going on for formal verification of Rust (both safe and unsafe code), in the RustBelt project.

In general making unsafe blocks run formal verification would be an interesting thing to do (and would solve this problem completely). I don't think it deserves language support, however (nice-to-have, not must-have). This is a very different goal from your original point of adding a few language features that ease writing lower level abstractions.

--------

Ultimately, you're right. While pcwalton did mention "There's no way to solve that problem without just forbidding unsafe code entirely."; this is a possible alternative -- have language support for scoped formal verification that allows you to use "unsafe" operations safely. I think this is an extreme solution to what I consider to be a mostly nonexistent problem.

For really security sensitive code this would indeed be very useful (and is probably a big motivator behind the RustBelt project). Or just use SPARK or something.

But for most Rust users I think the current system is pretty robust and provides enough primitives to write clean, easy-to-verify abstractions with (verifiable) safe API boundaries. (when I say "verify" here I mean it in the informal sense). I haven't come across unsafe code doing contortions, and I have had the (mis?)fortune of going through a lot of unsafe code. The only rough edges are with FFI, and these are mostly due to a lot of things about unsafe code being underspecified (which don't crop up as often in pure rust unsafe code, but do crop up when you through some C/++ FFI in the mix). There is active work on specifying the exact semantics of unsafe code however, so that should be fixable once it happens.


Don't forget there's a middle ground between not having them and manual, formal verification. It started with Eiffel with basic contracts that checked properties during testing and/or runtime. That did well in commercial deployments. SPARK took it formal with a basic, boolean encoding for programmer understanding. It uses a subset of Ada to prove absence of all kinds of error conditions without runtime checks or manual proof. It can optionally do more with a theorem prover but optional. Eschersoft did Perfect Developer to do it in high-level language with C, Java, or Ada generation. So, at least three are doing it in products with industry deployments with two highly concerned about performance in low-level applications.

Although I'm not getting into this dispute, I will add in general that Rust might benefit from such contracts or push-button verification of key properties as deployed successfully in Eiffel, SPARK, Ada 2012, and Perfect Developer. A language subset might be used like in SPARK to allow automated verification of those sections against common types of errors. Three follow-up benefits will be easier changes/integrations in maintenance phase, automated test generation from specs, and aiding dynamic analysis by giving it invariants to look at. Could be optimization benefits but I'm not qualified to say on that. Intuitively seems possible like using minimum-sized, data structure for a number range in spec or type. Stuff like that.

These techniques are really under-utilized despite being proven out many times over in high-reliability products.


Yeah, this exists, and would be interesting. I again think that it's a bit too extreme a solution to be baked into Rust itself, but I'd love a SPARKish Rust variant.


I'd like both. SPARK's stuff was ported to Ada 2012. It can be done for Rust as well. The trick is to make it optional so people don't have to pay attention to it. Maybe even have editors filter it out for people not paying attention to it. At the least, it being used in standard library and OS API's would let it enforce correct usage of those in debug/testing mode. 80/20 rule says that should have some impact given how much damage we've seen C and Java developers do misusing the foundational stuff.


Yeah, me too. However, I think we should wait for the formal verification of Rust to be completed before trying this. While it is possible to make something SPARKish without complete formal verification, it's probably better to build it using concepts learned during the formal verification.


> We're probably going to see "unsafe" code that assumes good behavior on the part of the caller.

I have yet to see any of this.

I have noticed that it's harder to write correct unsafe code when it comes to parallelism and FFI, but parallelism has always been a hard problem and the FFI problems generally come from the fact that you need to know the invariants being upheld on the other end, which is trickier.

But for this kind of unsafe code -- designing (non-parallel) abstractions -- upholding invariants is pretty straightforward.


> I have yet to see any of this.

mem::forget-pocalypse was this. (Rc/Arc, Vec::drain, thread::scoped)

Any UB bug that results from an overflow is kind've implicitly this.

BTreeMap::range still has an UB bug from trusting the caller! I literally asked you to fix it! https://github.com/rust-lang/rust/issues/33197

Bugs happen man.


> mem::forget-pocalypse was this.

I would say that this is from a time when the invariants were not understood. In particular, the fact that leaking is safe to do in safe code was not known.

(The invariants are still not completely understood, but there's work to specify that, and IMO they're understood enough to be able to avoid unsafe bugs)

> BTreeMap::range still has an UB bug from trusting the caller!

Fair :) I'd completely forgotten about that one.


> I would say that this is from a time when the invariants were not understood.

Yeah, but it's not like "oh this is an obvious thing to consider trusting the caller about". It's an exceptionally niche problem that you'd only know about if someone told you about it. Especially since a Rust programmer shouldn't be expected to write unsafe code often, if ever!

Similarly: not trusting traits to be implemented correctly. Not trusting closures to not-unwind.


Fair. I'm not saying that your average Rust programmer will be able to deal with unsafe code immediately. But I do think that at this stage the list of things you can and cannot rely on (and the invariants you must uphold) is clear enough that in theory you could make a checklist to deal with this. The nomicon provides much of the background for folks wanting to figure this out and write unsafe code.

These days I've been writing a lot of unsafe code (for FFI) and I do want to get around to penning a concise guide (or just expanding the nomicon). But I'm mostly waiting for the unsafe code subteam to figure out a couple things before doing this (specifically, the exact boundaries of rust's noalias UB becomes important in FFI and this is not specified yet).

But yeah, it's not necessarily obvious. I'd like to make it easier to get this understanding of unsafe code though.


One of the barriers I've erected in my own head is whether my unsafe code is generic or not. As soon as your unsafe code starts depending on an arbitrary T (or some trait that T satisfies), then the scope of what you need to consider seems to widen quite a bit. I tend to either avoid this type of unsafe or find a way to constrain my problem. Using `unsafe` for pointer tricks or efficient movement of memory on concrete data types feels more self-contained to me, and therefore easier to verify.

(I don't have any particular point to make btw. Just sharing thoughts.)


Yeah, this is super important. I thought about it and I too tend to think very hard about generics here.

As with all such things, it's harder to enumerate what's in your head :)


Hmm, a quick checklist-style thing is a pretty good idea!


> there's the possibility of safe code misusing the unsafe code to create unsafe behavior.

This is a misconception.

You can write un-abusable unsafe code pretty easily. There are a couple of invariants that the language requires you to uphold. As long as you uphold them you are fine. If your unsafe code can be broken by external safe code that is a bug in your unsafe code, on par with doing `let p = ptr::null(); print(*p);` in your unsafe code.


The unsafety still originates in the unsafe code, thus you can audit the use of the unsafe code or the unsafe code itself just the same.


I was trying to figure out how this is so much faster than FNV https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo... Is it only because of the parallelism? Or are the operations really that much cheaper somehow?


Both.

The pseudocode for FNV looks like this:

   hash = FNV_offset_value
   for each byte_of_data to be hashed
   {
        hash = hash XOR byte_of_data
        hash = hash × FNV_prime
   }
   return hash
The pseudocode for seahash looks like this (with '×' as the wrapping multplier operator, and some simplification for padding if the data length in bytes is not a multiple of 8 bytes per word × 4 words in the hash state):

    hash = {offset_1, offset_2, offset_3, offset_4}

    for (int data_index = 0; 
             data_index < data.length_in_64_bit_words; 
             data_index = data_index + 4_words_in_hash)
    {        
        for (int hash_index = 0; hash_index < 4; hash_index++)
        {
            // Mix in data
            hash[hash_index] = hash[hash_index] XOR data[data_index + hash_index]
            // Diffuse
            hash[hash_index] = hash[hash_index] XOR (hash[hash_index] RSHIFT 32)
            hash[hash_index] = hash[hash_index] × seahash_prime
            hash[hash_index] = hash[hash_index] XOR (hash[hash_index] RSHIFT 32)
            hash[hash_index] = hash[hash_index] × seahash_prime
            hash[hash_index] = hash[hash_index] XOR (hash[hash_index] RSHIFT 32)
        }
    }

    result = hash[0] XOR hash[1] XOR hash[2] XOR hash[3] XOR data.length_in_bytes

    result = result XOR (result RSHIFT 32)
    result = result × seahash_prime
    result = result XOR (result RSHIFT 32)
    result = result × seahash_prime
    result = result XOR (result RSHIFT 32)

    return result
FNV is operating on bytes of data, while seahash is operating on 64-bit words. A modern processor will be able to handle 64 bits at once. True, it can probably handle 8 bits independently in one instruction without having to create a temporary value, but it still needs to do more operations.

FNV is completely sequential. Until the first byte is hashed, no work can be done on the second byte. In seahash, as you observed, parallelism can be exploited. The second, third, and fourth bytes are all completely independent of the first byte, as bytes 6, 7, and 8 are independent of byte 5, and so on. You can have four independent threads each do a quarter of the work, and then put the result back together at the end.


So what you're saying is, is that this could be heavily optimized using OpenCL?

I might have to bookmark this as a project to take on. A heavily parallel hashing algorithm that takes advantage of OpenCL would immensely increase the efficiency wouldn't it?

(I'm dipping my toes in parallel processing as of late and am genuinely curious in this topic and question)


Is FNV algorithm limited in the same way as a ripple carry adder is then, where every operation relies on the previous result, and can't be (or maybe just isn't but could be) done in parallel due to this implementation?


Hardware threads, just to clarify.

> SeaHash achieves the performance by heavily exploiting Instruction-Level Parallelism.

> This means that almost always the CPU will be able to run the instructions in parallel.


Execution units, to be precise. Separate threads of execution are not involved, hardware or software.


Presumably because SeaHash operates on 8-byte blocks while FNV acts on single bytes.


How does it perform on short strings (e.g. <= 16 bytes)? We've seen several new hash functions lately with great throughput numbers, but unfortunately they often end up being slower than FNV when used e.g. on keys in hash maps, which are often short strings.


I was unsure how to read the diffusion function's description, so I went to the source.

The first line, "x ← x ≫ 32", might have a typo? It's actually assigning (x XOR (x >> 32)).

With "x ← px", p is a fixed prime number being multiplied by x.


Great to see another piece of code written in Rust. That said, how do you make claims of something being blazingly fast without any comparisons to implementations in other languages such as C or C++?


The claim is that it is a blazingly fast hash function compared with other hash functions, and it is also written in Rust. Rust is an enabling technology, but not able to be dramatically faster than a comparable C/C++ implementation, as a general rule.


The title is confusing. If it tries to compete with other hash functions, why does the title have to bear "in Rust"?


Because it's probably the most notable/unusual part of this hash function. Virtually all other hash functions for at least a decade have had their reference implementations written in C or C++.


The Rust ecosystem is growing rapidly, and I personally think it's awesome that we're telling the world about it. Rust is production-ready and presents several advantages over C and C++. The more people that jump on board, the faster we get more libraries, and the more companies will take a look at using Rust in their own production environments.

I predict Rust will also begin to find use as a backend server language, and begin to eat into Java, Go, Python, and Ruby mindshare. Though there's a learning curve with Rust, it doesn't take too long to become productive. I'm also excited to see how Rust makes inroads in game development.

I think we're all going to start seeing more Rust in our headlines. It's a great language, and the people I know that use it are in love with it (myself included).


Because it's benchmarking a hash function, not a language?


Very cool!

BTW, the hyperlink for "reference" in the "Specification" section is broken.


Great. But keep in mind that this is not a keyed hash function.


You can use it as a keyed hash function by adjusting the IV. See: https://docs.rs/seahash/2.0.0/src/seahash/.cargo/registry/sr...


It seems to already be implemented in https://docs.rs/seahash/2.1.1/src/seahash/.cargo/registry/sr...

But it only seeds one of the lanes, so you can still make collisions trivially in one of the other lanes. I guess it could still be useful for namespacing.


This is mentioned in the documentation:

"Warning!

This is not a cryptographic function, and it certainly should not be used as one. If you want a good cryptograhic hash function, you should use SHA-3 (Keccak) or BLAKE2."


That's not the same thing. You don't usually key for crypto reasons, and a good cryptographic hash does not imply keying.


What's the matter with this stupid word "blazingly" in the Rust community?




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: