Hacker News new | past | comments | ask | show | jobs | submit login

Hash maps are such a fundamentally important data structure that it comes as a surprise that the Zig implementation is so broken. Good to see it’s getting fixed, but surprising that this wasn’t detected before.



My impression from the article is that Zig provides several different hashtables and not all of them are broken in this way.

This reminds me of Aria's comment in her Rust tutorial https://rust-unofficial.github.io/too-many-lists/ about failing to kill LinkedList. One philosophy (and the one Rust chose) for a stdlib is that this is only where things should live when they're so commonly needed that essentially everybody needs them either directly or to talk about. So, HashTable is needed by so much otherwise unrelated software that qualifies, BloomFilter, while it's real useful for some people, not so much. Aria cleaned out Rust's set of standard library containers before Rust 1.0, trying to keep only those most people would need. LinkedList isn't a good general purpose data structure, but, it was too popular and Aria was not able to remove it.

Having multiple hash tables feels like a win (they're optimized for different purposes) but may cost too much in terms of the necessary testing to ensure they all hit the quality you want.


Linked lists are necessary when you have structures that can't be moved. Very important in a highly concurrent or lockfree environment.


How could you use the Rust standard library's linked list for that?


A deque will usually work just as well in that environment and it will always have better general performance characteristics. There are very few legitimate uses of linked lists.


A deque is a doubly-linked list under the hood.


Array based deques are very common (and usually much faster). Doubly linked lists are absolutely awful for memory locality.


Can't you have a list of pointers instead?


Resizing/deleting items in list in most implementations are either not lock free or not threadsafe


Yeah, but what if you just use smaller lists and point to the next/previous list?

Hold on...


Linked lists aren't either so....?


I personally have two habits that made me not notice:

1. I generally prefer ArrayHashMap

2. I tend to not delete things from hash maps


Not that many hashmaps see hundreds of millions of entries in their lifetime. Given that Zig isn’t widely used, it’s probable that noone has really stumbled upon this behaviour in a non-benchmark setting.


Actually it seems according to the issue that TigerBeetle (one of the bigger zig projects out there) noticed this issue [1]. It's also on their issue tracker [2].

[1] https://github.com/ziglang/zig/issues/17851

[2] https://github.com/tigerbeetle/tigerbeetle/issues/1191


When you use a language that's in alpha- (maybe beta- now?) stage, this kind of thing should be expected. Even with the latest version of Zig, perfectly correct programs can segfault due to miscompilation, so performance issues are not even the biggest worry you should have.


One thing I really don't unserstand is how bun already reached stability with it's 1.0 release (https://bun.sh/blog/bun-v1.0) while being written in Zig, which still hasn't reached it's 1.0 release.


If you test your software built on Zig close to 100% then you can be pretty confident you didn't run into any of the Zig compiler bugs. Bun has certainly run into them but they can almost always be worked around... if you work around all problems and manage to catch everything then yeah, you can have an application that's production worthy while being built on a compiler that's not quite there yet.

The problem is, of course, you almost certainly didn't test anything near 100% and probably not even everything users may do with your software, so it's a risky bet.


Versioning isn’t an exact science, and people use whatever they see fit, both for technical but also marketing purposes.


Bun is compiled to a binary, so not sure it matters how stable the underlying language is if Bun itself has a stable API?


That’s the point OP is making, even if the program is bug-free, given the compiler’s current state there’s a chance the compiled binary has bugs (due to miscompilation).

Now that can happen with every compiler but using one in still in alpha significantly increases the risk.


Does it help at all that Zig is LLVM-based?


Not particularly. And zig is moving away from llvm anyway.


If we're talking SemVer then 1.0 only means a commitment to a public interface. It doesn't mean the code has no more bugs and ready for use in a spaceship.

Conversely, taking on a dependency prior to its 1.0 just means you're on the hook for dealing with API changes as they come up. But as long as the API you expose doesn't change, it's fine to be on 1.0 yourself.


Are there any links / examples for the miscompilations?



Thanks, seems like most are from LLVM or packed structs.


It's surprisingly common for fledgling programming languages… cough https://accidentallyquadratic.tumblr.com/post/153545455987/r...

(I know https://news.ycombinator.com/item?id=38138223, but at least I'm not the first one this time.)


Zig is rather new, so it doesn't surprise me.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: