Hash maps are such a fundamentally important data structure that it comes as a s...

tialaramex · on Nov 11, 2023

My impression from the article is that Zig provides several different hashtables and not all of them are broken in this way.

This reminds me of Aria's comment in her Rust tutorial https://rust-unofficial.github.io/too-many-lists/ about failing to kill LinkedList. One philosophy (and the one Rust chose) for a stdlib is that this is only where things should live when they're so commonly needed that essentially everybody needs them either directly or to talk about. So, HashTable is needed by so much otherwise unrelated software that qualifies, BloomFilter, while it's real useful for some people, not so much. Aria cleaned out Rust's set of standard library containers before Rust 1.0, trying to keep only those most people would need. LinkedList isn't a good general purpose data structure, but, it was too popular and Aria was not able to remove it.

Having multiple hash tables feels like a win (they're optimized for different purposes) but may cost too much in terms of the necessary testing to ensure they all hit the quality you want.

otabdeveloper4 · on Nov 11, 2023

Linked lists are necessary when you have structures that can't be moved. Very important in a highly concurrent or lockfree environment.

iudqnolq · on Nov 11, 2023

How could you use the Rust standard library's linked list for that?

KerrAvon · on Nov 11, 2023

A deque will usually work just as well in that environment and it will always have better general performance characteristics. There are very few legitimate uses of linked lists.

otabdeveloper4 · on Nov 11, 2023

A deque is a doubly-linked list under the hood.

adgjlsfhk1 · on Nov 12, 2023

Array based deques are very common (and usually much faster). Doubly linked lists are absolutely awful for memory locality.

boxed · on Nov 11, 2023

Can't you have a list of pointers instead?

xgstation · on Nov 11, 2023

Resizing/deleting items in list in most implementations are either not lock free or not threadsafe

chmod775 · on Nov 11, 2023

Yeah, but what if you just use smaller lists and point to the next/previous list?

Hold on...

boxed · on Nov 13, 2023

Linked lists aren't either so....?

AndyKelley · on Nov 11, 2023

I personally have two habits that made me not notice:

1. I generally prefer ArrayHashMap

2. I tend to not delete things from hash maps

murkt · on Nov 11, 2023

Not that many hashmaps see hundreds of millions of entries in their lifetime. Given that Zig isn’t widely used, it’s probable that noone has really stumbled upon this behaviour in a non-benchmark setting.

shemii · on Nov 11, 2023

Actually it seems according to the issue that TigerBeetle (one of the bigger zig projects out there) noticed this issue [1]. It's also on their issue tracker [2].

[1] https://github.com/ziglang/zig/issues/17851

[2] https://github.com/tigerbeetle/tigerbeetle/issues/1191

brabel · on Nov 11, 2023

When you use a language that's in alpha- (maybe beta- now?) stage, this kind of thing should be expected. Even with the latest version of Zig, perfectly correct programs can segfault due to miscompilation, so performance issues are not even the biggest worry you should have.

shemii · on Nov 11, 2023

One thing I really don't unserstand is how bun already reached stability with it's 1.0 release (https://bun.sh/blog/bun-v1.0) while being written in Zig, which still hasn't reached it's 1.0 release.

brabel · on Nov 11, 2023

If you test your software built on Zig close to 100% then you can be pretty confident you didn't run into any of the Zig compiler bugs. Bun has certainly run into them but they can almost always be worked around... if you work around all problems and manage to catch everything then yeah, you can have an application that's production worthy while being built on a compiler that's not quite there yet.

The problem is, of course, you almost certainly didn't test anything near 100% and probably not even everything users may do with your software, so it's a risky bet.

kreetx · on Nov 11, 2023

Versioning isn’t an exact science, and people use whatever they see fit, both for technical but also marketing purposes.

fastball · on Nov 11, 2023

Bun is compiled to a binary, so not sure it matters how stable the underlying language is if Bun itself has a stable API?

KolmogorovComp · on Nov 11, 2023

That’s the point OP is making, even if the program is bug-free, given the compiler’s current state there’s a chance the compiled binary has bugs (due to miscompilation).

Now that can happen with every compiler but using one in still in alpha significantly increases the risk.

fastball · on Nov 11, 2023

Does it help at all that Zig is LLVM-based?

wredue · on Nov 11, 2023

Not particularly. And zig is moving away from llvm anyway.

losvedir · on Nov 11, 2023

If we're talking SemVer then 1.0 only means a commitment to a public interface. It doesn't mean the code has no more bugs and ready for use in a spaceship.

Conversely, taking on a dependency prior to its 1.0 just means you're on the hook for dealing with API changes as they come up. But as long as the API you expose doesn't change, it's fine to be on 1.0 yourself.

kprotty · on Nov 11, 2023

Are there any links / examples for the miscompilations?

KolmogorovComp · on Nov 11, 2023

Zig’s issue tracker list them [0].

[0]: https://github.com/ziglang/zig/issues?q=is%3Aopen+is%3Aissue...

kprotty · on Nov 12, 2023

Thanks, seems like most are from LLVM or packed structs.

liftm · on Nov 11, 2023

It's surprisingly common for fledgling programming languages… cough https://accidentallyquadratic.tumblr.com/post/153545455987/r...

(I know https://news.ycombinator.com/item?id=38138223, but at least I'm not the first one this time.)

29athrowaway · on Nov 11, 2023

Zig is rather new, so it doesn't surprise me.