Hacker News new | past | comments | ask | show | jobs | submit login

There are probably a bunch of reasons, which is why I want an easy "run benchmarks" command that I can use. I'd even be fine using infra so long as I had pulumi/terraform to set it all up for me.

I just don't want to spin up EC2 instances manually, get the connections all working, make sure I can reset state, etc.

I already have a fork of Scylla where I removed a lot of unnecessary cloning of `String` but no way I'm gonna PR it without a benchmark.

I also opened a PR to replace the hash algorithm used in their PreparedStatement cache, which gets hit for every query, but they wanted benchmarks before accepting (completely fair) and I have none. `ahash` is extremely fast compared to Rust's default - https://github.com/tkaitchuck/ahash and with the `comptime` randomness (more than sufficient for the scylla use case) you can avoid a system call when creating the HashMap.

There are also some performance improvements I have in mind for the response parsing, among other things.




So, I got sniped hard by this.

1. I've re-opened by hashing PR and I'm going to suggest that they adopt ahash as the default hasher in the future.

2. I've re-written my "reduce allocations" work as a POC. Another dev has done similar work to reduce allocations, we took different approaches to the same area of code. I'm going to try to push the conversation forward until we have a PR'able plan.

3. I'm going to push for a change that will remove multiple large allocations (of PreparedStatement) out of the query path.

4. Another two devs have started work on the response deserialization optimizations, which is awesome and means I don't have to even think about it.

I think we'll see really significant performance gains if all of these changes come in.


> I just don't want to spin up EC2 instances manually, get the connections all working, make sure I can reset state, etc.

I've been thinking about this lately.

I wonder if we could standardize a benchmark format so that you could automatically do the steps of downloading the code, setting up a container (on your computer or in the cloud), running the benchmarks, producing an output file, and making a PR with the output.

So developers would go "here's my benchmark suite, but I've only tested it on my machine", and users would call "cargo bench --submit-results-in-pr" or whatever, and thus the benchmark would quickly get more samples.

(With graphs being auto-generated as more samples come in, based on some config files plus the bench samples)


Interesting idea. I could imagine something like that but it's a bit tough.


So would the ideal solution be if ScyllaDB had a github action to run benchmarks against PR's?

Not sure how decent a benchmark would be without running up servers in the cloud. So I guess provisioning infra would be a requirement?

So perhaps this could be run manually. But certainly possible

- Pulumi up infra - Run benchmarks - Collect results - Attach to PR.


I'd be happy with a few things:

1. Benchmarks of "pure" code like the response parser, which I could `cargo bench`. I may actually work on contributing this.

2. Some way to run benchmarks against a deployed server. I wouldn't recommend a Github action necessarily, a nightly job or manual job would probably be a better use of money/resources. If I could plug in some AWS creds and have it do the deployment and spit out a bunch of metrics for me that'd be wonderful.


I just did a comparison between almost every hashing algorithm I could find on crates.io. On my machine t1ha2 (under the t1ha crate) beat the pants off of every other algorithm. By like an order of magnitude. Others in the lead were blake3 (from the blake3 crate) and metrohash. Worth taking a look at those if you’re going for hash speed.

I don’t have the exact numbers on me right now but I can share them tomorrow (along with the benchmark code) if you’re interested.


The PR I have lets you provide the algorithm as the caller, although I did benchmark against fxhash and I think it would be a good idea to suggest `ahash`. I'm certainly interested.

`ahash` has some good benchmarks here: https://github.com/tkaitchuck/aHash/blob/master/FAQ.md


aHash claims it is faster than t1ha[1].

The t1ha crate also hasn't been updated in over three years so the benchmark in this link should be current.

[1] https://github.com/tkaitchuck/aHash/blob/master/compare/read...

Edit: if you really think tha1 is faster I would open an issue on the aHash repo to update their benchmark.


FYI Small hashes beat better quality hashes for hash table purposes.


Hi, do you know if there's a recent hash benchmark I can look into? I am using `FnvHash` as my go-to non-crypto-secure hash for performance reason, didn't realise there could be faster contenders.

Thanks!


This is the best, most comprehensive hash test suite I know of: https://github.com/rurban/smhasher/

you might want to particularly look into murmur, spooky, and metrohash. I'm not exactly sure of what the tradeoffs involved are, or what your need is, but that site should serve as a good starting point at least.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: