There are probably a bunch of reasons, which is why I want an easy "run benchmar...

insanitybit · on Oct 13, 2022

So, I got sniped hard by this.

1. I've re-opened by hashing PR and I'm going to suggest that they adopt ahash as the default hasher in the future.

2. I've re-written my "reduce allocations" work as a POC. Another dev has done similar work to reduce allocations, we took different approaches to the same area of code. I'm going to try to push the conversation forward until we have a PR'able plan.

3. I'm going to push for a change that will remove multiple large allocations (of PreparedStatement) out of the query path.

4. Another two devs have started work on the response deserialization optimizations, which is awesome and means I don't have to even think about it.

I think we'll see really significant performance gains if all of these changes come in.

PoignardAzur · on Oct 13, 2022

> I just don't want to spin up EC2 instances manually, get the connections all working, make sure I can reset state, etc.

I've been thinking about this lately.

I wonder if we could standardize a benchmark format so that you could automatically do the steps of downloading the code, setting up a container (on your computer or in the cloud), running the benchmarks, producing an output file, and making a PR with the output.

So developers would go "here's my benchmark suite, but I've only tested it on my machine", and users would call "cargo bench --submit-results-in-pr" or whatever, and thus the benchmark would quickly get more samples.

(With graphs being auto-generated as more samples come in, based on some config files plus the bench samples)

insanitybit · on Oct 13, 2022

Interesting idea. I could imagine something like that but it's a bit tough.

ianpurton · on Oct 13, 2022

So would the ideal solution be if ScyllaDB had a github action to run benchmarks against PR's?

Not sure how decent a benchmark would be without running up servers in the cloud. So I guess provisioning infra would be a requirement?

So perhaps this could be run manually. But certainly possible

- Pulumi up infra - Run benchmarks - Collect results - Attach to PR.

insanitybit · on Oct 13, 2022

I'd be happy with a few things:

1. Benchmarks of "pure" code like the response parser, which I could `cargo bench`. I may actually work on contributing this.

2. Some way to run benchmarks against a deployed server. I wouldn't recommend a Github action necessarily, a nightly job or manual job would probably be a better use of money/resources. If I could plug in some AWS creds and have it do the deployment and spit out a bunch of metrics for me that'd be wonderful.

indiv0 · on Oct 13, 2022

I just did a comparison between almost every hashing algorithm I could find on crates.io. On my machine t1ha2 (under the t1ha crate) beat the pants off of every other algorithm. By like an order of magnitude. Others in the lead were blake3 (from the blake3 crate) and metrohash. Worth taking a look at those if you’re going for hash speed.

I don’t have the exact numbers on me right now but I can share them tomorrow (along with the benchmark code) if you’re interested.

insanitybit · on Oct 13, 2022

The PR I have lets you provide the algorithm as the caller, although I did benchmark against fxhash and I think it would be a good idea to suggest `ahash`. I'm certainly interested.

`ahash` has some good benchmarks here: https://github.com/tkaitchuck/aHash/blob/master/FAQ.md

virtualritz · on Oct 13, 2022

aHash claims it is faster than t1ha[1].

The t1ha crate also hasn't been updated in over three years so the benchmark in this link should be current.

[1] https://github.com/tkaitchuck/aHash/blob/master/compare/read...

Edit: if you really think tha1 is faster I would open an issue on the aHash repo to update their benchmark.

ComputerGuru · on Oct 13, 2022

FYI Small hashes beat better quality hashes for hash table purposes.

QuadDamaged · on Oct 13, 2022

Hi, do you know if there's a recent hash benchmark I can look into? I am using `FnvHash` as my go-to non-crypto-secure hash for performance reason, didn't realise there could be faster contenders.

Thanks!

llimllib · on Oct 13, 2022

This is the best, most comprehensive hash test suite I know of: https://github.com/rurban/smhasher/

you might want to particularly look into murmur, spooky, and metrohash. I'm not exactly sure of what the tradeoffs involved are, or what your need is, but that site should serve as a good starting point at least.

insanitybit · on Oct 13, 2022

These seem decent: https://github.com/tkaitchuck/aHash/blob/master/FAQ.md