Hacker News new | past | comments | ask | show | jobs | submit login

Did the challenge limit submissions to only using SSE2? Seems odd, given the prevalence of SSE4.2 support.

PMULUDQ is in SSE2, though I haven't checked if that's usable for the problem here. There's also PMULLD in SSE4.1 if you only need a 32-bit result. But for summing digits, perhaps SSE2's PMADDWD could be sufficient?




The official 1BRC was Java-only, so no using any architecture-specific SIMD at all; the test system did have AVX2 though, and that's what most non-competing native solutions (including mine) targeted.

Completely forgot about pmuludq, that works too for SSE2. But a 32-bit result is insufficient for the magic number method, needs to be at least 36-bit. I originally used vpmaddubsw+vpmaddwd, but switched to vpmuldq for the reduced register pressure, and I was already only parsing 4 numbers in ymm registers so the 64-bit result didn't affect me (after parsing 4 temperatures I immediately did the respective hashmap stuff for each).




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: