Hacker News new | past | comments | ask | show | jobs | submit | rdevulap's comments login


Great work. But what's their goal? Are they trying to make that GeLU approximation go faster? Things would probably go a lot faster going back to the erff().


Intel's x86-simd-sort 4.0 Delivers A 2x Boost For AVX-512 Performance, Adds AVX2 Code


Thank you for such detailed analysis! Just curious to know which version of x86-simd-sort did you benchmark: release v1.0 or the top of main current branch? (I'm the author of x86-simd-sort).


Hi, hope you found this analysis helpful. I vendored the code Feb 16 2023 from the main branch.


Of course! Appreciate all the time you put in. I added a few more optimizations to qsort after that (see https://github.com/intel/x86-simd-sort/pull/33), just wanted to know if your analysis took that into account.


Still using the simple way of getting the pivot though.


No matter how sophisticated the pivot selection is, you can always risk having some degenerate worst case. I recommend having something like a heapsort fallback after a certain recursion limit is reached, as do pdqsort, ipnsort and vqsort(I'm a little fuzzy what their fallback is, but they have one).


Yes, vqsort does indeed resort to Heapsort after too many recursions. I'd be surprised if that happens on real data, though, because we apply a lot more effort to the pivot selection. Would be curious to see any input distribution that triggers a fallback.


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: