rdevulap's comments

rdevulap · 2024-05-16T04:55:21 1715835321

On similar lines, faster tanh https://github.com/microsoft/onnxruntime/pull/20612

jart · 2024-05-16T05:58:27 1715839107

Great work. But what's their goal? Are they trying to make that GeLU approximation go faster? Things would probably go a lot faster going back to the erff().

rdevulap · on Oct 31, 2023

Intel's x86-simd-sort 4.0 Delivers A 2x Boost For AVX-512 Performance, Adds AVX2 Code

rdevulap · on June 10, 2023

Thank you for such detailed analysis! Just curious to know which version of x86-simd-sort did you benchmark: release v1.0 or the top of main current branch? (I'm the author of x86-simd-sort).

Voultapher · on June 10, 2023

Hi, hope you found this analysis helpful. I vendored the code Feb 16 2023 from the main branch.

rdevulap · on June 10, 2023

Of course! Appreciate all the time you put in. I added a few more optimizations to qsort after that (see https://github.com/intel/x86-simd-sort/pull/33), just wanted to know if your analysis took that into account.

rdevulap · on June 10, 2023

Still using the simple way of getting the pivot though.

Voultapher · on June 11, 2023

No matter how sophisticated the pivot selection is, you can always risk having some degenerate worst case. I recommend having something like a heapsort fallback after a certain recursion limit is reached, as do pdqsort, ipnsort and vqsort(I'm a little fuzzy what their fallback is, but they have one).

janwas · on June 11, 2023

Yes, vqsort does indeed resort to Heapsort after too many recursions. I'd be surprised if that happens on real data, though, because we apply a lot more effort to the pivot selection. Would be curious to see any input distribution that triggers a fallback.