Hacker News new | past | comments | ask | show | jobs | submit login

Mostly agree, but there is actually a mismatch between madd_epi16 and Arm. Implementing Arm semantics or x86 on the other requires ~5 instructions, but if we generalize the definition to allow reordering (e.g. Highway's ReorderWidenMulAccumulate [1]), it's only 2 instructions.

1: https://github.com/google/highway/blob/master/g3doc/quick_re...




Indeed, and your comment led me to find additional issues with my port of _mm_madd_epi16.

I agree it would perhaps be possible to find better semantics for SIMD that kinda gloss over all the differences. That would be cleaner but require a lot of names. Well I suppose that's what Highway does, isn't it?


:) Yes indeed! Always happy to discuss suggestions for new intrinsics via Github issues.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: