Hacker News new | past | comments | ask | show | jobs | submit login
Emulating AMD Approximate Arithmetic Instructions on Intel (ocallahan.org)
84 points by zdw on Sept 12, 2021 | hide | past | favorite | 27 comments



Since the post doesn't explain it, I'll note that while RSQRTSS is approximate reciprocal square root, RCPSS is just approximate reciprocal.


I wrote something up when I ran into these instructions last year: https://github.com/akkartik/mu/blob/main/linux/x86_approx.md


What's interesting (to me at least) is "oh right, the spec for this only says abs-diff less than 1.5 x 2^-12 or something" and that AMD and Intel parts may differ by a lot (in the example post and value, the AMD part is more accurate).


Yeah, for sure. I should just take it out of my project. I'm only supporting a subset of the instruction set anyway.


You're in good company though with the WASM folks. It does feel like you've strayed outside of your non-goal of Speed. Also though as a nit, there are double precision forms (e..g, VSQRTPD), so:

> When it upgraded SSE to SSE2 (in 2000), most of its single-precision floating-point instructions got upgraded to double-precision — but not these two.

isn't correct.

It's still not likely that you needed to care about these for mu.


Thank you for the correction! I had my blinkers on there and wasn't thinking of the vector instruction set.


Yikes.

A lot of code uses _mm_rsqrt_ps (sometimes) followed by a Newton-raphson update to compute a "precise" 1/sqrt(x). Here's a good example of NEON's rsqrt being sufficiently different from Intel, that more iterations were necessary for Embree on ARM [1].

Because I only cared about vectorization a long time ago, and AMD was so uncompetitive then, I'd bet a lot of code assumes that the SSE rsqrtps values match.

[1] https://github.com/lighttransport/embree-aarch64/issues/20


(Too late for edit)

Looks like Eigen also defaults to EIGEN_FAST_MATH which makes Eigen's psqrt ("packet sqrt") use _mm256_rsqrt_ps instead of _mm256_sqrt_ps [1].

Interestingly, the thing they're trying to avoid (long latency of sqrt vs rsqrt) hasn't been true for a long time on Intel processors, but apparently is still true for AMD parts according to Agner Fog's tables [2] (though maybe I'm reading them wrong, there is no vsqrtps entry for Zen2/3).

Hopefully, Eigen will separate the single global "fast math" config [3].

[1] https://gitlab.com/libeigen/eigen/-/blob/a75122584594fb98db0...

[2] https://agner.org/optimize/instruction_tables.pdf

[3] https://gitlab.com/libeigen/eigen/-/issues/1687


Neat, though I'm curious - when do instructions like this typically get used?


Normalizing a vector needs this. And it's super common as whenever you just want to have a direction you're likely going to need it. Your GPU also has a fast instruction for this because in graphics it's very very common operation.


To add to what others have already pointed out, inverse square root is an important operation in finance (for example in pricing options: calculating implied volatility in Black-Scholes Model).

However, if speed is that important to you (e.g. if you are an HFT), you don't even want to be calculating the inverse square root in your hotpath. Basically, the implied volatility is "seeded" once and then you update it using greeks (finance term for a derivative, don't ask me why).


Greeks are a family of derivatives that apply to options, not just one.

Delta is the amount an option goes up or down in price for every $ the underlying moves.

Gamma is the second derivative. The change in delta as a function of change in price of the underlying.

Theta is the amount an option goes down in price for each day you hold it (“time decay”)


You know you're in a forum of programmers when you implicitly switch a noun from plural to singular in a parenthetical remark and instead of inferring the meaning based on context, someone feels the need to correct you.


I didn’t read it as a correction to number but rather as a response to your “I don’t know why”


Indeed I wasn't trying to correct the parent - I actually found the explanation pretty cool. Rather to provide some color to the folks here without financial background. I figured an explanation of what the common greeks were would help folks without a financial background understand how it's possible to work backwards from them to an updated price.

Sorry if it didn't come across that way @alex_smart!


It definitely educated me!


Sometimes they aren't even real Greek!


(Yes, vega - we are looking at you)


Huh. I'm amused at the Wikipedia section on this [1].

> The use of Greek letter names is presumably by extension from the common finance terms alpha and beta, and the use of sigma (the standard deviation of logarithmic returns) and tau (time to expiry) in the Black–Scholes option pricing model. Several names such as 'vega' and 'zomma' are invented, but sound similar to Greek letters. The names 'color' and 'charm' presumably derive from the use of these terms for exotic properties of quarks in particle physics.

What's with all the "presumably"?

[1] https://en.wikipedia.org/wiki/Greeks_(finance)#Names


Who would know? Most of this stuff started off in industry rather in academia; probably somebody started writing a nu on the whiteboard sometime to mean sensitivity to implied vol and someone else gave it a Greek-sounding name that stuck.


It's before my time but I bet it's the finance equivalent of Philosophers (i.e. fashionable nonsense) science-washing their work - even if unknowingly.


Inverse square root is often used to normalize the length of a vector to 1.


The Newton iteration is nicer for the inverse square root than for the square root. You can refine an initial approximation for `1 / sqrt(x)`, and multiply the result by `x` to compute an approximation of `sqrt(x)`. This less direct approach only needs FP multiplications and additions (and the initial approximation).


Not a real answer because I've never deployed such a thing, but I could imagine using these functions to bootstrap an approximation procedure (e.g. newton-raphson, or something of the sort) so you're going to do an accuracy refinement anyways that is going to buy you more accuracy and guarantee convergence within <1 ulp in a subsequent step so you might as well use a fast instruction over an accurate one for the first step.


Presumably when speed of execution is more desireable than maximal precision. RSQRTSS for example could be useful for graphics applications I believe.


They compute reciprocal, or inverse, square roots. You may be familiar with the “fast inverse square root” code snippet: this is a hardware instruction for that operation, and is thus useful for games.


// what the fuck?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: