Emulating AMD Approximate Arithmetic Instructions on Intel

Sniffnoy · on Sept 12, 2021

Since the post doesn't explain it, I'll note that while RSQRTSS is approximate reciprocal square root, RCPSS is just approximate reciprocal.

akkartik · on Sept 13, 2021

I wrote something up when I ran into these instructions last year: https://github.com/akkartik/mu/blob/main/linux/x86_approx.md

boulos · on Sept 13, 2021

What's interesting (to me at least) is "oh right, the spec for this only says abs-diff less than 1.5 x 2^-12 or something" and that AMD and Intel parts may differ by a lot (in the example post and value, the AMD part is more accurate).

akkartik · on Sept 13, 2021

Yeah, for sure. I should just take it out of my project. I'm only supporting a subset of the instruction set anyway.

boulos · on Sept 13, 2021

You're in good company though with the WASM folks. It does feel like you've strayed outside of your non-goal of Speed. Also though as a nit, there are double precision forms (e..g, VSQRTPD), so:

> When it upgraded SSE to SSE2 (in 2000), most of its single-precision floating-point instructions got upgraded to double-precision — but not these two.

isn't correct.

It's still not likely that you needed to care about these for mu.

akkartik · on Sept 13, 2021

Thank you for the correction! I had my blinkers on there and wasn't thinking of the vector instruction set.

boulos · on Sept 12, 2021

Yikes.

A lot of code uses _mm_rsqrt_ps (sometimes) followed by a Newton-raphson update to compute a "precise" 1/sqrt(x). Here's a good example of NEON's rsqrt being sufficiently different from Intel, that more iterations were necessary for Embree on ARM [1].

Because I only cared about vectorization a long time ago, and AMD was so uncompetitive then, I'd bet a lot of code assumes that the SSE rsqrtps values match.

[1] https://github.com/lighttransport/embree-aarch64/issues/20

boulos · on Sept 12, 2021

(Too late for edit)

Looks like Eigen also defaults to EIGEN_FAST_MATH which makes Eigen's psqrt ("packet sqrt") use _mm256_rsqrt_ps instead of _mm256_sqrt_ps [1].

Interestingly, the thing they're trying to avoid (long latency of sqrt vs rsqrt) hasn't been true for a long time on Intel processors, but apparently is still true for AMD parts according to Agner Fog's tables [2] (though maybe I'm reading them wrong, there is no vsqrtps entry for Zen2/3).

Hopefully, Eigen will separate the single global "fast math" config [3].

[1] https://gitlab.com/libeigen/eigen/-/blob/a75122584594fb98db0...

[2] https://agner.org/optimize/instruction_tables.pdf

[3] https://gitlab.com/libeigen/eigen/-/issues/1687

Varriount · on Sept 12, 2021

Neat, though I'm curious - when do instructions like this typically get used?

sharpneli · on Sept 12, 2021

Normalizing a vector needs this. And it's super common as whenever you just want to have a direction you're likely going to need it. Your GPU also has a fast instruction for this because in graphics it's very very common operation.

alex_smart · on Sept 12, 2021

To add to what others have already pointed out, inverse square root is an important operation in finance (for example in pricing options: calculating implied volatility in Black-Scholes Model).

However, if speed is that important to you (e.g. if you are an HFT), you don't even want to be calculating the inverse square root in your hotpath. Basically, the implied volatility is "seeded" once and then you update it using greeks (finance term for a derivative, don't ask me why).

arcticbull · on Sept 12, 2021

Greeks are a family of derivatives that apply to options, not just one.

Delta is the amount an option goes up or down in price for every $ the underlying moves.

Gamma is the second derivative. The change in delta as a function of change in price of the underlying.

Theta is the amount an option goes down in price for each day you hold it (“time decay”)

alex_smart · on Sept 12, 2021

You know you're in a forum of programmers when you implicitly switch a noun from plural to singular in a parenthetical remark and instead of inferring the meaning based on context, someone feels the need to correct you.

gumby · on Sept 12, 2021

I didn’t read it as a correction to number but rather as a response to your “I don’t know why”

arcticbull · on Sept 13, 2021

Indeed I wasn't trying to correct the parent - I actually found the explanation pretty cool. Rather to provide some color to the folks here without financial background. I figured an explanation of what the common greeks were would help folks without a financial background understand how it's possible to work backwards from them to an updated price.

Sorry if it didn't come across that way @alex_smart!

azalemeth · on Sept 13, 2021

It definitely educated me!

mhh__ · on Sept 12, 2021

Sometimes they aren't even real Greek!

NovemberWhiskey · on Sept 13, 2021

(Yes, vega - we are looking at you)

boulos · on Sept 13, 2021

Huh. I'm amused at the Wikipedia section on this [1].

> The use of Greek letter names is presumably by extension from the common finance terms alpha and beta, and the use of sigma (the standard deviation of logarithmic returns) and tau (time to expiry) in the Black–Scholes option pricing model. Several names such as 'vega' and 'zomma' are invented, but sound similar to Greek letters. The names 'color' and 'charm' presumably derive from the use of these terms for exotic properties of quarks in particle physics.

What's with all the "presumably"?

[1] https://en.wikipedia.org/wiki/Greeks_(finance)#Names

NovemberWhiskey · on Sept 13, 2021

Who would know? Most of this stuff started off in industry rather in academia; probably somebody started writing a nu on the whiteboard sometime to mean sensitivity to implied vol and someone else gave it a Greek-sounding name that stuck.

mhh__ · on Sept 13, 2021

It's before my time but I bet it's the finance equivalent of Philosophers (i.e. fashionable nonsense) science-washing their work - even if unknowingly.

snovv_crash · on Sept 12, 2021

Inverse square root is often used to normalize the length of a vector to 1.

pkhuong · on Sept 12, 2021

The Newton iteration is nicer for the inverse square root than for the square root. You can refine an initial approximation for `1 / sqrt(x)`, and multiply the result by `x` to compute an approximation of `sqrt(x)`. This less direct approach only needs FP multiplications and additions (and the initial approximation).

dnautics · on Sept 12, 2021

Not a real answer because I've never deployed such a thing, but I could imagine using these functions to bootstrap an approximation procedure (e.g. newton-raphson, or something of the sort) so you're going to do an accuracy refinement anyways that is going to buy you more accuracy and guarantee convergence within <1 ulp in a subsequent step so you might as well use a fast instruction over an accurate one for the first step.

_0ffh · on Sept 12, 2021

Presumably when speed of execution is more desireable than maximal precision. RSQRTSS for example could be useful for graphics applications I believe.

saagarjha · on Sept 12, 2021

They compute reciprocal, or inverse, square roots. You may be familiar with the “fast inverse square root” code snippet: this is a hardware instruction for that operation, and is thus useful for games.

yissp · on Sept 13, 2021

// what the fuck?