Bit Patterns of Float

dvt · on Sept 7, 2021

I'm not sure why the IEEE's original floating point specs didn't manage to make the NaN "quiet" (position 22) bit be the the leading "sign" bit -- although I'm sure there's a good reason. The NaN vs -NaN distinction is purely representative and has no semantic implications. As opposed to negative infinity or negative zero, for example.

Also worth adding is that NaN comparisons will break catastrophically when using gcc's -ffast-math`.

dbaupp · on Sept 8, 2021

This would mean hardware/software tricks like those below wouldn’t work directly and would need additional gates/operations, because they’d noticeably change the meaning of NaNs.

   -x     => x xor 0b100…00
   abs(x) => x and 0b011…11

dvt · on Sept 8, 2021

Yeah, I'm sure plenty of things would break, though I don't really think the IEEE was thinking of bit-twiddling trickery when they were writing the spec.

dbaupp · on Sept 8, 2021

I get the sense that implementation efficiency was considered as part of the spec: if it wasn’t efficient, vendors with existing FP hardware would have less incentive to switch. Allowing operations to happen via simple bit twiddling (in software or hardware) would be part of that.

mike_hock · on Sept 8, 2021

I mean, what else were they considering if not implementation efficiency? It's not hard to come up with some representation for floating point numbers if you don't have to be concerned with efficiency.

gumby · on Sept 8, 2021

Also correctness. There were a lot of implementations already that had poor behavior at different points in the number space (especially with respect to very small numbers).

Kahan also had experience because of his work for Intel on FP coprocessing. He couldn’t talk about the then not-yet-released parts, but he pushed hard for implementation decisions that other committee members said were too expensive but that he knew had been solved at Intel.

volta83 · on Sept 8, 2021

We had quite a bit of existing FP hardware before the IEEE spec.

Code using that hardware was not portable, because each FP unit did things subtly (and sometimes not that subtly) different.

All those already existing hardware units were already doing this type of bit trickery. That's one of the main things that differentiated units from each other.

The people that developed these units all had a stake and were involved in the IEEE process.

That's what "standardization" used to mean. Once you have enough implementations of something that compatibility is an issue, then, with the hindsight of all those implementations, you create a standard.

Today, the meaning of "standardization" is lost. I've seen so many standards that are based on zero actual experience, that had no implementations before the process began, and that 10 years later after many people invested thousands of man hours into a standard document, still have no implementation, some of which, like OpenCL 2 and OpenCL 3 have ended up being "retracted" and we are back to OpenCL 1.

This is particularly true for Khronos standards, Vulkan being the most recent exception. Khronos failures include OpenCL 2, OpenCL 3, OpenMP, SyCL (getting first implementations now, 5-6 years before the first standard was written....). Even for the relatively widely used OpenMP standard, now at version 5, there are still so many features of OpenMP3 that have never been implemented and are arguably useless, that it isn't even funny. Same for OpenMP4 and 5 which mainly added "useless" GPU support (GPU support is important, but the way it was added makes it useless).

The reason Khronos standard suck is because "designers" that can't program hello world write some idea on toilet paper, and sign it off as a standard. Then they ask "the community" to implement it for them, and either nobody cares, or the problem the standard solves are problems nobody has, or the ideas are just wishful thinking that aren't implementable, etc.

The C++ standard is going the same route with signing off on tons of stuff that has little implementation experience, zero proven actual practical value delivered to users.

The best C++ standard spectacular failure example is Concepts. 15 years and thousands of man years invested in designing Concepts, a feature whose main goal is to improve template error messages. Turns out, concepts actually make template error messages worse. They aren't completely useless, but still... who would have thought, right? Well, turns out that every standard that was standardizing standard practice would have... known this... you know... from actual experience...

steerablesafe · on Sept 8, 2021

C++ concepts were prototyped before getting into the standard (many major compilers had experimental implementations) and vastly improve the replaced enable_if mess. clang concept error messages prove that it can be done right too (although clang also improved enable_if error messages in the meantime).

volta83 · on Sept 8, 2021

> clang concept error messages prove that it can be done right too

Clang concepts error messages are longer that the same error message without concepts.

99% of the error in both cases is just garbage.

Kranar · on Sept 8, 2021

>The best C++ standard spectacular failure example is Concepts.

Concepts are more of a disappointment but they still technically work... I experimented with porting our codebase over to use concepts and all it does is substitute one mess for another mess. That said at the very least with concepts there is now a "standardized" mess that one can use as opposed to the ad-hoc situation that precedes it.

I'd say modules are a bigger example of an actual failure. There were two proposals for modules, clang's and Microsoft's. Clang's built off of Objective-C modules which is widely used on iOS and has a working implementation. Microsoft's modules were built off of... nothing. You can guess which one got standardized.

This along with all kinds of political bickering over the ABI is why clang has stepped down from its various roles in the standardization process and it's a really unfortunate outcome not only for C++ standardization but also for the broader C++ community.

volta83 · on Sept 9, 2021

Modules is arguably another failure.

The main goal of modules is to improve compilation times, but if you already have a modularized code base, and are using PCH, then modules don't improve compile-time, at all.

Migrating from PCH to modules is quite a lot of work, for minimal added value, and a lot of disappointment.

djmips · on Sept 8, 2021

I enjoyed how you responded to a specific hypothesis but then took it into a polemic about standards and then a diatribe about C++. A little off the rails but I couldn't agree more.

jcelerier · on Sept 8, 2021

> The best C++ standard spectacular failure example is Concepts. 15 years and thousands of man years invested in designing Concepts, a feature whose main goal is to improve template error messages. Turns out, concepts actually make template error messages worse. They aren't completely useless, but still... who would have thought, right? Well, turns out that every standard that was standardizing standard practice would have... known this... you know... from actual experience...

I generally agree with your posts but concepts have definitely improved my code, I would never in a lifetime go back to SFINAE trickeries.

conistonwater · on Sept 8, 2021

I agree with the other comment about there being a lot of predecessors to IEEE floating point numbers. What happened was that when people designed other, non-IEEE floating point arithmetic, they sometimes got things wrong. Not wrong in that the hardware produced incorrect results, but that it was hard for people to write code that did what they wanted. The dominance of IEEE floating-point numbers (in part due to the fairly logical way it's designed, and at the time they took into account what they knew, about 40 years ago now) means that we have all forgotten how annoying things can be if you don't have IEEE numbers.

One of the well-known people in this research area is Bill Kahan, and on his webpage (http://people.eecs.berkeley.edu/~wkahan/) he has a bunch of, uhm, rants about floating-point arithmetic. If you read some of them, you can find clear explicit examples of arithmetic that's not IEEE-compliant, and does weird hard-to-understand confusing things that don't really happen any more. (For example, p.4 of http://people.eecs.berkeley.edu/~wkahan/Mind1ess.pdf, and you can dig around his page for more examples; none of these were done out of malice by programmers, they just didn't think things through, which is what Kahan is writing about)

kaoD · on Sept 8, 2021

2021's bit-twiddling trickery was 1985's engineering.

pklausler · on Sept 8, 2021

There's no standard bit position for that flag, unfortunately, or even a standard interpretation (flag set -> quiet/signaling). A NaN just have to have at least one bit set in its significand to distinguish the NaN from an infinity.

(So the fact that a binary value is a NaN is portable, but not the distinction between qNaN and sNaN.)

dbaupp · on Sept 8, 2021

I think that’s not the full story. The standard says:

A quiet NaN bit string should be encoded with the first bit (d1) of the trailing significand field T being 1. A signaling NaN bit string should be encoded with the first bit of the trailing significand field being 0

That does use “should” (not “shall”) and is described as a “preferred encoding”, so it’s standards-compliant to do something else, but it’s not recommended.

pklausler · on Sept 8, 2021

Sure, that's in the 2008 update. But OP was lamenting that the original IEEE-754 didn't encode qNaN/sNaN with the sign bit, no?

dbaupp · on Sept 8, 2021

Yeah, you’re right that was the 2008 version. Fair enough.

lazulicurio · on Sept 8, 2021

Probably so if you bitcast a floating point value into a signed int it maintains the proper sign for comparisons.

dvt · on Sept 8, 2021

I think casting a NaN to an int will always just return gibberish, signed or not. In fact, this is explicitly undefined behavior in C.

saagarjha · on Sept 8, 2021

Yeah, but why have a signed NaN at all? It’s not like they have a defined ordering anyways…

lazulicurio · on Sept 8, 2021

A "fun" thing to do if you really want to get a hang of IEEE floats is manually write the algorithms for a bunch of rounding operations (ceil, floor, truncate, half to even, etc).

infogulch · on Sept 8, 2021

Apparently the bit pattern of a float interpreted as an int is sorta-kinda it's own logarithm, which is why the famous fast inverse square root from quake 3 works: log(x^-½) = -½log(x) =~ -log(x) >> 1. This video is approachable and explains it pretty well: https://www.youtube.com/watch?v=p8u_k2LIZyo

thegeomaster · on Sept 8, 2021

I can't recommend https://float.exposed enough. A great tool for exploring floating-point representations.

nickcw · on Sept 8, 2021

For anyone that read the code at the end of this article and was wondering what the spaceship / flying saucer operator <=> was doing in a C++ program, I found a write up on this on stack overflow page:

https://stackoverflow.com/a/47466459/164234

According to that, this was added in C++20 and means

> There’s a new three-way comparison operator, <=>. The expression a <=> b returns an object that compares <0 if a < b, compares >0 if a > b, and compares ==0 if a and b are equal/equivalent.

I first saw this in Perl a long time ago and according to Wikipedia it is in Perl, Ruby, Apache Groovy, PHP, Eclipse Ceylon as well as C++ now.

https://en.wikipedia.org/wiki/Three-way_comparison

arthur2e5 · on Sept 8, 2021

I believe the IEEE 754 total ordering is identical to a classical recipe for fast FP comparison.[1] Of course, the handling of NaNs is only identical when the qNaN bit is at the MSB of the significand as recommended.

[1]: http://stereopsis.com/radix.html

Const-me · on Sept 8, 2021

I've learned most of these things while writing SIMD code in C++. PC processors have very fast bitwise instructions processing fp32 and fp64 vectors, like andnps, xorpd, and the rest of them.

ddtaylor · on Sept 8, 2021

> Zero is all-bits-zero. Its biased exponent is 00000000, or 127−127. ... 2^-127

Isn't 127-127 = 0, so it should be 2^0

ajnin · on Sept 8, 2021

The "biased" exponent has a constant 127 added to it, so for example if the bits are 00000000 the exponent is actually 0-127=-127, if the bits are 10000111 the exponent is 135-127=8, and so forth.

guitarbill · on Sept 8, 2021

If you want to describe how to get the bit pattern, that's the calculation based on the known exponent. So because 127 - 127 = 0, the bit pattern is zeros. Not the other way around ;)