Beautiful idea: a number representation where the distribution of accuracy aligns with the distribution of typical use.
>Real numbers can’t be perfectly represented in hardware simply because there are infinitely many of them. To fit into a designated number of bits, many real numbers have to be rounded. The advantage of posits comes from the way the numbers they represent exactly are distributed along the number line. In the middle of the number line, around 1 and -1, there are more posit representations than floating point. And at the wings, going out to large negative and positive numbers, posit accuracy falls off more gracefully than floating point.
>“It’s a better match for the natural distribution of numbers in a calculation,” says Gustafson. “It’s the right dynamic range, and it’s the right accuracy where you need more accuracy. There’s an awful lot of bit patterns in floating-point arithmetic no one ever uses. And that’s waste.”
Ah. So it's because the machine learning people mostly work in the -1 .. 1 range, but it's possible for their numbers to go outside that range. So they need a compact representation which uses most of the possible values for the range of interest. If you're down to 8 bit numbers, something like this makes sense for custom machine learning hardware.
I wonder if it has graphics applications for high dynamic range images. Probably not.
The machine learning space probably has the easiest time to adopt a new number format. The players are big enough to define their own silicon and training/inference don't even need the same number format. As long as it works on their new specialized training hardware it's good to go.
The real challenge is probably in the power efficiency. As far as I know right now power consumption is the biggest cost factor when training new models. Everything else seems like a minor trade-off but when accounting for the power consumption there is real money involved.
Hardware is not cheap either at this scale. Reallly rough math follows, please anyone correct me if I err. A100 cards are >$10,000 each. A node is typically 8 and a cluster might be 64 A100s. So about a million in hardware for the medium cluster of 64 A100s. (Cross checking, a machine from lambda labs is apparently ~175k for 8 gpus, so 1.4m for the cluster above)
Those A100 use 300watts each, plus let’s say 500watts for the rest. (8gpus300+500)8servers ≈ 25Kw * 8760 hours = 219k KwH / year. So if your costs are $0.15/KwH that’s only on the order of $32k/year.
I see your numbers with the A100. However, consider the 3090 which is down to $1000 and uses 350W. Double that if you consider cooling and overhead in a data center. In Europe the electricity prices are through the roof and easily cost >0.5$/kWh. If you use those numbers a single card would cost you 0.35kW * 2 * 0.5$/kWh * 24h * 365 = 3066$ per year. So I guess it highly depends on your specific location and circumstances.
It’s still more complicated because you have to amortize costs over several years and then also keep in mind that big players will design their own hardware to avoid the A100 cost and energy. Disks and other devices will also consume power. And you datacenter cooling is now more expensive, etc.
Yes, it’s complicated, for sure. But if we amortize over 3 years, and triple costs for the power: it’s ~500k/yr for hardware and ~100k/yr for power.
In terms of TPUs or other custom accelerators, sure, they exist. However most definitely aren’t building their own hardware.
ETA: I’m not saying power is irrelevant, it clearly matters. But saying it’s the dominant financial constraint is clearly wrong, at least below Google/Amazon/Apple scale. Never mind the cost of the people running these trainings!
It might. The whole reason behind gamma correction is human brightness perception is logarithmic. By having more resolution at lower values, posits could correct for the reason we need in gamma in the first place, though maybe, not as well as gamma itself.
But if it mitigates error accumulation in perceptually-friendly ways, it might still have value in calculations (once there’s native support).
I've been studying color theory (its harder than you would think) on my spare time; and I did my undergraduate thesis was on posit few years ago. I think using posits may be a good idea to compress the luminosity, but I'm not sure how good it would be for the ab on a Lab-like format.
A format close to the cone fundamentals, like XYZ, could have benefits being encoded in some kind of non-standart posit.
I'll add this into the stack of things I eventually I'll look into.
1. IEE754 floats are already nonlinear. The precision is the highest around zero.
2. Bitmap images are not using floating-point values, except in some super niche use cases like GIS data, so “posits” are irrelevant for the use case you’ve posited. (ba-dum-ts)
3. Non-linear gamma is completely unnecessary for bit depths >= 16.
Floating point bitmaps are a lot more common than you think. A lot of consumers don't see them, but their software will internally use floats. Floating point bitmaps are standard practice in visual effects and animation. The OpenEXR format exists for the exchange of images with floating point bit depths up to 32.
That result happens for the same reason that 1 / 3 * 3 != 1, if you use decimal: 1 / 3 = .333, .333 * 3 = .999, which is different from 1.00.
0.1 is the same as 1 / 10, which does not have a finite representation in binary notation, just as 1 / 3 does not have a finite representation in binary or decimal notation.
This is a problem for all number systems. The true issue here is not a precision in the underlying bit representation of IEEE-754, it's that 0.1 and 0.2 aren't actually a valid IEEE-754 numbers, and so they get casted to their best approximations.
>Real numbers can’t be perfectly represented in hardware simply because there are infinitely many of them.
There are also infinitely many integers, but we can represent them just fine inside a finite bound. The problem with reals (and rationals to a lesser extent) is that, within any range we are interested in, a single real number can be infinitely 'long'.
I like this idea of choosing a representation for numbers based on the use case. Like, "ints" and "floats" but even more specialized.
For example, what would be the most efficient binary representation of probabilities values between 0 and 1? In the future, I can imagine hardware + software that is specialized for examples like this.
I feel like probabilities are best stored as inverse logistic function, because the closer the probability is to 0 or 1, the less you care about significant digits, because the events are going (not) to happen anyway. (That's why I am proponent of LNS.)
I feel that it's actually the opposite. There is a world of difference between 0.0...01 and 0.0...001 , where ... represents the same number of 0s in the two cases. Same applies for the other end.
edit:
Do I really want more resolution between 0.41 and 0.42 than between 0.01 and 0.02?
You're dead on. Metals are sold by how fine they are, so .999%, .9995% (should end in a 7 numeral not a 5, whole industry is fucking up right there, but at any rate) .9999%, more nines more purity better experimental results. When they talk about 4 nines or 5 nines the actual measurement is of the
f(x)= - log(1 - x). That is the purity. Negative of the log of the complement. And that's why 4.5 nines should be .99997% not fucking .99995%, so stupid (it's almost exactly .99997%). Mining engineers, come on! Such impressive degrees! PhD's all over the place! Professors even! I'm sorry, don't mind me with my math.
Some experiments like with liquid crystals early on required purities that what chemical company was it, Merck I think, around the year 1900, complained like they felt insulted. Told the inventor of liquid crystals because it was an absurd amount of purity required to get them actually working. But look at them go! Right in front of your very eyes!
The logistic curve becomes denser close to 0 and 1. Which makes sense: you will want to tell apart 1 defect per million from 0.01 dpm, and 5-sigma process (99.977%) from 6-sigma (99.99966%), much more than tell apart 30% from 30.001%.
real numbers can't be represented in hardware because most of them are incomputable. It's not that there are too many of them, it's that any one of them is unlikly to be representable as the output of any program.
I don't know about regime notation, but it is nice to see a new format with the most annoying ideas from IEEE754 removed:
- Two's complement instead of one's complement
- No infinities, no signed zeroes
- Only one exceptional value with the same encoding as the smallest signed integer: 10...0
The above means that comparisons work exactly like for signed integers (with the unique exceptional value behaving like -∞). Also, one can test whether x is invalid using `x == -x && x != 0` rather than the ugly `x != x`, i.e. no need to break reflexivity. Even if posits do nothing but remove the one's complement dust off IEEE754, that would be a positive change in my view.
As an aside, this is why I love HN (no /s). I have absolutely no idea what a posit is, and here is a community of experts nitpicking it, discussing it at length and in detail on this thread. No matter what the subject, SMEs always pop up out of nowhere and give the rest of us a crash-course education :)
IEEE754 uses sign-magnitude, not 1's-complement. It doesn't use any sort of complementation.
Edit: Also, it looks like posits also use sign-magnitude and not 2's-complement? I am confused as to where you are getting this from. [Edit again: I was wrong about this, see below.]
Yes, sign-magnitude is the right terminology, my mistake, but too late to edit. As for two's complement, look at slide 18-19 from the link somebody produced below:
Hm, they do indeed use 2's-complement... sort of. In case of a negative posit, they don't apply 2's-complement to the mantissa or anything like that, but rather to the entire posit without regard to the meaning of the bits. That's... kind of surprising, huh! Although it's not the first number format to do that. But it is a sort of 2's-complement, yes...
The big problem with posits is that its relative error depends on its value, this is terrible for a lot of engineering work and scientific simulations where you need to present an error estimate that includes computational error (for ML it's probably fine)
the relative error is bounded by 2^{-24} only in the interval [1.0e-6,1.0e6] for the Posit32 format, whereas it is [1.17e-38,3.4e38] for the IEEE binary32 format.
IMO, this isn't a big deal. If you want rigorous error estimates, you need to use some form of interval arithmetic (or ball arithmetic). Also, these types of engineering and scientific work are pretty much all 64 bit, while POSIT is mainly useful for <=32 bit. My ideal processor would have 64 bit floating point (with Inf/NaN behavior more like posits) and possits for 16 and 32 bit.
I think the problem is that you can't really restrict IEEEs values. I would like to have an error for infinities unless explicitly constructed (so number/number = infinity is an error but number * infinity = infinity is not).
So I understand the use case but they can't replace floats with those infinities remove imho. They complement them. You can work around those things by wrapping posits but sometimes you just need them (for example: I needed it yesterday). I don't always use them, but still somewhat frequently. They are not error codes, I work with the extended real number line [1] so infinity is a valid return value. What's the alternative...working with PositOrInfintiy?
I would probably use an exceptional value and later treat the exceptional like if I would encounter an infinity (usually they are later involved in calculations where infinites turn to 0). But that's just hard to understand for anyone but myself.
Maybe I work not low-level enough but I've never used `x != x` would use `x == -x && x != 0` because that's just not readable. I want a function (isinf, isnumber etc) and I am happy.
> I want a function (isinf, isnumber etc) and I am happy.
Sure, isnan(x) is what one should use. The fact that it is `x != x` is an implementation detail. The problem is that it is also a hack that breaks the usual mathematical axioms for equality and for order relations. For instance, if you want to sort floating-point values, you have to write your own comparison predicate in case there is a NaN, because a NaN is neither smaller, greater or equal to itself.
As for infinities, they somewhat work for real numbers, but it gets more complicated for complex numbers. For instance, Annex G of the C standard stipulates that an infinite complex number multiplied by a nonzero finite complex number should yield an infinite complex number. Sounds reasonable, but consider:
So Annex G recommends some complicated functions to be executed at each complex multiplication and complex division, which makes little sense for most applications, and I suspect few people do that. As an aside, Annex G breaks the whole point of NaNs, because it stipulates that numbers like (∞ + iNaN) should be considered infinities rather than NaNs, which means that NaNs are no longer necessarily viral.
All in all, what I find frustating with these aspects of IEEE754 is that they complicates things under the hood, but the benefits seem to me limited to some specialized applications.
There are a whole bunch of people who insist that lots of the "baggage" of IEEE 754 is really important and must not be abandoned. For most applications it is just extra garbage and for the few there are other ways to do what they want.
That may be true, but you don't need a fundamentally different number representation in order to throw away most of it. Also, the die size cost is quite small.
The biggest saving you could make is probably foregoing exactly rounded results, and only stipulate that the result has to be within ±1 lsb of the true value. That would save multipliers from computing a bunch of bits that don't end up in the result anyway, except for the rare case where they decide the rounding. That would probably be a good trade-off for most AI chips. For general purpose CPUs I don't think it is worth the breakage.
We literally had a post about "old not being broken" on the front page yesterday with a significant discussion on developer churn and now people are already clamoring on other pages to deprecate floating points. Let's have a little memory to shake us out of our habits, shall we?
Also in March Gustafson hosted the Conference for Next Generation Arithmetic (CoNGA) 2022 sorta 'inside' the Supercomputing Asia (SCA2022) conference. They covered a bunch of different areas not exclusively related to posits. Lots of cool topics including hardware implementations and optimizations, comparisons between different number formats, and a talk and panel from the posit committee about writing the standard and what's next for posits. The session videos are somewhat difficult to find so I made a playlist: https://youtube.com/playlist?list=PLBH9oUUfaYoQNl6-zr_ScsMp4... (Note the descriptions have talk titles and authors.)
Probably my favorite CoNGA2022 talk is the first one, A Case for Correctly Rounded Elementary Functions (starts at about 10m), which presents a methodology and library, already implemented into LLVM for floats, of elementary functions (lg, sin, cos, etc etc) that are 1. correctly rounded to the last bit, 2. are faster than the previous algorithms, 3. and can be downcasted to any smaller format while preserving correct rounding.
The posit standard requires correctly rounded elementary functions in order for an implementation to be considered compliant. This means that every standards-compliant posit implementation is now also deterministic.
I’d love to know what Kahan (the father of IEEE 754 floats) has to say about them. He thought that the „merits of schemes proposed in“ earlier versions were „greatly exaggerate[d]”.
I did some reading on unum a while back, inc. Kahan's criticisms. There's another side.
There's a vid on youtube with Kahan and Gustafson talking over it. Kahan goes in like a wolf onto a lamb, much to gustafson's shock and hurt. Gustafson points out some of Kahan's criticisms are plain wrong, even citing the page in the book, then says something like "we should be working together on this, why aren't we working together?" Kahan doesn't respond.
Kahan may be right or wrong but his attitude is weirdly hostile, and Gustafson is no n00b about floats.
Gustafson is a noob. Both his Unum I and II proposals are completely impractical, making absolutely no consideration for how hardware would implement his ideas. But by writing only about the imagined merits he managed to convince a lot of people that his ideas were great.
Unum III/Posits can work, but mainly it is just a number compression format. Working with say 64 bit Posits pretty much requires implementing arithmetic equivalent to that of 80 bit IEEE floats, and then throwing away a larger or smaller portion of the mantissa depending on the exponent. And the extra accuracy around the sweet spot of 1 still comes at the cost of lost accuracy for large and small numbers, so some workloads will suffer.
Having watched that video, Gustafson seems to me much more vengeful and hostile than Kahan. There was one point--where an audience member was asking after the impact of variably-sized arithmetic types on implementing algorithms--where Gustafson basically says "you guys don't know how to code for modern computers anymore", which is when the moderator has to step in to keep things from escalating.
The answer to your question--why isn't Kahan working with Gustafson--is that, in Kahan's view, interval arithmetic (this is, AIUI, the main thrust of unums) isn't actually an effective solution to the "problem" of needing numerical analysis. While Kahan isn't the best at explaining this in detail, he does point out two valid issues: interval arithmetic can give excessively pessimistic ranges (because it doesn't account for correlated error), and it can give just plain incorrect answers when you have singularities in ranges.
I would note that, as Gustafson is no longer (as far as I know) pushing for the interval arithmetic approach, this is basically a concession that Kahan was right and Gustafson was wrong.
The(?) posit website has a pdf[1] which goes into more details under "3.3 Posit format encoding". It also mentions a "quire" (and the footnote for the quire value is self referential?) , can anyone explain that concept?
Wasn't that a gigantic issue with x87 floating point implementation? It would internally use 80 bit registers and the result of any computation was completely at the mercy of how the compiler used those 80 bit registers as every move to system memory would drop precision back to the 32 bit or 64 bit specified by the program.
The footnote for point 4.3 in the linked document above explicitly calls out optimization modes and possible non compliance. Yeah, its the x87 80 bit registers all over again.
Except the compilers will not be dumb enough to use it implicitly when you don't ask for it, just like they don't automatically use FMA now (except maybe with -ffast-math).
Nope. C/C++ think it's OK to use fma when you didn't ask for it with default compiler settings. Also, with default settings, GCC is willing to replace single precision math with double precision math if it feels like it.
I'm not aware of any time GCC does that except due to C++'s promotion rules (e.g. float + double -> double), which is a problem with C++, not the compiler. I write deterministic software using floating point in C++.
Yes, it's not replacing 64-bit math with 80-bit math, x87 just doesn't have proper 64-bit floats with correctly sized exponent-field and the flags here interpret float literals as "long double" if I remember correctly. It's just the 80-bit x87 problem referred to earlier in the thread. The workarounds mentioned in the github aren't actually enough. Compilers cannot do IEEE-compliant computation on x87 without relatively large performance penalties (I made a library that did it).
> Nope. C/C++ think it's OK to use fma when you didn't ask for it with default compiler settings.
No, it doesn't. You have to request #pragma STDC FP_CONTRACT ON explicitly, or use -ffp-contract flag-equivalent (implied by -ffast-math or -Ofast) on most compilers. icc is the only major compiler that actually defaults to fast-math flags.
> Also, with default settings, GCC is willing to replace single precision math with double precision math if it feels like it.
I'm less familiar with gcc than I am with LLVM, but I strongly doubt that this is the case. There is a provision in C/C++ for FLT_EVAL_METHOD, which indicates what the internal precision of arithmetic expressions (which excludes assignments and casts) is, and this is set to 2 on 32-bit x86, because x87 internally operates on all numbers as long double precision, only explicitly rounding to float/double when you tell it to in an extension. But on 64-bit x86, FLT_EVAL_METHOD is 0 (everybody executes according to their own type), because SSE can operate on single- or double-precision numbers directly.
Novel number representations are cool. I came across Paul Tarau's "hereditarily binary natural numbers" a few years ago, and via them, Knuth's earlier TCALC representation (one of the many small side projects Knuth has done over the years, which for some reason hasn't gotten much attention).
They go pretty much the other way than this representation: they allow huge numbers, much larger than can be represented in regular notation (incl. floating point) to be represented and calculated with efficiently and exactly. Tarau's system improves slightly on Knuth's in that representations are unique. He also proves that they at worst require twice as many bits as regular representation.
Knuth's and Tarau's system are variable-length and limited to naturals, but it seems like it would be easy enough to extend them to rationals and fix the representation length.
> A posit processing unit takes less circuitry than an IEEE float FPU. With lower power use and smaller silicon footprint, the posit operations per second (POPS) supported by a chip can be significantly higher than the FLOPS using similar hardware resources.
The argument seems to be predicated on the cost associated with NaN-handling. The exposition comes across as somewhat arrogant, IMO:
> If a programmer finds the need for NaN values, it indicates the program is not yet finished, and the use of valids should be invoked as a sort of numerical debugging environment to find and eliminate possible sources of such outputs.
Meanwhile, from the article where people actually tried implementing this thing on an FPGA:
> They also found that the improved accuracy didn’t come at the cost of computation time, only a somewhat increased chip area and power consumption.
Posits have a variable size mantissa, and the largest mantissa for a given bit size (eg, 32 bit floats) is larger than the mantissa in the corresponding IEEE float. For example, in a 32 bit float, the IEEE mantissa is 23 bits, while the maximum size Posit mantissa is 27 bits. So a Posit FPU requires more mantissa bits than the IEEE FPU of the same bitsize, and this is why the Posit FPU will have a larger silicon footprint. IEEE needs a bit more silicon to manage all the special cases of IEEE logic, but apparently this is less than the extra transistors required for the Posit mantissa. (The extra cost for the mantissa is related to the better accuracy reported for posits.)
Gustafson's paper is old, and doesn't reflect the language of the recent Posit standard. He may have had a different silicon implementation in mind than what has been implemented. Gustafson says there is no NaN. But in the Posit standard, there is a single NaN-like value called NaR (Not a Real). In IEEE, 0/0 is NaN, while in Posit, 0/0 is NaR. The rules for NaN and NaR are different, so they have different names.
It's possible Gustafson's paper doesn't consider the scaling as you get to wider types, or the presence of NaR removes enough of the savings from killing off NaN that it's a wash.
I think Gustafson may be the kind of liar that doesn't know he is lying. He knows saying that his design improves upon power and die size sounds good, so he says it.
Nope, I have seen it tested on an fpga, many years ago. The reference was Berkeley hardfloat. This might not be the best implementation ever, but it is the most accessible open source one.
Where do you save transistors? With the Posit you have to be able to deal with larger mantissa, multiplier size scales with mantissa bits squared, so even a small increase makes quite a mark.
Maybe you save a bit by not having denormals, but then parsing the packed float is a bit more complicated in that the bits do not have a fixed division between exponent and mantissa.
It is possible that the Posit circuit was smaller by leaving out some feature, like exact rounding, which is quite expensive, but then it is not an apples-to-apples comparison.
you save transistors but not dealing with NaNs and Infs also. you also save some because your comparison operations just use signed integer comparison so you don't need hardware to do those. my guess is that when you combine these effects it could be a net win for 16 bit. also bigger multipliers might be slightly sub-quadratic by using karatsuba or similar.
That's a really interesting idea. On the other hand it does add quite a lot of complexity to something that is quite simple. And it raises some awkward questions like how do you represent NaN for unsigned integers and if you don't then you have the awkward fact that there's one more unsigned than signed number.
I can imagine an alternative timeline where it worked like that though.
+1. I do that sometimes, have INT_NAN map to INT_MIN. It is of limited practical use without hardware support such that as in FLOAT_NAN, any operation involving it, results in NAN. Still - I find that improves my code readability, so small +ve gain imho.
posits still have Near which is a single NaN like value. the problem with floats is that they have -0, -inf, inf and a bunch of NaN values. float16 for example wastes over 1000 of the 65000 numbers or can represent on non-finite numbers.
On signed integers, -0x80000000 returns 0x80000000 itself since there's no positive value opposite to this negative value, so 0x80000000 as NaN would make sense, but alternatively 0x80000000 as projective infinity would also make sense
Of course nothing for free, but the point is to trade things you don't need for things you do.
I also expect that chip area and power consumption for IEEE floating point has been optimized quite a bit more than for this novel number format, so there may be some efficiency left on the table on those particular metrics.
The posit is a compressed floating point encoding with a flexible data structure with a trade-off of precision (the number of bits after the dot) and magnitude (the absolute value).
It uses more hardware because the posit is decoded into a floating point larger than a equivalent IEEE754 with the same number of bits. So, the logic units needs to be larger.
>"With their new hardware implementation, which was synthesized in a field-programmable gate array (FPGA), the Complutense team was able to compare computations done using 32-bit floats and 32-bit posits side by side. They assessed their accuracy by comparing them to results using the much more accurate but computationally costly 64-bit floating-point format.
Posits showed an astounding four-order-of-magnitude improvement in the accuracy of matrix multiplication"
This is nothing short of amazing!
I can just imagine future GPUs on FPGA's using Posits rather than (now old!) IEEE floating point formats...
(Also, on a probably unrelated note -- might Posits be where we get the future equivalent of Star Trek TNG's character Mr. Data's Positronic (Posit-ronic!) -- AI "brain" from? ???)
One particularly stark 'issue' I've noticed with floating point values in AI (which posits would seem to help with) is the difference between 0 and 1. In binary classification problems we generally assign 1 to one subset of our data, and 0 to its complement - there is not necessarily an inherent asymmetry here and the problem will often not meaningfully change if you switch labels 0<->1.
But... if we take for example float16, the smallest model prediction you can represent greater than 0 is 2^{−24} = 5.96*10E−8, but the largest you can represent _less_ than 1 is (binary) 0.111... = 1-2^{-10} = 1-9.77*10E-4. So values around 1 are about 4 orders of magnitude more quantized than those around 0. I don't know if that's necessarily a 'problem' but I have noticed this fact when looking at model predictions.
That is fundamentally the same for Posits. If you want constant quantization you should use fixed point, or you could switch your inputs to be +1/-1 for symmetry in this specific case, but I doubt that this is much of a practical issue.
I remember going to an AI meetup where the guy presenting was proposing a totally new computing architecture.
I quickly suspected he wasn't very competent about how computers worked, and tried to probe a bit. I suspected he was mentally ill but could "talk the talk" enough to convince people his ideas had merit.
But this: A more efficient way of handling floating-point numbers, is probably something worthwhile. I wonder how hard it will be to recompile existing software to take advantage of posits?
Well, art is art. It doesn't matter who's making it, what their background is, or if they follow preconceived rules.
That doesn't apply to engineering because it either works or it doesn't. The presenter misunderstood basic concepts of information. If you don't understand those, you can't design a working computer.
Today, both integer addition and multiplication is IMHO done in similar speeds in silicon, although multiplication circuits are way larger.
In the logarithmic representation, multiplication becomes addition. I don't think that just by doing everything in logarithms you really increase the overall complexity, rather you transfer it from multiplication onto addition, so I don't think adding logarithms should be more expensive than integer multiplication..
Should it then matter for neural networks, which IIRC require similar amount of additions and multiplications?
Even floating-point is actually trading off the simplicity of addition in order to make multiplication easier, because they are quasi-logarithmic representation.
However, I wonder if there is a numeric representation where the circuits for addition and multiplication are of similar complexity. Something like half-logarithm, which if applied twice, you get the logarithm.
Addition in logarithmic space is expensive, we don't have any neat ways of doing it. Options include:
* Converting back and forth to linear space, using exp and log functions, this is much slower than a regular multiplication.
* Evaluating a polynomial that approximates the logarithmic space add, this takes several multiplications, so also much slower.
In general we tend to use more additions than multiplications, so trading addition speed for multiplication speed is rarely a good idea, even 1 for 1. If we need a lot of exponentiation keeping some values in logarithmic space may be beneficial, but it has to be those rare cases only.
I've done work with big numbers and this format screams problems.
I can see how it would be great depending on the domain, and wish there was more diversity out there in the wild in this area but this seems a little hyped? Maybe I'm missing something. I'm not saying this doesn't seem useful, just that the article seems to be presenting it as a replacement for floats, when it might be better thought of as another option?
>Real numbers can’t be perfectly represented in hardware simply because there are infinitely many of them. To fit into a designated number of bits, many real numbers have to be rounded. The advantage of posits comes from the way the numbers they represent exactly are distributed along the number line. In the middle of the number line, around 1 and -1, there are more posit representations than floating point. And at the wings, going out to large negative and positive numbers, posit accuracy falls off more gracefully than floating point.
>“It’s a better match for the natural distribution of numbers in a calculation,” says Gustafson. “It’s the right dynamic range, and it’s the right accuracy where you need more accuracy. There’s an awful lot of bit patterns in floating-point arithmetic no one ever uses. And that’s waste.”