The Eisel-Lemire ParseNumberF64 Algorithm

lifthrasiir · on Feb 7, 2021

So decimal-to-float algorithms are following the track of float-to-decimal algorithms:

- Algorithm M (Clinger, 1990): A bignum-based baseline algorithm

- Bellerophon + Algorithm R (Clinger, 1990): An integer-based approximation algorithm; Bellerophon has ~1 ULP error which gets fixed by Algorithm R

- strtod (Gay, 1990): A hybrid integer/float/bignum-based algorithm using modified Bellerophon

- double-conversion (Loitsch et al., 2010): A complete algorithm which verifies an error-bound approximation with correct float-to-decimal algorithms; this "round-tripping" approach was, I believe, first described by Jaffer (2013)

- Eisel-Lemire (2020): An integer-based approximation algorithm with error tracking

By comparison float-to-decimal algorithms went like this:

- Dragon4 (Steele Jr. and White, 1990; improved by Burger and Dybvig, 1996): A bignum-based baseline algorithm, still used when the arbitrary precision is required

- dtoa (Gay, 1990): A hybrid integer/float/bignum-based algorithm using modified Dragon4

- Grisu3 (Loitsch, 2010): An integer-based approximation algorithm with error tracking, can be combined with Dragon4 to produce a complete algorithm

- double-conversion (Loitsch et al., 2010): A hybrid integer/bignum-based algorithm using Grisu3

- Errol3 (Andrysco et al., 2016): The first complete (no fallback) algorithm without bignum; relies on FP with a small number of precomputed exceptions and turned out to be slower than Grisu3

- SwiftDtoa (Swift authors, 2018): Improves upon Errol3 to eliminate any exceptions; later adapts Ryu as well

- Ryu (Adams, 2018): A complete integer-based algorithm with no error tracking required

- (I noticed that there were some more developments around 2018--2020, which I haven't yet fully reviewed.)

I find it surprising that the d-to-f algorithm comparable to Grisu3 only appeared 10 years later. Maybe the round-tripping approach was fast enough that the needs weren't great so far.

aldanor · on Feb 7, 2021

Here's a more recent blog post: [1] and a published paper on arxiv: [2], describing the latest version of the algorithm which is implemented in fast_float C++ library [3].

Coincidentally, I've participated in this as well recently by implementing the algorithm in Rust (fast-float crate, see [4] for the source and benchmarks). It is insanely fast (easily over 1GB/s on an m1 machine) and beats all of the existing float parsers I'm aware of by a large margin, at least on the datasets we've tested it on.

[1] https://lemire.me/blog/2021/01/29/number-parsing-at-a-gigaby...

[2] https://arxiv.org/pdf/2101.11408.pdf

[3] https://github.com/fastfloat/fast_float

[4] https://github.com/aldanor/fast-float-rust/