In implementing Rust's serde_json library, I have dealt with both string-to-double and double-t...

dtolnay • last Thursday at 7:34 AM • 3 replies • view on HN

In implementing Rust's serde_json library, I have dealt with both string-to-double and double-to-string. Of the two, I found string-to-double was more complex.

Unlike formatting, correct parsing involves high precision arithmetic.

Example: the IEEE 754 double closest to the exact value "0.1" is 7205759403792794*2^-56, which has an exact value of A (see below). The next higher IEEE 754 double has an exact value of C (see below). Exactly halfway between these values is B=(A+C)/2.

  A=0.1000000000000000055511151231257827021181583404541015625
  B=0.100000000000000012490009027033011079765856266021728515625
  C=0.10000000000000001942890293094023945741355419158935546875

So for correctness the algorithm needs the ability to distinguish the following extremely close values, because the first is closer to A (must parse to A) whereas the second is closer to C:

  0.1000000000000000124900090270330110797658562660217285156249
  0.1000000000000000124900090270330110797658562660217285156251

The problem of "string-to-double for the special case of strings produced by a good double-to-string algorithm" might be relatively easy compared to double-to-string, but correct string-to-double for arbitrarily big inputs is harder.

Replies

nly • last Thursday at 8:27 AM

I guess one aspect of it is that in really high performance fields where you're taking in lots of stringy real inputs (FIX messages coming from trading venues for example, containing prices and quantities) you would simply parse directly to a fixed point decimal format, and only accept fixed (not scientific) notation inputs. Except for trailing or leading zeros there is no normalisation to be done.

Parsing a decimal ASCII string to a decimal value already optimizes well, because you can scale each digit by it's power of 10 in parallel and just add up the result.

vitaut • last Thursday at 2:25 PM

> Unlike formatting, correct parsing involves high precision arithmetic.

Formatting also requires high precision arithmetic unless you disallow user-specified precision. That's why {fmt} still has an implementation of Dragon4 as a fallback for such silly cases.

andrepd • last Thursday at 11:28 AM

For those wishing to read up on this subject, an excellent starting point is this comprehensive post by one of the main contributors of the fast algorithm currently used in core:

https://old.reddit.com/r/rust/comments/omelz4/making_rust_fl...

alt Hacker News

Replies