Perhaps more pervasive and despairingly problematic is number representation. JSON only, per specification, supports arbitrary precision numeric representation. However, Javascript is -- also per specification (as I understand it) -- entirely floating point. That makes working with money in Javascript somewhat hazardous. While there are various arbitrary precision libraries out there for Javascript to assuage this, the problem is most JSON-parsing routines will always force one through a floating point conversion anyway, so a loss of precision is more or less inevitable.
While swiftly coercing raw off-the-wire number representations to one's arbitrary precision library of choice can avoid most cases of noticeable accumulated error, it is irksome that the only way everyone seems to get by is by cross their fingers that any loss in precision caused by "JSON.parse" is meaningless to their application.
Or, the problem can be soundly avoided by using strings in the JSON payload, which is lame but effective and probably one's best practical choice. It is clearly an example of corruption spreading from Javascript to an otherwise reasonable feature of the data representation format JSON.
JavaScript uses IEEE754 doubles for numbers, so as long as you can count on all of your parsers to do that, then you can rely on any integer up to 2^54 being represented exactly. For money, work in cents.
The "work in cents" thing usually works for same currency, but not when you involve foreign exchange:
Up until 4-5 years ago, the EUR/USD or GBP/USD or GBP/EUR were quoted with 4 decimals; Nowadays, there are quite a few places where you have to deal with the 5th decimal to make the books balance. It's going to be a roundoff error either way, but it might be a round-off error that keeps been counters, auditors and therefore eventually IT people awake at night.
"Works in cents" can be generalized for an integer with implied decimals. 4 or 5 decimal places gives you plenty of room to work with if you are just using a double to exactly represent an integer value.
That's almost never true using 32 bit math, is increasingly less and less true for 64 bit math (e.g., if you need 5 decimal places, you're left with just 13 digits - that's only trillions!) - but most importantly, a support nightmare - a lot of software written until 2002 or so assumed 4 digits is enough - Microsoft stuff (e.g. Visual Basic, I think other stuff too) had a "money" 4-decimal-digit fixed point type.
And then the market adopts a 5th digit - and you have to essentially rewrite all the math stuff.
If you're using language support (e.g., in VB until 2004 "dim cost as Money" was a declaration that was good for every purpose), then you're out of luck, and you do need a rewrite.
But even if you don't - this is far from trivial: Is your own implemented money type a fixed point type? If so, what is your fixed point? (e.g. Yen only needs 3 after the dot, but trillions in the front is not enough; GBP/USD needs 5 after the dot, but trillions usually IS enough).
Is it fixed per currency? Is it floating decimal point? The abstraction here is far from trivial if care about performance.
Of course, if you've used a language or a library with a usable decimal type (I'm sure there's other, but I've only been happy with Python), the abstraction has been taken care of.
But generally, abstracting money (value+currency) is not as simple as one would assume, and it is very rare that a production system gets it both right and future proof.
I don't think any of the common business programming languages let you add new types to the numeric tower. C++ does, through operator overloading. Haskell has very nice support for this; to get started all you need is "newtype Money = Money Rational deriving (Show, Eq, Ord, Ratio, Integral, Num)". If you later want to make the type more restrictive than Rational, you can do that without changing any calling code (which will work on normal Num instances, most likely).
Rather than arbitrary multiplication, you should probably have an OrderUnits type, then define a function of type Money -> OrderUnits -> Money that calculates the price of multiple units.
(Money 3 * 2 will return Money 6 if fromIntegral 2 = Money 2. But that is not really the operation you want to perform, so you might consider not implementing Num for Money.)
That's faster, but I think can be a poor trade-off because:
1. The speed gain is negligible for most programs
2. The addition of any pricing that requires fractional cents will require careful work to handle unit conversion to maintain integral representation.
#2 becomes ugly when one has an API many people use, and one must bother them to update their code paths to use the new, higher precision that can handle all money as integers. This is not even counting the case where one tries to cut a corner and someone ends up not doing the conversion they ought to.
Also, programs often are more lucid when operating in terms of the frequent units of choice, such as dollars or fractional dollars. Few domains price everything in cents by preference, because in aggregation often dollars -- sometimes many -- are exchanged. The problem gets worse if one needs weird units like 0.1c to regain integral numbers.
There are ways around these, but falling back on strings seems to me the lowest-maintenance option.
On the other hand, integral representations have few dependencies (a compliant javascript interpreter), which is also a pretty big plus for that.
The latter. Such as big.js (not an endorsement, my understanding is too weak for that).
Just about every practically used programming environment from every walk of life has such a thing in the standard library, and I suppose they are there for good reasons. However, the fact is that Javascript doesn't have one, so one will have to weigh dependencies into their decision.
Go - math/big
Java - BigDecimal
Python - decimal
Ruby - BigDecimal
PostgreSQL - numeric
Interesting mention: decimal in C#, deemed sufficient for monetary calculations at 128 bits of precision.
Another interesting mention: Oracle, which in my understanding handles just about all its numbers this way, by default. This might tell you something about their early customer base.
CS history lesson -- don't use rationals for money. You have no business messing with the denominator, so the extra freedom of using arbitrary rationals is just more rope to hang yourself with.
Lua does not provide integers, too. From their mailing list I have learnt that the IEEE rules guarantee, that e.g. doubles have absolutely now problem representing 32bit integers. You write the same here. It seems to not be intuitive, but IEEE took care of it.
But I wonder why you write that for values in 1/100 one should multiply by 100! Wouldn't an IEEE double have no problems with values with two decimal points, too?
Or are there many problems with that, e.g. that $0.01 + $0.01 might already cause rounding errors which do not even out by the clever IEEE rules?
Of course 2^54 woudln't be the highest value where one shouldn't worry anymore, but something smaller.
1/5 in decimal is 0.00110011... repeating in binary, therefore not exactly representable, therefore it could lead to errors. Not big errors, but enough that smart people would prefer not to use it for money.
To give a concrete example, 19.99 can't be exactly represented in floating-point, and ends up being 19.989something. If you have a price of $19.99 that's represented in floating-point, and if you truncate rather than round as you pass it off for display, you'll end up displaying a price of $19.98. With enough manipulation, the error can accumulate to the extent that rounding the result won't save you either.
The name cent shouldn't throw you off here as applying only to US currency. Many other currencies have a notional 1/100th unit of the basic one, that usually also exists as a cash value.
Yes, he clearly should have taken into account any possible subunit denomination, including exotic native tribe money units like sea shells, because international programmers reading HN are not expected to think of that of their own.
Such vitriol! Can your currency code handle sea shells? It should! :)
I only mention it because the yen thing surprised me once as a newbie programmer. (A tip: "%0.2f" is no substitute for actual i18n.) It never hurts to be aware of the edge cases.
The yen has no fractional units (anymore), so when I added support for that currency but continued to use the same format string as all the others, prices like ¥1000.00 looked unnatural.
The historical subunit of the yen is the "sen" (銭), which you often seen mentioned in prewar books. Postwar inflation made sen-denominated currency obsolete, but...
[A bit of googling also turns up the "rin" (厘), which is a thousandth of a yen...]
If sub-penny pricing is needed, usually myridollars (10^-4) suffice. Your example requires millidollars. I'm curious if there's a natural use case requiring sub-myri resolution.
The idea is that you have income for each loans, which then pay into bonds based on whatever rules may have been set. These rules often have terms like a fixed interest on the outstanding principal for senior bond issues, and then various divisions for the junior, with a weird "IO" piece that gets the leftovers that don't divide neatly. The rules can be anything that they were structured to be. The result surprisingly frequently is something where the allocation of the final penny in billions of dollars can be impacted by floating point ambiguity. (And the prospectus seldom will clarify this - the ultimate control lies in whatever the servicer's computer program does.)
Do not confuse "unambiguous" with "correct". If you use myridollars and the servicer used floating point with roundoff errors, the servicer's implementation is by definition correct.
All of the interchange formats that I can think of at 1 AM (including FIX and exchange-proprietary formats) are very specific regarding the data. For example, the FIX floating point specifically limits the number of significant figures to 15
Minor electronic components like surface mount capacitors have tiny prices. I just looked up a model that costs $0.00745.
EDIT: Millage taxes on property are denoted in mills—thousandths. The taxes are often fractional mills, like 2.225. So you need 10^-8 resulution at the least.
The only real solution for high precision numbers in JavaScript are string-based "big num" classes, unfortunately. Sending important numbers (banking, gas prices, etc) as the built-in number type is just setting yourself up for pain.
If I remember correctly, this was a problem with Facebook user IDs. I believe they ended up making the value a string in the API to deal with this (and padding new user IDs by some large power of ten so that developers would hit this code-path early enough to catch it).
> However, Javascript is -- also per specification (as I understand it) -- entirely floating point. [...] While there are various arbitrary precision libraries out there for Javascript to assuage this
Interesting! I hadn't seen those before.
That strongly suggests that browsers ought to add some built-in extension types for arbitrary-precision arithmetic, using fast native libraries like GMP. That would then allow such JavaScript libraries to use the native types when available, and their existing pure-JavaScript support when not.
Eventually, sure; I'd love to see some standardized signed/unsigned integers, for that matter. However, in the meantime, a browser could add an extension that existing JavaScript libraries could transparently use when available.
That just seems to be what happens when so many people have to collaborate.
Plus its still just mega young. Look at legal systems, they've been around forever and it can be your life's work just to understand them at a competency to participate.
The miracle of software isn't that it works, it's that it does anything at all.
That's ridiculous. JSON is replacing XML which is a well-established interchange technology that doesn't have all the problems of JSON (like no good way to represent 64 bit numbers, weak typing, barely any types in the first place, etc).
Yep, it only has strings either way (also, "weak typing" does not mean jack shit, and talking about typing strength for a serialization format makes zero sense)
Schemas are metadata, annotations to tell processors "treat this string as [some other datatype]" (note how it's not going to work if you're not using the schema and a schema-aware processor?)
And guess what? Nothing stops you from doing exactly the same thing in JSON. In fact you don't have much of a choice for the datatypes JSON doesn't natively support (dates being the most common one, but not the only one by any mean). And good JSON interfaces provide for embedding transcodings directly in the parsing or dumping (that's what the `reviver` and `replacer` arguments do in JSON.parse and JSON.stringify) for exactly that purpose.
> PS: I still don't like XML, but your comment is technically incorrect.
Nope. Specific XML dialects may have non-string datatypes (XML-RPC certainly does), but XML only has strings. In the same way CSV only has strings, but specific CSV uses may have more. That's the plain facts of the matter.
That's quite a conclusion to jump to. Maybe you haven't noticed but the Web is running great, especially when developed by thoughtful, competent programmers.
It's sad how many non-web developers still complain about how the Web is sour grapes.
It's not that the people working on it haven't been great, it's that there's so much path dependence, it's frustrating. Which is not at all to say that there isn't path dependence in plenty of other domains.
The "uniname" command in the uniutils package (apt-get install uniutils) is great for checking "invisible" or weird Unicode characters. `echo '{"str": "own ed"}' | uniname` will show the LINE SEPARATOR (\u002028) hidden in the string.
> Some libraries implement an unsafe, but fast, JSON parse using “eval” for older browsers
eval is not fast! In fact it is the opposite of fast. Most JIT optimizations go away in the face of eval()! Do not use it even if you know it's safe. Use JSON.parse instead.
This is a good catch, but the presentation is awfully passive aggressive. Yes, if JSON wants to strictly be a subset of JS, then those two characters need to be treated specially - either excluded from JSON, or specially escaped in JSON libraries. The simplest solution, clearly, is to exclude them from JSON. (You can still use those characters, but you have to escape them). There's no compelling reason not to do this - and I doubt that this will cause a problem anyway.
If you can paste these invisible characters into a chat box to break a website that uses eval(), then someone will inevitably do it, simply because it is the internet.
The other reason: if you are using a Turing-complete language to eval your nominally context free grammar as if it were code, you are an idiot who is asking to be hacked, whether through quirks like this or otherwise. (This also goes for PHP and ERB.) http://www.cs.dartmouth.edu/~sergey/langsec/occupy/
I don't know enough Javascript to give the analogous exploit myself, but that approach looks dangerously similar to some attempts I have seen at making Python's "eval" safe - attempts which, I might add, all fail to capture corner cases that any sufficiently determined or motivated attacker would eventually discover.
First of all, if native JSON parsing exists, jQuery will use that.
The validation code they use in case there is no native JSON implementation available is borrowed from Douglas Crockford's json2.js ( https://github.com/douglascrockford/JSON-js/blob/master/json... ) which was the inspiration for the native JSON implementations and should really be correct by now, both in terms of correctness but also circumventing regexp weaknesses in some engines.
It's still the Wrong Way™. You want to parse JSON, you write a lexer that recognizes its symbols and a parser that consumes them and spits the JS equivalent. Otherwise you are playing an eternal game of whack-a-mole with the JS eval parser. No. Just no.
If you can find a case in which the validator fails, it's wrong. Otherwise, if it's looks like a JSON parser and works like a JSON parser, i.e. is indistinguishable from a "proper" parser, it makes no sense but satisfying OCD to rework it.
Note that the code is now only a work-around for older browsers. Every modern browser supports native JSON parsing anyway.
However this has always seemed unclean to me. Does anyone else have a better, alternative method of inlining data? I'd rather not use inline scripts for the exact reason they mention.
Can you give it directly to the JS working on that data by setting a var (var x = <%= @data.to_json %>;) or passing it in as a parameter to an initializing function call (APP.init(<%= @data.to_json %>);)? Edit: 1. I hope it's not user generated data. 2. The output in this case wouldn't need to be JSON syntax, it could be object literal notation, saving space on all those double quotes but I doubt you have a helper method available for that.
I feel really dense, but I don't understand why the example line is throwing an error. The article mentions line terminators, but it doesn't seem to contain any, and I also don't understand why "owned" would be escaped the way the author says... it looks as though the interpreter is just rejecting the use of quotes around the key. But I'm sure I'm just missing something, so I'd be much obliged if someone could enlighten me.
There's an invisible-to-the eye unicode character in the string "owned," if you copy and paste the text from the website.
JSON is fine with these characters, but JavaScript is not.
For plain-jane JSON this is usually fine, since you're not just evaluating the JSON as JavaScript, but are running the returned data through a JSON parser. A properly-designed JSON parser will escape any JSON-valid-but-JavaScript-invalid characters.
JSONP, however, works differently and will use use the JavaScript parser. Womp womp.
The blog post also lists two other cases, although the first case -- parsing JSON using eval -- is both insecure and incorrect. I haven't seen people do that in ages and ages.
That makes sense. I was confused because I was also getting the error when I typed the line into the interpreter by hand. The issue being that I forgot to assign it to a variable (facepalm). Thanks for the explanation.
The line terminator is there, it just doesn't render as anything visible. (It's inserted directly as a unicode character, rather than it's HTML-escaped equivalent, so you won't even see it if you view source!)
However, if you take a hex dump of the page, it becomes quite apparent:
One more thing that really bugs me about JSON is that it doesn't support Infinity, -Infinity and NaN. Python's JSON library does, which leads to some interesting breakages.
Sure, for NaN you can use null, but for Infinity, you have to use really large/small numbers, which can also lead to other problems.
Another problem is unicode characters that are not in the basic multilingual plane. JSON supports those, but many Javascript environments (like basically all browsers) do not.
The other thing is that JSON scoffs at trailing comma (both in object and array literals), while JavaScript engines are perfectly happy to accept them.
IE8 (maybe IE7 as well, not sure anymore) will accept trailing commas in objects. All IE will also "accept" trailing commas in arrays, but they will create a final `undefined` element
While swiftly coercing raw off-the-wire number representations to one's arbitrary precision library of choice can avoid most cases of noticeable accumulated error, it is irksome that the only way everyone seems to get by is by cross their fingers that any loss in precision caused by "JSON.parse" is meaningless to their application.
Or, the problem can be soundly avoided by using strings in the JSON payload, which is lame but effective and probably one's best practical choice. It is clearly an example of corruption spreading from Javascript to an otherwise reasonable feature of the data representation format JSON.