Chess.com stopped working on 32bit iPads because 2^31 games have been played

eponeponepon · on June 12, 2017

It's fascinating... the Y2K problem never came to fruition because - arguably - of the immense effort put in behind the scenes by people who understood what might have happened if they hadn't. The end result has been that the entire class of problems is overlooked, because people see it as having been a fuss over nothing.

I sometimes think it would've been better if a few things had visibly failed in January 2000.

jfultz · on June 12, 2017

If you were watching closely and knew what to look for in the first couple of months of 2000, the failures were there. But they were generally minor and easy to overlook as Y2K problems.

I spotted something like half a dozen failures in various systems I interact with which I strongly suspected, based upon the timing, were likely Y2K problems that slipped through testing. For example, I received duplicate bills for one of my credit cards for the January 2000 billing period, and then a subsequent apology for the duplicate bills. They never said Y2K, but the timing was very suggestive.

It's pretty much exactly what I expected from most companies...the big stuff had been largely been dealt with, but a few things slipped through which they could dismiss with some hand-waving. The thing that surprised me was that there didn't really seem to be any high profile disasters (like a company that couldn't ship products, an airline that couldn't issue tickets, or whatever) at all...I figured there'd be at least a couple.

moomin · on June 13, 2017

Having spent a year hardening financial systems against Y2K, I was very unimpressed to discover my credit card not working on 1st of January.

The call centre staff told me that this wasn't a Y2K bug, but a year-end bug. As if that was meant to make me feel better about an obvious, grim, failure.

dspillett · on June 13, 2017

> in the first couple of months of 2000, the failures were there.

They were there before then too. Things that could go wrong at midnight on NYE were only one of a few classes of problem associated with roll-overs. There were a lot of bugs in like scheduling applications (and similar system tools) in the run up to 2000 that the man on the street didn't associate with the Y2K issue because it didn't happen at that exact moment.

collinmanderson · on June 13, 2017

in JavaScript, new Date().getYear() returns 117 :)

rimliu · on June 13, 2017

tracker1 · on June 13, 2017

getYear() is based from 1900, if you use getFullYear() you'll get 2017. Though, given all the additions to ES201*, it would be nice if all the timezone data were in the browsers, so one could get moment-timezone without the, relatively large data files.

dspillett · on June 13, 2017

.getYear() and .setYear() are deprecated in all recent standards [IIRC they were before 2000, I certainly remember them being recommended against as far back as that].

niix · on June 13, 2017

Makes sense.

glhaynes · on June 12, 2017

Similar to how much effort goes into dealing with things like dangerous strains of bird flu, only to have people complain about how much money was spent on "nothing" when an outbreak doesn't occur.

sbov · on June 12, 2017

This is a whole class of problem - I wonder if there's a name for it. More examples include: talking about welfare being unnecessary because no one is starving. Or people on medications stopping because they feel better (while still on them).

gooseus · on June 12, 2017

This makes me think of a hypothetical scenario proposed in one of Nassim Nicolas Taleb's books where-in some senator gets a bill passed in the weeks before 9/11 requiring reinforced doors on all commercial airplane cockpits.

No parade would be thrown for this senator for having prevented 9/11 and likely he'd be castigated for having given airlines an excuse to raise prices due to restrictive government regulations.

avh02 · on June 13, 2017

that example (while i was reading the book) kind of stuck out with me - I think of it almost as preventative maintenance or good software dev practices - you'll get crap from others for 'wasting' time on it, but they don't realise it might just save your ass when you need it most.

amrrs · on June 13, 2017

That's a very definitive bias we always hold. Praising a player who falls behind and achieves something while not paying much attention to one who does something normally. Chaos gets attention!

BearGoesChirp · on June 13, 2017

The guy at work who causes a complex IT problem and fixes it by working late one night: team player doing what he can to support the company.

The guy at work whose work doesn't cause a problem and things keep working fine: why aren't you as much a team player as that other guy?

sooheon · on June 13, 2017

The silent wheel gets rode?

sslayer · on June 13, 2017

Greased wheels don't squeak.

chrisweekly · on June 13, 2017

How about: "prodigal brother syndrome"?

I suspect the parable it alludes to passes the cultural literacy threshold for comprehension, at least in the US.

dopkew · on June 13, 2017

https://en.wikipedia.org/wiki/Parable_of_the_Prodigal_Son for those wondering.

xpe · on June 12, 2017

This reminds me of an essential but overlooked truth examined in this paper: "Nobody Ever Gets Credit for Fixing Problems that Never Happened". It builds a simple system dynamics model and shows the long-term effects of working smarter versus working harder: http://web.mit.edu/nelsonr/www/Repenning=Sterman_CMR_su01_.p...

Karunamon · on June 12, 2017

I think the "availability heuristic" is what you're looking for.

https://en.wikipedia.org/wiki/Availability_heuristic

The availability heuristic is a mental shortcut that relies on immediate examples that come to a given person's mind when evaluating a specific topic, concept, method or decision. The availability heuristic operates on the notion that if something can be recalled, it must be important, or at least more important than alternative solutions which are not as readily recalled. Subsequently, under the availability heuristic, people tend to heavily weigh their judgments toward more recent information, making new opinions biased toward that latest news.

Sonorous · on June 12, 2017

For a pop-culture name, I'd call it the Head & Shoulders Problem, after those old shampoo commercials:

"You use Head & Shoulders? But you don't have dandruff!" "Exactly!"

thaumasiotes · on June 12, 2017

I wish. I do have dandruff (depending on where in the world I happen to be). Head & Shoulders does absolutely nothing.

vinceguidry · on June 13, 2017

Dandruff is generally caused by fungi living on the scalp. The fungi eat up your skin and make it dry. The dry skin flakes off.

Dandruff shampoos work by killing the fungi. There are three main types of anti-fungals, zinc pyrithione is used in Head and Shoulders, selenium disulfide in Selson Blue. There is a third one not commonly used called ketoconazole that you may want to try.

There is also coal tar shampoo, which is not an anti-fungal, instead it slows skin cell growth and sloughs off the dead skin.

Selson Blue works well for me, I use it daily. If I go off of it for a few days the dandruff kicks into high gear.

The thing about dandruff shampoos is you have to use them with a certain regularity because even if you kill the fungi, the condition takes a few days to clear up, the damage to your scalp is done. You need to create a hostile environment for the microorganisms, and that takes time.

Your dandruff is probably caused by only one type of organism that thrives in particular climates. Next time you're there, you might get ahold of two or three shampoos and try them for a week each and see if it clears up. Then you know which is 'your' anti-fungal.

creepydata · on June 13, 2017

Have you seen your doctor about this problem? There's quite a lot they can do.

thaumasiotes · on June 13, 2017

Dandruff has a barely noticeable impact on your life. Following medical advice can do much more.

Some problems aren't worth trying to fix.

Jedd · on June 13, 2017

Not merely an insightful repository of analysis and explanations around the more profound aspects of our world, John Ralston Saul's exquisite Doubter's Companion includes some answers - acerbic accusations bundled free - to the more pedestrian issues:

"DANDRUFF: The ANSWER is usually vinegar. To some problems there are solutions.

"What we call dandruff is often the result of a PH imbalance on the skin, which shampoo exacerbates. Wash your hair with a simple non-detergent shampoo, soap, olive oil, beer, almost anything. Rinse. Then close your eyes and pour on some vinegar. The extremely cheap but natural sort—apple cider, for example—is probably best. The smell will stimulate interesting conversations in changing-room showers and your explanation will win you friends. Wait thirty to sixty seconds. Rinse it off. The smell will go away. So will your dandruff.

"All dermatologists, pharmacists and pharmaceutical companies know this simple secret. They don’t tell you because they make money by converting dandruff into a complex medical and social problem. By most professional standards this would amount to legally defined incompetence or misrepresentation.

"Dandruff shampoos that promise to keep your shoulders and even your head clean are harsh detergents and may promote baldness, which ought to constitute malpractice."

ChickeNES · on June 13, 2017

Stop spreading psuedo-scientific alternative-medicine falsehoods. Dandruff is in no way caused by a "PH imbalance on the skin": http://www.mayoclinic.org/diseases-conditions/dandruff/diagn...

Jedd · on June 13, 2017

Actually, I think I'm mostly annoyed by your use of three pejoratives to try to make one point, and the fact you spelled pseudo wrong just irritates some other part of my brain. :)

Sure, dandruff and 'dry skin flakes' are separate things entirely.

Dandruff is generally regarded to be caused by the fungus Malassezia. This fungus does not enjoy an acidic environment.

(Healthy) human skin tends towards pH 4-5 (there seems to be some debate). Malassezia likes 5-8 (again, some debate). Reducing the pH / increasing the acidity would appear to have some non-psuedo, non-alternative-medicine, non-falsehood foundation.

Jedd · on June 13, 2017

Are you claiming vinegar is an ineffective treatment, or merely that the explanation for why it's effective doesn't meet your requirements?

If the latter, I can probably agree. If the former, please do some more research.

logn · on June 13, 2017

Wikipedia says one cause of dandruff is yeast-like fungus. Perhaps PH plays a role in that. Also vinegar has cleansing properties and may be less drying than detergent, and dry skin is another cause of dandruff.

Godel_unicode · on June 13, 2017

> _may_ be less drying than detergent (Emphasis mine) Did you miss the part where GP asked people to stop spreading pseudo scientific falsehoods?

logn · on June 13, 2017

"harsh soaps, detergents, and perfumes, [...] tend to dry the skin"

https://www.urmc.rochester.edu/encyclopedia/content.aspx?con...

Godel_unicode · on June 13, 2017

The word "vinegar" is notably absent from that link. That solely shows that harsh soaps are harsh (which as you may have noticed is axiomatic), not that vinegar isn't.

logn · on June 14, 2017

When I typed the word "may" I meant it. If it's medically established that properly diluted vinegar dries the skin I'd be interested in knowing that. Otherwise, it seems like reasonable speculation to me rather than pseudo-scientific BS.

joatmon-snoo · on June 13, 2017

>notably absent

Maybe that's because vinegar isn't a staple in the hair care aisle at Walgreen's.

yamaneko · on June 13, 2017

I suffer from dandruff, tried a bunch of shampoos but they didn't work well. I saw this tip about vinegar, which I tried a single day then I gave up thinking this could be popular saying. I'll try again and report.

Jedd · on June 13, 2017

When you say 'this could be popular saying' are you meaning that it may be wrong because it's popular?

I'd be curious to know your results, in any case. I've had mixed outcomes from conventional (commercial shampoos) and have also been trying to identify causal factors (worse in winter, especially after a few days of wearing a beanie, worse when I'm staying near a high-pollution area, etc). It's all anecdata, but OTOH not beyond our abilities to thoughtfully analyse.

yamaneko · on June 16, 2017

This unconscious bias went unnoticed to me (as expected, per definition). Thanks for pointing it :)

So far, my experiments with other shampoos where conducted a bit ad hoc. They're all anecdata. When I try with vinegar, I should try to rule out other causal factors first. I probably haven't done that in my first try. Otherwise, we can never be sure.

ChickeNES · on June 13, 2017

Have you tried going to a dermatologist?

yamaneko · on June 13, 2017

To be honest, I only went to see a single dermatologist, but my experience was bad. The dermatologist kinda overlooked and shrugged. I received a shampoo recommendation, but the dermatologist didn't say we should monitor the treatment or anything. From what I saw, she would keep recommending shampoos and I would try until I found one that worked. so I decided to try by myself.

Not my wisest moment. I should probably have seen another dermatologist.

weinerdog · on June 13, 2017

Great book and great advice. Must be about 20 years since I tried this approach. Count me as a data point strongly supporting the use of vinegar to deal with dandruff.

moufestaphio · on June 12, 2017

Like when people say to IT: "Everything's working, what am I playing you for?!"

inetknght · on June 12, 2017

Followed by the inevitable "Nothing's working, what am I paying you for?!"

sslayer · on June 13, 2017

You have to spend money to make money, but I spent all mine trying to make more.

mistersquid · on June 12, 2017

I think this is actually a version of confirmation bias since

  > Confirmation bias, also known as Observational selection or The enumeration of
  > favorable circumstances is the tendency for people to (consciously or
  > unconsciously) seek out information that conforms to their pre-existing view
  > points, and subsequently ignore information that goes against them, both
  > positive and negative [0]

[0] http://rationalwiki.org/wiki/Confirmation_bias

taneq · on June 13, 2017

You'll find confirmation bias everywhere if you look hard enough for it.

qubex · on June 13, 2017

I'm looking hard enough to see what you did there.

dmichulke · on June 13, 2017

I think it's selection bias because you sample from the set of perceived problems which are the ones that actually occurred relative to the ones that could have happened.

acheron · on June 12, 2017

I mean, yes, but from the outside some examples can look an awful lot like "my tiger-repelling rock works because we haven't been attacked by tigers".

InTempWeTrust · on June 13, 2017

I find your lack of heuristics disturbing.

mikeyouse · on June 13, 2017

The opposite of that is frustrating too.. I see it a lot in SF. Something like "We spend $100M/year on homelessness and we have 3,500 homeless. Why don't we just give them the money and do away with the social services!" As if the $100M isn't keeping many thousands more from living on the streets..

nashashmi · on June 12, 2017

A more relevant way of putting it for me: Why do we have to have so many different corporate rules and processes when everything is working anyways?

LanceH · on June 12, 2017

The flip side is the confirmation bias of "it's working because of our rules."

pelario · on June 13, 2017

Yes, Daniel Kahneman describes this kind of cognitive bias as "what you see is all there is". A simple example given in his book* is to ask the members of any couple (or roommates) which percentage of the home duties he/she performs. The sum is always above 100÷

*"Thinking fast and slow", best good recommendation I got from the HN crowd

nkrisc · on June 12, 2017

The "It's Working" problem.

sowbug · on June 13, 2017

Maybe the expression "absence of evidence is not evidence of absence," discussed in https://en.wikipedia.org/wiki/Argument_from_ignorance and https://en.wikipedia.org/wiki/Evidence_of_absence.

TeMPOraL · on June 13, 2017

With the important caveat (described on the Wiki page) that absence of evidence sometimes is (weak) evidence of absence.

vacri · on June 13, 2017

It's a form of (inverse) Survivorship Bias. "None survived, therefore none started out" is an (inverse) extension of "The ones that survived were the ones that started out".

rusk · on June 13, 2017

I was going to say ... it's a corollary of survivor bias ...

rcthompson · on June 12, 2017

I wonder if there's a general law here, something like "prevention is socially unsustainable".

rabboRubble · on June 12, 2017

As a former Y2K project manager, I had one thing go BOOM. But it didn't go boom on 1/1/2000, rather on 2/29/2000. Our Y2k program had, IIRC, some 9 likely fail dates including the 1/1/2000 date. The component in question was provided globally and untestable locally, so I the best I could do is acquire a copy of the remote complete test script and call it good.

I was thrilled that we had something to point to as a "see, this is why we put in so much work". Prior to that had received lots of criticism about the amounts spent, people hired, blah blah blah.

sundvor · on June 12, 2017

"Wait, we spent two million dollars servicing your fleet of 100 cars in the last year, and you're complaining that none of them broke down? Why spend all that money on maintenance when they just keep running?"

Danieru · on June 13, 2017

Are you sue those numbers are right? That's 20k per car per year. At such a cost running the cars with no maintenance then buying new cars every break down is cheaper.

sundvor · on June 13, 2017

They were intentionally inflated - however I probably had Australian pricing in mind which is probably about 50% more expensive than the US ones. Considering the audience I guess I should have said 150 cars instead. :)

A lot of money was spent fixing the Y2K issue. Can't exactly recall how much time I spent myself, but it was a dominant factor as far as IT projects went back then.

pixl97 · on June 13, 2017

> running the cars with no maintenance

Your car runs into a bus full of children.

Other lawyer: Jury, they saved millions of dollars per year not checking the brakes on their cars, you should award those millions of dollars, and some more millions of dollars to the families whose children died that day.

zeta0134 · on June 12, 2017

Maybe it's because of how young I was at the time, but I remember media outlets reporting on the Y2K problem, and showing video of hundreds of older computing devices that were being retired (thrown in landfills) because they wouldn't work properly after the date change happened.

Maybe this isn't a "real" failure, and the symptom of some IT departments working diligently to solve the problem before it happened. In any case though, I'm curious how inaccurate the televised reality from my youth actually was.

rovr138 · on June 13, 2017

Y2K was a software issue.

In short and very simple terms, some software stored date as DD/MM/YY.

It would asume that for year it would always prefix 19. The problem was when you reached /00. Any calculations or software decisions that happened based on that date, it would be off. Way off.

Some solutions where expanding the date to YYYY or adding a prefix to the new dates. Dates without the prefix would be 19xx. Date with the prefix X was 19xx and dates with prefix Y where 20xx.

vizeroth · on June 13, 2017

Another solution was to set the prefix for 2-digit years on a sliding scale, so they are interpreted as a date within a specific 100-year period (date windowing). For example, see the 2029 rule: https://support.microsoft.com/en-us/help/214391/how-excel-wo...

This turned out to be one of the most cost-effective methods of fixing the problem, and was probably one of the most likely to be implemented. This was especially the case in situations involving software which ran closer to the hardware (for example, BIOS or firmware) or on systems where RAM or storage couldn't be increased and/or the change might increase the software's requirements beyond the system's capabilities.

32-bit Unix timestamps have the 2038 problem.

Despite how common these problems have been in the history of computers, we keep making them: https://en.wikipedia.org/wiki/Time_formatting_and_storage_bu...

rabboRubble · on June 13, 2017

There were a couple of computers in my shop that had a Y2K BIOS issue. Although there is a sort of software in the form of firmware on BIOS, many people considered this sort of Y2k error presentation a hardware problem.

So, while a Y2K compliance program dealt mostly with software, a complete program went through and tested all hardware in inventory.

jlebrech · on June 13, 2017

the issue was a closed source problem, you could easily add the option to switch between 19 and 20 for calculations. or have a pre 2000 instance of the app and a post 2000.

asveikau · on June 13, 2017

The problem was that media reports at the time seemed to assume that anything with a computer would completely blow up, instead of rationally thinking through much more benign consequences of a date or calendar being wrong.

The image of a bunch of perfectly capable computers being discarded for no reason describes this media frenzy wonderfully.

megablast · on June 13, 2017

> the Y2K problem never came to fruition because - arguably - of the immense effort put in behind the scenes by people who understood what might have happened if they hadn't.

Except many countries who spent no money on upgrading their system has very few problems too.

Piskvorrr · on June 13, 2017

[citation-needed]

wmil · on June 13, 2017

There actually were a lot of inconsequential failures with date display.

A fair number of websites went from showing "Dec. 31, 1999" to showing "Jan. 1, 19100".

weavie · on June 13, 2017

The National Express (a UK based coach company) website showed the current date as 01/01/19100 on new years day 2000.

eveningcoffee · on June 12, 2017

I never have thought this way. If this is true then there should have been highly visible media campaign to point this out.

mkempe · on June 12, 2017

My long-term active retirement plan involves building a year-2038 consultancy.

justin66 · on June 12, 2017

> I sometimes think it would've been better if a few things had visibly failed in January 2000.

Having to spend billions of dollars on programmers' goofy means of saving a few bytes of memory is a pretty visible failure.

wpietri · on June 12, 2017

Nope. Definitely not goofy.

My dad was a programmer in the early days. The machines he started on in the 1960s had 8 KB of RAM. Saving a byte then is the equivalent today of saving 1 MB on an 8 GB machine.

Multiply that times, say, the thousands of customer orders you're trying to process and the goofy thing would be burning a lot of additional RAM because it might help somebody 35 years later. Who among us is writing code today worried about how it will be used in 2052?

wongarsu · on June 12, 2017

>Who among us is writing code today worried about how it will be used in 2052?

This decade I knowingly wrote code that will break in 2036 [1]. My supervisor was against investing the time to do it future-proof (he will be retired by 2036), and I have good reason to believe the code will still be around by that time. I don't think I'm the only programmer in that position.

[1] Library specific variant of the y2038 problem https://en.wikipedia.org/wiki/Year_2038_problem

justin66 · on June 12, 2017

- This decade I knowingly wrote code that will breaks in 2038

Sure, but how bad was it really? Something you could fix relatively quickly with a little time and money, or an instance of Lovecraftian horror unleashed upon the world like so much COBOL code?

sbov · on June 12, 2017

Even then, mix in people switching jobs, losing the knowledge of where or what all these landmines are, and add similar but unrelated issues across your entire codebase. This stuff adds up. I like to at least add stern log messages when we are at 10%-50% of the limitation. It's saved my ass before, especially when your base assumption can be faulty.

In one of those scenarios, where we expected the growth of an integer to last at least 100 years, due to certain unaccounted for pathological behaviors, a user burned through 20% of that in a single day. But we had heavy warnings around this, so we were able to address the problem before it escalated.

Natsu · on June 12, 2017

I'm assuming this is C/C++ so I'm wondering if there isn't some compiler pragma to have it warn people still building the code in 2025 or so?

rietta · on June 13, 2017

Ha! Rebuilding the code. The source will be lost and the binary will be irreplaceable in many circumstances.

Dylan16807 · on June 13, 2017

Embed the source code in the binary.

pixl97 · on June 13, 2017

Wow, these libraries are way too large

strip a.out

mikeash · on June 12, 2017

Last time this came up, I ran the numbers and the cost of the RAM saved per date stored was hundreds of dollars. Not per computer, or per program, but per date. Comparing total memory sizes doesn't tell the whole story, because RAM for a whole machine is so much cheaper now.

Spending that much money on storing "19" just so your code keeps working in the unlikely event that it's still in use 3+ decades into the future isn't a good tradeoff. Obviously things are different now.

wpietri · on June 13, 2017

Excellent point. Yeah, the machines my dad started on had magnetic core memory [1]. Each bit was a little metal donut with hand-wound wiring.

And in some ways, even "hundreds of dollars per date" doesn't quite convey it. These machines were rare and fiendishly expensive. In 2017 dollars, they started at $1M and went up rapidly from there. Getting more memory wasn't a quick trip to Fry's; even if you could afford it and it was technically possible, it was a long negotiation and industrial installation.

Another constraint that we forget about is the physicality of storage. Every 80 columns was a whole new punch card. That's a really big incentive to keep your records under 80 characters. Each one of those characters took time to punch. Each new card required physical motion to write and read, and space to store.

There were just so many incentives to make do.

https://en.wikipedia.org/wiki/Magnetic-core_memory

mikeash · on June 13, 2017

It really was a different world. I think a lot of programmers don't understand just how different it was (I barely do), and don't realize that modern principles like programmer time being more expensive than computer time are not universal truths about computing, but are just observations of how things are in recent decades.

justin66 · on June 12, 2017

The interesting thing about this from an engineering point of view is, you quietly pass a threshold where the clever hack which was worthwhile becomes literally more trouble than it is worth. When that happens is a multivariate problem that we couldn't truly predict at the time of the code's creation. (and when it happens, there might not even be anyone on the payroll thinking about it)

odbol_ · on June 13, 2017

You're calculating what it would cost to store a string representation of a date. Which is silly. You should always convert to a timestamp for storage. You can cram way more info into a single integer than you can with a base 10 string. And the bonus is you verify the date's correctness before storing.

Even a 32-bit int could hold 11 million years worth of dates. And if your software is used for longer than that, you can just change it to a 64-bit long and have software that will outlast the sun.

varjag · on June 13, 2017

Silly or not, that's the reality of punch card based technology (BCDIC later extended to EBCDIC). Punch cards pre-date electronic computers, and making a relay tabulator set-up working with binary formats is impractical.

As computer hardware grew out of that, it maintained much of the legacy, down to hardware data paths and specialized processor instructions. It was more than a programming convention.

wpietri · on June 14, 2017

They were stored in BCD at the time:

https://en.wikipedia.org/wiki/Binary-coded_decimal

That was the right choice for the era. As mikeash points out, your approach takes more bits and more CPU cycles. But it also takes a computer to decode. Any programmer can look at a punch card, a hex dump, or even blinkenlights and read BCD. Decoding a 32-bit int for the date takes special code. Which you have to make sure to manually include in your program, the size of which you are already struggling to keep under machine limits.

We've come a long way.

mikeash · on June 13, 2017

Systems from this era were probably using BCD rather than base-10 strings. A BCD date would take up 24 bits.

Running a complicated date routine to convert to/from 32-bit timestamps would also have cost a huge amount. These machines had speeds measured in thousands of operations per second, and the division operations needed to do that sort of date work would take ages, relatively speaking. All on a machine that cost dozens of times the average yearly wage at the time, and accordingly needed to get as much work done as possible in order to earn its keep.

Piskvorrr · on June 13, 2017

Sometimes this worry is thrust upon you by the problem domain. I do remember tackling the Y2K38 problem in 2008 - the business logic dictated that the expiration date should be tracked, and some of them were set to 30 years.

MichaelGG · on June 12, 2017

But a 2 digit date should take less than 7 bits. Were they using systems that didn't use 8 bit bytes? Why wouldn't the dates work from, say 1900 to 2155?

zaxomi · on June 13, 2017

Back in the days it was probably 7 bits, but the word size is not that important. The problem still exists today with a modern 64 bit computer:

Even if a system internally can store a timestamp with nanosecond precision since the beginning of the universe, all that precision is lost when communicating with another system if it must send the timestamp as a six character string formated as "yymmdd" in ASCII.

wpietri · on June 13, 2017

My understanding is that the actual number of bits used would generally have been 4 bits per digit, as they were using Binary Coded Decimal [1]. So dropping the 19 would save you a byte per date.

Sure, they could have used a custom encoding. But that increases maintenance cost and extra development work. All to solve a problem that nobody cared about at the time.

[1] https://en.wikipedia.org/wiki/Binary-coded_decimal

MichaelGG · on June 13, 2017

Just seems wierd. A 2 byte day count would give a huge range and use 2 bytes for the full date Vs 3 bytes for a BCD yymmdd representation.

zaxomi · on June 13, 2017

There are a lot of weird things...

You are assuming 8 bits per byte, but a byte can be any number of bits.

With two bytes of 7 bits each, the range is only about 40 years. Is is also impractical when the storing media is punch-cards, and the systems adder unit only counts in binary coded decimal.

wpietri · on June 14, 2017

But then you need special code to decode that. Code that you have to write yourself or borrow and include in your program. Remember, no shared libraries. And it means extra CPU time you have to display a date. Whereas BCD has special hardware support.

It means that data interchange is now much more complicated too. How do you get everybody to agree on the same 2-byte representation for dates? This is the 1960s, so you can't just email them. You have to have somebody type up a letter and mail it. Or if you want to get on the phone, a 3-minute international call will cost $12, which is about $100 in 2017 dollars.

Plus then you can't look at a hex dump or a punch card or front panel lights and see the date, so now you've made debugging much harder.

chrismcb · on June 13, 2017

A 2 digit date typically used two guys, one for each digit. Cobol and all

alanbernstein · on June 12, 2017

Good question! Nobody was writing 1900 + (year % 100), right?

PeterisP · on June 13, 2017

For example, some systems stored the year in a byte, and when printing out a report it printed "19" and that byte - so year 1999 would be followed by year 19100.

Some systems, where storing numbers in columns of characters were common practice (COBOL idiomatic style?) stored the date as two digits (possibly BCD), so the possible range is 00-99 no matter how many bits are used.

nandemo · on June 13, 2017

Some people were.

But it's worse than that. In the 90s a lot of code used 16-bit values, character strings. That is, it stored a char(2), parsed it as 2-digit number and then converted it to a date by adding 1900.

So it was only really "saving space" when compared with storing a char(4).

MichaelGG · on June 13, 2017

But if they wanted to save space why not store a 8 bit number? I imagine it must have something to do with punch card compat or some binary coded decimal nonsense. Still seems inefficient.

vizeroth · on June 13, 2017

If a system gives you two options for storing a date (using 2-digit or 4-digit years), how many dates do you need to store and use in calculations before you end up saving space by creating a new data type and all of the supporting operations to make the storage of the date itself more efficient? In recent years, it's more common to make this type of decision because something else is causing an issue, otherwise we rarely consider the space required for a date (and many languages no longer have a separate type for dates).

nandemo · on June 13, 2017

Maybe because they didn't know better? I was going to say "maybe they were bad programmers", but likely just "average" programmers.

No punch cards or BCD, I'm talking about DOS/Windows systems.

justin66 · on June 12, 2017

- Who among us is writing code today worried about how it will be used in 2052?

The good programmers who are fortunate enough to be working on software that matters.

wpietri · on June 13, 2017

I doubt that's true, unless you mean it tautologically.

There are plenty of good programmers working on software that matters that should absolutely not be trading off hazy possible benefits in 2052 for significant costs now.

It's occasionally necessary. When I wrote the code for Long Bets [1], I took a number of prudent steps to make sure things would have a good shot at surviving for decades. But I only took the cheap ones; the important thing was to ship on time.

And I think that's the right choice for most people. Technological change has slowed down some, but 35 years is still an incredible amount of volatility. Betting a lot of money on your theories of what will be beneficial then is very risky.

justin66 · on June 13, 2017

> There are plenty of good programmers working on software that matters that should absolutely not be trading off hazy possible benefits in 2052 for significant costs now.

I guess it's not obvious, but I think there's really a continuum here. You don't necessarily need to write software that will run perfectly in 2052, but it'd be good if you wrote software that can be comprehended, adapted and altered later on. Maintainability is never a "hazy benefit." (If the problem isn't a total throwaway.)

wpietri · on June 14, 2017

Sure. Maintainability pays off relatively soon, and often makes systems simpler and cheaper to operate. But the topic in this sub-thread is the Y2K bug, where the proposed solution would have been expensive and provided no benefit for 35 years. And at the time, those benefits would have been very hazy.

tonyedgecombe · on June 13, 2017

I don't know, I think the attitudes that make you a good programmer mean you won't be satisfied leaving broken code in your product, no matter how far out the consequences are.

chrisweekly · on June 13, 2017

> "Technological change has slowed down some"

wat

wpietri · on June 14, 2017

It's definitely true. Technical change in the 60s was enormous. During that period there was a lot of fundamental architectural change and experimentation; that's when they settled on 8-bit bytes as the standard, as well as many other things. Moore' Law became a thing. In the 70s is when we started seeing operating systems that look familiar to us, and even into the 80s it was plausible to introduce a new OS from scratch (see, e.g., NeXT, or Be).

But Moore's Law is now dead:

https://www.technologyreview.com/s/601441/moores-law-is-dead...

Per-thread performance has been basically flat after decades of exponential gain:

http://preshing.com/20120208/a-look-back-at-single-threaded-...

The iPhone is a decade old; every phone now looks like it, and it's highly plausible that they'll look basically the same a decade from now, possibly much longer. Laptops are 30 years old; they've gotten cheaper, faster, and better, but are recognizably the same. HTML is coming up on 30, and it will be in use until long after I'm dead. TCP is nearly 40; Ethernet is over 40; even Wifi is 20.

So it's just easier now to guess what programmers will be doing in 35 years compared to 1965.

davidgerard · on June 12, 2017

In my experience it's not the good systems that stick around - they're cleanly replaceable.

It's the quick hacks and bodges that stick around forever.

ianamartin · on June 13, 2017

As someone who just wrote a quick hack for a temporary problem, I agree.

It's not just the shitty programmers who do this. Sometimes we have shitty product managers who won't push back against this kind of thing. And you're forced into a creating something evil because most of the job is very good but this one time, you have to suck it up.

My response to that, though I agree with you, is that when a supervisor or PM or whoever gets on you about something you know is bad, you negotiate.

"Yes, I'll do this for now because the company needs it now. But only if you guarantee me the time (and possibly people) to do it right later."

You get agreement in an email, create the ticket and assign it to yourself as mustfix two months from now. And you shove it down their throats.

That's not an ideal place to work if you have to do that, but I have worked at those places and this is how you deal with that situation.

"Yeah, I'll give you a shit solution in 1 day right now. But only if you give me a a couple of weeks for a good solution later."

In reality, I've mostly only had to deal with this situation in startups. Mid-level and mature companies are usually open to pushing back and getting things right. But there are exceptions. Today was an exception. But that's also one of the reasons I don't really want to work at startups anymore.

new299 · on June 13, 2017

Shitty solutions are usually the right answer. At least in the areas I work in (mostly startups). I would estimate 99% of the code I write gets thrown away. Most of it is trying something out. Even for code that was intended to hit production, the company/project often gets cancelled before it ever hits production.

I'm not saying this is true in your case. But there are so many different classes and types of programmers and projects that it's hard to generalize.

ianamartin · on June 14, 2017

Nothing about this comment is okay.

99% of your shit code isn't getting thrown away. It's sticking around making life hell for people like me.

Stop writing shit code because it's going to get thrown away. If you work for startups, you are always operating in protoduction mode. Everything you write ends up in prod.

Write code that doesn't suck. It doesn't have to be perfect or optimal, but make it not suck before you push.

new299 · on June 15, 2017

Hmm no. That's what's happening in your world, but you're imposing that world view on me.

Probably about 80% of the code I write doesn't even get looked at or used by another developer. If the technique/analysis proves useful, it gets rewritten/refactored. That has the added advantage that I then understand the model better.

wpietri · on June 15, 2017

Yeah, same here.

For me there's a giant difference between code that lasts, which needs to be sustainable, and disposable code, which doesn't. I'm also very big on YAGNI; my code gets so much cleaner and more maintainable when I'm only solving problems that are at hand or reasonably close. Speculative building for the future can get insanely expensive: there are many possible futures, but we only end up living in one.

Indeed, I think a "do it right" tendency can prevent people from really doing it right. If we invest in the wrong sorts of rightness up front, we can create code bases that are too heavy or rigid to meet the inevitable changes. So then people are forced into different sorts of wrongness, working around the old architecture rather than cleaning it up.

wpietri · on June 15, 2017

Good for you. That's my approach, too. And to rig the system such that technical debt gets cleaned up continuously and gradually without the product managers knowing the details.

When there are real business reasons to rush something, I'm glad to support that by splitting the work like you suggest. But the flipside is them recognizing that not every thing is an emergency, and that most of the time we have to do it right if they don't want to get bogged down.

justin66 · on June 12, 2017

Well, yeah, I absolutely agree. Replaceability and maintainability go hand in hand in a system. It's a cruel irony that the code that sticks around, often sticks around because it's crap.

(that doesn't stop me from sometimes having a weird admiration for incomprehensible software kept going forever with weird hacks. It's like with movies, sometimes they're so uniquely awful that you have to admire the art of them)

chrismcb · on June 13, 2017

In the 80s I questioned the use of using two bytes for the date. I was laughed at by the experienced programmers. They said the software would be rewritten by then. It should have been, but it wasn't... But there is a trade off between how much time you spend today vs future compatibility.

Retric · on June 12, 2017

It really depends on how much data and how old.

My dad's first programming job was initially to mechanically change how variables where stored thus saving 1 and only 1 byte of disk space. Someone ran the numbers and having someone do that for a few months was a net savings.

A few years after that he was talking about some relatively minor optimizations that saved a full million dollars worth of hardware costs by delaying a single new computer purchase.

justin66 · on June 12, 2017

Regarding the point I was speaking to: it's undoubtedly true that some of those hacks were initially worthwhile. Keeping them going until the year 2000 was, by that same standard of cost effectiveness, pretty definitely a visible failure.

eru · on June 12, 2017

You keep them going because of status quo bias / not touching a working system.

andrepd · on June 12, 2017

How many bytes of memory were saved? 32-bits multiplied by potentially trillions of time, considering storage was not dirt cheap as it is now...

cm2187 · on June 12, 2017

Self-confidence as a programmer is when starting a new project, storing the transaction ID as a long rather than an int...

rdtsc · on June 12, 2017

> Self-confidence as a programmer is when starting a new project, storing the transaction ID as a long rather than an int...

uint64_t even

Or a UUID as others have suggested.

Technically C spec doesn't really say exactly how many bits int, long and long long should be. If you want specific sizes and your code to be somewhat portable use the specific bit sizes to make that clear. There are also types for size-like things (size_t) and pointer and offset like things.

fpgaminer · on June 12, 2017

> If you want specific sizes

I would go further and say you should _always_ use specific sizes, unless forced otherwise. There's no reason not to.

majewsky · on June 12, 2017

There's a usecase for lower-bounded types such as int_least32_t, where the compiler may choose a larger type if it offers better performance. However, if you're using that, the test suite should run all relevant tests for multiple actual sizes of that particular type (through strategic use of #define, for example).

masklinn · on June 13, 2017

> There's a usecase for lower-bounded types such as int_least32_t, where the compiler may choose a larger type if it offers better performance.

If you're looking for the best performances you shouldn't use leastX types, you should use fastX types (e.g. int_fast32_t for the "fastest integer type available in the implementation, that has at least 32 bits").

The difference between "leastX" and "fastX" is that "leastX" is the smallest type in the implementation which has at least X bits. So if the implementation has 16, 32 and 64b ints and is a 32b architecture, least8 would give you a 16b int but fast8 might give you a 32b one.

wruza · on June 13, 2017

The reason is that anything else is using default types and you lose your safety battle in each:

  int32_t x = call_returning_int();

line. Otherwise, you assert/recover on each me-they borderline. C is a language where you have absolutely no guarantee that int or constants defined as int will fit into anything beyond int, long or long long, and then there is UB patiently waiting for your mistake. The method of handling that is to never change or fix types unless you have to, and then be careful with that.

RcouF1uZ4gsC · on June 13, 2017

This is a perfect use for the C++ auto feature:

    auto x = call_returning_int();

wruza · on June 13, 2017

C had this particular feature for decades.

louiz · on June 13, 2017

No, and it still doesn’t.

wruza · on June 13, 2017

Time to explain the joke, right?

https://stackoverflow.com/questions/2192547/where-is-the-c-a...

daemin · on June 13, 2017

What he means by that is the old definition of auto, which C++11 deprecated in favour of making it do type deduction instead.

auto foo = func_returning_int(); to my knowledge worked in C because 'auto' was the lifetime keyword - like 'register' - and the default type in C is 'int'.

That's why when you miss a definition in C++ the compiler warns you that there's no default int.

kodablah · on June 13, 2017

There are reasons not to when creating RDBMS schemas.

wott · on June 13, 2017

There are a gazillion reasons not to. You should _never_ use specific sizes, unless forced otherwise.

bcoates · on June 13, 2017

Your code is actually less portable if you use types like uint64_t -- if your system doesn't have exactly that type implemented the typedef won't exist. If all you need is a really big number 'unsigned long long' is required to exist and be able to store 0..2^64-1

bitwize · on June 12, 2017

The C spec does specify minimum sizes: at least 16 bits for int, 32 for long, and 64 for long long.

Best to use the stdint types, just in case.

simon83 · on June 12, 2017

The naive, 9 years in the past me once was like "int will last us forever! and it'll save us some space!", only to have to change it to bigint a few months later

WrtCdEvrydy · on June 12, 2017

Also see: IPv4 vs IPv6.

Remember, these fancy computing devices were built for the rich and the government, not for the average joe, noone thought computing would be this easily accessible.

brianwawok · on June 12, 2017

Except that isn't a great example because with Nat, ip4 can get us oodles of devices still.

Can't really use NAT on a primary key...

jdietrich · on June 13, 2017

NAT is a colossal pain in the ass. I shudder to think of the number of man-years that have been spent on NAT traversal. NAT breaks one of the fundamental functions of an IP address - a unique identifier for a network device. It turns a simple identifier into a weird, amorphous abstraction.

IPv6 isn't perfect, but we could have avoided a lot of hassle if we'd started off with it.

Piskvorrr · on June 13, 2017

Sure. Now ping my computer at 10.3.1.4

lstamour · on June 12, 2017

Sharding, or a second key, or detecting the iPad and using a lower number with an offset internally? Still requires special handling at one end or another.

maxxxxx · on June 12, 2017

"int will last us forever! and it'll save us some space!",

There are still plenty of people with that mindset. Some will even quote "YAGNI" when you question them.

raverbashing · on June 12, 2017

Makes me wonder how many long ints haven't needed a char to store them

maxxxxx · on June 12, 2017

It gets better when you realize that on Windows a long is still only 32 bit....

qb45 · on June 13, 2017

You mean 64bit Windows?

It gets better yet when you realize that on 32bit systems (like in TFA) long usually is 32bit too ;)

hvidgaard · on June 13, 2017

It's the smart thing to do - 4.3 billion is not that much.

I had some students that asked me if even a long would be enough to handle exponential growth, after all it's only twice as big. As a thought experiement I asked them to come up with a time to fill a 32 bit int completely. They came up with roughly a year. Then as a margin of error I said, let's assume you have 4.3 billion transactions every second instead. This volume can be sustained for 100 years, and we're still not in the danger zone yet.

pc86 · on June 13, 2017

Uh it's not twice as big.

    2^32  ~              4,294,967,000
    2^64  ~ 18,446,700,000,000,000,000

hvidgaard · on June 14, 2017

One is 32 bits, the other is 64 bits. It occupies exactly twice as much space. It can contain far more information, but it is only twice as big on the harddrive.

ReverseCold · on June 13, 2017

2^64-1 is a lot more than 2 times 2^32-1

hvidgaard · on June 14, 2017

2 times 32 bits equals 64 bits...

dom0 · on June 12, 2017

That is, unless you're on Windows.

trendia · on June 12, 2017

Or UUID...

derefr · on June 12, 2017

I pick UUIDs because experience has taught me that even for the smallest workloads, I'll inevitably have to shard my DB (to partition a shared public cloud from N isolated "enterprise" deployments), and then will inevitably want to do statistical-analysis things that involve ingesting rows from the shards (or log entries referencing those rows), and deduplicating them by ID, without generating false-positive collisions in the process.

The simplest way to do that is to just throw UUID at all problems from the start. (https://github.com/alizain/ulid s are better, but there aren't libraries to generate them in literally every language + RDBMS.)

eru · on June 12, 2017

I don't agree with all their reasoning---but ulid still seems like a good idea. (Though the main difference you care about in programming is how they are generated---via timestamps plus randomness, not that they have a different serialization format.)

For some applications you don't want to leak the time. Choose wisely.

taneq · on June 13, 2017

So when is it self-confidence and when is it overengineering and invoking the wrath of YAGNI? :P

lacampbell · on June 13, 2017

You know what, you're right. I'm going to change some SERIALs to BIGSERIAL in the database of my side project. Someone has to start believing in it - it may as well be me.

chesserik · on June 12, 2017

Hey all. Thanks for noticing :P Obviously this is embarrassing and I'm sorry about it. As a non-developer I can't really explain how or why this happened, but I can say that we do our best and are sorry when that falls short.

- Erik, CEO, Chess.com

windowshopping · on June 12, 2017

We can help you understand how and why!

Computers set limits internally on how big numbers can be when they're keeping track of stuff.

Your developers had given each game a number to identify it. So your first game was #1, the 40th game was #40, and so on.

The limit for how big the number could be was a bit over 2 billion, and your players have just now played a bit over 2 billion games, and so that id number suddenly exceeded the computer's internal limit. Specifically, the limit was 2147483648, so basically it crashed on game #2147483649, which is the next id after the last acceptable one (notice the last digit is 1 higher.)

I'm simplifying slightly but that's the idea. It'll be fixable by essentially using a different format for the id number so that the limit is higher, much like telling the computer "use a higher limit for this particular number, it's special."

chesserik · on June 12, 2017

Yes - I understand HOW it happened, just not sure WHY. Meaning, I'm not sure what the developer was thinking, and at this point, I'm not going to track down exactly who it was and point fingers. I think everyone has learned enough through this highly interesting bug. It certainly was interesting to see the slack room exploding with theories and debugging. A new iOS client has been submitted to Apple (hurry plz!!!), and a server fix is also in QA now. Fun problems to have......

creepydata · on June 12, 2017

This is an fairly easy mistake to make for a novice developer to make. It even has a specific name - integer overflow - https://en.wikipedia.org/wiki/Integer_overflow

The original Pacman crashes at level 256 for the same reason. - http://pacman.wikia.com/wiki/Map_256_Glitch

ww520 · on June 12, 2017

It's most likely for efficiency and performance reason. 64-bit doubles the storage requirement of 32-bit and would have impact on database's utilization of memory, querying window size, cache, and storage.

Edit: 32 bits worth of games played means about 4 billion games. 4 billion X 4 bytes for 32-bit = 16GB just for the 32-bit ID's. 64-bit ID's would need 32GB for the 4 billion games. I guess memory and storage weren't that cheap back then.

mikeash · on June 12, 2017

It sounds like it was client side, not server side. Most likely the iOS client was using Objective-C's NSInteger or Swift's Int, just because that's the default choice for most programmers working in that language, and they didn't think it through.

markcerqueira · on June 12, 2017

It also could've been just the person picking an int over a long. Is ints vs. longs the first place to look for optimizing efficiency/performance?

cesarb · on June 13, 2017

On a 32-bit system, a "long" is usually also 32 bits. On a non-Microsoft 64-bit system, a "long" is usually 64 bits. On both 32-bit and 64-bit systems (Microsoft or not), an "int" is usually 32 bits.

If the issue happened only on 32-bit iPads, but not on 64-bit iPads, the programmer probably picked a "long", not an "int". Had the programmer picked an "int", the problem would also happen on 64-bit iPads.

e28eta · on June 13, 2017

Our iOS app with Java backend was using long for database IDs on both ends. I was going through the ILP32->LP64 conversion process and when I realized we had a pretty serious discrepancy.

I think it's a really easy mistake for the first developer to make (especially because they weren't a C/Obj-C programmer), and then the sort of thing that no one audits after that.

cesarb · on June 13, 2017

Yeah, Java always uses 64 bits for "long", even on 32-bit systems. Which only adds to the confusion, since it's different from C and C++.

(Another place where Java is confusingly different: "volatile" implies a memory barrier in Java, but not in C and C++.)

wvenable · on June 12, 2017

> Meaning, I'm not sure what the developer was thinking

A 32bit integer is pretty much the default numeric type for the majority of programming tasks over the last 20 years. Even with 64bit CPUs, 32bit is still a common practice. Probably 99% of all programmers would make the same choice unless given specific requirements to support more than 2 billion values.

ewjordan · on June 12, 2017

It's often not even an explicit choice, it's just default behavior.

Up until recently, Rails defaulted to 32 bit IDs, so there are a ton of apps out there that could have these issues, especially since Rails has always prided itself on providing sane defaults: https://github.com/rails/rails/pull/26266

xapata · on June 12, 2017

> 99% of all programmers

Many dynamically typed languages have an automatic change from int to bigint rather than allowing overflow. For example, Python.

mort96 · on June 13, 2017

Others, like JS and Lua, just use doubles, meaning they'll never overflow - instead every 2 numbers start to be considered equal. Then a while after that every 4, etc. Not exactly optimal behavior when using incrementing IDs.

tripzilch · on June 13, 2017

however, doubles represent integers accurately up to 53 bits. so that's still quite a lot before you run into that problem.

qqg3 · on June 13, 2017

I don't think you do understand, you sound like you're upset that a developer "set" this limit. When in reality it's tied to fundamental programming principles. It wasn't really a conscious decision to say, "I'm only going to account for 2bil games"...

sundvor · on June 12, 2017

As a consolation: Over 2 billion games played - congratulations!

mcv · on June 13, 2017

Probably when this was initially developed, nobody thought you'd ever go over 2 billion games. This error is brought to light by your success and popularity.

Computer history is riddled with assumptions like that. The Y2K problem, Unix dates running out in 2037, and 32 bit computers unable to address more than 4 GB of memory are just the big ones. It's everywhere. Smaller software projects are generally built for what you need right now, and less for what might happen in the distant future.

Ideally you want to retain some awareness that this is an issue so you can start working on once you go over a billion games, but in a small company, there are probably always more urgent things to worry about, and nobody ever gets around to fixing this technical debt.

obj-g · on June 13, 2017

> Meaning, I'm not sure what the developer was thinking, and at this point, I'm not going to track down exactly who it was and point fingers.

As a developer, this sentence made my skin crawl.

Aloha · on June 12, 2017

2 billion is a very large number that was probably not envisioned as reachable in the near future - as a programmer I'd argue this is a pretty easy mistake to make, and that while (slightly) embarrassing, its a good learning moment.

It's also really awesome that you're here, and that you guys were so honest about the nature of the bug - this is really something that should be encouraged.

chesserik · on June 12, 2017

Maybe we should start a blog about all of the interesting bugs and challenges we encounter. It certainly is white-knuckle pretty often when running at scale. The number of devices, connections, features... I'm aging prematurely :P

eru · on June 12, 2017

A few articles would definitely be appreciated. Might even help with recruiting fresh blood.

mishoo · on June 13, 2017

Agree with Aloha. I wouldn't be too hard on the programmer (also, if I understand correctly it's not a database issue, but only with the 32-bit iOS client). I'd pat him on his back and say “you didn't think we'd get this big, eh?” ;-)

nandemo · on June 13, 2017

> 2 billion is a very large number that was probably not envisioned as reachable in the near future

I disagree. Simple napkin calculation: 100 million players playing 40 games each per year (about 1 per week) over 5 years = 10 billion unique games.

As others pointed out it was likely not a miscalculation, just a lack of calculation. The bug occurred only in the client and the decision to use a smaller data type was likely not a conscious one.

In any case, I wouldn't hold it against an individual programmer. But arguably this sort of bug indicates your development process has flaws (not enough testing, code reviews, etc).

erwan · on June 12, 2017

It's an honest mistake, things like this happen. Not the end of the world, especially for an otherwise great product like chess.com.

Go easy on your eng team ;)

chesserik · on June 12, 2017

Thanks. I'm a pretty understanding "boss", especially on the heals of reaching the 2 billion games milestone :D Our team is awesome and we love what we do. Unfortunately we're still a bunch of humans sitting at kitchen counters and on couches around the world, so things do sometimes fall in between the cushions...

smnscu · on June 12, 2017

> on the heals of reaching the 2 billion games milestone

That's a very poetic typo

chesserik · on June 12, 2017

True. Only time (and grammatical errors) can heel :P

scandox · on June 12, 2017

It's a beautifully simple problem and one born of success.

chesserik · on June 12, 2017

Indeed. I'm not sure that anyone here at Chess.com at the beginning thought we would hit a billion games played in our lifetime. But I guess after 10 years....

glandium · on June 13, 2017

To put things in perspective, 2 billion games in 10 years is half a million games per day on average over the 10 years. Considering you didn't start at that rate and that it's an average, it means you have way more than half a million games per day now. (that's also more than 6 per second!)

Congrats on such a success.

yellowapple · on June 12, 2017

Think of a mechanical odometer, and how it only has a certain number of digits. Eventually you'll hit 999,999 miles, and on the next mile, everything will roll over to 000,000.

Same deal here. 32-bit numbers are stored as 32 switches, starting from

    0000 0000 0000 0000 0000 0000 0000 0000

which is 0, to

    1111 1111 1111 1111 1111 1111 1111 1111

which is 4,294,967,295. Since the 32-bit iOS version of Chess.com apparently uses 32-bit numbers to store each game's unique ID, that means you can have 4,294,967,295 games.

So what happens on game 4,294,967,296? Just like the odometer, everything rolls back to 0, and things start breaking because the program gets confused.

Pretty common problem, really. The fix would be to use a 64-bit number, which doubles the number of binary digits.

SomeHacker44 · on June 12, 2017

"This was obviously an unforeseen bug that was nearly impossible to anticipate..."

Snarky... Except that there were probably years of games to notice that you were approaching a "magic number" like 2^31.

CGamesPlay · on June 12, 2017

I actually read that quote as fully sarcastic.

blktiger · on June 12, 2017

As expected, sarcasm always translates correctly into textual form.