Raymond Chen explains what the Y2K was like at Microsoft [video]

GrinningFool · on Dec 21, 2016

One thing that I found interesting was the various media coverage of the time describing Y2K as a 'non-event'.

The media - and consequently the public - was generally unaware that it was a non-event because of the massive resources poured into making it so. Instead they wrote it up as a bunch of hype and paranoia over nothing.

IMO it's one of the greatest unsung engineering success stories of our time.

thesimp · on Dec 21, 2016

Yes, indeed! As someone who also sat in a room (with many other people) watching the clock tick until the stroke of midnight I still get pissed off if people remember y2k as the event that "did not happen". Nothing happened because most of 1998 and 1999 was spent running around customer sites checking & patching software.

I work with industrial control systems and the oldest code comment that was found about the year 2000 problem was in code from the early 80s. The programmer long retired but that code was still running oil refineries.

There were very little large chemical installations in the pacific so we waited for updates from sites in New Zealand. After those sites rolled into the new year without problem everybody relaxed and we knew that our software installed in middle east oil&gas sites would work ok so people would have petrol and diesel in the new millennium. Then the rollover in the EU and US region were easy after that.

all_usernames · on Dec 21, 2016

"I still get pissed off if people remember [it] as the event that "did not happen""

Welcome to the life of any operations engineer, my friend. If we're doing our job right, no one notices that the insanely complex and down-to-the-wire maintenance goes smoothly.

trendia · on Dec 21, 2016

"Nobody gets credit for fixing problems that didn't happen." [0]

[0] http://www.agsm.edu.au/bobm/teaching/SimSS/Shayne/RepenningS...

hueving · on Dec 22, 2016

It's the life of any job where you maintain something. Even janitors aren't noticed until they quit and the trash starts to stink.

mentos · on Dec 22, 2016

Ha reminds me of this https://www.youtube.com/watch?v=2-OQhot_ml0

fillskills · on Dec 22, 2016

I run OPS and Tech at a startup. A very thankless job I must say.

Moru · on Dec 22, 2016

Ever tried cleaning a school? Students can even be heard saying things like "It's their job to pick up after me so it doesn't matter that I throw garbage on the floor".

j1o1h1n · on Dec 22, 2016

I was on an island in Sydney harbour sitting behind someone with a handheld video camera. As the first firework exploded, his video camera turned off. Everyone looked around at the buildings, waiting for the lights to flicker.

function_seven · on Dec 22, 2016

> the oldest code comment that was found about the year 2000 problem was in code from the early 80s.

Do you remember what that comment was? I mean, was it something like

    /* TODO: this won’t work after 1999-12-31, fixme */

or was it more like

    /* Using four bytes to handle dates beyond 2000 */

thesimp · on Dec 22, 2016

It was a comment from someone who understood that 2 digit year notation was bad and explained why he used the full year notation. At this time memory was still counted in kilobytes so it probably was an expensive use of two extra bytes.

booleandilemma · on Dec 22, 2016

When trouble is solved before it forms, who calls that clever? When there is a victory without battle, who talks about bravery?

Sun Tzu

eternalban · on Dec 22, 2016

    According to an old story, a lord of ancient China once asked 
    his physician, a member of a family of healers, which of them 
    was the most skilled in the art.

    The physician, whose reputation was such that his name became 
    synonymous with medical science in China, replied, 

    "My eldest brother sees the spirit of sickness and removes it 
    before it takes shape, so his name does not get out of the house.

    "My elder brother cures sickness when it is still extremely minute, 
    so his name does not get out of the neighborhood.

    "As for me, I puncture veins, prescribe potions, and massage skin, 
    so from time to time my name gets out and is heard among the lords."

    - Translator's Introduction, Taoism and The Art of War
      The Art of War, Sun Tzu, Thomas Cleary

rogerbinns · on Dec 21, 2016

Some things did break on Jan 1st 2000. We found that our 16 bit Windows client failed (blocked) SSL validation, and that some customers still used the 16 bit client!

(It wasn't the certificate having a date in the future which they will all have done for around a year at that point. It was getting the local time and using that to decide if the certificate was valid.)

ht85 · on Dec 21, 2016

Yeah... I wondered the same when nothing happened, it seemed like a big deal for nothing.

Then the next day when I opened ACDSee it said my license had expired. It was a pirated license that previously had shown valid until 2050 or something.

All of a sudden it put the whole thing in perspective, and I realized that those things could happen anywhere causing small bugs, and much bigger ones in systems where dates actually mattered.

13of40 · on Dec 21, 2016

I had a PDP-11/74 running RSTS that wouldn't boot after Y2K. I think I had to dig around inside it to reset the clock before it would turn on again. It was an old machine at the time, but I'm pretty sure there were still some like it in the field, so someone was pulling their hair out on January 1st.

takinola · on Dec 21, 2016

But the counterpoint to your argument is that there were lots of countries and companies that were being derided as not being Y2K-ready and nothing much happened to them as well. It's very hard to say that Y2K preparedness work was unnecessary but it is not clear that a lot of it was justified.

scurvy · on Dec 22, 2016

I formerly worked as an escalation support engineer in Microsoft's Product Support Services (PSS) for Windows networking. I, and a lot of other people, were at our desks in Las Colinas, TX for the Y2K rollover. Nothing happened. We got a press call from a reporter asking if anything was going on. I said I wasn't allowed to talk to the press, but half the people on the floor were already drinking.

We knew early on that the Windows dev teams had done their job as Y2K hit Australia and nothing happened. Then Europe and nothing happened.

Even though I "missed out" on the big Y2K celebrations and instead had to celebrate with a bunch of nerd coworkers in a boring office building, it felt good to be a part of something where we all banded together and pulled off a major piece of work.

yuhong · on Dec 22, 2016

This was before MS had a formal support lifecycle, right?

scurvy · on Dec 22, 2016

I'm not sure if I follow your question. I worked in Windows NT networking escalation/debug support. We did NT4 and Windows 2000. You had to have a support contract with MS or work off some Pay Per Incident vouchers. Some people had Premier contracts and got to speak with me quickly (but already had first line dedicated engineers), but I mostly handled escalations from our outsourced support specialists (Keane in Arizona and one other I can't remember).

The Pro customers (non Premier) definitely drove the most support volume and often had the more difficult cases. Premier cases were often "hey what if" cases that we answered quickly or were straight bugs. Believe it or not, bug defects are easy because they're usually obvious and the path forward is also obvious (install checked builds to get more info and/or set up break points and remote debug for crashes).

yuhong · on Dec 22, 2016

So do you know for example when they exactly ended NT 3.51 updates?

scurvy · on Dec 22, 2016

Right before I started in the end of 1999.

yuhong · on Dec 22, 2016

Last hotfix I can find is this: https://support.microsoft.com/en-us/kb/253518

paulddraper · on Dec 22, 2016

> The media - and consequently the public - was generally unaware that it was a non-event

Really? I was only 10 during y2k and not techie at all (yet).

I remember it everywhere...news, school discussions, tshirts, TV, general conversation (similar to zombie apocalypse emergency plans some people talk about).

If anything, I'd say the hype was a little too high. Planes falling out of the sky, nuclear weapons detonating, etc.

Still...I agree. Zero issues. Very impressive.

stouset · on Dec 22, 2016

That's sort of his point. The narrative after the Y2K rollover was that it had been a whole lot of hype and didn't end up being a big deal after all. Conveniently ignoring the fact that the reason it wasn't a big deal was the massive amount of effort expended to make the transition seamless.

tinus_hn · on Dec 23, 2016

If you believed the hype your toaster oven would have exploded on Y2K because it had a clock on the front.

A lot of work went into fixing the real problems but there also was a lot of fuss about nothing.

mikestew · on Dec 21, 2016

Y2K at Microsoft was boring as hell. What wasn't boring as hell was the long period of time leading up to it. But everyone here knows that. New Year's Eve 1999 and the hours following was probably the most Age of Empires I've ever played in one sitting.

EDIT: "for an estimated issue programmers were not taking into account when applying the Gregorian calendar rule to software."

First, what the hell does that even mean? Anyway, no, it was taken into account. What, you think programmers didn't know what would happen when 2000 rolled around? What wasn't taken into account was that the software would still be running ten, twenty years later.

TheCoelacanth · on Dec 21, 2016

And most of the programmers who assumed that their software wouldn't be around that long were probably right. It was just a few who had the misfortune of creating successful software that ran into the problem.

sundvor · on Dec 21, 2016

In ops, boring is good. :) Great work guys, we had no issues with our MS Software.

I recall spending significant hours during 98 and 99 on the various systems I was involved with, to check every tick box possible - applying updates where needed and verifying each and every piece of hardware and software.

Number of issues after we'd put in all the hard work: Zero. My reaction to all the people complaining about the lack of issues: Well, doh, _really_? What did you actually expect would happen after so much time and money spent?? Go home.

slyall · on Dec 21, 2016

I worked Y2k by myself in the Noc at an ISP in New Zealand ( aka UTC+1300 )

* Got paid $1000 bonus

* Called up by local reporter asking if any problems - Nope

* Called by several of our vendors/providers in the US just to say "Hello", and to casually ask how things were going.

* One ISP in Australia took themselves offline for the evening

* Rumors were going around Australia that all the power and the phones in NZ had died.

* The one little problem with a provider on the evening got a lot of people closely looking at it.

* There was fog and I didn't get to see the fireworks down town.

lostlogin · on Dec 22, 2016

The weather was truely awful. I went up One Tree Hill and just saw clouds light up with the fireworks.

CJefferson · on Dec 21, 2016

Raymond Chen is the tech writer who has influenced me the most. His blog has really helped me understand how large long-running software projects work (or at least, how they work at Microsoft), and how apparently strange features can come into existence in a sensible way.

One interesting evolution over time is how Windows moved from basically trusting all applications (which they had to do in Windows 3.X anyway, as any badly behaving app could crash the whole system) to treating applications as bad actors.

ikeboy · on Dec 21, 2016

I haven't seen anything from him on the whole Windows 10 forced update fiasco. I'm curious how that played out internally.

chadgeidel · on Dec 21, 2016

I think he writes his blog posts a year or so in advance. https://blogs.msdn.microsoft.com/oldnewthing/20090227-00/?p=...

CJefferson · on Dec 21, 2016

He mostly only discusses things where there is an interesting code angle, with the updating there is just politics really.

user837387 · on Dec 21, 2016

About Y2K: during that time my family kept hearing that really bad things could happen at midnight with the coming of the new millennium. One of the main things that could happen, I think, was the lost of electrical power. So, a couple of seconds before the new year I strategically positioned myself next to the light switch and when the clock hit midnight and everybody started yelling "happy new year" I turned the light off and yelled "we lost power". I wish I could have recorded the faces of shock and fear in some of my family members. It was funny as hell.

acqq · on Dec 21, 2016

I'm sure it's not how the whole Y2K process was running, it's just how the exact New Year's Eve event was planned.

I was personally also on ready-to-go-to-office stand-by, even not working in Microsoft, but in much, much smaller company. And my team did do a serious work in making the code we were responsible be Y2K proof, we've spent good part of 1998 with that.

In short, we've worked for almost a year to make that "nothing serious happened" result possible, in our own domain of responsibility. The highest management understood the problem and the process was properly planned. That "nothing serious happened" for our code is a success story of the quality of the tests we've did and used to fix the real problems. And they did exist. I'm sure there are other HN readers who can tell the similar stories.

The computer-related risks are a serious subject.

I remember following the news on the New Year's Eve to see what's happening in Japan and Australia. As also "nothing serious" happened there, I was ready to bet that everything will be fine, and that enough other companies also took the subject seriously and acted early enough.

keithpeter · on Dec 21, 2016

I'll admit to a pause for reflection around 9pm UK time (Midnight Moscow time). After that, no worries. ADA has robust time and date libraries.

AnimalMuppet · on Dec 21, 2016

I hadn't thought of that as a serious possibility, but a friend and I considered writing a novel on it.

The basic idea was that a bunch of US weapons systems wouldn't work due to Y2K - things like ballistic missile navigation. So the US was frantically trying to get all this patched so that they would have a credible defense after Y2K. The Chinese knew this, and launched on New Years Day...

... and completely missed, because they had used borrowed Russian code for their ballistic missile navigation. The Russians had stolen US ballistic missile navigation code, which had the Y2K issue in it.

acqq · on Dec 21, 2016

> I hadn't thought of that as a serious possibility

Not serious?

https://en.wikipedia.org/wiki/List_of_military_nuclear_accid...

http://arstechnica.com/tech-policy/2013/12/launch-code-for-u...

"Launch code for US nukes was 00000000 for 20 years"

http://foreignpolicy.com/2014/01/21/air-force-swears-our-nuk...

So it was not actually eight zeros but apparently 6 zeroes and the "key under the doormat" (in the safe, but not really something you needed the president to access, the opposite of what was claimed then).

AnimalMuppet · on Dec 21, 2016

That's all very different from "launch because of a Y2K bug", though...

acqq · on Dec 21, 2016

No technology is perfect, it's always a process.

And effectively every computer-related technology has undiscovered bugs.

https://around.com/ariane.html

""The board wishes to point out," they added, with the magnificent blandness of many official accident reports, "that software is an expression of a highly detailed design and does not fail in the same sense as a mechanical system." (...)

(...) really important software has a reliability of 99.9999999 percent. At least, until it doesn't. "

The statistics is against that generous number of nines.

tyingq · on Dec 21, 2016

I wonder when the wind up for the 2038 problem (32 bit signed int and unix epoch) starts.

tyingq · on Dec 21, 2016

If you're thinking it's already largely resolved, this is from 64 bit mysql. I assume similar issues exist in other software.

  mysql> SELECT UNIX_TIMESTAMP('2037-11-13 10:20:19');
  +---------------------------------------+
  | UNIX_TIMESTAMP('2037-11-13 10:20:19') |
  +---------------------------------------+
  |                            2141742019 |
  +---------------------------------------+
  1 row in set (0.00 sec)

  mysql> SELECT UNIX_TIMESTAMP('2039-11-13 10:20:19');
  +---------------------------------------+
  | UNIX_TIMESTAMP('2039-11-13 10:20:19') |
  +---------------------------------------+
  |                                     0 |
  +---------------------------------------+

otterley · on Dec 21, 2016

See also "System call conversion for year 2038" (https://lwn.net/Articles/643234/) for a brief discussion of the Linux case.

Manozco · on Dec 21, 2016

Works in PostgreSQL 9.6.1

  postgres> SELECT extract(epoch from to_timestamp('2037-11-13 10:20:19', 'YYYY-MM-DD hh24:mi:ss'));
   date_part  
  ------------
   2141720419
  (1 row)


  postgres> SELECT extract(epoch from to_timestamp('2039-11-13 10:20:19', 'YYYY-MM-DD hh24:mi:ss'));
   date_part  
  ------------
   2204792419
  (1 row)

raverbashing · on Dec 21, 2016

Well, we still have 21 years to solve it

tyingq · on Dec 21, 2016

A bit pedantic, I suppose, but Y2K problems showed up well ahead of Y2K. Things like credit cards with expiration dates well in the future, and validation code making comparisons.

Moru · on Dec 22, 2016

Yes, our system doesn't work on some operating systems because we for some reason use dates after 2050 as a flag for something. And that breaks together if we testrun on windows but works fine on Linux.

stevewilhelm · on Dec 21, 2016

Unless of course, you are trying to store the maturity date of a thirty year bond or mortgage.

raverbashing · on Dec 21, 2016

You store the ISO date, you don't need a UNIX epoch to calculate that

jdub · on Dec 21, 2016

You "store" it that way (as a string? no), but how are the relevant systems (database, OS, application) handling calculations with it?

ams6110 · on Dec 21, 2016

You should never store or work with dates as unix epoch-based timestamps. Use proper date/time datatypes, and manipulate them with library functions. Naive integer arithmetic on unix timestamps will bite you in the ass a hundred different ways.

jdub · on Dec 22, 2016

> Use proper date/time datatypes, and manipulate them with library functions.

The point is: What do they do? There are plenty that will use... Unix timestamps (or worse). Or use those as the lowest common denominator for interchange. There are more systems at play than the data layer, and all of them will take every opportunity to bite you in the arse.

tyingq · on Dec 21, 2016

Certainly a good idea, but sometimes it's difficult to tell if OS functions and 3rd party libraries you depend on are doing that.

The ext4 filesystem in Linux, for example, uses epoch time. I think they have it all fixed now, but there were bugs still out there as recently as this year: https://bugzilla.kernel.org/show_bug.cgi?id=23732

camiller · on Dec 21, 2016

that many years ago it was probably stored on tape and processed on a mainframe by a COBOL program. No database, it was probably stored as a string, and the application would likely have been fix to handle it before it was a problem.

acdha · on Dec 21, 2016

That COBOL program almost certainly _is_ a database – not a SQL database but the COBOL language standard specifies structured records and allows you to specify whether a file is sequential or indexed (see e.g. https://en.wikipedia.org/wiki/COBOL#Data_division).

In the case of Y2K, the problem was often that people defined a date filed as three values all declared as PIC 99 (i.e. a two-digit number). If they migrated to 4 digits, we're fine until Y10K. If they added a window (values less n are 19xx, etc.) or switched to Unix timestamps or a SQL database – there are products which let a COBOL runtime transparently map the indexed file semantics to SQL statements – then it requires more information to say whether it's at risk.

wolfgang42 · on Dec 22, 2016

I work with a COBOL database, and interestingly all of the dates are declared as four PIC 99 fields, like this:

    02  ITM-LAST-CHG-DATE.
      03  ITM-LC-DATE-CC    PIC 99.
      03  ITM-LC-DATE-YY    PIC 99.
      03  ITM-LC-DATE-MM    PIC 99.
      03  ITM-LC-DATE-DD    PIC 99.

I assume this split between 'CC' and 'YY' is a relic of the Y2K updates, so they could add in a whole new field with a default of '19'.

raverbashing · on Dec 21, 2016

mysql> SELECT TIMESTAMPDIFF(YEAR,CURDATE(), "2050-01-01");

  +---------------------------------------------+
  | TIMESTAMPDIFF(YEAR,CURDATE(), "2050-01-01") |
  +---------------------------------------------+
  |                                          33 |
  +---------------------------------------------+

Seems to work fine

jdub · on Dec 22, 2016

That's just the database, and one that we happen to know works.

raverbashing · on Dec 22, 2016

The original comment referred to exactly this database, so it is relevant

Splines · on Dec 22, 2016

http://stackoverflow.com/questions/12067697/convert-current-...

One of the top results for "convert date to integer". People who have an integer on one hand, a datetime on the other are going to see this code and use it (hey, it works) without understanding the ramifications.

I suppose one can only hope that such code is brittle enough that other problems bring it down before 2038.

fanf2 · on Dec 21, 2016

At least 10 years ago...

http://thedailywtf.com/forums/thread/78254.aspx

http://www.mail-archive.com/aolserver@listserv.aol.com/msg09...

baus · on Dec 21, 2016

My first professional programming job was to fix a Y2K bug in 1995 in an embedded system that hadn't shipped yet

CurtMonash · on Dec 22, 2016

I was born on January 1, 1960.

So January 1, 2000 was going to be either my 40th birthday, or else my -60th. :)

Spooky23 · on Dec 22, 2016

Where I worked after college, Y2K was the land of honey. It was a .gov that essentially was able to get unlimited overtime funds for Y2K. Everyone participated and my understanding is that people were "working" 18-20 hours a day in preparation for Christmas 1999.

harry8 · on Dec 22, 2016

y2k was a massive deal that was totally solved by great engineering. The full range of engineering approaches, the full range of talents, the full range of management interference brought to bear in every country, company, department etc. and all of it worked out equally well so there was no problem. Us engineers, we're amazing, we should take a bow.

Or maybe, just maybe it was just a teensy weensy bit overblown so consulting companies could charge really big fees especially to government departments and banks. Easiest way to sell is to scare the st out of people then show them "the solution." I bought an invasion like that once, regret it now...

stevewilhelm · on Dec 21, 2016

We spent the two years prior to Y2k replacing our client's old mission critical servers and services with new 'Y2k certified' hardware and software.

I personally think this contributed significantly to the dot.com era growth.

mSparks · on Dec 21, 2016

Yep, and subsequent bust.

Many many companies getting not significant cash inflows in the run up to 2000 which made their balance sheets look great, until the end of the 2000 tax year in 2001 when they didn't look so great any more.

firebones · on Dec 22, 2016

That's a great point that I had forgotten.

In terms of y2k effort, it seemed to me to be a small amount of code change and a disproportionately high amount of testing. The ramp up in hiring for those changes and certification, combined with the lost opportunity cost, did contribute to a later contraction.

Here's one thing that puts it in perspective though. The amount of code in the world that was Y2K-affected was considerably less than the total amount of code that exists today. And stepping outside the HN startup bubble (where code is relatively young and modernized) and into the types of companies affected by Y2K, there are many efforts that are as bad or worse occurring all the time. Stuff like regulatory changes (SarbOx or various banking reforms), technology uplift (e.g., 32bit --> 64bit, Win32-->.NET, platform ports, etc.) are as bad or worse based on the amount of legacy code it affects.

But what Y2K did is teach a lot of these companies how to solve sweeping, codebase-wide problems.

raverbashing · on Dec 21, 2016

Now that I realize, problems caused by Y2K were much fewer than problems caused by leap seconds or "unexpected" leap days

acqq · on Dec 21, 2016

Only because the former was widely known, discussed and the measures taken, and the later was typically badly understood and almost nobody cared.

Quick, do you know, does your computer system ever display the 61st second? And do you think it should?

TazeTSchnitzel · on Dec 22, 2016

It most likely does, but blink and you'll miss it.

acqq · on Dec 22, 2016

The answer is: it is complicated.

POSIX:2001 specifies:

"As represented in seconds since the Epoch, each and every day shall be accounted for by exactly 86400 seconds."

And that's typically no problem for normal use. The specialists know that there are different time standards and that for "real" number of seconds one has to use TAI, not UTC.

The problem in Unix world with NTP and the datetime algorithms was that some programmers believed that they have to actually see the leap second on their own computers in the kernel timestamps, up to the kernel intentionally producing discontinuities for kernel times (behavior which never had sense for timestamping purposes but implemented as such anyway). So now we have the configuration variations like this:

https://access.redhat.com/articles/15145

and, to avoid Linux kernel discontinuities:

https://developers.google.com/time/smear

In fact, the smoothing of UTC and using TAI for those who need the "real number of seconds" since point x was known as the reasonable approach long ago:

https://www.cl.cam.ac.uk/~mgk25/time/utc-sls/draft-kuhn-leap...

Now it's clearer why it's complex: too much people locally "assumed" what was not to be assumed and didn't understand the effects of their local decisions and the global context.

Hopefully "smearing" will get standardized and accepted and nobody will have to care, except the specialists who really need TAI. The leap second corrections should be invisible for normal uses, just like nobody cares for correcting the clocks for much bigger differences.

divbit · on Dec 21, 2016

This hacker-fiction y2k is fun (if quite dated now): https://www.amazon.com/Wyrm-Mark-Fabi/dp/0553578081