“A Windows 7 deployment image was accidently sent to all Windows machines”

perlgeek · on May 16, 2014

https://twitter.com/DEVOPS_BORAT/status/41587168870797312

"To make error is human. To propagate error to all server in automatic way is #devops."

Frankly, I'm surprised things like this don't happen more often. Kudos for the incident management. Also a big plus for having working backups, it seems.

IgorPartola · on May 16, 2014

@DEVOPS_BORAT is actually very insightful in about 1/5 tweets. Snide for sure, but there are quite a few good points in there if you read carefully:

"In devops we have best minds of generation are deal with flaky VPN client."

"Single point of failure in private cloud is of usually Unix guy with neckbeard."

These are gold.

Edit: based on the above advice I once grew out a neckbeard while going through a multi-month rollout of a large product. It itched like crazy, but I did work much faster to get rid of it.

joezydeco · on May 17, 2014

"In devops is turtle all way down but at bottom is perl script."

ejain · on May 17, 2014

I thought Perl is what is holding the turtles together?

mst · on May 17, 2014

It's turtles.pl all the way down.

tragic · on May 17, 2014

For sibling post:

"Turtles all the way down" is a "a jocular expression of the infinite regress problem in cosmology posed by the "unmoved mover" paradox."[1]

http://en.wikipedia.org/wiki/Turtles_all_the_way_down

crashandburn4 · on May 17, 2014

So, I feel like I might be being stupid and not getting something, but what is turtle? I can't find a programming language that seems to be related to it.

count · on May 17, 2014

http://en.wikipedia.org/wiki/Turtles_all_the_way_down

jpwgarrison · on May 17, 2014

Count is correct, but I did have a flash to my childhood: http://en.wikipedia.org/wiki/Logo_(programming_language)

crashandburn4 · on May 18, 2014

That was my first thought too but I figured they couldn't be referring to that. :)

Havoc · on May 17, 2014

>best minds of generation are deal with flaky VPN client

So true. I'm on the receiving side of this..."No you can't work on that multi million deadline project of yours...the only way to fix the VPN is to re-image the machine back at head office [an international flight away]". Me..."Could you repeat that?" And thats a Cisco Enterprise VPN...(turns out IT was right...re-image & avoid conflicting software is the only solution). So much for Cisco...

philtar · on May 17, 2014

Are you calling yourself one of the best minds of our generation?

Havoc · on May 17, 2014

Hardly. No I meant on the receiving side of techs trying to fix VPNs.

c0nsumer · on May 17, 2014

Professionally I deal with much of the fallout from problems such as yours, and leading techs doing this kind of work. It really sucks, but for many problems like this the choice becomes spend-four-hours-reimaging-the-machine or spend-unknown-period-of-time-trying-to-fix-new-problem. The latter would be great if it was less than four hours, but it's often not, and until that time you / the user are without a machine.

After an hour or so of troubleshooting it's usually better to go with the reimaging, since all you / the user wants is to get back to working.

Ideally I try to get the entire broken machine captured and the user issued a new, fixed machine because then a fix can be developed and documented, but for those who end up in a new failure mode, it sucks. And with something like the Cisco VPN Agent? That's not uncommon at all...

Havoc · on May 17, 2014

>spend-four-hours-reimaging-the-machine or spend-unknown-period-of-time-trying-to-fix-new-problem

Definitely. In our case its 8 hours minimum though for a re-image. Somehow the FDE makes pulling the old data off the machine slow.

You've got my sympathies though - I'd not like to be the one doing the IT in these cases. Can't be fun troubleshooting IT with that kind of time pressure.

c0nsumer · on May 18, 2014

Thank you. It really, honestly is hard on our tech because they feel the pressure from all sides. Eight hours sounds rough for a reimage. I think ours are... maybe two or three? We've done a lot of work to get the reimage time down, and Win7 (WIMs) have made this really nice.

If this is something that smells of a bigger problem (or has been seen elsewhere) then I push for them to get the user a wholly new machine, capturing the old one for analysis. If the user is given an upgraded machine, then there is usually little resistance, even with the downtime that'll be incurred.

On the upside, if the issue can be reproduced readily, from this we can almost always get root cause and put a systemic fix in place. If it's sporadic... Well... I'm sure you understand how it goes trying to fix something that you can't yet reproduce. ;)

(I'd love to troubleshoot your slow data backup issue... That's the stuff I rather enjoy.)

Havoc · on May 18, 2014

>I'd love to troubleshoot your slow data backup issue... That's the stuff I rather enjoy.

I'm not directly involved with the tech side so I don't know the details. I gather they pull the old data off the disk using some offline low-level tool though (like you would for harddrive damage recovery). Between that and the encryption its somehow very slow. No idea why its like that though.

>get the user a wholly new machine

I wish it was the same here. They just give loan machines :/

Anderkent · on May 17, 2014

Is it that surprising? I'd bet the average devops is in the top quartile, if not the top 10%.

IgorPartola · on May 17, 2014

Is that a statistics joke?

Anderkent · on May 19, 2014

I guess it depends on where your line for 'best minds of the generation' lies. If it's the top 25%, I wouldn't be surprised that many software devs / devops people lie in that category.

Nexxxeh · on May 17, 2014

At the risk of asking a potentially dumb question, why can they only reimage the machine at head office?

Havoc · on May 17, 2014

Not dumb at all. This is a professional service firm, so there is no real head office per se, but rather your "home office" - I just simplified it a bit for hn purposes.

Couple of reasons. Each country rolls their own custom image. Plus I need an office that has the encryption keys for the full disk encryption. Plus only 3 offices globally carry copies of my data (used when they can't pull the data off the hdd).

If I'm flying anyway I might as well go to home office - I know they have all the required stuff for my laptop.

tetha · on May 17, 2014

Same for TheCodelessCode. A lot of these are cryptic and weird, but some are pure gold. Especially since no one understands the koans until they fall flat on their face just like the student does and a huge floodlight turns on.

appplemac · on May 17, 2014

In startup we gamify operations: 3 infrastructure failures, and ops team are out.

brokenparser · on May 17, 2014

> These are gold.

But not English, I can't make sense of them.

toomuchtodo · on May 16, 2014

Devops Borat: This not mistake. This how we test recovery operations in production.

ibisum · on May 17, 2014

This one goes on the first page of my playbook as of today.

shill · on May 16, 2014

Worst mistake is always happen in batch. --Devops Borat

https://twitter.com/DEVOPS_BORAT/status/251885078794366976

wiredfool · on May 16, 2014

The most frightening movie ever for sysadmins -- The Sorcerer's Apprentice.

jerf · on May 16, 2014

My favorite personal variant: "To err is human. To screw up a million times per second, you need a computer."

antmldr · on May 17, 2014

>Frankly, I'm surprised things like this don't happen more often.

They do. This happened to the largest bank in Australia mid 2012[1]. Very similar circumstances. I've been told that SCCM's UI doesn't help here- something about the default action when nothing is selected to apply it to all devices managed by SCCM. Someone more familiar with SCCM may want to correct me here.

[1] http://delimiter.com.au/2012/07/30/disastrous-patch-cripples...

mey · on May 16, 2014

I wish that account was still updating.

misnome · on May 17, 2014

I found an alternative in https://twitter.com/BigDataBorat, not quite as insightful but still occasionally funny

reitanqild · on May 17, 2014

For anyone who hasn't read it yet the last tweet on that account is gold.

ar7hur · on May 16, 2014

"In startup we are practice Outage Driven Infrastructure."

wiml · on May 16, 2014

I'm not sure that's so unlike Netflix's Chaos Monkey.

timClicks · on May 16, 2014

Propogation of a software mistake is what appears to have caused the Gmail outage of 2011 http://youtu.be/eNliOm9NtCM?t=28m49s

laumars · on May 16, 2014

I think it does happen often but isn't as well reported. I certainly know of more than one place that's suffered from this kind of accident (thankfully not places I personally work so I've not had to deal with the fallout. These are places I have friends or family who work there)

miles · on May 16, 2014

Snark and sarcasm aside, I am impressed with the level of detail that the IT department is sharing; it is refreshing to see such a disaster being discussed so openly and honestly, while at the same time treating customers like adults.

harrystone · on May 16, 2014

It is impressive. I can't imagine dealing with that kind of nightmare.

I once worked as an admin on Solaris boxes at a big pharma company. There were ~77,000 users in their LDAP directory. I was very careful.

cfreeman · on May 18, 2014

That sounds terrifying.

keithpeter · on May 16, 2014

At one place I worked, in the days of XP, the 'index server'(?) had a problem and un-installed all the application programs. The basic OS was there but MS Office, MSIE, all the doodads just got removed as each machine logged in.

This was a small college, so the IT guys just went round explaining and told us to log out and log in again. Applications re-installed. No data loss and so no shouting.

Stuff happens. We did 'assignment action planning' that morning: mind maps, essay plans and research ideas. Results better than normal anyway

filmgirlcw · on May 16, 2014

100% agree. My initial response was to laugh, but dude, I so feel for the team that has to clean the mess up.

swasheck · on May 16, 2014

It's not a standard across the board, but I've found academic institutions to be more honest about their technological mistakes (outside of large-scale breaches) than the private sector.

SoftwareMaven · on May 16, 2014

I worked at a company for several years that provided software to the IT groups in the higher ed vertical. What I found is there are two types of people in higher-ed IT (and they often congregate at different campuses):

First are the people who believe in the mission. They are really good, and are willing to take a cut in pay for some combination of social good, great working environment, etc. These kinds of people tend to be forthright about problems.

The second group are people who would struggle with the demands of the normal corporate world. They are getting paid less in higher ed and are worth what they are getting paid.

drivingmenuts · on May 17, 2014

When I was in college, I worked for the IT department. While there's politics and bullshit no matter where you go, the politics and bullshit in Ed IT was not that noticeable where I was.

Furthermore, the profs were always happy to see me coming because I fixed their broken stuff without pointing any fingers.

It was heaven.

PhasmaFelis · on May 17, 2014

I think I might actually be both of those.

song · on May 16, 2014

Yes, shit happens. It's how you solve it and make sure that it doesn't happen again that's important. With the way they're communicating, they seem to be on the top of their game

arde · on May 18, 2014

Snark and sarcasm somewhat aside, it's not like they could tell the users about the remediation progress through their intranet, so they probably didn't have many options besides posting it on the Internet for all to see.

magpi3 · on May 17, 2014

I imagine this is what the sysadmins conversations at Emory are like right now: https://www.youtube.com/watch?v=eOkAyUmyQko

TheRealWatson · on May 16, 2014

Yup. There's reason for some initial amusement but much respect for openly taking care of the problem. This will win them appreciation and confidence in the aftermath of this disaster.

nissehulth · on May 17, 2014

Indeed. Though the step described as "We are developing a more automated approach" is a bit scary. :)

beloch · on May 16, 2014

This reminds me of my undergrad CPSC days. The CPSC department had their own *nix-based mainframe system that was separate from the rest of the University. The sysadmin was a pretty smart guy who was making less than a third of what he could get in industry. Eventually he got fed up and left. About a week or two later the servers had a whole cascade of failures that resulted in everyone losing every last bit of work they'd done over the weekend (This was a weekend near the end of the semester when everyone was in crunch mode).

Long story short, the sysadmin was hired back and paid more than most of the profs. Academia may tend to skimp on salaries for certain positions, but sysadmins probably shouldn't be one of them.

gertef · on May 16, 2014

The BOFH sabotaged his systems before departing? Didn't build a backup plan? And they hired him back?

SDGT · on May 16, 2014

You know what? Fuck them. Fuck higher education completely. Undervalued and underpaid is the name of the game for any important IT roles in that shit hole of an industry.

Disclaimer: dev at a university.

beloch · on May 17, 2014

To be fair, a lot of the people working in support roles in academia are pretty much unemployable in the real world. They show up at 10 am, take constant smoke breaks all day, and leave at 3 pm. When I was in physics (which used the main university servers for most things) we had a sysadmin who was in charge of some printers and a couple of server boxes. He had inherited those boxes from a former student who set them up, but he was functionally illiterate in managing them. At one point I needed a package installed. Not only could he not figure out how to install a package on a ubuntu server on his own, he couldn't do it with emailed instructions either. I had to go up and physically stand over him telling him what to click on and what to type. To make matters worse, he was so hard to actually catch "in the office", that I had to have the department secretary (whose office he was next to) alert me when he showed up. Not surprisingly, the functions of those servers were soon moved to desktop machines in various offices. As far as I know he's still working there though. He's a union employee and it would be a ride through deepest, hottest, hell to get rid of him.

Note: I am not saying all university support staff are like this. Some definitely are though, and they're probably the reason why good people sometimes find it hard to be properly remunerated in academia.

misnome · on May 17, 2014

> Note: I am not saying all university support staff are like this. Some definitely are though, and they're probably the reason why good people sometimes find it hard to be properly remunerated in academia.

Certainly not everyone is like him - but I'd wager every university has at least a couple of people like him (we definitely had one, again, in physics)

angersock · on May 17, 2014

Preach it.

That said, if you get a cushy job, it can stay cushy for a long time.

Fomite · on May 16, 2014

Reminds me of some emails that went out at my old university during a cluster outage, and got progressively more informal as the night went on, detailing people leaving dinners with extended families, a growing sense of desperation, etc. The last email might as well have ended with "Tell my wife I love her."

It was both direct and funny enough that I was only mildly annoyed that the cluster was down.

facorreia · on May 16, 2014

> A Windows 7 deployment image was accidently sent to all Windows machines, including laptops, desktops, and even servers. This image started with a repartition / reformat set of tasks.

Wow. That is very unfortunate, to say the least...

spiantino · on May 16, 2014

> As soon as the accident was discovered, the SCCM server was powered off – however, by that time, the SCCM server itself had been repartitioned and reformatted.

malkia · on May 16, 2014

I feel bad, but I laughed at this for few seconds...

__david__ · on May 16, 2014

I wouldn't feel bad. I guffawed at most of this story.

Not in a "haha, what a bunch of morons. Serves those jerks right!" kind of way, but more in a "oh dear, that's the worst thing that can possibly happen! Oh no it gets worse??". I've been through IT catastrophes (and caused a couple myself) and I could easily see this happening to me. Still, it's funny as anything.

rasz_pl · on May 16, 2014

We apologize again for the fault in the subtitles. Those responsible for sacking the people who have just been sacked have been sacked

Fuzzwah · on May 16, 2014

I've just been hired to run a project using SCCM to upgrade ~5000 PCs from XP to Win7.

This was amazing reading. Reading such a detailed wrap up of an IT team going through my worst possible nightmare was enlightening.

asdfaoeu · on May 17, 2014

Looks like it will be pretty easy these guys did it by accident.

sswaner · on May 16, 2014

It must be reassuring that your new employer is so proactive in getting that XP upgrade project running.

Sanddancer · on May 17, 2014

Sometimes these products get bogged down because of three important problems outside of the control of IT. First off, you need to get in touch with all the upstream vendors to get updates to any sort of custom software that has compatibility problems with newer versions of windows; Vista/7 got a lot more strict about giving admin access, for example, which may cause problems in the updates. Second, you've got to keep in mind the training costs. There are a lot of users who may be brilliant financial minds that can make numbers dance and bow to their whims, but get terribly locked up if an icon changes. Doesn't make them horrible people by any means, but you've got to keep it in consideration when planning a rollout. Finally, you have to keep in mind the petty turf wars. If Joe in Accounting gets the upgrade to 7 before Bob in Legal, Bob in Legal may feel slighted and start raising a holy shitstorm, even if he's scheduled to be upgraded a week later. Upgrades are ugly, no matter when they happen. Sometimes that proactive upgrade project takes many years just because of all the moving parts involved.

lugg · on May 17, 2014

I think you missed the joke.

stark3 · on May 16, 2014

There was a similar catastrophe at Jewel osco stores many years ago. Nightly, items added to the store pos were merged back with the main item file at each store location. The format of the merged data was exactly the same as loading a new file, except the first statement would be /EDIT instead of /LOAD.

One of the programmers decided to eliminate some code by combining the two functions, with a switch to control whether /LOAD or /EDIT was used for the first statement.

There was a bug in the program, and the edits were sent down as loads.

A guy I knew, Barry, was the main operator that night. He started getting calls from the stores after around 10 of them had been reloaded with 5 or 6 items.

Barry said it was the first time he got to meet the president of the company that day.

pierlux · on May 17, 2014

Failure should never be the only time you get to meet upper management :(

ceejayoz · on May 17, 2014

A night IT operator for an organization with 176 locations is pretty unlikely to ever meet the company president.

upwardbound · on May 16, 2014

Forgive my beginner question:

Since a reformat was done to the affected machines, does this mean that researchers' datasets, drafts of papers, and other IP were lost? Or were researchers' machines not affected?

wtallis · on May 16, 2014

In my experience with campus networks, home directories are never stored locally on any remotely-administered machines. Any specially-configured researcher's machine that stored data locally would not have been subscribed to get the automatically deployed OS images.

pbhjpbhj · on May 16, 2014

>"As soon as the accident was discovered, the SCCM server was powered off – however, by that time, the SCCM server itself had been repartitioned and reformatted." //

If the SCCM server was pushed the "update" then there doesn't seem much hope for other machines? Surely no rule should be able to format the server running the ruleset; seems like a failsafe failure there at least.

wtallis · on May 17, 2014

None of the storage servers should have been storing the user data on the same volume as the OS the way a client machine would. So the network-mounted home directories should be intact and ready to use once the server OS is reinstalled. And while I don't know how SCCM works, I'd be surprised if this image push was affecting anything other than the primary physical drive (a wipe-all, populate-one recipe would be too obviously wrong and dangerous, right?).

InclinedPlane · on May 16, 2014

Deletion and formatting doesn't necessarily destroy data, it just destroys the pointers to the data. If they're lucky the data can be recovered via software utilities (undelete) with backfill from backups. If they're unlucky then important and un-backedup data has been written over, and those people are going to be SoL.

johnchristopher · on May 16, 2014

Are there actual backup procedures out there that foresee and automate the restoration of wiped drives and partitions ? I might be wrong bu I doubt it should even be considered though.

fyrabanks · on May 16, 2014

Yes, there are a lot of options, both commercial and non-commercial, for full drive backups (then you restore the using those and the incrementals). Do that for your provisioning servers and you can redeploy a lot of the infrastructure based on that.

johnchristopher · on May 17, 2014

I see, Thank you. Somehow I was under the impression that automating and scaling the tedious work of analyzing and restoring boot sectors and the like couldn't be done. I suppose that it's easier to plan for though if you restore a whole drive from a backup image rather than restoring a random set of files by going on a sector hunt ?

8ig8 · on May 17, 2014

Mistakes are made. In related news...

Lawn care error kills most of Ohio college's grass

http://www.wral.com/lawn-care-error-kills-most-of-ohio-colle...

frogpelt · on May 17, 2014

I can understand the Embry problem a lot more simply because the distance between the decision and implementation of the decision is not that far.

How on earth do you "accidentally" load up enough weed killer to treat 54 acres of grass and never realize its the wrong stuff?

dfc · on May 17, 2014

How do you click/not-click the button in SCCM...? How are these two events any different?

unclesaamm · on May 17, 2014

Haha this is great. How did you find this?

Just to clarify for the passersby, it isn't the University of Ohio, rather, Findlay University

8ig8 · on May 17, 2014

Coincidently on the local news website.

> Findlay University did not release the name of the company that made the mistake but is working with the business' insurance company to pay for it to be fixed.

Findlay apparently doesn't value transparency as much as Emory.

> The university says grass was killed on as many as 54 of the campus' 72 acres.

dredmorbius · on May 17, 2014

Makes you wonder what chemicals are going into lawns generally.

One of the lessons from my college days (informally acquired, take with appropriate quantities of salt) was that walking barefoot was as much a risk for chemical exposure as puncture wounds.

pling · on May 16, 2014

Not quite as disastrous but when I was at university the resident administrators configured the entire site's tftp server (everything was netbooted Suns) to boot from the network. This was fine until there was a site-wide power blip and it was shut down. When it came back it couldn't tftp to itself to boot because it wasn't booted yet (feel the paradox!). Cue 300 angry workstation users descend on the computer centre with pitchforks and torches because their workstations couldn't boot either...

Bad stuff doesn't just happen to Windows networks.

rfolstad · on May 16, 2014

On the bright side they are no longer running XP!

rfrey · on May 17, 2014

My nomination of the top bullet point of 2014:

* As soon as the accident was discovered, the SCCM server was powered off – however, by that time, the SCCM server itself had been repartitioned and reformatted.

chromaton · on May 17, 2014

Reminds me of The Website Is Down, episode 4: https://www.youtube.com/watch?v=v0mwT3DkG4w

Fuzzwah · on May 16, 2014

I was just watching the "What’s New with OS Deployment in Configuration Manager and the Microsoft Deployment Toolkit" session from TechED and hit the section on "check readiness" option which MS have added to SCCM 2012 in R2. It sounds like having this in part of the task sequence at Emory would have (at the very least) stopped this OS push from at least hosing all the servers.

http://channel9.msdn.com/Events/TechEd/NorthAmerica/2014/PCI...

randlet · on May 16, 2014

Reading that just made me feel sick to my stomach and my heart goes out to the poor gal/guy that pushed "Go" on that one. Shit happens, but a screw up that big can be devastating to ones psyche.

grumblepeet · on May 16, 2014

I _very_ nearly did this whilst working for a University back in the early noughties. Luckily I managed to get to the server before the "advert" activated and wiped out everything. It was so easy to do I am surprised that it is stil possible. I feel for their pain, but it does sound like they are doing a good job of mopping up. I did allow myself a snort of laughter when I read the bit about the server being re imaged as well. That is pretty darn impressive carpet bombing the entire campus.

tgma · on May 17, 2014

As soon as the accident was discovered, the SCCM server was powered off – however, by that time, the SCCM server itself had been repartitioned and reformatted.

I guess that's how robot apocalypse is gonna look like.

smegel · on May 16, 2014

Automation can also mean automated disaster.

Already__Taken · on May 16, 2014

Bigger tools have bigger sharper edges. It's important the message is stressed as the developers to operators worlds merge in the middle.

ameoba · on May 17, 2014

Why did the Roman Empire collapse?

What is the Latin for office automation?

batmansbelt · on May 16, 2014

...Or automated the-opposite-of-disaster.

sergiotapia · on May 16, 2014

Isn't this more the fault of the system architect than the guy who accidentally fired the bad deploys?

It's similar to a database firehose: If you accidentally start deleting all data you should have a quick working backup ready to quickly bring the dead box up to production.

fred_durst · on May 16, 2014

I don't know. This could very well be a case of not much more than a bad drag and drop in SCCM. Its not quite that simple, but I'm not sure this is some custom process they setup.

not_kurt_godel · on May 18, 2014

Any tool that allows you to easily perform the antithesis of its function without making it abundantly clear what clicking the OK button will do is fundamentally broken.

tbyehl · on May 16, 2014

I've built a few systems for deploying Windows... and the last thing that every one of them did before writing a new partition table and laying down an image was to check for existing partitions and require manual intervention if any were found.

imgur · on May 17, 2014

  > As soon as the accident was discovered, the SCCM server was powered off – however, by that time, the SCCM server itself had been repartitioned and reformatted.

That made me laugh. Poor SCCM server :)

svec · on May 17, 2014

With great power comes great responsibility.

deckar01 · on May 17, 2014

"As soon as the accident was discovered, the SCCM server was powered off – however, by that time, the SCCM server itself had been repartitioned and reformatted."

Unicast fail.

keehun · on May 17, 2014

I asked my friend attending Emory right now, and he didn't even realize anything was going on. He says that the Emory IT department has a notorious distinction on campus as being regularly terrible, mostly with an unreliable internet connection.

However, it looks like they handled this accident the best they could! Perhaps this accident would not have happened at a more reliable IT department.

ww520 · on May 17, 2014

Disasters as well as mistakes are unavoidable, such is life. A hallmark of a competent organization is how they handle the situation and recover from disasters or mistakes.

So far all the signs have indicated they are doing great in recovering. I just hope there won't be onerous processes and restriction afterward due to desire on "make sure it won't happen again" stance.

durkie · on May 17, 2014

hah! delighted to see this here.

my roommate works at the emory library and has had a fun slow week there of coming home early many days because no one could do work. they were apparently also given laptops as an interim solution, but those somehow also wiped themselves eventually (?).

poor IT people...just as they're starting to get a handle on the actual sitation it starts blowing up on the internet.

zacharycohn · on May 16, 2014

I thought this "accident" may have been on purpose... until they mentioned the servers.

In my days of university tech support.

sorennielsen · on May 16, 2014

This happened at one a former workplace too. Only the Solaris and Linux servers was untouched.

It "mildly" amused the *nix operations guys to see all the "point and click" colleagues panic.

k_sze · on May 17, 2014

Funny how they mention iTunes as one of the "key components" that are restored first, whereas Visio, Project, Adobe application are relegated to a second round.

zachlipton · on May 17, 2014

Presumably iTunes is part of their base system image for all workstations, along with Office, Firefox, Adobe Reader, and the like. In other words, a basic set of software to handle standard officework tasks. iTunes is free and IT would probably rather distribute it everywhere than have people trying to install it themselves (or calling the helpdesk to get someone with administrator rights to do it). They then offer additional applications on an as-needed basis to individuals and departments with specific tasks. So the designers who do print publications and the faculty who teach digital art might get the Adobe suite, while people in Facilities who plan construction will get Project. This keeps licensing costs down and simplifies systems according to their uses.

commandar · on May 18, 2014

Generally you're going to keep your base images in SCCM limited to software that's only infrequently updated. Otherwise somebody has to update the entire image every time an update gets pushed out. Instead, you package them up and deploy the apps on top of the base image at install time. It takes a little longer to deploy, but it takes less admin time to manage since the actual installs are automated anyway.

lstamour · on May 17, 2014

Not hard to imagine. The others likely have specialty licenses and so aren't as easily distributed to everyone. In addition, Adobe software itself wasn't working earlier ;-)

krexit · on May 17, 2014

It's possible they're delivering academic materials through iTunes, I suppose.

gojomo · on May 17, 2014

"... to the cloud!"

http://www.youtube.com/watch?v=jR6xbulUmsg

"Yay, cloud!"

nissehulth · on May 17, 2014

Next time I'm about to complain about a bad day at the office, I will read this story again.

lucio · on May 17, 2014

reads like a short dystopian novel

CamperBob2 · on May 16, 2014

stark3, you seem to be hellbanned.

taspeotis · on May 17, 2014

What gives you that idea?

https://news.ycombinator.com/item?id=7758203 is a dupe of https://news.ycombinator.com/item?id=7758193

CamperBob2 · on May 17, 2014

(Sorry, didn't see the dupe.)

mantrax5 · on May 16, 2014

You know how in movies you need at least two people to bring their special secret keys, plug them in, and turn them at once to enable a self-destruct sequence?

That is a real principle in interface design - if something would be really, really bad to activate unintentionally, make it really, really hard to activate.

If you design a nuclear missile facility, you don't put the "launch nukes" button right next to "check email" and "open facebook".

Same way it shouldn't be easy for users to delete or corrupt their data by accident due to some omnipotent action innocently shoved right in between other trivial actions.

I wouldn't blame the person who triggered this re-imaging process. I'd blame those who designed the re-imaging interface, to allow it to happen so easily by accident.

kalleboo · on May 17, 2014

In my experience, the key is that the UI makes it clear exactly what you're doing. What I mean is, instead of a button that says "Start Imaging", it should be "Start Imaging of the 12,600 computers this rule applies to". Of course, that's a lot more work for the programmer, so it's never done.

7952 · on May 17, 2014

It also helps to have sensible conventions for naming hosts and groups. If you need to select a subset of machines then you are sometimes going to make a mistake as simple is getting a wildcard pattern wrong. Instead have groups with explicit and obvious names that require no memory to understand.

lstamour · on May 17, 2014

But then that leads to people automatically clicking "Yes" to the "Are you sure..." dialog. Though even I would pause at "Are you sure you want to reformat 12,500 machines including this one?" ;-)

rectangletangle · on May 17, 2014

Having people type in some sort of auto-generated confirmation key might give them pause, seeing as the technique is seldom used.

kalleboo · on May 17, 2014

Or in my original example, have them enter in the # of computers it's going to apply to.

mantrax5 · on May 17, 2014

Event that is hardly enough. There should be a physical (well virtual physical) obstacle towards launching a high stakes command.

The system should be able to assess the scope of a task, and ask you to confirm 10 times if it has to, in blinking red dialogs, to make sure you really want to do what you are doing.

Of course, it's crucial that "clicking 10 times" is not the default behavior for any trivial action. Or boredom and the subsequently formed mechanical 10-click habit of the operator will kill the effectiveness of this approach...

leccine · on May 16, 2014

We accidentally re-imaged all of the Windows servers with Linux the other day. Nobody noticed though...

jacalata · on May 16, 2014

That probably means nobody is using the servers you maintain and your department should be eliminated completely.

leccine · on May 16, 2014

Or somebody was just joking... Who knows...

marcosdumay · on May 16, 2014

Looks like two people can joke at the same time.

jacalata · on May 16, 2014

maybe I was joking too!

zyx321 · on May 18, 2014

I've found that jokes tend to get downvoted on HN. Your post seems to be still in the black.

leccine · on May 19, 2014

No sense of humor is bad for you. But on HN it is normal.

negativity · on May 17, 2014

Oh, you!

filmgirlcw · on May 16, 2014

I've never been prouder of my alma mater. /s

gt1337h · on May 16, 2014

You should be. Emory is the 2nd best school in Georgia! :)