There has been too much recently of government agencies losing data due to "crashes" or "inadvertent" deletions. For example the IRS, state department, and now the Air Force.
A really easy way to fix this is to make a law so that if government data such as this gets lost, then the cabinet level person responsible for overseeing the agency immediately loses their cabinet level position and is barred from further work with the federal government.
This would do wonders to align incentives all through the federal agencies when it comes to data safeguards.
This idea is too simplistic. Although I used to support it. I've changed my position from yours to 'retrained and demotion' position. Since people make mistakes, I think being 'barred from further work' is counter-productive and the government would lose out on many able and competent workers. Heck, this idea could also be exploited by lower-level workers due to not liking their higher-up bosses; i.e. a lower-level worker intentionally screwing up and costing the job of the higher-up boss because of the 'new law' stating the higher-up is now 'barred' from further govt work. Easy way to take out the competition going up a ladder.
P.S. I think legislating accountability is an issue the public needs to debate. In all countries. The US has seen numerous gross negligence/errors (heck one is too many) in various levels of govt (fed, state, local). I hope we can come up with a better solution than 'fire them and bar them' and even mine of 'demote/retrain'. There has to be a better way to legislate govt accountability.
Indeed, such ideas would create an atmosphere of fear and increase cover-ups, rather than solve problems. There needs to be a solution that acknowledges the inevitably of mistakes and still deals with the issue.
Exactly this. How many of us would be up in arms if developers were fired for making mistakes? You just need to look at all the stories around rm -rf...
Maybe thats exactly what should happen?
I mean, whole "security" antics where companies loose tens of millions of logins/credit cards numbers and it all ends with "southpark BP apology"
People fuck up epically and get a slap on a wrist with a promotion, I mean, come on
so what if tomorrow your bank says it has lost all records due to programming/tech error? who is responsible there? why is such sensitive data (air force etc) should be treated differently than cash/finance data?
If you have backup, a crash won't lose months of data. I'm ok to lose a so called "competent worker" if he/she hasn't been doing something as simple as off-site backup.
It's local data corruption here. At worst you loose the data since the last backup.
Now, in the army, they have redundancy procedures for everything, and you want to make us believe the one server used to keep them in check, not only had crash beyong repair, but has no backup ?
You'd be surprised how tenuous backup situations are at most companies and organizations. Even if your company is doing full disaster recovery checks 24/7, one after the other, a "well-placed" failure/mistakes or small series of failures/mistakes can lead to data loss.
Many of the things that cause data loss are just simple mistakes caused by a failure to review changes carefully, even when you have multiple levels of review- I would dare to say especially when you have high confidence that someone else is reviewing your changes.
"Manual" data changes, e.g. executing SQL statements or scripts that execute SQL that aren't a part of your application, in my experience are the most common cause of data loss.
After manual data changes the second most common in my experience is not understanding what you are doing. For example, you might take a chance on an upgrade that fails because you have to meet a deadline.
Changes to application code are next. Typically when making changes to an application, a little more thought may be put into it than a one-off data migration or change, but if you are under time pressure, don't know what you are doing, or are assuming someone else will catch your mistakes, you could easily screw everything.
Following this- mistakes that cause hardware or software failure. I worked at one large organization where storage arrays with various power backups were just "turned off" by a contractor that didn't understand the impact of what he or she was doing.
Finally, you might have configuration issues or the hardware might just fail.
Really, there is no substitute for having your data backed up frequently, and in a way you know how to easily restore and have tested, by building another machine from the ground up to replace it and documenting and practicing that well. Very few do this frequently. And even if you do- what if all of your hardware were destroyed? Can you easily go out and buy something off the shelf with instructions you have in your head or stored safely around the world and rebuild everything?
Please note the severity of this. It's not "We had a crash and lost a weeks worth of data" - yeah, that happens. And it's totally acceptable if you're unable to restore it, for any of the reasons you mention.
This is "We had a crash and lost over a decade's worth of data" - yeah, that shouldn't happen. Ever. If you don't have a working backup from sometime in the past 13 years, -you are doing something wrong-. That's not a small series of failures or mistakes. That's either staggering levels of incompetence, or maliciousness.
This is exactly right. A misconfigured backup script, or a pessimal chain of crashes, or a lot of other things could lose recent data. I wouldn't be shocked by data loss.
This is decade of data. It should have been on in cold storage, on tested media, in multiple locations. No single error, or small chain of errors, should have enabled this to happen.
We are not talking about "most companies". We are talking of the air force, an organisation that is built to be resilient and handle crisis. Backups, fallbacks, redundancy, checks, reviews, access management and risk evaluation are BAKED IN their culture and mission.
And we are not talking about "any data". This was cleary very sensitive data they new they needed to protect.
Either the Air Force is failling at being the very thing it's been created to be (which I doubt) or something is fishy (ocaml razor).
Yeah, forgive me if I'm too cynical, but Hanlon's razor fails when this sort of thing involves criminal investigations of government employees.
Edit: it also inspired a variation on an old joke about how the same word means different things to different people. For example, suppose the order goes out to "secure the data center."
Marines report back, "We have destroyed the data center."
Army reports, "We have killed everyone in the data center and are holding the position."
Navy: "We locked the doors when we left for the day."
Air Force: "We signed a three-year contract with an outsourcing company, with an option to extend at the same price for ten more years."
Which of course means that just because I don't think Hanlon's razor applies doesn't mean I think they're competent.
It applies at all times. Given that it's very easy to disguise your behavior as incompetence, it doesn't make sense to give the benefit of the doubt to people who do something that gives them a huge payout. Trying to forgive the incompetent people while punishing the malicious ones just brings you a huge epidemic of mysteriously incompetent people.
Imagine what would happen in the industry if every time someone screwed up like that they got fired. Just like that, without any option given to explain what happened, how they tried everything they could from stopping it and so on.
Ah, I know it's the way it is in some organisations, sure. But is it a good way to handle people screwing up? People always screw up. If you start firing them for screwing up you end up with a pool of inexperienced workers who keep screwing up in the same way. If you keep them around, they learn and never screw up in the same way again.
there is screwing up and screwing up. one - you do all you could, act proactively, educate yourself, but forces beyond your control prevent you from reaching optimal state and massive clusterfk happens. I would keep this one.
then you have ignorants, that in 2016 will not even think that mission-critical system should have SOME backup, off-site. people that will put amazing amount of energy into little political games, backstabbing, or just plain old incompetent lazy ones. those should become exemplary cases.
investigation should sort out which is which, but most will fall on either side of spectrum. usually it's really not that hard.
"Neither the Air Force nor Lockheed Martin, the defense firm that runs the database, could say why it became corrupted or whether they’ll be able to recover the information."
Incumbent trusts local IT manager and local IT manager has contract with external company.
The question becomes: "Did the contract with Lockheed Martin cover any form of regular backup and additionally some form of long term open format archiving?"
Perhaps some checklists or standards that all long term record keeping systems need to comply with would be good.
It's always an attractive idea to eliminate incompetence through threats. It's been tried lots of times throughout history, and has never succeeded (in fact, it usually makes things worse).
I agree. These days I don't believe it when I hear something so opportune... oh, surprise surprise, we lost all our data on criminal investigations. How convenient for some people!
> A really easy way to fix this is to make a law so that if government data such as this gets lost, then the cabinet level person responsible for overseeing the agency immediately loses their cabinet level position and is barred from further work with the federal government.
I see a fantastic way to completely abuse this. Need to get someone who disagrees with you kicked out? Hire some corporate espionage people to take care of that work for you!
Losing their position is meaningless - they'll just go private and make even more money while being recklessly incompetent. They'd just hire a clean agent as their intermediary for government contracts.
No, I'd slap prison sentences on everyone involved, not just loss of government employment.
Making mistakes is an important part of learning and becoming experienced. I'm sure we've all done a bad "rm -f" from time and time, and each mistakes makes you more cautious in the future.
All that would lead to is make any process change in the government impossible, domination of defensive document workflow a.k.a. CYA, and thorough lack of leadership.
Sigh. When I was in university, I worked at a computer store one summer that provided IT services to local offices. One of the clients was the Mexican consulate. We were hired to do various things, including installing a tape backup system.
I was installing some service packs at about 5pm. Windows NT4. 5pm, everyone's gone home, safe, right? Server asks if I want to restart, I say yes. Then a guy walks into the server room, asks if I had restarted the server. I say yes. Whoops, OK, I guess someone was working.
Next day, the database was dead. Seemingly because I had restarted the server while someone was using it. It was a custom database job, and the contractor who had made it was on vacation in some other country. Here's the kicker. Usually tape backups have rolling tapes, right? But we hadn't started rolling tapes yet. So the good backup from 2 days prior was overwritten the night before with the crashed database. Ugh. And I was due to go back to school in 2 weeks in another city. Ouch. I still feel bad about that to this day. No idea how my boss ended up.
Never ever let someone who doesn't know what they're doing mess around with your mission critical systems where the consequences are not recoverable. Always guide them. Although in my case, it seems I may have known more than my boss at the time, so maybe wasn't possible for me....
The joys of IT. Back in the day, to wipe years of records you had to burn or shred tons of paper in multiple locations. Now you just crash one little system... Despite 20 years of Microsoft-induced crashes, clear best practices and yadda yadda, most organizations will still have a single backup system for any given data repository -- if they have one at all.
And the answer when shit happens is "outsource! Go to the cloud!" -- so that local managers won't be responsible when the snafu happens and data is lost (if it happens to Google, it can and will happen to anyone).
The only data loss to Google that I was able to find was [0], where a single datacenter was hit by lightning 4 times in short succession, and 0.000001% of disk space in said datacenter was lost. And only because the lost data consisted of recent writes that had yet to be backed up. And only because the lightning affected the AUX power systems in a way that they have since committed to fixing (or perhaps have fixed).
IMO, cloud storage is the most reliable way to store data, and Google the most reliable. I never know if I'm going to drop my laptop and lose all work. Or if the harddrive in my server is going to give up and burst into flames. But the great thing about cloud storage is that it abstracts away hardware components and failures so it's somebody else's problem. And they worry a lot more about data loss than I do. Cloud data storage is designed with the assumption that hardware is faulty. Spread redundant copies of data to geographically isolated datacenters, and you're much safer than some admin of averagejoe.com keeping his data on a single harddrive in his closet.
The problem is not just DR, but anything that can result in data loss. There was the one where they'd delete stuff you didn't mean to [1], the one where they got stuff back by the scruff of the neck [2]... i can't be arsed to look further back, but I'm pretty sure shit happened earlier as well (and I'm just looking at GMail). And of course there are tons of horror stories about accounts disabled or deleted for commercial or other unspecified reasons, because once you put your trust in another entity, that's it, they can do as they please. Their track record might be as good as anyone's, but it still has a few holes and people should prepare for it rather than considering "the cloud" as a magic bucket.
(disclaimer: my life would probably be ruined by losing the gmail account I've held since private-beta days, because life is short and I'm shit at sysadmin; but there's a big difference between a lone old geek and large organizations...)
Thank goodness storage is cheap. I've had a revolving door of credit/debit card numbers over the last few years thanks to identity theft, and the SaaS providers I use for personal stuff are really good about keeping my data around even after invalid credit-card auto-pay cycles fail.
For more serious data, I have to imagine that SLAs generally define data retention periods even after exiting the agreement, right? (I legitimately don't know)
This is also a bit "above my pay grade", but i've heard bits and pieces about how our company has some kind of agreement setup that if there is a billing dispute that there is a minimum amount of time that must pass before they can shut us down.
Obviously it doesn't help if the company goes under, but it's not as simple as the parent commenters statement.
It depends on the vendor. I negotiated a pretty big agreement a few years ago and were able to get the vendor to agree to hold data for a year after termination. We also backup to another provider.
That said, over the years I've witnessed all manner of billing and payment fuckups. I'm not trying to say that cloud is bad -- but that billing snafu is a risk factor that you need to understand and account for.
IMO, cloud storage is the most reliable way to store data, and Google the most reliable
Does Google publish any durability numbers? Amazon claims 99.999999999% (11 9's) for S3 and Glacier (and considerably worse for EBS volumes). How does Google compare?
Don't have numbers here, but it's pretty good, and we regularly move the data around and check it. (Ie it's not just rotting on a hard disk somewhere.)
> The joys of IT. Back in the day, to wipe years of records you had to burn or shred tons of paper in multiple locations. Now you just crash one little system. [...] most organizations will still have a single backup system for any given data repository -- if they have one at all.
No, back in the day, you could still wipe years of records because most people only used a single filing cabinet with no duplication or off-site copies. "Destroyed in a fire" was the canonical way to destroy all records, easily. This story has nothing to do with the introduction of IT. Incompetence is still incompetence.
Bernard: Shall I file it?
Hacker: Shall you file it? Shred it!
Bernard: Shred it?
Hacker: No one must ever be able to find it again!
Bernard: In that case, Minister, I think it's best I file it.
Back in the day, in 2300 BC, records caught in fires were preserved forever, while records not so lucky eventually returned to the earth. (Barring the odd formal peace treaty inscribed in metal.)
We know a lot more about conquered Mesopotamian kingdoms than we do about successful ones, because step one after taking over a city was apparently burning it down, or at least burning down the palace.
I recall an episode of the X-Files where they talked about a 'basement fire', this reminds me of that. When you hear about a high-level government department "losing" files, it's usually because they've had an office party around the paper shredder.
"frantically destroying documents" I suppose would be another way to frame it... But I seem to remember that was the wording on the show, or pretty close to it.
It made it sound like the staff were running around and being very energetic in their operation of the paper shredder.
This is a Lockheed snafu, so yes, it is amateur hour.
Much of the data is backed up, and more of it will be retrievable from original sources. Maybe not all of it; I'll be very interested to see what gaps remain when they declare the issue resolved.
I would withhold judgement on Lockheed. It wouldn't surprise me if someone there raised the lack of a backup strategy as an problem with the govt. ... only to be ignored or told there wasn't enough budget for it.
It's never phrased that boldly. People aren't "fired", their contracts are not renewed, or they are declared redundant because of the new kid with an advanced degree... usually just after they've finished training the new kid.
The most common move is to cut the highest paid section of the workforce after they acquire the service contract from a competitor. All major defense firms do this, which leads to problems maintaining institutional continuity.
That's a pretty stupid reason not to have backups. If anything, a lack of backups makes me feel like any safeguards for protecting PII would be sub par.
James Hacker: How am I going to explain the missing documents to "The Mail"?
Sir Humphrey Appleby: Well, this is what we normally do in circumstances like these.
James Hacker: [reads memo] This file contains the complete set of papers, except for a number of secret documents, a few others which are part of still active files, some correspondence lost in the floods of 1967...
James Hacker: Was 1967 a particularly bad winter?
Sir Humphrey Appleby: No, a marvellous winter. We lost no end of embarrassing files.
James Hacker: [reads] Some records which went astray in the move to London and others when the War Office was incorporated in the Ministry of Defence, and the normal withdrawal of papers whose publication could give grounds for an action for libel or breach of confidence or cause embarrassment to friendly governments.
James Hacker: That's pretty comprehensive. How many does that normally leave for them to look at?
James Hacker: How many does it actually leave? About a hundred?... Fifty?... Ten?... Five?... Four?... Three?... Two?... One?... Zero?
18F is severely outgunned. They need your help. I can't say that enough. The entire Western US is on fire, there's a drought, they have one fire hose and their next boss may turn it off.
>Data about current investigations has also been lost, which is delaying them.
[...]
>"We've opened an investigation to try to find out what’s going on, but right now, we just don’t know," Stefanek said.
I wonder where they're putting the files for that investigation.
I wonder if their backups were corrupted too, or if they just overwrite the old ones with newer ones as a cost-saving measure. Or whether no one ever tried restoring from them and for whatever reason they never could have been restored from.
Since people rarely prioritize it until it's too late, maybe 'restore from backup' drills should be a common thing.
May the additional security requirement deter sensible backup procedure? Like you can't have copies of backup somewhere? Or backups have to be encrypted and the keys are in a USB drive somewhere, which had gone missing/bad and the encrypted data can't be recovered?
Good Grief. For a lot less than an F-35 program, they could install a TSM rig, with off sight backup.
Ours (at a state university) deals with petabytes of data. We've had to escalate to IBM's Tier 3 support a couple of times over the past decade, but we have never lost a file.
Do please keep in mind that on September 10th, 2001, the secretary of defense reported to Congress that the department could not account for 2.3 trillion dollars of spending.
Yeah this is believable.
Where I worked, in the secured network there were maybe 15 workstations with a maximum of maybe 5 people working at any one time, unless there was a "TOUR" going on.
We had 2 contractor and 1 government IT person supporting this network.
I asked the one of the contractor ITs when was the last backup. Response: "Do you want me to do a backup of your workstation tonight?"
WTF!!! I am no IT slob but the minimum is Mon-Thur backup of new and changed files that day, and Friday night was full network backup.
Why wasn't this done?!!!! The tape backup makes too much noise. I guess the bitch couldn't read her endless Romance Novels because of the noise.
I find an intern, not a full employee, working on a design file in the account of another engineer. WTF!!! We believe that since the intern was leaving soon, it would appear that the design was the work of the engineer.
Security violation!!!!! The contractor IT people swore they did not know this was going on...... BS!!! With so few on the network at any one time, and in the mornings only the intern was on the network, these contractor IT people were unaware!!!!
The government IT person knew, but she did not want to report it because she was well aware of the contractor's history of retaliation.
BTW..... during the investigation, the government engineer asked where was it written that they could not share passwords!!!!! Can you hear Snowden laughing at this.
All swept under the lumpy rug....... the violations would make the organization look bad.
As for me..... I got the IR treatment.... that's Isolation and Retaliation... then after 2 and a half days, supervisor sent me an email as to my where abouts.... my reply: Retired as of 4:00 PM PST, 3 days previous.
A really easy way to fix this is to make a law so that if government data such as this gets lost, then the cabinet level person responsible for overseeing the agency immediately loses their cabinet level position and is barred from further work with the federal government.
This would do wonders to align incentives all through the federal agencies when it comes to data safeguards.