The system crashed while my coworker was running a code (aka doing CPR) in the ER last night. Healthcare IT is so bad at baseline that we are somewhat prepared for an outage while resuscitating a critical patient.
The second largest hospital group in Nashville experienced a ransomware attack about two months ago. Nurses told me they were using manual processes for three weeks.
It takes a certain type of a criminal a55hole to attack hospitals and blackmail them. I would easily support life or death penalty for anyone attempting this cr@p.
Yes. And I was told by multiple nurses at St. Thomas Midtown that the hospital did not have manual procedures already in place. In their press release they refer to their hospitals as "ministries" [0], so apparently they practice faith-based cyber security (as in "we believe that we don't need backups") since it took over 3 weeks to recover.
As a paramedic, there is very little about running a code that requires IT. You have the crash cart, so not even stuck trying to get meds out of the Pyxis. The biggest challenge is charting / scribing the encounter.
I used to work in healthcare IT. Running a code is not always only CPR.
Different medications may be pushed (injected into the patient) to help stabilize them. These medications are recorded via a bar code and added to the patients chart in Epic. Epic is the source of truth for the current state of the patient. So if that is suddenly unavailable that is a big problem.
Okay,not having historical data avaliable to make decision on what to put into a patient is understandable - but maybe also print critical stuff per patient once a day? - but not being able to log an action in realtime should not be a critical problem.
It is a critical problem if your entire record of life-saving drugs you've given them in the past 24 hours suddenly goes down. You have to start relying on people's memories, and it's made worse by shift turn-overs so the relevant information may not even be reachable once the previous shift has gone home.
There are plenty of drugs that can only be given in certain quantities over a certain period of time, and if you go beyond that, it makes the patient worse not better. Similarly there are plenty of bad drug interactions where whether you take a given course of action now is directly dependent on which drugs that patient has already been given. And of course you need to monitor the patient's progress over time to know if the treatments have been working and how to adjust them, so if you suddenly lose the record of all dosages given and all records of their vital signs, you've lost all the information you need to treat them well. Imagine being dropped off in the middle of nowhere, randomly, without a GPS.
That's why there's a sharpie in the first aid kit. If you're out of stuff to write on you can just write on the patient.
More seriously, we need better purpose build medical computing equipment, that runs on it's own OS, and only has outbound network connectivity for updating other systems.
I also think of things like the old school "check list boards" that used to be literally built into the yolk of the airplane they were made for.
I’m afraid the profitability calculation shifted it in favor of off-the-shelf OS a long time ago. I agree with you, though, that a general purpose OS has way too much crap that isn’t needed in a situation like this.
> It is a critical problem if your entire record of life-saving drugs you've given them in the past 24 hours suddenly goes down.
Will outages like this motivate a backup paper process? The automated process should save enough information on paper so a switch over to paper process at any time is feasible. Similar to elections.
Maybe if all the profit seeking entities were removed from healthcare that money could instead go to the development of useful offline systems.
Maybe a handheld device for scanning in drugs or entering procedure information that stores the data locally which can then be synced with a larger device with more storage somewhere that is also 100% local and immutable which then can sync to online systems if that is needed.
A good system is resilient. Paper process could take over when system is down. Form my understanding healthcare systems undergo recurrent outages for various reasons.
Many place did revert back to paper processes. But, it’s a disaster model that has to tested to make sure everyone can still function when your EMR goes down. Situations like this just reinforce that you can’t plan for if IT systems go down, it is when they go down.
My experience with internet outages affecting retail is the ability to rapidly and accurately calculate bill totals and change is not practiced much anymore. Not helped by things like 9.075 % tax rates to be sure.
Real paper is probably as much about breaking from the "IT culture" as it's about the physical properties. E-ink display would probably help with power outage, but happily display BSOD in an incident like this.
Honestly if you were designing a system to be resilient to events like this one, the focus would be on distributed data and local communication. The exact sort of things that have become basically dirty words in this SaaS future we are in. Every PC in the building, including the ones tethered to equipment, is presently basically a dumb terminal, dependent on cloud servers like Epic, meaning WAN connection is a single point of failure (I assume that a hospital hopefully has a credible backup ISP though?) and same for the Epic servers.
If medical data were synced to the cloud but also stored on the endpoint devices and local servers, you’d have more redundancy. Obviously much more complexity to it but that’s what it would take. Epic as single source of truth means everyone is screwed when it is down. This is the trade off that’s been made.
> synced to the cloud but also stored on the endpoint devices and local servers
That's a recipe for a different kind of disaster. I actually used Google Keep some years ago for medical data at home — counted pills nightly, so mom could either ask me or check on her phone if she forgot to take one. Most of the time it worked fine, but the failure modes were fascinating. When it suddenly showed data from half a year ago, I gave up and switched to paper.
I don't think it is historical data required to make a decision, it is required to store the action for historical purposes in the future. This is ultimately to bill you and to track that a doctor isn't stealing medication, improperly treating the patient, and to track it for legal purposes.
Some hospitals require you to input this in order to even get physical access to the medications.
Although a crash cart would normally have common things necessary to save someone in an emergency, so I would think that if someone was truly dying they could get them what they needed. But of course there are going to be exceptions and a system being down will only make the process harder.
Of course the real backup plan should be designed based on the actual needs, perhaps the whole system needs an "offline mode" switch. I assume they already run things locally, in case the big cable seeker machine arrives in the neighborhood.
Most printers in these facilities run standalone on an embedded Linux variant.They actually can host whole folders of.data for reproduction "offline". Actually all scan/print/fax multi function machines can generally do that these days. If the IT onsite is good though the usb ports an storage on devices should be locked down.
Oh yes. This would be a contingency measure, just to keep the record in a human readable form while requiring little manual labor. Printed codes could be scanned later into Epic and, if you need to transfer the patient, tear the paper and send it with them.
It is not necessarily crowdstrike's responsibility, but it should be someone's.
If I go to Home Depot to buy rope for belaying at my rock climbing center and someone falls, breaks the rope and dies, then I am on the hook for manslaughter.
Not the rope manufacturer, who clearly labeled the packaging with "do not use in situations where safety can be endangered". Not the retailer, who left it in the packaging with the warning, and made no claim that it was suitable for a climbing safety line. But me, who used a product in a situation where it was unsuitable.
If I instead go to Sterling Rope and the same thing happens, fault is much more complicated, but if someone there was sufficiently negligent they could be liable for manslaughter.
In practice, to convict of manslaughter, you would need to show an individual was negligant. However, our entire industry is bad at our job, so no individual involved failed to perform their duties to a "reasonable" standard.
Software engineering is going to follow the path that all other disciplines of meatspace engineering did. We are going to kill a lot of people; and every so often, enough people will die that we add some basic rules for safety critical software, until eventually, this type of failure occuring without gross negligence becomes nearly unthinkable.
Its on whoever runs the hospitals computer systems - allowing a ring 0 kernel driver to update ad-hoc from the internet is just sheer negligence.
Then again, the management that put this in are probably also the same idiots that insist on a 7 day lead time CAB process to update a typo on a brochure ware website "because risk".
This patient is dead. They would not have been if the computer system was up. It was down because of CrowdStrike. CrowdStrike had a duty of care to ensure they didn't fuck over their client's systems.
I'm not even beyond two degrees of seperation here. I don't think a court'll have trouble navigating it.
If that really were how it worked, I don’t think that software would really exist at all. Open Source would probably be the first to disappear too — who would contribute to, say, Linux, if you could go to jail for a pull request you made because it turns out they were using it in a life or death situation and your code had a bug in it. That checks all the same boxes that your scenario does: someone is dead, they wouldn’t be if you didn’t have a bug in your code.
Now, a tort is less of a stretch than a crime, but thank goodness I’m not a lawyer so I don’t have to figure out what circumstances apply and how much liability the TOS and EULAs are able to wash away.
When I read something like this that has such a confident tone while being incredibly incorrect all I can do is shake my head and try to remember I was young once and thought I knew it all as well.
I don't think you understand the scale of this problem. Computers were not up to print from. Our Epic cluster was down for placing and receiving orders. Our lab was down and unable to process bloodwork - should we bring out the mortar and pestle and start doing medicine the old fashioned way? Should we be charged with "criminal negligence" for not having a jar of leeches on hand for when all else fails?
I was advocating for a paper fall back. That means that WHILE the computers are running, you must create a paper record, eg “medication x administered at time y”, etc., hence the receipt printers, which are cheap and low-dependency.
The grandparent indicated that the problem was that when all tow computers went down, they couldn’t look up what had already been done for the patient. I suggested a simple solution for that - receipt printers.
After the computers fail you tape the receipt to the wall and fall pack to pen and paper until the computers come back up.
I completely understand the scale of the outage today. I am saying that it was a stupid decision and possibly criminally negligent to make a life critical process dependent on the availability of a distributed IT application not specifically designed for life critical availability. I strongly stand by that POV.
> I suggested a simple solution for that - receipt printers.
Just so I understand what you are saying you are proposing that we drown our hospital rooms in paper receipt constantly. In the off chance the computers go down very rarely?
Do you see any possible drawbacks with your proposed solution?
> possibly criminally negligent to make a life critical process dependent on the availability of a distributed IT application
What process is not “life critical” in a hospital? Do you suggest that we don’t use IT at all?
Modern medicine requires computers. You literally cannot provide medical care in a critical care setting with the sophistication and speed required for modern critical care without electronic medical records. Fall back to paper? Ok, but you fall back to 1960s medicine, too.
Why would you ever need to move a patient from one hospital room containing one set of airgapped computers into another, containing another set of airgapped computers?
Why would you ever need to get information about a patient (a chart, a prescription, a scan, a bill, an X-Ray) to a person who is not physically present in the same room (or in the same building) as the patient?
Local area networks air gapped from the internet don't need to be air gapped from each other. You could have nodes in each network responsible for transmitting specific data to the other networks.. like, all the healthcare data you need. All other traffic, including windows updates? Blocked. Using IP still a risk? Use something else. As long as you can get bytes across a wire, you can still share data over long distances.
In my eyes, there is a technical solution therr that keeps friction low for hospital staff: network stuff, on an internet, but not The Internet...
Edit: I've since been reading the other many many comment threads on this HN post which show the reasons why so much stuff in healthcare is connected to each other via good old internet, and I can see there's way more nuance and technicality I am not privy to which makes "just connect LANs together!" less useful. I wasn't appreciating just how much of medicine is telemedicine.
I think wiring computers within the hospital over LAN, and adding a human to the loop for inter-hospital communication seems like a reasonable compromise.
Yes there will be some pain, but the alternative is what we have right now.
> nobody wants to do it.
Tough luck. There's lots of things I don't want to do.
A hospital my wife worked at over a decade ago didn't use EMR's, it was all on paper. Each patient had a binder. Per stay. And for many of them it rolled into another binder. (This was neuro-ICU so generally lengthy patient stays with lots of activity, but not super-unusual or Dr House stuff, every major city in America will have 2-3 different hospitals with that level of care.)
But they switched over to EMR because the advantages of Pyxis[1] in getting the right medications to the right patients at the right time- and documenting all of that- are so large that for patient safety reasons alone it wins out over paper. You can fall back to paper, it's just a giant pain in the ass to do it, and then you have to do the data entry to get it all back into EMR's. Like my wife, who was working last night when everyone else in her department got Crowdstrike'd, she created a document to track what she did so it could be transferred into EMR's once everything comes back up. And the document was over 70 pages long! Just for one employee for one shift.
1: Workflow: Doctor writes prescription in EMR. Pharmacist reviews charts in EMR, approves prescription. Nurse comes to Pyxis cabinet and scans patient barcode. Correct drawer opens in cabinet so the proper medication- and only the proper medication- is immediately available to nurse (technicians restock cabinet when necessary). Nurse takes medication to patient's room, scans patient barcode and medication barcode, administers drug. This system has dramatically lowered the rates of wrong-drug administration, because the computers are watching over things and catch humans getting confused on whether this medication is supposed to go to room 12 or room 21 in hour 11 of their shift. It is a great thing that has made hospitals safer. But it requires a huge amount of computers and networks to support.
Why would a Pyxis cabinet run Windows? I realize Windows isn't even necessarily at fault here, but why on earth would such a device run Windows? Is the 90s form of mass incompetence in the industry still a thing where lots of stuff is written for Windows for no reason?
I don't know what Pyxis runs on, my wife is the pharmacist and she doesn't recognize UI package differences with the same practiced eye that I do. And she didn't mention problems with the Pyxis. Just problems with some of their servers and lots of end user machines. So I don't know that they do.
For relying on windows to run this kind of stuff and not doing any kind of staged rollout but just blindly applying untested kernel driver 3rd party patching fleet wide? yeah honestly. We had safer rollouts for cat videos than y'all seem to have for life critical systems. Maybe some criminal liability would make y'all care about reliability a bit more.
Staged rollout in the traditional sense wouldn't have helped here because the skanky kernel driver worked under all test conditions. It just didn't work when ot got fed bad data. This could have been mitigated by staging the data propagation, or by fully testing the driver with bad data (unlikely to ever have been done by any commercial organization). Perhaps some static analysis tool could have found the potential to crash (or the isomorphic "safe language" that doesn't yet exist for NT kernel drivers).
A QR code can store 3 KB of data. Every patient has a small QR Sticker printer on their bed. Whenever EPIC updates, print a new small QR sticker. Patient being moved tear of sticker and stick to their wrist tag.
This much of patients state will be carried on their wrist. Maybe for complex cases you need two stickers. Have to be judicious in encoding data, maybe just last 48 hours.
Handheld qr readers, off line that read and display QR data strings.
You need to document everything during a code arrest. All interventions, vitals and other pertinent information must be logged for various reasons. Paper and pen work but they are very difficult to audit and/or keep track of. Electronic reporting is the standard and deviating from the standard is generally a recipe for a myriad of problems.
We chart all codes on paper first and then transfer to computer when it's done. There's a nurse whose entire job is to stay in one place and document times while the rest of us work. You don't make the documenter do anything else because it's a lot of work.
And that's in the OR, where vitals are automatically captured. There just aren't enough computers to do real-time electronic documentation, and even if there were there wouldn't be enough space.
I chart codes on my EPCR, in the PT's house, almost everyday with one hand. Not joking about the one hand either.
Its easier, faster, and more accurate than writing in my experience. We have a page solely dedicated to codes and the most common interventions. Got IO? I press a button and its documented with timestamp. Pushing EPI, button press with timestamp. Dropping an I-Gel or Intubating, button press... you get the idea.
The details of the interventions can be documented later along with the narrative, but the bulk of the work was captured real-time. We can also sync with our monitors and show depth of compressions, rate of compressions and rhythms associated with the continuous chest compression style CPR we do for my agency.
Going back to paper for codes would be ludicrous for my department. The data would be shit for a start. Hand writing is often shit and made worse under the stress of screaming bystanders. Depending on whether we achieved ROSC or not would increase the likelihood of losing paper in the shuffle
The idea is to have the current system create a backup paper trail from which you practice resuming from for when computers go down. Nothing about current process for you need change only that you be familiar with falling back to paper backups when computers are down.
Which means that you have to be operating papered before the system goes down. If you aren't, the system never gets to transition because it just got CrowdStruck.
Correct. We use paper receipts for shopping and paper ballots for voting. Automation is fast and efficient, but there must be a manual fallback when power fails or automation is unreliable.
This wisdom is echoed in some religious practices that avoid complete reliance on modern technology.
You can do CPR without a computer system, but changing systems in the middle of resuscitation where a delay of seconds can mean the difference between survival and death is absolutely not ideal. CPR in the hospital is a coordinated team response and if one person can’t do their job without a computer then the whole thing breaks down.
If you're so close to death that you're depending on a few seconds give or take, you're in God's hands. I would not blame or credit anyone or any system for the outcome, either way.
Judgement is always part of the process, but yeah running a routine code is pretty easy to train for. It's one of the easiest procedures in medicine. There are a small number of things that can go wrong that cause quick death, and for each a small number of ways to fix them. You can learn all that in a 150 hour EMT class.
Hello, I'm a journalist looking to reach people impacted by the outage and wondering if you could kindly connect with your ER colleague. My email is sarah.needleman@wsj.com. Thanks!
I mean if they're finding sources through the comment and then corroborating their stories via actual interviews, it's completely fine practice. As long as what's printed is corroborated and cross-referenced I don't see a problem.
If they go and publish "According to hackernews user davycro ..." _then_ there's a problem.