Hacker News new | past | comments | ask | show | jobs | submit login
Retrieving 1TB of data from a faulty drive with the help of woodworking tools (jgc.org)
535 points by jgrahamc on Aug 17, 2023 | hide | past | favorite | 186 comments



I did this many years ago when I was a bench tech at a PC shop. It was an old Connor 40 meg drive running DOS. If I squeezed it just right, it would work. Too much? Failure, Abort/retry/ignore. Not enough? Same. So I stood there, squeezing the drive, and hitting ‘r’ for retry, over and over. It worked!

Boss walked by. “What the heck are you doing?”

“I’m milking the data out of this drive!”


I had this happen once but the force that would cause the drive to work seemed to be gyroscopic rather that pressure on the casing. So I had to hold the laptop at an angle and turn myself around on the spot while I copied the data off. If I stopped spinning around the drive would grind/crash and the OS would hang.


Using a centrifuge to separate valuable data from low value redundant components is inspired!


The ones sink because they’re heavier.


You didn’t get sick ?


I only had to do it for several minutes, long enough to have it boot up successfully and copy off the most important data over wifi. I do remember getting a bit dizzy during the procedure. The laptop was a pretty light PowerBook G4 from 2003.

My theory (based on the sound the drive had been making, which prompted me to try imposing different kinds of sideways acceleration on the laptop until I could hear the drive spinning ”freely”) is that it wasn’t a problem with the disks/heads but a messed up bearing in the spindle motor.


> old Connor 40 meg drive

I would like to call out to the younger readers that this is not a typo. Think about how minuscule the capacity of this hard drive is, and also that it's entirely possible it was a full height 5.25" drive, which is 3.25" tall!

Here is a fun tour of old drive technology that goes into just how amazingly simple some of it is: https://www.youtube.com/watch?v=8LbFKV_pPAE


Yep, I had a Seagate 40MB IDE hard drive, which I considered quite compact at 3.5” full height. It replaced 20MB full-height 5.25” MFM drives.

The MFM drives were… fun… in that if you connected one of the two data cables upside down, you’d crash the armature and destroy the hard drive.


I worked on this drive. It's a 3.5 drive. A long time ago. Time for my daily nap and Centrum shot.


Wait in line, and give me back those 8" floppies that you borrowed.

Seriously though: the amount of miniaturization in storage is something I'll never get used to. From punch cards and papertape via cassette to floppies, harddrives (in various incarnations and densities) and now to solid state so compact that you could store all of the data that I've created in my whole life in something about the size of your smallest fingernail. Incredible.


>> old Connor 40 meg drive >I would like to call out to the younger readers that this is not a typo.

Actually there is a typo there, it's Conner. But yeah, 40MB was even a luxurious capacity, my first HDD was 20 (later reformatted to 30 with an RLL controller)


Cheater! I used pirated ROMs to make my USR HST a dual standard.


I ran a BBS with MFM & RLL drives. The table shook when I formatted a hard drive.


A:>debug

-g=c800:5


Ah, you young'uns with your 40MB drives. My first was a full-height 10MB drive on my Sirius 1. That thing was life-changing.

Someone gave me a 30GB drive the other day. I still haven't figured a use for it?


Yep I owned a 40 meg drive. I remember when very large applications appeared (50 megs) wondering how it was even at all possible, and whether it made sense.

It also meant lots of highly unreliable floppies !


I did the same sort of thing with a broken iPod hard drive. If it was squeezed, it would work; the fix was to put a folded business card in-between the exterior back plate and the drive. That would squeeze it enough that it'd work. Never heated up enough to start a fire, either, so it was safe!


Oh man I remember doing that to eke out a little more life from several mp3 players, one or two Creatives and an Sansa e200. There was also one with a sliding door over an SD card with 32MB that I had to pop open and try to fix. I think that was around 2003? I'm pretty sure I upgraded to a 64MB card at some point.

Shims were plastic cards, subway tickets, and occasionally toothpicks as an adjustable clamp while I was trying to figure out where the failure was.


I've had to do that to make batteries stay connected in various devices; I've also made shims out of folded aluminum foil when battery contacts are just too far apart; and used a penny (with the edges coated) to shim a light socket where the end of the fixture made it impossible to screw the proper bulb in all the way. Good times :D


Same. In 1990 was a field service engineer at a PC shop. We used to use circuit cooler or "cold spray" as we used to call it.

We had three levels of hard drive data recovery that we did in order. Usually didn't have to get past #2.

1. Put it into another computer

2. Cold spray the heck out of it and see if it works for an hour

3. Remove the circuit board of the drive and replace it with the circuit board from a matching known working drive

If it didn't work after that and they 100% needed the data we sent it out to Ontrack. They'd put it in their clean room, remove the platters and read the data directly.


That last trick no longer works. Too much trickery with NV storage on the board. The last time I did this successfully is about a decade ago.


Reminds me of one I had where it would start clicking and file access froze, so I would turn it side ways and bang it against the table, it seemed to unstick it.


Reminds me of when I fixed my computer with a chopstick back when I was a poor college student.

I didn't know anything about cold solder joints at that time, but I did discover that my flaky motherboard would start working when I flexed it a certain way. So I wedged a bamboo chopstick between the motherboard and the case to keep it in a little bit of flexion. Lasted the rest of the year.


Not quite the same thing, but there was a time when 30 pin SIMM sockets had terrible plastic clips on either end. Being something of a gorilla in my youth, I’d break them off from to time. Fortunately the one in my daily driver had another module behind it, so in went a pencil eraser to force the simm in the broken socket to stay in contact.

Was so relieved as newer socket designs came out.


This doesn't involve woodworking tools but is the weirdest fix I've ever done. I had a Surface Pro 3 that was working fine and then suddenly died and would not power on. I couldn't figure out anything to fix it.

Eventually I read a random post on Reddit[0] how a guy tried putting it his SP3 in the freezer (in a sealed freezer bag) and it eventually came back to life.

I skeptically tried it. I thought freezing was more likely to destroy something else on it, but what did I have to lose? It was already dead.

I put it in a sealed freezer bag with as much air removed as possible, and then put it in the freezer for a couple of hours. I took it out, plugged power in, and I was able to turn it on!

The first time I did it, it only worked for a few days. I tried freezing it again, and that worked. It still works to this day, many years later. My theory is that it seemed like a power problem, and that freezing the battery put the chemistry through some kind of cycle that repaired it.

I did keep the Surface wrapped in layers of towel to warm it back up slowly. Mostly to help prevent any moisture from building up somewhere before it got to temperature.

[0] https://www.reddit.com/r/Surface/comments/5tficj/how_i_reviv...


I remember back in the day when I had one of the original Gameboys that needed 4AA batteries, I would put the batteries in the freezer for a while when they died and I could get few extra minutes out the batteries. Maybe your battery theory is right.


> freezing the battery put the chemistry through some kind of cycle that repaired it

Do you think that replacing the battery would have fixed it?


Perhaps. But freezing it twice also seemed to fix it. I used it as my primary machine for years after this fix.


"But, of course, the SSD gets quite hot during operation so I used one of the heatsinks from the PC and another made from part of carpentry square and some thermal adhesive tape to keep things cool."

Instead of this i would suggest a fan. Even small ones create much more airflow than needed when please next to a device.


A block of metal is great heatsink, and a long ruler like that is both thermally conductive and has decent surface area to radiate heat away. More importantly, a fan is not a carpentry tool. Now, one could use a drill, shop vac, or circular saw to generate wind, but they'd be much less energy efficient than the passive radiator.


For long term applications I much prefer passive cooling (AKA a hunk of metal.) I have one Raspberry Pi 4B in a case with a small heat sink on the processor and a small fan driven from the GPIO pins. It keeps the processor within limits even when overclocked and heavily loaded. Some day the fan will stop working and the processor will overheat (under heavy load) and throttle to prevent self destruction.

I have two Compute Modules (CM4s) and found a passive heat sink that is finned aluminum that covers the entire module. There is no fan to fail. I can overclock the CM4 to 2 GHz and load it for stress testing and it remains well within limits. One of these is equipped with an NVME SSD and I was surprised to see it reach excessive temperatures under heavy loads despite the PCIe x1 connection on the Pi. I got a "hunk of copper" NVME cooler, cut it to fit a 2230 NVME and it keeps the NVME SSD within reasonable limits.


Doesn't that really depend on the metal? E.g. copper is a great thermal conductor while stainless steel a really poor one.


Combination square blades are normally plain carbon steel, not stainless.

Actually, almost nothing in a wood shop is stainless because it sharpens poorly[0] compared to O1, A2, or any of the other tool steels you'll find in a wood shop.

[0] Yes, I realize that stainless and carbon steel are two whole classes of alloys, all with different characteristics. Based on my experiences with kitchen knives, I'll stand by my claim that at least the common ones sharpen poorly compared to the common carbon steel alloys.


Of course it depends on the metal! But for a hackjob like this, I'm betting it was run without any thermal management first, and then with the ruler as the first thing at hand... that the effort succeeded at that stage suggests that the alloy is sufficient.


I didn't understand the blog post as a challenge to only use carpentry tools but think he used what was available in his shop. (Most carpenter wouldn't have thermal adhesive tape or a thermal camera either ;-)


Circular saws sure do produce a surprising amount of wind, but the thought of using one for cooling is horribly terrifying.


Thank you


[flagged]


I was going to repurpose some small old tower fans for some windows in "vent mode" where the window is open slightly but has a latch to lock in in place for security.


>Milk was used in administering nuke materials to orphans to measure how they affected thyroid/edocrin/blah -- and they kept this research....

Come again on that?

There was research in which nuclear material was introduced to orphans through milk who were monitored for radiologic exposure?

...I dare say, it wouldn't exactly surprise me, but if you've got a reference to the data/research project, inquisitive minds would like to know.


https://ahrp.org/1944-1956-radioactive-nutrition-experiments...

My grandfather, whilst from seattle was in Schenectady, New York at the time...

He didnt move to Saratogo California until 58 or so - and bought the house in 1959


Welp... That was a thing.

<Adds another datapoint on why information asymmetry lays at the heart of all evil>

And people give me shit about being overly concerned about even the possibility of ethical shortfalls...


>“a question arose as to whether chemicals in breakfast cereals interfered with the uptake of iron or calcium in children. An answer was needed. One obvious way to do the study, he said, would be to use radioactively tagged trace amounts of iron and calcium and to follow the fates of these minerals in the body when children ate cereals.” (Read Gina Kolata, The New York Times, Jan. 1, 1994)

Maybe you understand why my grandfather ate cornflakes and a banana as his only breakfast for 50 years... he was the control group.

Why do you think there has been an uptake in product marketing to push IRON and CALCIUM rich 'breakfast' diets on kids for "strong bones" (and by 'strong bones' I mean bone marrow blood production...) - you know.. the thing destroyed by radiation?

-

And I have some super sinister comments that I dont think I should share here...


i AM LITERALLY IN TEARS

I finally know what my grandfather was doing.

He was repenting for his role. My other grandfather worked on the Enola Gay...

FUCK - I have a nuclear curse on my family. I am serious - this is a wow to me, as this curse eluded me for all my life


...You should really check on a few things before getting too carried away. After all, nuclear material doesn't just end up in milk. You're implying an extra step of adulterating milk that was otherwise clean.

...Also, you're assuming that this adulteration continued throughout the course of your grandfather's life, and not just the duration of the (inarguably unethical if conducted as described) research.

Regardless though, none of that is your fault. That was him, you're you. Don't take on your shoulders culpability that has no business being there, and just focus on being better.

If you do find papers describing the research, however, there are worse ways to handle it than going to the press/Congress to possibly see if you can track down surviving descendants through application of copious amounts of FOIA.

Unlike you, the Government absolutely does still hold culpability for any such research it did. That'd be right up there with the Tuskeegee syphilis experiments there.

Vaya con dios.


Its with weird realization at 48 years old I am finally able to understand some of the things that my grandfather was silently telling me.

Its not my fault - but I didnt know until I was able to see ;; I have been a "conspiracy theroist" for many years thinking all was golden.

My grandfather, now I know - through his gaze - through his eyes "the window to the soul" - He wanted to tell me, but couldnt tell me until I could understand.

Now that my grandmother has passed at 100 - I now know.

I now know.

My fathers father built hanford.

My mothers father was a weapons person on the Enola Gay.

I now know the true extent of my nuclear curse.

I need some time to process.


Reflowing the controller worked for me twice for now. Once with an Intel Optane drive that worked without issues for 3 years and started overheating one day. And another one from a fanless machine that used an mSATA drive that chose to die on a Sunday, with no spare mSATA disk lying around. In both cases I went for short hunting under a microscope, looking for that one guilty shorted cap, yet finding none. It was the controllers with "tired" BGA solder balls, which could use some tender loving care of 225°C (soak) -> 400°C (peak) -> 325°C (hold) reflow.


That's quite aggressive. I typically reflow with peak 225C.

https://microchip.my.site.com/s/article/SAM-Cortex-M3-M4-M7-...


Dunno. This works for me without damaging the parts, pads, or traces. Your link is for PCBA reflow ovens. Talks about minutes long of exposure. I'm using a Atten ST-862D hot air station with questionable temperature and airflow accuracy. For an M.2 drive I'll probably soak 225 for 15 seconds 3 times the size of the chip areas in circles. Hit my memory-1 button to go to 400 and focus on the chip for 3 seconds. Hit memory-2 to go down to 300 and one up arrow for 325 for 5 more seconds while watching nearby caps' solder pads and ensuring they don't fly away. ¯\_(ツ)_/¯


Yes, those temps are absolutely sane for hot air. You didn't specify that.


The trick is staying below the chip's rated processing temps because many of these temps are far too high and could easily lead to semiconductor degradation, especially with the mechanical shock of thermal stresses from rapid or uneven heating and cooling. Hot air is far more uneven and uncontrolled than a BGA oven.


That sounds low, unless you are using one of the commercial brands notorious for displaying too-low temperatures (so that their iron/air/oven will be "the good one" that works when others don't).


According to that datasheet peak temp should be above 245C for at least 20s.


This is superb, something like the modern equivalent of the engineer's slap that one could use with old spinning drives that would no longer spin up.


I once had an old IBM-PC that the previous owner had upgraded with a 20MB hard drive. They said after a while it just stopped booting, which is why they gave it to me.

I found that if I manipulated the axle of the read-write-head arm where it came out through the bottom of the drive, it would "unstick" the head from the surface of the disk, and the thing would boot! I imagine there was some kind of lubricant in there that would congeal when the machine was off for a certain amount of time.

So I left the drive slot cover off, and "fingering" the drive would get it to start reliably for a number of years after that.


Huh this is really good to know, I have a drive where the head seems to "stick" to the drive (or something)... I wonder if that's a potential common solution, just jostle it around and/or lubricate it to free it up a bit??


It's called stiction. On old stepper-motor-based drives this would commonly happen if the heads weren't parked before powering off. I don't recommend directly manipulating the actuator as you can literally shear the heads off the actuator arm with too much force, but a bit of oblique shock to the side of the drive will usually unstick them with less risk of damage.


I got a cheap used original PS4 and it wouldn't power up unless you put pressure on the mainboard, right around the PSU.

I took pieces of wood, and a clamp and applied pressure there and it worked.

The same pressure is applied if you use washers on the heatsink clamp - I did that and it's been running fine for months.

The solder joints on the GPU get brittle under the stress of heat and compressing it restores the connection. Pressure or re-flowing is really just a band-aid, the real fix is to re-ball the GPU but that gets expensive.


Classic. This was a common failure mode for the 360 as well. Used to fix friends 360s that had the red ring of death by tearing them down, tightening the braces underneath the heat sink, and adding an additional case fan.

Never actually had to escalate to reflowing the BGA mount underneath the GPU but recall tutorials of how to do that in a consumer oven.. thank goodness I never tried that one at home.


The real hack with the 360 with that issue was to wrap it in towels and turn it on so it'd cook and reflow itself. Absolutely bonkers that trick worked for anyone at all, let alone for enough people for it to be a "thing".


I'm kinda amazed soldering is still such a thing.

Tech is already pretty reliable in my experience (At least the cheap stuff that isn't as high power and doesn't thermal cycle as much I guess), but getting rid of solder as a failure mode and making chips swappable would be so cool.

If they could somehow make production grade Z axis tape we could just tape the parts on with a 3d printed frame for position, and anyone could do component level repair without much skill.

And all the chips from dead devices could be reusable, if there was a way to automatically sort them all.


Can you be more specific about the placement? I have a PS4 that sometimes fails to start, it would be neat if I had a consistent fix for it.


I had that with an old drive that was in a drawer for years. I tried it many times and it wouldn't start. Then I accidentally dropped it (it landed pretty hard, hitting a metal table leg)... After that it worked fine and I was able to copy off all the data.


Compressive maintenance rather than percussive.


I've stuck drives in the freezer twice and they worked long enough to recover the data after. Both were small spinning external drives (laptop hdds) that had stopped working completely.


> So, I left the Firecuda in the freezer at -18C for 30 minutes. Inserting it into an NVMe M.2 USB case I was able to see the drive on my Mac. Success!

Oh so the fridge trick now also works for NVMe drives? I saved files out of an old HDD (about 20 years ago) by putting the HDD in the fridge for about half an hour too: got the files out of the drive then trashed the drive.


I had an emergency one time on the road where the laptop I was using late at night randomly shut down in the middle of audio editing. Weird. I booted it up and it died again fairly quickly. Next time even sooner.

Eventually I realized it was overheating.

Cue getting ice from the hotel dispenser and using a fan and a metal tray to keep the laptop alive until my work was done.


Ah yes, good old 'cold storage'.


Is 42C even hot for an NVMe? I recently started collecting SMART stats for all the drives in my home fleet and that seems about typical for the drives I have when idle-ish. I've also seen it suggested that keeping NVMe drives too cool is bad for performance.

But more importantly, this is a reminder TO HAVE BACKUPS. Many of my friends love their Synology systems. I am thrifty so I use urbackup and it has saved my butt a few times.

https://www.worldbackupday.com/


Yeah. I have a Synology and backed up everything except... this PC. It was just the "gaming PC" and then I realized that I would rather not re-install everything and try to recover the drive. It's set up on the Synology now.


Ouch. I'm am grateful to have picked up the "When in doubt, back it up anyways" habit before getting burned.


I have this with movies. I don't back them up because of the storage requirements. Everything is ripped from BluRays that I own, but... can I be bothered re-doing them?


If you'd already built out the automated "Some stranger on the Internet ripped it for me" workflow you could say that usenet is your backup.

A buddy just gifted me a decomm'd Synology RS2416+ w/ expansion unit full of drives that could, just, back up what I have... and I am definitely not going to do that.


I don't think 42C is at all hot for an NVME SSD. But I don't recall the suggested limit offhand.

> TO HAVE BACKUPS

Agree. I admire the diagnostic skills and ingenuity to recover the data, but was thinking I would just restore from backup in the same situation. But I'm weird. I have a "home lab" file server with a true server H/W that my desktop and laptop back up to as well as a remote server that the local one backs up to.


Generally safe operating temperatures are between 0 and 70 degrees celsiuis.

e.g. Samsung https://download.semiconductor.samsung.com/resources/data-sh...


For some reason the Max field in the picture says 42.4°, but if you look at the center target you'll see that it says 88.7°. I missed it too at first and was equally confused.


That says 38.7C I think. I have a standalone FLIR which plugs into an iphone, and the way it works is: min/max/avg=min/max/avg of current frame, and the temperature written in the center is temperature at the crosshair. Hence the crosshair temperature cannot be > max.


Ah yeah. I also misread it as 88,7, but looked after your commend and lo and behold, it says 38,7 :)


That would be bad but I read that as 38.7c.


>"Now, I couldn't possibly sit and hold the SSD squeezing the chip while I copied off the data so I came up with another solution. A metal G clamp and strips of a Silicon Valley Bank credit card under the SSD to support the PCB."

I know I shouldn't, but the folded up SVB card made me laugh.


Does anyone have any advice (other than sending to a professional data recovery service) for how to access data on an HDD whose controller card stopped working, and refuse to accept a spare controller card taken from a brand new identical HDD?


Since many years (like 15 or more) the PCB of a hard disk contains a chip (essentially an eeprom or flash device) containing so-called "adaptive data", a set of data written in the factory that are specific to the disk drive (head/platters) the PCB is mounted on.

There is specialized hardware (and software) to be able to extract the data and save on another board's memory, but the poorman's way is that of transferring the actual chip from the old board to the new (identical) one.

This, commonly referred to as "ROM swap" is not particularly difficult[1] as the chip is usually a rather simple 8 pin one, if you are not into this kind of things a hardware repair shop (like a phone repair one) will normally make this work for you.

However newish hard disks may have not this separate chip, it has to be seen which model yours is.

Here is a site with some more info:

https://hddpcb.eu/gb/content/how-to-swap-hdd-pcb

[1] meaning that it can be done DIY if you are familiar enough with soldering/desoldering components


Cool it down to freezer temperatures and see if it works. Heat it up to ~60C and see if it works. This applies to both hard drives and SSD's.

A good half of "failed" drives can be made to work for at least a few hours longer with that method.

Unlike OP, If you see signs of life, don't mess with clamps or reflowing. Just leave it in the freezer/oven while you take data off it, with longish power/sata cables to a machine just outside the freezer/oven door.

I recommend GNU ddrescue for getting data off - when you only have a few hours of service life left till it is dead-dead, it maximizes data recovery in a given time. There are various ways to generate a mapfile to skip recovering free blocks, which are worth using if you suspect the drive is mostly empty.


And for the freezer method, be careful with humidity when you take it out. Worth putting it in a bag with some dehumidifiers and just take the cable out


Couldn’t this method risk making it completely irrecoverable if it fails?


Freezing / heating? Other than the risk of humidity as mentioned, not really. Unless the head crashes into the platter, it's unlikely to do anything that would permanently kill the drive.

That said, if you ever have a drive with absolutely critical, must-have data, don't bother with any of this and just ship it to professionals. You'll pay dearly, but they'll get your data out.


There are bios chips on the controller board than need to be transfered to the new one. In a reasonably modern drive that would be pretty challenging and rewuire a hot air station.

https://www.hddzone.com/hard_drive_pcb_replacement.html


How do you know for sure it's the controller?

Note that many controllers store parts of their firmware/config on the disk platter, so without the disk platter the board may not show up via sata.


It's my suspicion, at least. In fact I have two of those drives, bought at the same time, and both died in the same sudden way with no more than two weeks between.


Wasn't there a firmware bug in some drives that made them fail after a certain number of seconds of uptime? Was a workaround/fix published?


Yes, there was a well published Seagate issue [1] that I think was related to uptime, and you could fix it after the fact by grabbing a shell through the ttl serial on the jumper pins. (Edit: thanks jaclaz for providing a link with more details!) It was claimed Seagate would fix it for you under warranty as well, although shipping a drive always has risks.

HPe had two rounds of enterprise SSDs that failed because the uptime counter overflowed, but I never saw content about fixing those after the fact. And I think I had seen a different SSD uptime based failure a year or so before.

IMHO, it's best to avoid same batch storage, and if that's not possible, stagger the online time to try to give enough time to notice a failure, obtain replacement storage, install replacement stotage, and migrate data. Backups are important too, but it's nicer to have a path towards mostly online recovery. And some mostly replacable data is hard to justify backups for (do I need three copies of format shifted media? probably not, if my online storage fails, I can re-rip)

I don't recall hearing about this for Western Digital drives, but there's some xbox360 stuff that I thought involved the ttl serial on WD drives... It's certainly worth exploring. WD green drives do also have a very short default timeout to park the drive, and as a result can experience a large number of parking cycles in some applications, and the parking ramp can wear out; I don't think this is really recoverable, the heads are likely to get damanged and debris may damage the platters.

[1] https://sites.google.com/site/seagatefix/


Only for the record, at the time the Seagate issue was due to a bug in the firmware, when the drive was powered and found a counter at certain values, it went into a sort of loop and failed to "boot" (the internal OS) further.

This happened only on some disk drives because it was initially triggered by a defective testing equipment only on some production lines, see "Root cause" here:

https://msfn.org/board/topic/128807-the-solution-for-seagate...


Doesn't ring a bell for me, but given the highly questionable tiering/marketing methods from HDD manufacturers these days and the fact that they have been reducing warranty durations by some 50%, that type of planned obsolescence wouldn't surprise me.

The drives are 3.5" 2TB WD Green purchased around 2012, and had been in use for about 1 year when they both died.


> 2012

If that's not a typo, then it seems like those drives have been powered off for about 10 years.

I think powered off hard drives are commonly said to retain data for um... maybe 3 years (from rough memory). So, your drives have probably lost their magnetism (and thus the data). :(


Naah, that may apply to SSD's, not to good ol' (rotating platters) hard disks, the magnetism does not evaporate.

The only issue that may happen on an unpowered for several years hard disk is so called bearing seizing, the (fluid) bearing of the motor/platter may become stuck, but it is relatively rare, though some particular make/models are more prone to this, and though (usually) fixable, in some cases it can be made to rotate freely again, but you need the services of a specialized service, as the disk needs to be opened, in some other cases the bearing can be replaced, and some specialized tools are needed:

https://hddsurgery.com/blog/hdd-motor-bearings


Thanks. :)

That reminds me of a work colleague a few years ago. He got an ancient drive working again by tapping it on the side with a screwdriver while powering it on, to get it "unstuck".


Not a typo, they've been powered off for around a decade. But HDDs (just like floppies) don't lose their magnetized property in just 3 years, or even 10 years, barring the very rare case of an extreme fluke or environmental exposure. Flash memory (memory cards, USB sticks, SSDs) loses electrical charge relatively fast, however. That might be what you're thinking of.


Thanks. Yeah, I was probably thinking of SSD's then. :)


There were SSD's from an unnamed OEM that would brick after overflowing an uptime counter. Cisco and HP were both bitten by it: https://news.ycombinator.com/item?id=32048148


Double check the numbers on the controller boards; HDD's are complicated little computers and the manufacturers change things during production of a particular model of drive fairly often.

If you've got the controller board your drive wants and still nothing, then its time for professional help or considering the data lost, imo.


The PCBs are of the exact same model. My fear is a DRM type scheme connected to serial numbers etc. stored on the platters having to match those of the controller.


doesn't even have to be DRM, just "media mapping" where its adjusting tot he individual platters and such. like factory low-level format stuff.

I don't know how complex HDDs have got, but I recall giggling at someone installing linux on an HDD controller board several years ago. So I bet its much worse now.


> other than sending to a professional data recovery service

Ah, so you don't really need that data...


Why the snark? I will survive without the data, but I'm curious about alternatives to throwing $4000 at the problem and forfeiting my privacy and my data's integrity.


There are many people out there who thought what they could recover the data by themselves, only to swallow a bitter pill later, that's why.

Because if you really need the data then you go to people who makes a living by recovering data.

But if you are okay to lose the if unsuccessful then it's okay to try, but you should know/tell that beforehand.

Reading through the other comments - you have a very low chance to succeed, because if you want to swap controller boards then you need to move adaptive data too, as other had said.

But I'm curios what exactly happened, WD Green from 2012 are not the worst drives out there. How exactly they failed, what happens now when you power them on, with SATA connected, without? Did you try external USB2SATA converters/boxes?


>"Reading through the other comments - you have a very low chance to succeed, because if you want to swap controller boards then you need to move adaptive data too, as other had said."

I'm pretty decent with soldering iron and hot-air gun. Migrating the SMD flash/EEPROM chip shouldn't be too hard.

>"But I'm curios what exactly happened, WD Green from 2012 are not the worst drives out there. How exactly they failed, what happens now when you power them on, with SATA connected, without? Did you try external USB2SATA converters/boxes?"

The discs spin up but the controller no longer communicates with the host. The computer doesn't see that the drives are attached to the SATA bus. There were no signs of problems coming, they just suddenly stopped working from one power-up to the next. I tried with different motherboards and a couple of SATA-USB bridges, all same result.


> Migrating the SMD flash/EEPROM chip shouldn't be too hard.

Well, good luck then.

> The discs spin up but the controller no longer communicates with the host

Now this is strange, if the controller would be dead then there would be no spin-up. If you hear the heads working than the controller is definitely not dead.

You tried to search forums dedicated to data recovery with your exact P/N?


No discernable sound of the heads moving. Drive just spins up and that's it. Last I searched was back in 2012-2013 and I didn't find any other advice than trying a PCB swap.


Just a double check... you said in thread you have two of these drives that died at 1 year old. Did you have a third, working, drive you're transplanting boards from?


I have a third drive, working and unused, with same PCB model number. Others in this thread have said that my chances are good if I bring the DIP-8 EEPROM from the broken drives over to the working PCB.


Is it not the case that the controller boards store track data specific to an individual drive? I don't think they can be swapped out.


Historically they did contain a map of physically unusable sectors particular to the physical platters in the drive, and I'd be surprised if they didn't now. So the consequence of a controller swap was making the assembly unstable because you were using a foreign sector map; usually it was still enough to recover as long as you weren't writing new data.

But nowadays a drive controller is far more complex, e.g. it might implement transparent hardware AES encryption, in which case swapping the board loses the key. And I've no doubt there are many modern manufacturing-related tricks for yield that go into them as well, any of which might make a different set of platters than what was shipped unreadable.


Yes, they (the PCB's) cannot be swapped unless you also swap the chip containing the adaptive data.


I fear this is the case...


Dodgy flash bodge involving a woodworking clamp & 'strips of a Silicon Valley Bank credit card' - this is absolutely the 'most HN' story I've ever read!


Ah, data compression


I just had to login to upvote this. I am a little ashamed to admit it took me a second to get it. Then, I chuckled :)


Found a good use for SVB debit card


beat me to it


Lexi-tangent: This is a good example of English refusing to follow rules. C-clamps and G-clamps are exactly the same clamp.


In a way, C and G are the same letter!

https://en.wikipedia.org/wiki/G#History


I came here to comment on G-clamp. I wonder where the author is from. I started building things as a kid, I've worked for quite a few contractors over the decades, I've sold some of my wooden creations and met many skilled people with beautiful shops in the US and Australia and have never ever once heard G-clamp.

G-clamp makes total sense, I'm just curious!!


He's probably not from England because there they call them cramps!


I am British but also French and lived in many different places. I'm just confused. I imagine someone called it a G clamp in front of me at some point and I just started calling it that.


I'm also British (SW) without the French or many different places - 'cramps' is new to me as far as I can remember. I'd call it a G clamp, and so does Screwfix, fwiw.

(Wiktionary does give the clamp meaning too, not with a UK qualifier though. https://en.wiktionary.org/wiki/cramp)


Ah Seagate, good memories!

In the 90's they had this policy that you can send them a broken HD and they would replace it with a new one. Since they wouldn't make the old version anymore, the new drive would have more capaciy.

So at first you could just send them your hard drive and you would get an upgraded back.

I was probably not the only one who heard about it, and so they started testing the incoming drives to see if they were really malfunctioning.

I once tried to break such a drive. Formatting the hard drive while bashing it on the table. The thing kept going, no problem.


My elementary school had a little school supplies store in the school itself. I bought a 4-function calculator for math. Next week, they had an upgraded calculator which showed the inputted formula and supported more functions. The old calculator was gone from the store, but my parents wouldn't give me money to buy the new one. So I took the old calculator apart, disconnected a power wire so it didn't work, and brought it back to exchange for the new one.

Sorry school!


The first PC I built myself in the 90s had a Seagate hard drive. It developed a "click of death" after a few years, a repetitive clicking noise which corresponded with my PC freezing/crashing.

I discovered at some point that a few swift kicks to the front of the case would get it going again for a couple of days. This lasted for a good year or two before no amount of kicking would revive it.


The image in my head of you bashing the harddrive gave me a good laugh!


I have a GPD MicroPC. The weird SATA SSD it came with stopped working after I dropped it once. Everything seemed fine, and I noticed that sometimes it would work if the mounting pressure was a bit higher than usual.

So, I did the right thing and stacked a bunch of electrical tape between the chassis and the SSD, and it has been working ever since. It's OK if the SSD dies though; it's running NixOS, so getting it back up and running with a new SSD would be a very short ordeal.


HP sold me a crappy high-end Pavillion laptop around 12 years ago with a GeForce chip in it that would unsolder itself due to poor thermals.

The fix was to disassemble the back of the laptop, mask around the GPU with tin foil, and hit it with a hair dryer on max setting for about 20 mins.

The fix would last like 3 months. Learnt it off some guy on YouTube who was such a hero.


In particular nvidia bump gate case it was microbumps cracking under thermal stress due to use of badly matched thermal coefficient of the glue securing the crystal. Defect was between gpu core and carrier package, not the BGA balls.


> So, I left the Firecuda in the freezer at -18C for 30 minutes.

Just scrolled through the Ontrack website:

    Data recovery myths: why you should avoid the temptation of a DIY repair
    Suggestions we’ve seen online that definitely _will not_ help you recover your data include:
    - Putting your hard drive in the freezer overnight
:')


This isnt a hard drive. It worked because cold causes components to shrink and pulled BGA ram chip closer to the board. Cold spray is standard practice when diagnosing electronics defects.


I know those chips are all ball grid array or something, but how does one (or more, I suppose) of the contacts fail this way? Seems like if the package contracts and lifts, that the copper pad would come with it, rather than just a break in the solder. I'm not sure what I should be visualizing for a cold solder joint on surface mount...


Fatigue cracking from thermal expansion, and also perhaps residual stress and creep deformation. Definitely in the "mechanical engineering" realm.


It's possible that there was a flaw in the original manufacture of the board and one of the balls didn't flow properly, but it passed QC.


Wouldn't that be DOA though? This has to be some sort of flaw that creeps in slowly.


The flowed contacts could be holding the ball against the pad, making an electrical connection, but then the connection gets fritzy as everything warms up. Ideally it would show itself in the first run but I could also see it getting worse with time.

I assume this occurred after the manufacturer's warranty expired.


Using woodworking tools to retrieve 1TB of data from a faulty drive is not recommended and is highly unlikely to be successful. Data recovery requires specialized equipment and expertise to handle delicate electronic components. Using woodworking tools could further damage the drive and data beyond recovery. It's best to consult professional data recovery services or try DIY data recovery software for such tasks.


Using woodworking tools to retrieve 1TB of data from a faulty drive is not recommended and is highly unlikely to be successful. Data recovery requires specialized equipment and expertise to handle delicate electronic components. Using woodworking tools could further damage the drive and data beyond recovery. It's best to consult professional data recovery software/service for such tasks.


Too bad we didn't have such technology for the xbox "red ring of death". oh wait.


The use of more brittle lead-free solder has almost certainly contributed to higher failure rates of BGAs under thermal cycling, leading to more ewaste generation. But at least the ewaste is environmentally friendly...?


I had a new laptop that I swear would hardlock while watching YouTube, but only when the device was recently powered on (it was on for weeks / months at a time normally). Eventually someone suggested a solder contact may be less stable when cold, after a few weeks of testing I confirmed this was true (or enough to reproduce the issue) before RMAing the laptop, which has been working perfectly ever since!


Sounds like a born motorhead keeping his rod running!


The power button broke on a dell laptop i was working on one time. We had to order the part, but the customer didn't want to wait.

I offered a very cheap alternative, and took his spare wifi antenna cables and cut the off. Then I soldered them DIRECTLY to the motherboard and routed them to hang out the side.

It was like jumper cables, you tap the and the laptop would turn on


Once I pulled all my data off my stricken phone in my hotel room using nothing but the veggie sausages I'd stashed in the mini fridge as a cold sink. Failed at just the wrong moment the evening before an important event, of course. No time to find enough bandwidth to restore my backups...


I just lost data on a firecuda drive that was a few months old. The lab could not recover the data and also did not tell me the root cause of the drive failure.

Compared with my samsung ssds which have (knock on wood) never died in a decade.

I know all brands fail but is there something of lesser quality with the firecuda line?


From working tangentially in consumer electronics, I can tell you that nearly all failures are design related if you look deep enough.

There are some products where some start dying after 1 year, and nearly all of them are dead after 2 years.

Other times it's a combination of a bad design and some specific use case. For example "these fail after 3 years if you turn them on and off every day, but don't fail if you leave them running 24x7".


Or sometimes they will die if you leave them on for 40000 hours.

https://news.ycombinator.com/item?id=32048148


Solder joints seem to be a major failure. That'd be mostly fatigue (thermal cycling) and the occasional case of tin whiskers.


Removing lead from solder should reduce lead poisoning and environmental lead in general, which is good. Unfortunately, lead-free solder is objectively worse than leaded solder in terms of longevity of joints as well as requiring a greater working temperature. Things have gotten a lot better since introduction though.


Yes, but in turn the failure is because the thermal expansion coefficient of the board and component were insufficiently closely matched (when taking into account the elasticity of the board and component)


Lead-free solder is just shittier than leaded solder. You're not wrong about design but better solder would reduce the impact of mismatched thermal expansion properties.


BGAs and hot chips are a bad combo.


In my view, I do think Samsung’s hardware is generally more reliable, however this gets negated by their dogshit firmware. I’ve had multiple different samsung enterprise and oem drives die because their firmware is full of bugs.

Basically one will suddenly decide it hates life and will then only show up as having only 1gb of space and the firmware version set as “ERRORMOD” (“error mode”). This is not a time counter rollover issue as far as I can tell (though samsung has plenty of those too [0]) as there wasn’t anything in my smart logging that would indicate a time value they approached and died. Samsung’s firmware is just super buggy and can get caught in a bad state. You can find business purchasers complaining about these issues as well. [1]

Someone once decompiled the firmware of the EVO 840 before they started encrypting it. Have a look at the “Bugs” section to have a laugh: http://www2.futureware.at/~philipp/ssd/TheMissingManual.pdf

So when you see a samsung enterprise/oem drive on ebay with 99% of it’s life left, what you are really buying is a drive with bugs but no way to obtain fixes, since samsung will give you the business version of go fuck yourself by telling you to “contact your vendor”.

Part of the problem is samsung cultivates a complete shitshow where vendors will “customize” firmware, so an identical drive from lenovo will need different firmware than one from HP. The other part of the problem is Samsung’s consumer drive branch is basically completely separate from their oem/enterprise branch despite what is mostly the same hardware and firmware. So while the consumer branch has historically had to eventually face the market consequences of gross negligence, the enterprise branch is shielded by misdirection to the vendors.

Fortunately mine where quite cheap so not much loss, but still, they lost any illusion of competence over other companies in my eyes.

Note there is a collection of samsung firmware here, though it is hardly complete: https://github.com/lolyinseo/samsung-nvme-firmware

P.S. Also note there is a poorly documented but common form of ssd “failure” that can happen. If you have a drive suddenly not show up on next boot (especially after power loss), using the power cycling technique can often recover a drive: https://dfarq.homeip.net/fix-dead-ssd/

[0]: https://www.tomshardware.com/news/samsung-990-pro-health-dro...

[1]: https://forums.servethehome.com/index.php?threads/pm9a3-firm...


The worst part is that it's so hard to find a vendor with SSD firmware that isn't terrible. So many buy the cheapest controller chip and then ghost you when the drive crashes and bricks itself.

Many many drives, even from big companies like HP, came with the damn Sandforce controller that tends to brick itself if your computer ever goes to sleep. I know organizations that ended up replacing entire swaths of drives because they were dying so often. It was enough of a problem that some people went to extreme (legally dubious) lengths to try to recover the drives. I mean just look at the procedure:

https://computerlounge.it/how-to-unbrick-sandforce-ssd/

Of course Sandforce was completely unhelpful in trying to fix the problem.


Can someone please explain the physics behind

1) A "faulty connection" and why it fails at a certain temp? Is it a partially broken circuit that doesn't work anymore if resistance is too high?

2) Why airblowing fixes it. Because it melts the crack together? Any why it doesnt do any other harm to the SSD?


It sounds like to me a faulty solder joint (crack between the smd pin and the pad). Cooling probably causes a differential contraction which brings the pin into better contact, and of course clamping achieves the same.

It may have started with a hairline defect that got worsened by thermal cycles.

If it is a bad solder joint, reflowing bridges and fixes the joint. It won't harm the chips as long as the temperature is correct, since it needed to be soldered in a reflow oven in the first place. However (I believe) there is some risk of excess heat corrupting some data, and/or worsening the defect, so if you can back up first using the jury-rig, that's certainly preferable.


Coefficient of thermal expansion is related to density of an object, so chips with higher density and circuit boards with lower density expands to different sizes at the same temperature. Something will have to give way eventually, and usually the solder balls that join chips to the board do so by building up cracks.

It WILL do other harms to the SSD. Semiconductor parts have limited reflow counts allowed to meet specifications such as failure rates, longevity, maybe power consumption too. It will be a factor at scale or over time. But it's free extra half life for semi broken parts too.


Reminds me of the time I had my spinning rust system drive sitting in the freezer with the wires leading out to my tower sitting on top. I was desperately trying to image my system drive with clonezilla. As a poor student it was a stressful time, thankfully it worked.


I think the use of a chopped up SVB credit card is clearly the cherry on top of this story.


This is both very clever and very cool.

It's not a matter of if your storage media will fail, just when.

Always keep backups.


> very cool

Yep, exactly -18C cool.


Ha, I have use a clothespin to get a few documents from an old failing usb stick before. Clamp bga rework ftw!

Was the drive getting too hot? Maybe under the gpu or in the bottom slot one of those piggyback boards?


Long time ago I saved data from a hard drive by putting it into freezer, then copying data, putting it into freezing, copying data, ... repeat a lot of times :)


>> Y U NO USE SSD ONLY??11

That's why. HDDs aren't failure free but at least most of the shenanigans are solved already. Except SMR.


Oh fuck, I have 4 FireCudas (2 530s and 2 520s). I'd probably try reflowing BGAs in my buddy's shop's BGA oven.


I see a lot of tutorials on running mechanical hard disks without covers on and such.

Years ago, the drives had safety systems to prevent this.


In the late 90's I was an instructor in a youth PC repair class. For a fun demonstration we took an old IDE drive, removed the lid and had it running defrag. We did this outside. Next to show how important it was to prevent liquid spills we poured some cola directly into the drive. It sprayed everywhere but the drive kept working fine. Kids got a good laugh and did not learn a lesson that day.


Awesome! Thanks for sharing all the technical details. Sure could help in real life situations


A ziplock baggy with ice cubes and a little water can absorb a lot of heat.


About 15 years ago I made quite a bit of money recovering data from damaged HDDs using various low-tech means.

It started when I threw a very expensive (at the time) 1.5TB HDD. It had various things going on with it but I recall it wouldn't spin up and once I was able to get it to spin up, it wouldn't function for more than a few minutes before it would error out and the drive would become unrecognized by the PC.

I remember I made it spin up again entirely by accident. Figuring the drive was toast, I'd removed it and while fumbling with the cables it slipped out of my hand and landed on the hardwood floor. I hadn't noticed when I was holding the drive, but it felt like something was "stuck"[0] and after picking it up off the floor, it felt like the components were moving again. On a whim I plugged it back in and it worked.

The second problem, I knew, was thermal. The drive got incredibly hot very quickly. Being that it was a 1.5TB drive at a time when "that was big", I'd need the drive to be functional for hours to complete copying the data.

I had a dorm fridge in my office and debated putting it in one of my external cases and running a cable from there but knew that the moisture wouldn't play well. Because I kept very little in this fridge, it was filled with a few cans of soda, coffee creamer and about ten of those blue gel bags that people freeze/put in coolers in lieu of ice. I did this mainly because the fridge was really loud and keeping it stocked with anything made it run a lot less.

The upshot was that the size of these "bags" was just a little larger than a 3.5" HDD, if left out they maintained their temperature for hours[1]. I grabbed two, placed the drive on top, put two more on top of the drive and managed to image the whole thing.

It was so simple, I popped an ad on Craigslist and $200 "I'll get your data back or you don't pay." Out of 30-40 drives, there were less than 5 that were beyond help. About half were software issues, many of which were simple to resolve. Very few required a tool like photorec, but it did the job when needed[2]. The rest were handled with an extremely stable power supply, four gel bags and patience. Except for one of my own drives, I never popped screws and generally didn't resort to "applying physics and gravity" in hopes that if I couldn't recover anything, I'd at least give them a drive that was "no more damaged than when I received it" (they still signed a document covering me for any liability).

The whole experience made me realize how critical keeping "components other than the CPU" cool is. Every drive I've experienced hardware failures on[3] has had something heat-related coinciding. I remember one time I couldn't figure out which of the 8 drives was indicating failure; I saw one of the three fans in the drive cage wasn't spinning. I shut the server down, pulled out the fan and its 3-drive set, figured has to be the middle drive, swapped it and it started rebuilding the array on reboot. I guessed right. :)

[0] If you take a drive, lay it flat on a desk and spin it, you'll hear things move ... this one didn't.

[1] Provided it's just sitting on a wooden desk.

[2] There was one customer that I'll never forget and he was a reason I stopped doing this entirely. I told him I could recover photos/videos/specific file types from his drive but it will recovery everything it sees including files that may have been intentionally deleted. He said "Oh, no! Don't do that. I'll pick up my drive." I still wonder, to this day, what kinds of horrors I would have been in for (I rarely did more than a spot check on the data, anyway).

[3] I still have a pretty massive custom built storage server with an older LSI MegaRAID controller and a mess of SAS HDDs. For what it's used for, the speed is more than adequate the individual drive costs are substantially less (even taking into account SAS over SATA) and they outlive my SSDs.


IBM basically had a recall at one point on drive arrays.

They got the bright idea to make the tabs for the sleds out of plastic to make them easier to slide in and out, or save a couple pennies, or both. Problem was that metal drives in metal sleds on metal rails going into metal bays do a pretty reliable job of keeping the motherboard and the drives on a common earth ground. Plastic rails meant a floating ground, which electronics especially do not like. A little static electricity or inductance and things get ugly.

If memory serves they added some sort of little grounding cable they would send you, making it harder not easier to get the drives out. So dumb.


Bonus points for calling out the cut up SVB card. Classy.


JFYI: The author of the post is the CTO of Cloudflare.


lol when I read the title I was asking myself how they would have scalpeled out the right sectors... turns out it was a lot less exciting


It's the SVB card for me....lollll


A clamp is a wood working tool?


You can't get much done before you need at least one clamp. Most wood-workers have dozens on hand at all times.


Yeah but it is more of a general purpose tool than a woodworking tool... And this post kinda proves my point


> I'm tempted to try to permanently fix the SSD using my SMD hot air blower. But maybe I should just replace it at this point.

Ah, a man after my own heart <3

With the modern widespread trend in tech of treating the owner/user as a security threat, it's easy to feel like the hacker spirit is dead or dying, and then posts like this rekindle my hope for humanity


My concern with fixing a storage device is that storage is pretty much the worst thing to fail. Maybe as a cache drive or perhaps for some large applications - but I wouldn’t trust it with data I cared about…


A fix like this is likely to rebreak in the same way. The question becomes how many times do you think it's fixable before thermal stress destroys the board. Maybe future incidents will be prevented by increasing cooling of the device in the installed system?

I'd be ok with this in a machine with regular backups, where you're looking at worst case the data from a day or two is lost, plus some downtime while getting a replacement and restoring a backup, and best case another trip through the hot air station. Seems pretty ok to me in that use. Would probably be fine for a use case with easily replaceable data too: if you've got a fast network connection, it could be your steam drive.


Yeah Steam drive is what I was thinking with large applications, but it could as easily be something like Xilinx’s Vivado…


> strips of a Silicon Valley Bank credit card under the SSD to support the PCB.

Good to know it's still useful.


I'd make a collage out of it with AOL floppies, :CueCat scanners, and Juicero pouches.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: