Hacker News new | past | comments | ask | show | jobs | submit login
Backblaze Drive Stats for Q3 2023 (backblaze.com)
275 points by caution 7 months ago | hide | past | favorite | 72 comments



The article discusses the impact of high absolute temperature on the longevity of drives, however from my amateur knowledge the range of temperature during a day is also an important factor.

I always assumed that having a stable 40°C is better than a drive constantly swinging between 20°C and 40°C, so I am surprised that the article only mentions alerts on reaching a high threshold.


Andy Klein from Backblaze here. Your point is a good one in that temperature fluctuation can be an important factor. We actually sample smart stats, which contain the temperature attribute, multiple times a day looking for such changes. The Drives Stats data is captured once a day, so it looks static, but behind the scenes the monitoring is more dynamic.


Do you have plans to look at whether higher fluctuations translate into higher failure rates? Not sure whether you have historical data on this, but I would be really interested in this aspect, even if you can only run the stats on a smaller number of drives or shorter time periods.

Maybe dividing the drives roughly in "higher than average variability" and "low variability" and then looking at the AFR for this subset can show some relation. Of course as the AFR for many drives is already quite low, the effect might be too small to distinguish from noise.

On the topic of temperatures: Have you run an analysis whether a drive increasing in temperature (or maybe even decreasing) compared to its base line and "neighbors" results in a higher chance of failure?


Count me in as a second to the question from rft.

I'd be interested if you have data to compare similar drives with stable tempretures against diurnal tempreture cycling.

I'd imagine you have fairly constant data centre type environments though which would confound analysis for such questions.


The best blog series going. Great technical writeups that I wish more companies would do - we've been doing it at our small business and customers really get a lot out of it. Also helps with marketing.


My simple rule still stands even after 20 years: Avoid Seagate.


They had a couple bad runs a few years back. If you keep following your simple rule, you'll eventually get a bad WD and not be able to use spinning platters at all. A better simple rule would be to never skimp on buying out-of-warranty drives, as well as having a proper backup regimen. I've had bad seagates back in ~2008 but my current nas has 5x 16T exos and they work fine.


Conversely, I’ve had nothing but bad luck with WD the last few years, and my Seagates have been flawless.

My simple rule is that all drives suck, and always have good backups.


My WD drives failed pretty consistently, so I'm now giving Seagate a try.

Well, my main reason was that WD decided that just failing "naturally" after a few years wasn't enough, but that a drive having been on for 3 years should be considered the same as "failing" (communicated through WDDA), which led to Synology adopting that for a while. Not sure what the current state is of that, but I intend to swap drives when they fail, not when they turn 3.


Well, for me, WD Reds seem to start to go bad at about 3 years, so...


You might want to read this: https://arstechnica.com/gadgets/2023/06/clearly-predatory-we...

I had 4 WD Red 4 TB HDDs like WDC WD40EFRX and 2 out of them already failed SMART long tests and hat uncorrectable errors reported by the kernel after about 25000 hrs powered on. I've messed a lot with the drives bought 3 other used drives and it turned out that one of them had the same failure just undetected.

I was able to "fix" the issue by running testdisk in read-write mode forcing the disk to overwrite the bad sector. That's how I forcefully fix pending sectors on desktop drives. But it seems that WD Reds don't want to replace sectors because the data is still readable. The drive just needs a second or two.

I'm not happy with that but I'm also glad, I could confirm that's not an issue caused by my setup. One would say, I should replace the drive immediately but I trust in ZFS and my backups. I would put the drive on my shelve and maybe reuse it as temporary buffer storage because why would someone buy such a used drive for a high price? In my eyes, it's still okay.


Yup. All drives kick the bucket at some point. That's why I use BB. I don't trust myself with NAS.

The ones that never failed me were any drives made by Quantum using SCSI interface. 5 drives, zero failure over 10+ years. But those were slower and cooler running units.


I’ve got a NAS backed up to Backblaze. It’s a nice setup. I can quickly recover from local data loss, or replace a RAID disk when needed, but if the NAS gets hit by a bus then I still haven’t lost everything.


That is a nice setup since it minimizes recovery time to nothing. I'm guessing you're using their B2?

I've had 2 drive failures since using BB so it's been worthwhile. But their Download software is pretty dreadful if you have large amount of files. I may try their ship-drive option next time.

Tracking their stock price, they are operating pretty lean, so I don't fault them for having crappy software.


I always found it hilarious when they were sponsoring the datahoarder subreddit. I’d like to meet that marketer and shake his or her hand.


I'm surprised that Seagate has consistently kept its 'slightly less reliable' crown after all these years.

It's like the student who copies the 'A' student's exam. They'll purposefully get a couple of the answers wrong to avoid suspicion.


My simple rule in the same period: avoid WD at all costs, prefer HGST (which later became part of WD) and use Seagate mostly. I’ve seen variations of these personal rules, and during a time when Backblaze didn’t exist yet (and came up with better measurements on a larger scale), it was like one of those holy wars between tabs and spaces (or vi and emacs).


> prefer HGST (which later became part of WD)

Toshiba got the 3.5" division. WD got the 2.5" division, anti-trust divesture (just 3 companies left): https://upload.wikimedia.org/wikipedia/commons/8/87/Diagram_...


I used to swear by WD HDDs, but when I literally couldn't tell which of their NAS drives were CMR and SMR a few years ago I wrote them off and went to Seagate who clearly labeled their drives.

Combine that with their lackluster reputation in solid state as of late and I probably won't buy their HDDs until Seagate one day gives me the "WTF are you even selling?" rigmarole too.


Not quite as long as 20 years, but for the past 15 years: Just buy HGST.


HGST was bought by WD long ago. Has that had any impact on the types of drives sold or the quality?


>has that had any impact on the types of drives sold or the quality?

No. To the point when WD tries to rebrand those drives away from HGST, the market demanded HGST and they brought it back a year later.


At what point do they rebrand crappy WD drives as HGST?

They already have a reputation of obfuscating information (SMR vs. CMR)


Close to 10 years ago, I bought a "new" hard drive on Amazon. When I got it, I happened to notice some suspicious signs of wear, like corrosion on the PCB from someone's greasy fingerprint. I dive into the SMART stats and it's completely zeroed. Not quite what I'd expect from a new drive.

So then I start scanning the drive. The partition table was deleted, but it was full of data, most of it encrypted. It was also badly fragmented, so it was likely in some type of array. What I could recover in cleartext implied that it had been spun up for thousands of hours, in a Backblaze datacenter. It wasn't conclusive enough to go to Backblaze about it, so I just returned it to Amazon. I probably did a zero pass on it first, can't remember.

Encrypted or no, if that was a backblaze drive, they were disposing of drives with customer data still on them. I'm not surprised someone tried to pass it off as new on Amazon, that scam is old hat. I was very shocked to see the data still intact though.


>it was full of data, most of it encrypted. [...] Encrypted or no, if that was a backblaze drive, they were disposing of drives with customer data still on them

Who cares if there was "customer data still on them" if it was encrypted? One of the nice things about encryption is that you don't have to worry about wiping drives.


Except for the part that there was a lot of clear text on it. It probably contained metadata. It did contain the name of some Backblaze employee, that's how I tracked it back to them.

Would you be comfortable publicly posting the encrypted database from your password manager? Or encrypted copies of your financial information? Go on, drop a google drive link if you're so confident


Sure.. until it gets decrypted


Is that a realistic threat with AES-256 and randomly generated keys?


The local electronics recycling company where I live (US-FL) shreds hard drives by default and many of their enterprise clients apparently ask for it when they “donate” their old PowerEdge servers, NASes and whatnot. Now obviously the recycling company could try to discourage this in lieu of a 3 (or 7 or 100 or whatever) zero of the drives and then resell them as they do everything else they get “donated” to them… many are really expensive, high-capacity SAS drives that are only a few years old. But I guess nobody wants to be that guy who compromises company data or whatever just so the local recycling company can make money off their old drives in addition to their old servers, UPSes, racks, et al.

Of course if these companies were really smart, they’d have wiped the drives before going to the recycling company. I’m sure many do. Still, they don’t risk it and want the drives shredded.

Eventually, AES-256 can probably be bruteforced in a reasonable amount of time. If you write all 1s and then all 0s (or vice-versa) to the drive, on the other hand… there’s no way to recover the data. There’s a lot of debate about that statement, but ultimately, if the drive is in fact zeroed twice, it’s physically impossible to recover the data. The debate seems to be mostly around whether zeroing a drive really does zero every bit and that’s not straightforward to prove (many drive erasure programs will offer a printable “certificate” once a drive has been “secure-wiped”, which often mentions a “million dollar guarantee” or whatever… it’s a sham because how do you prove the program failed to erase the data on the drive? Especially days, weeks, or years later?).


> Eventually, AES-256 can probably be brute-forced in a reasonable amount of time.

No. See https://security.stackexchange.com/questions/6141/amount-of-...

Time is not the bottleneck, energy is.

They invoke Landauer's principle which states that irreversible computation has an intrinsic cost in terms of energy per elementary operation, namely, k T ln(2) where k is the Boltzmann constant. Assuming brute-force search, more than 2^256 elementary operations would be needed, but that would require more energy than available if one converts the whole Sun's mass into energy.


VERY interesting read, thank you for that.

It’s worth noting several people’s answers state something to the effect of “quantum computing might be able to do it” and indeed I don’t expect an i9 or a ThreadRipper to ever defeat AES-256.


There's a reason that the industry standard for proper disposal of storage media, including HDDs, is nothing short of physical destruction.

It doesn't matter if the data is encrypted or not, the point of the matter is the data is still there when presumably that data should not exist outside <X> premises. Encryption serves as a mitigation against theft or accidental leakage of data, its purpose is not to facilitate data disposal.

Put another way, you have to answer Yes to this question for liability purposes: "Is the data gone?" The only way to say Yes with reasonable certainty is physically destroying the storage medium the data resides on.


Well, if you write 1 to every bit of the drive, and then write 0 to every bit of the drive, the data is gone… but to be fair, I think the concern is proving that actually occurred before disposal. It’s easy to see the data destroyed when the drive is ground up before your eyes.


Surprised by the number of drives with 0 failures, though it seems not all of the drives were run for the required time to qualify for the rating.

> In Q3, six different drive models managed to have zero drive failures during the quarter. But only the 6TB Seagate, noted above, had over 50,000 drive days, our minimum standard for ensuring we have enough data to make the AFR plausible.


I have always wondered why they don't use techniques from survival analysis to be able to draw conclusions even from sets with lower failure rates. Or for that matter to avoid slight bias even for drives they do report.


Andy Klein from Backblaze here. We have done some survival analysis (kaplan-meier curves). In our case, we need to have a reasonable number of failures over the observation period to get decent results. You can take a look at some of our work here: https://www.backblaze.com/blog/hard-drive-life-expectancy/ and see if that is what you were expecting.


Yes! Thank you. I've been thinking about scratching this itch myself but it turns out you did already.

In particular the shape of the survival curve interests me -- you hear so many things about exponential here, bathtub there, but ver little data. I will read once I have a spare moment.


I wonder if they have special hardware recycling arrangements with their vendors for decommissioned drives to reduce their footprint. How many magnets are laying around the office? :)


> The average age of the retired drives was just over eight years

Didn't realize they can last so long.


They can last much longer. I have operational IDE drives from the early aughts (not with anything important on them)


I have a WD Black 1TB drive that did service as my main desktop drive for about 6 or 7 years.

After that it went into my server at home. It was used for various things. At this point it’s spent the last 5 years as the disk my DVR records to. (Because 5 years ago I was expecting it to die any day and didn’t want anything more important on it…) So it’s being continuously written and rewritten 24 hours a day, 7 days a week.

It’s now about 17 years old and has spent almost the entirety of that time powered on. It’s been packed up and sent on two cross country moves. Still kicking.

I have a number of WD Green 1.5TB drives that are nearly as old and still in daily use in the same server.

Maybe I’ve just had great luck, but I’d be more surprised by a drive dying sooner than eight years.


Under my desk right now is a too-underutilized-to-upgrade-NAS that has been spinning 1TB Western Digitals since 2010. Between RAID-Z2 and cloud backups, there's almost no reason to get rid of them except for performance, which doesn't matter here.


If all of your disks are about the same age, from the same vendor, it might be reasonable to replace them over time to mitigate the risk that a firmware or manufacturing issue results in them all failing around the same time. Many vendors have had firmware errors where counters rolled over and the drive becomes inaccessible (generally much sooner than 13 years though).


I have a 3TB Seagate from around 2009 that still works. It was shucked from an external USB enclosure and has moved desktops several times. Granted its use has been pretty intermittent. Right now it’s just sitting on my desk, as it often has.

I have some ~140GB SAS drives HP branded that are probably of similar age based on the capacity, and they still work… but again, they haven’t exactly been active for the last 15 years straight.


I just retired some drives out of my home array. 3TB WD Reds with 10.5 years of power on time, no logged errors. Ran them through a full block check and had no errors.


As much as I appreciate these articles... Guys... Please don't save up on ink! Eyes are more expensive than black pixel paint these days.


Cool article! Looks like HGST (formally Hitachi) perform very well overall. Toshiba 4GB ran 101 months with no failures, holy moly!

I used to like Samsung hard drives personally, Lacie used them in their rugged series and I found them to have pretty low failure rates. Seagate bought out Samsung's disk drive business in 2011 apparently. I guess Samsung saw the future of SSD's?


Is there any backup software which continuously uploads your files to an AWS/GCP/Azure long term storage account that you control and pay for? Something like CrashPlan, which from time to time performs automatic maintenance and deletes old versions of files.


Tons of choices: Restic, Borg, Duplicity, Kopia

Though Backblaze is pretty good at what it does and you can set your own encryption key, if that's the concern.


Unfortunately Backblaze requires[1] you to provide them this private key to restore.

[1] https://help.backblaze.com/hc/en-us/articles/360038171794-Wh...


Yeah I mean you gotta trust them on some level. Backblaze could also push a client update that nerfs the encryption. If you've got really sensitive data I'd probably pick something else.


What does restore mean this context? I’ve downloaded files from their online portal without providing a key. Perhaps restoring in this context means having them mail you a hard drive.


Really? I thought the file metadata is encrypted so it needs the key to even identify what files are available to restore.


You can always encrypt yourself and rclone to b2. Likely cheaper if you are not a data hoarder.


Perhaps this can help https://rclone.org/


Arq backup. I have used it before in the past(2-3 years ago) but not using it currently.


My experience with Arq has been terrible so far. The interface keeps glitching up, for a while it failed repeatedly until I gave it a unique permission in Windows services, and the backup process is far from intuitive, with multiple backups showing up as restore options, but each with different file sizes and specific files saved. Overall, just too many ambiguities for it to be reliable.

Oddly, SpiderOak has been a background go-to for years and always worked smoothly for backing up everything I select in a wholesale way, keeping a fairly clear record of what was saved, removed or moved, and adjusting immediately to any file changes or deletes I do. The SO interface is shitty and often freezes, and lacks many basic features like being able to see file sizes or scrolling through long file lists easily, but at least overall, I can quickly and easily see when backups are happening, how they're being done and what's being saved. Also, for restoring files, it's surprisingly fast despite a reputation held by many that it's slow.


Arq is the best set-and-forget option I know of for macOS.


I'm using rclone to sync with Backblaze nightly, executed directly from a cronjob.


Same. Rclone is wonderful because it supports a ton of different backends which makes it super easy to mirror. It's also got some great features like crypt where you can encrypt everything locally, thus sending the data all as ciphertext.


Backblaze software is pretty reasonable. But I have a linux machine, so restic and their B2 storage is a buck or two a month to backup a few computers in my house. (around 185GB of photos, etc)


>Backblaze software is pretty reasonable.

Hard disagree. The jankiness of their software was what made me cancel my sub after several years. Firstly it doesn't follow the OS date and number formatting. It's a minor thing, but it's so annoying having to parse the dumb M/D/Y format and comma thousands separator etc (being da-DK). It's not a deal breaker, but on the other hand, it's such a low-hanging thing that most other software gets right immediately.

But far more importantly, the BB client would some times just decide to re-upload several hundred gigabytes of data that I know for sure didn't change, which makes me wonder if it's just the client being retarded or if the data got lost server-side. And it takes absolutely forever to detect USB harddrives being plugged in. And its log files will grow to absurd sizes, and you're not allowed to purge them or the client will become brain damaged. And one time I needed to do a restore, it took literal days for BB to prepare it and I had to get support involved. I feel I just can't trust the BB stack, the client being the weakest link by far, and backup I can't trust is worthless.


You are probably correct. But there is a reason I put Backblaze client on my mom's laptop, rather than a scheduled powershell task to run restic to a cloud storage. And a scheduled task to run the prune weekly.


Define "continuously". Does the data need to be mirrored immediately upon write?


> this chart is the confidence interval, which is the difference between the low and high AFR confidence levels calculated at 95%.

How to calculate low and high AFR ?


Backblaze is interesting but it's not very easy to use. Its interface is rather basic and it was difficult to select which drives to back up and which not to. It kept trying to back up directories on my computer that I specifically told it not to, and there was no way to efficiently update program's behaviouir from the the "what to backup list". It might be nice if you just want to backup your computer and all your drives but the moment you want only parts of your computer backed up, it's frustrating.


I run a bunch of WUH721414ALE6L4 and WUH721414ALE604 in numerous RAID10 volumes. Haven't had a failure yet.

L = without power disable, 0 = with


Kinda low sample sizes for 0 and highest AFRs


> ambient temperature within a data center often increases during the summer months

What? Isn't your Data Center supposed to be temperature controlled, where the A/C has a setpoint which it's keeping the entire environment to within a degree?

Being able to tell what season it is based on your HDD SMART temperature (armchair expert here) sounds bad.


Is moving the setpoint with the seasons, within tolerance of course, not a common energy savings method?


That makes more sense when you are housing people than servers, no?


Does it? Everybody wants to save energy. If the hardware can handle it, why not?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: