Hard Drive Stats for 2019

LeifCarrotson · on Feb 11, 2020

Interesting how the numbers carry over year-to-year in

https://www.backblaze.com/blog/wp-content/uploads/2020/02/Bl...

Some models are dwindling. Some are being tested. Others (like the Seagate and HGST 12 TB) are increasing. Only thing that's really perplexing is why they keep buying more and more of the high-failure-rate Seagate 12 TB drives. It must be more than 3% cheaper to buy (and service!) a Seagate with a 3% chance of failure than to buy an equivalent HGST with a 0.4% chance of failure. I guess when you have 120,000 drives, easy hot-swap enclosures, and software to handle it all that makes good sense! But as an individual consumer, even with a Backblaze backup, it's definitely worth my time to spend a bit more on a drive that's far more reliable than to save a few dollars on a Seagate.

hinkley · on Feb 11, 2020

If I make a hard drive, and sales are crappy, in part because BackBlaze told the world how shitty they are, I'm going to have to drop the prices to move product.

I suppose there's a movie plot in there where BackBlaze negs their favorite drive so they can buy them cheaper.

sixothree · on Feb 11, 2020

> Only thing that's really perplexing is why they keep buying more and more of the high-failure-rate Seagate 12 TB drives.

I am guessing they RMA the drives and get replacements.

universenz · on Feb 11, 2020

Your comment just sparked an interesting question in my mind: If a drive has failed, until now I always imagined the drive was just trashed. But now that you mention they are probably RMA'ing them, do you think that BackBlaze send the RMA drives through a magnetic tunnel of some sort before they ship the drives back to the manufacturer? Because otherwise, how do they ensure potentially unencrypted customer files are not accessed during the repair/refurbishment process?

bb611 · on Feb 12, 2020

I work at a large B2B SaaS that stores customer data, we pay extra for the option not to return failed drives that can't be wiped for RMA. We still get a replacement but the original is physically destroyed with a shredder.

anarazel · on Feb 12, 2020

I'd hope that their data is all encrypted at rest. Compared to the bandwidth of spinning disks, the cost of doing hardware assisted AES isn't big.

m4rtink · on Feb 12, 2020

Yeah, I would expect any data reaching the drives to be encrypted by Backblaze, with the key newer reaching the disk.

You could even have keys per disk and wipe them when a disk fails.

Either way, you should be fine to RMA the drives as for an external observer without the keys they just contain random noise.

ISL · on Feb 12, 2020

When I back things up with BackBlaze, they leave my computer encrypted, so they're encrypted at rest with them.

ahofmann · on Feb 12, 2020

They once wrote somewhere that they have contracts with Seagate that allow them to get the drives much cheaper if they buy certain quantities.

alanfranz · on Feb 11, 2020

At least in Europe, HGST is much more expensive than Seagate. Almost double the price, usually.

leokennis · on Feb 11, 2020

Very anecdotal evidence, but 3 of the 3 Seagate drives I ever used (all external 2,5” USB 3 HDD’s, in Seagate’s own enclosures) failed within 2 years, under very modest workloads (just used to store video files for my tv to play).

Meanwhile all WD’s have been rock solid.

eps · on Feb 11, 2020

FWIW, the consensus on /r/datahoarder seems to be that Seagates should be the absolute last choice for long-term storage.

antonyh · on Feb 12, 2020

Seagate do an 'archive' HDD, but with only a 3-year warranty I wouldn't expect it to work as a long term solution.

tfigment · on Feb 12, 2020

I have same experience with 5x 3 GB Seagates. None lasted 2 years. Replaced those with Toshiba's and HGST in my Synology and it's been 3+ years without bad sectors. Will never buy Seagate again.

antonyh · on Feb 12, 2020

I wonder if that's a characteristic of streaming a single file, consecutive blocks. That's be really strange behaviour though. Perhaps a thermal issue if the TV keeps it powered and spinning all the time? Certainly Xbox seems to keep the disk spinning - I had a WD Ext HDD attached and the light was always flickering even with the console off for whatever reason.

Personally I also find Seagate the loudest and 'clickiest' of all the drive brands. I can hear the mechanicals, making me think they will fail, so I trust them less than other brands.

alberth · on Feb 11, 2020

Why do people use Amazon S3 when Backblaze B2 is 1/4 the cost of S3 and also includes a CDN for free. You also get way faster access speeds with Backblaze vs Amazon since they tier their IO speeds.

https://www.backblaze.com/b2/cloud-storage.html

mceachen · on Feb 11, 2020

Unless you're a bootstrapped startup with just a couple people, paying the AWS bill is not something the engineer probably thinks about too much. Setting up a new billing account with another company is just enough friction to just use whatever AWS offers and call it a day.

Also, most employees aren't really incentivized to reduce or minimize infrastructure expenses.

sfilmeyer · on Feb 11, 2020

I think a big reason is that people are using the rest of the amazon ecosystem. If your costs aren't primarily storage, you might be willing to pay a premium to use something that integrates nicely with other services you're using. Here's an article[0] that does some other comparisons between providers and mentions things like upload speed and security features.

[0] https://www.cloudwards.net/azure-vs-amazon-s3-vs-google-vs-b...

cheeze · on Feb 11, 2020

Last I checked, Backblaze still stores most data in 1 location, no?

So, durability of data (which to be fair doesn't matter for most s3 use cases), and interop with literally everything else in AWS

Intelligent data tiering

Actual access control

Pre signed URLs

brianwski · on Feb 12, 2020

Disclaimer: I work at Backblaze.

> Last I checked, Backblaze still stores most data in 1 location, no?

Backblaze now has multiple regions! One in Europe (Netherlands) and one is called "US-West". Quietly the US-West is actually three separate data centers, but your data will only really land in 1 datacenter somewhere in US-West based on a few internal factors.

To be absolutely clear, if you only upload and store and pay for 1 copy of your Backblaze B2 data, it is living in one region. To get a copy in two locations you have to pay twice as much and take some actions. So if this kind of redundancy is important to you for mission critical reasons Backblaze B2 would only be half as expensive as one copy in Amazon S3, not 1/4 as expensive.

In the one copy in one region in Backblaze B2, any file is "sharded" across 20 different servers in 20 different racks in 20 different locations inside that datacenter. This helps insulate against failures like if one rack loses power (like if a power strip goes bad or a circuit breaker blows). But if a meteor hits that 1 datacenter and wipes out all of the equipment in a 1 mile blast radius, you won't be getting that data back unless you have a backup somewhere else.

chocolatkey · on Feb 11, 2020

I've combined cloudflare workers with backblaze to implement etags, signed URLs, etc. Backblaze is part of CF's bandwidth alliance so your bandwidth fee is zero. This makes for a very low monthly cost

SergeAx · on Feb 11, 2020

Can you elaborate further about this setup? Is there an article or a FAQ topic about it?

chocolatkey · on Feb 13, 2020

Hi, I haven't had time to write up about this, however, I have dumped the majority of the related code here for you and others who are interested in this solution: https://gist.github.com/chocolatkey/a7ef0364e357629e9875521d.... That should help you get started. It includes HMACSHA256 shared secret URL signatures based on IP, expiry, and optional path scope restriction, caching, ETAGs, sentry error reports, access to non-B2 data from a server w/ basic auth, and more... URLs look like this: https://example.com/delivery/UNIQUE_ID/p-001.jpg?token=16fb4... . My B2 bucket is public, however the requested path is also hmac'd with a secret known only to the CF worker to derive the path of the resources in the bucket. It is optimized for my use case of serving EPUB data. I do not guarantee it to be free of flaws, but it's worked well so far.

SergeAx · on Feb 13, 2020

Thanks a lot!

atYevP · on Feb 13, 2020

Yev from Backblaze here -> the Nodecraft folks did a great job with that project -> https://jross.me/free-personal-image-hosting-with-backblaze-...

SergeAx · on Feb 13, 2020

Hi, Yev! Thanks a lot.

I am fancying the idea to move our CDN from AWS stack to B2 + CF, thanks to Bandwidth Alliance. There's at least one thing stopping me: for simple schema of hosting static content out of bucket we should deploy Workers just for URL rewriting. Guys from CF recommending that way and not URL rewrites by simple rules[1]. But it puts us in a weak position of raising cost twice: for increased edge trafic AND for increased number of requests.

Can anything be done on BackBlaze side to address the problem, like custom domains for buckets? Like https://f001.backblazeb2.com/file/bucket-name/file.jpg => https://bucket-name.f001.backblazeb2.com/file.jpg ?

[1] https://community.cloudflare.com/t/page-rule-setting-request...

shiftpgdn · on Feb 12, 2020

S3 is single zone unless you specifically request multi zone durability (most don't.)

kevinchen · on Feb 12, 2020

I don’t think this is accurate. Within a single region, S3 stores in multiple availability zones by default.

https://aws.amazon.com/s3/faqs/

shiftpgdn · on Feb 13, 2020

You're completely right. I guess I was working on old info.

koolba · on Feb 11, 2020

Paying for outbound bandwidth is a big one.

rvnx · on Feb 11, 2020

In the same way, why people use Backblaze when they can use Wasabi and not pay the bandwidth ? https://wasabi.com/cloud-storage-pricing/

tux3 · on Feb 11, 2020

I looked at Wasabi some time ago, but their pricing is a LOT less simple than their headline says it is.

The major caveats are hidden away in their pricing FAQ: they charge a 1TB minimum if you use less, and there's a 90 days minimum retention period, meaning if you update a file a few times you will pay for the full 90 days of every intermediate version. Additionally, they reserve the right to make you pay for egress if it looks like you transfer more than you have stored.

So all in all, Wasabi might be the right fit for you if you store >1TB of files that are infrequently updated and get less than 1 download/month on average. If you fit that use case, I think their free egress pricing is awesome, but it's definitely not for everyone.

jjeaff · on Feb 11, 2020

Wasabi does not allow you to use unlimited bandwidth. Your egress is supposed to stay close to your total ingress. So if you are uploading assets that will be access more than a few times in the first month, I think you will be out of spec for wasabi.

Youden · on Feb 11, 2020

If I understand right, if you put CloudFlare in front of Backblaze, you get free bandwidth thanks to Bandwidth Alliance: https://www.cloudflare.com/bandwidth-alliance/backblaze/

pzmarzly · on Feb 12, 2020

Not unlimited, and not for all use cases. Check out Cloudflare ToS.

quellhorst · on Feb 11, 2020

Wasabi charges a minimum of 3 months of storage on anything uploaded.

kissgyorgy · on Feb 11, 2020

Because of vendor lock in. When you move a lot of data between S3 and EC2 it costs nothing (or very cheap). When you move data outside of AWS, there is extra cost, so it might not even be cheaper overall.

bob1029 · on Feb 12, 2020

I am strongly considering B2 as an option for a dropbox-style system. Something where I run 8-16TB of hot tier on my local LAN, with B2 serving as the slower mass storage tier behind it. It seems that the average B2 access latencies would be ~100-200ms, which is very tolerable for a cache miss on such a massive tier of storage. With this amount of space available you could have pre-fetch rules that do things like pull down entire directories as files within them are accessed.

mritun · on Feb 11, 2020

There are usually many reason:

1. Scale - S3 is big - really really big! You don’t need to care if you store one KB or several petabytes.

2. Tiers: the default on S3 is several way replicated storage with 11 9s of durability with high availability. However you can select from cheaper options with the trade off you are happy with.

3. Cost: S3 has reduced prices several times, you can be reasonably sure your costs will go down over time on per unit basis.

overcast · on Feb 11, 2020

Here's one reason I would need something like S3. Sure, I hack all that together into something barely functional myself, but it's not worth it. Pretty handy. https://aws.amazon.com/solutions/serverless-image-handler/

bwilliams18 · on Feb 11, 2020

I too use the serverless image handler but it's not perfect. The documentation is really crappy and over the summer they transitioned the whole system from thumbor to sharp and didn't provide great backwards compatibility.

mythz · on Feb 11, 2020

Where's the info about the free CDN?

nielsole · on Feb 11, 2020

Check Cloudflare bandwidth alliance

mythz · on Feb 11, 2020

This bandwidth alliance? https://www.cloudflare.com/bandwidth-alliance/

Cloudfare is a featured integration that only mentions that transfer fees are free not that CDN hosting is free: https://www.backblaze.com/b2/solutions/content-delivery.html

Cloudfare does have a free CDN tier "For individuals with a personal website and anyone who wants to explore Cloudflare." but it's not the same as B2 including a CDN for free, even Azure is apart of the bandwidth alliance.

big_chungus · on Feb 11, 2020

Right, but it means you can basically (ab?)use Cloudflare to get free egress from B2 storage. Cloudflare won't get too mad until you start hitting terabytes per month; even the free tier doesn't have restrictions.

You can also turn on an extremely aggressive caching policy with a page rule that will keep everything under a given subdomain for a month. This makes the "free CDN" part easy, though again, people who do this run the risk of getting their accounts terminated.

partiallypro · on Feb 11, 2020

You just get a discount on data egress, it's not free.

penagwin · on Feb 11, 2020

It depends on the partner. For Backblaze specifically it is indeed free.

leokennis · on Feb 11, 2020

I use B2 as “cold storage” of large-ish files. It’s incredible how low the monthly bills are.

partiallypro · on Feb 11, 2020

If you're with another cloud provider, you still have to pay egress fees to Backblaze. That cancels out the cost savings.

bithavoc · on Feb 11, 2020

B2 does not implement the S3 API. Also the B2 API is much slower than S3.

brianwski · on Feb 12, 2020

Disclaimer: I work for Backblaze so I'm biased. :-)

> the B2 API is much slower than S3.

This is "generally true" for 1 upload thread. We aren't even sure what Amazon is doing differently, but they can be a little faster in general for 1 thread (some people only see 20% faster, some see as high as 50% faster, might be latency to the datacenter and where you are located).

As long as you use multiple threads, I make the radical claim that B2 can be faster than Amazon S3. The B2 API is slightly better in that we don't go through any load balancers like S3 does, so there is no choke point. What this means is that in B2 40 threads are actually uploading to 40 separate servers in 40 separate "vaults" and none of the threads could possibly know the other threads are uploading and it does not "choke" through a load balancer. This was all designed originally so that 1 million individual laptops could upload backups all at the same time with no issues and no load balancers. And it works great every day.

Practically speaking, for most people in most applications, this means both Amazon S3 and Backblaze B2 are essentially free of any limitations. If you aren't using enough of your bandwidth, spawn a few more threads (on either platform) and soak your upload capacity. But in full disclosure, if your application is only single threaded, yes, B2 tends to be 20% slower for that 1 thread.

3fe9a03ccd14ca5 · on Feb 11, 2020

IAM access control, ease of use, reliability, speed.

meritt · on Feb 11, 2020

Because the rest of my infrastructure currently runs on AWS and aws egress charges are far more expensive than the b2 savings.

rsync · on Feb 11, 2020

Genuinely curious ... do you not assign any value to having a backup outside of Amazon ?

AWS can certainly provide geographical diversity, but on the organizational abstraction layer, all eggs are in one basket, yes ?

Is having organizational redundancy something you assign zero value to, or something whose value conflicts with the egress costs so as to make it a difficult decision ?

Again, genuinely very interested ...

konschubert · on Feb 11, 2020

Not OP, but of all the things that could kill the startup I work at, AWS shutting down is about on spot 63864664 on the list.

gnulinux · on Feb 11, 2020

I mean we have like 2 millions of line of python code written for lambda, S3, SQS, SNS, Kinesis, Redshift etc using boto3. So if AWS dies, it's not like data backup will save my startup. We're dead.

ISL · on Feb 12, 2020

That sounds troublesome, no?

outworlder · on Feb 12, 2020

Not the parent, but they mentioned that they are a startup. AWS "dying" has killed zero startups so far. Time to market has killed many more, same for "not-invented-here" syndrome, and prematurely building for the future.

gnulinux · on Feb 12, 2020

Maybe? I'm not an influential enough engineer to change something that fundamental. Seniors say it's troubling but they're already married to AWS so it's very expensive to have a plan B. I don't think AWS dying is high on the list of why the startup can die. There are bigger dangers and they can only be solved by writing code that works.

ISL · on Feb 13, 2020

I see the bigger danger as not AWS falling over, but AWS deciding to charge more money once you're locked-in.

kevstev · on Feb 11, 2020

We attempted to be cloud agnostic (using terraform instead of CloudFormation for example) and then later multi-cloud. The amount of complexity and cost around it was just too much.

If AWS goes down, more or less a good portion of the internet goes dark. It's an acceptable risk at this point unless you are truly massive and entirely self contained- if you are using any 3rd party services, IE for auth, payment, whatever- they may be using AWS as well and you are still exposed.

meritt · on Feb 11, 2020

We backup data that's not on S3 outside of AWS (code, operational databases), but most of our S3 data is effectively stuck due to the insane export prices. It's not the end of the world if we were to lose everything in S3 anyway.

To anyone reading this: Don't store lots of small files on S3. It's a terrible idea.

godzillabrennus · on Feb 12, 2020

Collocation providers now have options to put your physical environment on network with your amazon account to avoid egress fees.

UI_at_80x24 · on Feb 11, 2020

I live for these reports. Always insightful and professional. Thank-you SO MUCH for publishing this data.

atYevP · on Feb 11, 2020

Yev here -> You're welcome! The conversation's always fun :D

donmcronald · on Feb 11, 2020

I barely had time to skim it, but I'm not sure I like how the ST12000NM0008 shows up in the table. I find it really hard to reason about what the real failure rate could end up being on those drives. For example, you've got about 45 days average on each drive, so the failure rate is multiplied by roughly 8 to extrapolate the annualized failure rate. Doesn't that over state the estimated rate of failure since drives will tend to fail more often at the start of their life?

I only guesstimated out of the table and didn't have time to look at the actual data, so it's possible I misread something.

magnat · on Feb 11, 2020

Does anyone remember what is their definition of "drive failure"? Is it SMART "failure imminent" report, single uncorrectable read error or complete data loss for a whole disk? I recall reading about it in one of their previous report, but can't find it again.

EDIT: nevermind, found it.

"Backblaze counts a drive as failed when it is removed from a Storage Pod and replaced because it has 1) totally stopped working, or 2) because it has shown evidence of failing soon.

A drive is considered to have stopped working when the drive appears physically dead (e.g. won’t power up), doesn’t respond to console commands or the RAID system tells us that the drive can’t be read or written."

https://www.backblaze.com/blog/hard-drive-smart-stats/

anonsivalley652 · on Feb 12, 2020

Ah yes, the reliable BackBlaze folks. That they've out-Googled Google in a niche using mostly commodity infrastructure and kept their business alive for so long is a testament to their ingenuity (I wonder how their operating costs compare with AWS Glacier which has a theoretical advantage of unpowered disks.). And the releasing of this proprietary operational business data is a testament to their coolness factor.

It's a timely article as I'm looking at HC530's (WUH721414ALE6L4 / WUH721414ALN6L4 (wiredzone carries it)) for a home FreeNAS box:

- any relatively-modern enterprise 4U 3.5" storage box with Xeon 4 cores or so

- quieter, high-volume fan mod

- RAM: 64-128 GiB, beyond that isn't useful unless deduping

- NIC: X710-T4L 4x 10GbE copper NIC

- ZIL: mirrored pair of high-endurance, write-intensive, reliable SSD like Optane 900p/905p 280-480GB

- L2ARC: striped pair of read-intensive/larger SSDs like the Gigabyte Aorus Gen4 1 TB

This will fit nicely as my home NAS for a water-cooled dual EPYC virtualized server/workstation build underway. I managed to get a single water block with (3) G1/4 connections that will cool both CPUs and the VRM chokes/converters.

If anyone has better suggestions, please chime in.

tbrock · on Feb 12, 2020

Why does someone need something like this? I ran a home symbology NAS at one point but it wasn’t worth the trouble. Let others run and maintain those hot, loud, power hungry disks.

anonsivalley652 · on Feb 12, 2020

Then you already made the mistakes of:

- conflating trouble for you with trouble for me, which it clearly isn't

- not owning your own data

- paying more to store it

- paying to access it

- ability to keep things that aren't worth storing on paid clouds but aren't all that much when kept on cheap drives

Furthermore, there are additional network costs such as AWS network charges AND home ISP data limits.

And there are other uses, such as:

- backing-up VMs

- backing-up computers

- caching package and source code repos

- backing-up CCTV footage

- and whatever else comes along

zaarn · on Feb 12, 2020

Speed. Streaming files (movies, tv shows, youtube channels, linux ISO's, large file collections) is a lot faster; I can reliably hit 100MB/s on my home connection, my DSL caps out at 20MB/s if I do nothing else.

kevinchen · on Feb 12, 2020

I’m a Backblaze customer of many years & respect their team a lot. But seriously, “out-Googling Google” because they have cheaper storage is a meme that needs to die.

GCP and AWS both store full copies of your data in multiple locations by default (Availability Zones in AWS-speak). So it’s not an apples to apples comparison. The reduced redundancy is priced in, for people who can tolerate it.

anonsivalley652 · on Feb 12, 2020

Woah, chillax the dramatic rhetoric, your majesty. ;-P

The original scrappy Google was founded on commodity hardware held together by LEGO. The point was to not do as enterprise with redundant everything, which was wasteful for web-serving use-cases that were solved with better high-availability in software. These days, if you're a giant company like FAANG, you can easily afford to go to Quanta and say: give me 10k racks worth of compute nodes to this specification. If you're starting out and broke, you gotta use what's on the shelf, cobble together a custom solution optimized for the purpose and/or kit out a test lab with a mis-mash of used servers from eBay.

newscracker · on Feb 11, 2020

Slightly off topic: is anyone using B2 (which seems cheaper if you have more than one computer for a certain amount of data) for personal data backups with strong client side encryption across multiple platforms (Linux, Mac, Windows)? If so, how do you handle it?

S3raph · on Feb 11, 2020

I sync all my device files to a local Freenas server which runs duplicacy in a jail and sync's it every night to backblaze B2. I looked at duplicity, restic, attic, borg and in the end settled for duplicacy. Pay attention to the duplicacy license, for somebody it could be a problem.

w33ble · on Feb 11, 2020

I do this, though not from Windows, just Mac and Linux. I use restic, which has B2 support smd handles all the encryption. it also does diffing for backups. There's a Windows build, so I assume it would work for you there as well.

You can view and download builds at https://github.com/restic/restic/releases/

I don't automate this though, I just use it for occasional backups. Not sure what the automation story around restic is.

seanlane · on Feb 11, 2020

Similar to the other two sibling comments, been using restic to sync to B2 over the past 6-7 months. Stored amount has been 450-475 GB, and total costs tend to be about $2.50-$2.75 per month.

hashhar · on Feb 11, 2020

Yes. I use restic same as the sibling comment.

Have >8TB of data from multiple machines with a lot of deduplication (source is somewhere around 10 to 12TB).

orhanhh · on Feb 11, 2020

I use Arq on two macs and it works very well with B2.

zippergz · on Feb 12, 2020

I use and like Arq also, but the OP asked for something that covers Linux, which I believe Arq does not.

ksec · on Feb 11, 2020

Looking at those Data,

It seems they will soon reach 1000 PB / 1EB.

The top 5 Annualised hard drive failure rate are all from Seagate. All Drive from Hitachi and Toshiba has AFR lowered than 1%.

So basically dont buy Seagate.

arminiusreturns · on Feb 11, 2020

It's pretty much been this way for a few years, with only a few model lines of Seagate being the outlier. As always, thanks to the BackBlaze team for publishing these numbers.

LeifCarrotson · on Feb 11, 2020

My math says they're already over 1000 PB/1 EB:

1,089,318 = 4 * 2852 + 4 * 12746 + 8 * 1000 + 12 * 1560 + 12 * 10859 + 4 * 19211 + 6 * 886 + 8 * 9809 + 8 * 14447 + 10 * 1200 + 12 * 37004 + 12 * 7215 + 4 * 99 + 14 * 3619

Don't think I made a typo there, but please check my work. Even counting as 1024 TB = 1 PB and 1024 PB = 1 EB, that leaves 1,048,576 TB = 1 EB and they're over that threshold.

The February 5, 2018 "500 Petabytes and Counting" blog post should soon be eclipsed by a 1 EB post - though it appears they're counting actual data stored, not capacity. Nonetheless, with some redundancy, extra capacity, and overhead, we'll likely see that number soon.

AdamGibbins · on Feb 11, 2020

> So basically dont buy Seagate.

Or do, because they're cheaper than the competition and modern systems can handle failures.

antonyh · on Feb 12, 2020

True with enough redundancy it's fine, and if they have special terms with SG such as free replacements and heavy discounts then it's little wonder they use so many.

Myself though, for SoHo use, I'm willing to pay more for less stress because I don't have the sheer volume of devices, and the time to replace is time spent doing something useful instead of shuffling HDDs and rebuilding RAID arrays. A 5% saving on a handful of drives is not worth it, but a 40% saving on thousands makes them competitive.

ksec · on Feb 12, 2020

Every time I look at HDD price it seems Seagate is always a little more expensive. But the difference is less than 5% to the point one may argue they are all priced similarly.

I dont want to save a few dollars for potentially 4x the chance of failure and hassle.

And even if we ignore the outlier to 2%+ from two models, Seagate is still on average 2-3x more likely to fail.

mherrmann · on Feb 11, 2020

Does anyone here have experiences with BackBlaze's B2 service for hosting files? I'm considering switching to it from S3 because it is much cheaper. (I need to transfer 2-3TB / month, usually in 2-3 bursts of worldwide distribution).

atYevP · on Feb 11, 2020

Yev from Backblaze here -> We're definitely more affordable and our integrations (https://www.backblaze.com/b2/integrations.html) make it easy to get your data to us. We even have partnerships with companies who can help transfer data from S3 into Backblaze B2!

syedkarim · on Feb 11, 2020

How is Backblaze able to be so much cheaper than the other, larger competitors? I assume Amazon/Google/Microsoft has squeezed every last cent from suppliers and also has highly cost-optimized staffing costs.

atYevP · on Feb 11, 2020

Yev here -> great question! We are a bootstrapped company and we focus on inexpensive storage (https://www.backblaze.com/blog/vault-cloud-storage-architect...). Because we've built a robust system that doesn't use a ton of expensive components we can provide hot cloud storage (B2 Cloud Storage) and computer backup at an affordable rate while still making decent margins. To learn more about our business and decision making, we have a pretty cool series of entrepreneurship blog posts that might be interesting to some: https://www.backblaze.com/blog/category/entrepreneurship/

ChrisSD · on Feb 11, 2020

Reading about b2 pricing it says, you get "10GB of free storage, unlimited free uploads, and 1GB of downloads each day". Doesn't that amount to essentially free backups for (reasonable) personal use? Or am I missing something?

bcrosby95 · on Feb 11, 2020

You aren't missing anything. I use B2 along with Restic to backup my Linux machines since their standard backup solution doesn't support Linux. It costs me around $1/month to backup my primary desktop and two laptops.

They had a blog post about doing this a while back, so they are definitely aware of the use case: https://www.backblaze.com/blog/backing-linux-backblaze-b2-du...

I still use their standard backup service for my family's Windows machines since its more "batteries included".

jl6 · on Feb 11, 2020

I think even casual users tend to have more than 10GB of data these days.

ChrisSD · on Feb 11, 2020

I don't. Although I can easily fill up a terabyte drive, little of that is my own personal files that I need to keep if the drive blows up. Most of my stuff is source code, documents/notes and some photos (with photos being the only thing that takes up significant space). Almost everything else I can re-download or rebuild from the original source as and when I need it.

yjftsjthsd-h · on Feb 11, 2020

In total, sure, but at least for myself the really important stuff would fit in 10MB and I think I could fit all of the medium importance stuff in 1GB. The remaining terabytes are nice-to-have but I wouldn't be too upset if I lost it.

johnl1479 · on Feb 11, 2020

I'm over the 10GB free limit. It costs me about $1.50 a month to backup "irreplacable" data from my NAS.

Moeancurly · on Feb 11, 2020

Is there any consensus among Backblaze employees (or even just your personal opinion if applicable) for what brand/series of drives to use for home NAS devices?

I ask because the online favorite appears to be WD Reds, which you have phased out since 2018.

atYevP · on Feb 11, 2020

Yev here - it's interesting, we don't really chat about that often - what I would do is get the least expensive drive that has the most capacity and make sure the NAS is backed up somewhere in case of failure or theft. Personally I think the Toshiba drives are pretty good, but Seagates are affordable and do a good job. Plus there's always HGST which are rock-solid, but tend to run a bit more expensive.

mherrmann · on Feb 11, 2020

Thank you Yev. I'm wondering about the bandwidth, especially internationally. Do you have any numbers on that? Say split by Europe/US/Other.

atYevP · on Feb 11, 2020

We do have a datacenter in the EU (Amsterdam) - and if you set up your account there you'll be able to transfer data to it. That's a popular destination for folks living closely to it, but even before that one went "live" we had lots of people using the our West Coast Data Centers without much issue. If you have a ton of data you can take a look at the Fireball (https://www.backblaze.com/b2/solutions/datatransfer/fireball...) which allows you to rapidly ingest data to us.

Pahr3yah · on Feb 11, 2020

What are you using as TCP congestion controller? BBR should provide better utilization on long pipes (e.g. transoceanic transfers if stuff isn't geo-replicated). Totally anecdotal, but it helped me FTPing data from the US to europe.

atYevP · on Feb 11, 2020

Yev here -> This question's beyond me, lemme see if I can get a dev on the line :D

*Edit - sounds like BBR is used in some of the environment!

mherrmann · on Feb 11, 2020

(I need to quickly ship a 50mb file to 50,000 clients worldwide.)

jermaustin1 · on Feb 11, 2020

Hey Michael, I host RAW photos I want to share inside B2 (48mb each), and then put CloudFlare in front of it using their tutorial [1]. It gets edge caching, and achieves 200-500mbps. Its great, and I have absolutely no complaints.

1: https://help.backblaze.com/hc/en-us/articles/217666928-Using...

jermaustin1 · on Feb 11, 2020

@mherrmann - Only about 10-20GB, so not the TB levels you are dealing with, but backblaze isn't actually doing the transfer, it is CloudFlare.

@toomuchtodo - Yes, and on top of that, both B2 and CloudFlare are completely free since I'm under the 10gb storage limit (for now), and i'm a personal user of CloudFlare(for now).

toomuchtodo · on Feb 11, 2020

Is your outbound data free because of the bandwidth alliance deal Cloudflare has with Backblaze?

mherrmann · on Feb 11, 2020

Thank you; and how much data are you transferring each month?

tjonsson · on Feb 11, 2020

I used them for both backup (B2 storage with restic on linux servers) and also for serving static content for my homepage, together with Cloudflare ( https://www.backblaze.com/blog/backblaze-and-cloudflare-part... ) Works like a charm

the_svd_doctor · on Feb 11, 2020

I use it for personal backups with rclone. Works great.

donatj · on Feb 11, 2020

I have made all my hard drive purchasing decisions based almost entirely on these reports for the last couple years and have not been disappointed with the results.

metalliqaz · on Feb 11, 2020

I use Backblaze's massive infrastructure to store pictures of my keyboard.

atYevP · on Feb 11, 2020

Is it a cool keyboard?

mherrmann · on Feb 12, 2020

Are the keys very large?

exabrial · on Feb 11, 2020

I still can't believe BackBlaze gives this data away for free. Seems like something they should be selling to other cloud providers

Mirioron · on Feb 11, 2020

Maybe they consider this report to be an ad for their services? The name recognition this report gives them is probably quite valuable.

icelancer · on Feb 11, 2020

It also pressures HDD companies to make better products and appear higher on these lists, which is good for Backblaze.

nolok · on Feb 12, 2020

To quote them : Transparency breeds trust. We’re in the business of asking customers to trust us with their data. It seems reasonable to demonstrate why we’re worthy of your trust.

kevinchen · on Feb 12, 2020

I’m sure Amazon, Google, Facebook, etc collect their own stats on drive failure. It would be almost negligent if they were just guessing in the dark every time they buy drives.

Main difference is probably Backblaze is small enough to publish these stats without hurting their supplier relationships. (pure speculation)

robertoandred · on Feb 11, 2020

I love Backblaze, but their log package in my Library folder has grown to something like 10 gigs. Wish there was a way around that.

dleslie · on Feb 11, 2020

Signed up for this a week ago. 45 days remaining to upload.

Hurray for Canadian internet.

UI_at_80x24 · on Feb 11, 2020

This may not apply to you, but atleast 2 of the UnderDogs in the Canadian ISP world (MNSi, & TekSavvy) have been rolling out Gigabit fiber.

I've got a 1Gb fiber pipe for 1/10th the cost that Cogeco was charging.

mrguyorama · on Feb 11, 2020

I have ~10mbps upload here in the US, and my backup was looking to take about a month for about 3ish TB of data. One thing that helped is that with default settings of only 1 backup thread, the Windows client was unable to saturate my upload bandwidth. Upping it to 4-6 threads allowed it to keep enough data moving to actually saturate my upload bandwidth and brought my backup down to like a week.

sillyquiet · on Feb 11, 2020

No really related remarks about this handy study, but anybody else still in real awe about how spoiled we are with regards to the sizes and speeds of HDs nowadays? I mean the smallest capacity drive on their chart is 4 Terabytes.

zaat · on Feb 11, 2020

Not feeling spoiled at all, not at all. Especially not with 2 to 3 percent of failure rate. The failure rate I experienced in my workstation makes me worry about not having raid 1 or 10. HDs for 9 TB in raid 10 are not that cheap.

But the bigger issue is that the warranty terms for HDs nowadays is down to 2 or 3 years, so this investment is short living. It also tell you something about the manufacturers reliability estimation of their products.

AnIdiotOnTheNet · on Feb 11, 2020

Can't say I agree with that sentiment. The fact that I can quite reasonably have a 30TB usable RAID5 NAS array makes me feel pretty spoiled. Then again, I'm old enough that my first HDD was 10MB.

zaat · on Feb 11, 2020

Mine was 10MB as well, with a dedicated controller. Quantum if I'm not mistaken. And it lasted much much longer than the averages I get from 4TB disks. I believe I managed to take files out of it in 2000, about 13 years after it was installed.

Edit: nope, probably was a ST506 or 412.

chousuke · on Feb 11, 2020

I'd be wary of making a RAID5 array with drives that big; you could easily lose another drive from the I/O caused by a rebuild; though if you have backups (you should) then it's probably an acceptable risk for non-critical data.

AnIdiotOnTheNet · on Feb 11, 2020

I'd agree with that. Even 2-disk redundancy these days is a bit dangerous when you're talking about 14TB drives and 100+TB arrays. As is often stated: RAID is not backup.

newscracker · on Feb 11, 2020

Since HDDs have, for the most part, been relegated to being external drives on laptops, I'm still looking forward to SSDs becoming way cheaper and reaching current HDD prices per GB. Internal storage on laptops has reduced or stayed the same while our datasets have grown exponentially over the years (with photos and videos). Since SSDs also perform much better when there's always a good amount of free space (for wear leveling and maintenance), it's all the more painful to live with lower capacity SSDs on laptops.

klodolph · on Feb 11, 2020

Sizes, yes, speeds, no. 600 MB/s of data transferred, and only for linear accesses.

Siecje · on Feb 11, 2020

I have about 10 TB of video files. I use BackBlaze for Windows but I would like the files to be available on other computers and my phone in my local network.

What can I use to do this and still keep offsite backups?

mrguyorama · on Feb 11, 2020

I think their more premium plans offer sharing

sandoooo · on Feb 12, 2020

Dropbox?

ronnier · on Feb 11, 2020

I have two ST12000VN0007 (VN) Seagate drives. The report shows the ST12000NM0007 (NM) has a 3.32% failure rate. I wonder how closely related the VN and NM models are.

war1025 · on Feb 11, 2020

If you look, that drive model is also the most highly used by far. I think it's just a matter of the larger sample size / use time.

Mistletoe · on Feb 11, 2020

Surely it doesn’t matter when you have 10,000s of drives? Aren’t you already at a large enough sample size? If it isn’t, what is the point of them publishing this every year? I don’t know the math of the matter though.

generalpass · on Feb 11, 2020

> Surely it doesn’t matter when you have 10,000s of drives? Aren’t you already at a large enough sample size? If it isn’t, what is the point of them publishing this every year? I don’t know the math of the matter though.

I think drive age matters? I'm not clear if they cycle drives out at a certain age or just run them until they fail.

Also, if a drive is low enough in cost, then the additional cost of replacing an incremental 1% may be lower than the cost of acquisition of a more reliable drive.

war1025 · on Feb 11, 2020

Yea I don't know. I'm not big on statistics either. I just noticed that the drives that did the worst were the ones that had the most usage overall.

labawi · on Feb 11, 2020

That would probably be price ~ failure rate correlation.

Kubuxu · on Feb 11, 2020

Looks like the Segate 0007 are 1y old on average, where the 0008 are 44 days old on average.

The 12TB HGST are 220 days old on average. The Segate 12TB failure rates seem high, quite unfortunate as I own 6 of them.

S3raph · on Feb 11, 2020

I'm a very happy customer, but please do something about your mobile app (android) it's really horrible.

leokennis · on Feb 11, 2020

I agree the mobile app (on iOS in my case) is at best an afterthought, and most likely not even a high ranking afterthought.

However, out of curiosity...what would you imagine a better Backblaze mobile app would do?

S3raph · on Feb 11, 2020

For sure it is/should not be high priority, but releasing such an app in 2020 for sure does not reflect the great skills of the backblaze team. At least show me some basic stats, account settings and invoices. You can only download files from your buckets and that's it.. really?

rosstex · on Feb 11, 2020

Semi-bummed my school partnered with another backup company, cause I'd love to support BackBlaze.

atYevP · on Feb 11, 2020

Yev here -> Thanks! Out of curiosity, does your school provide backup to all the students?

rosstex · on Feb 11, 2020

To all grad students and faculty:

https://csguide.cs.princeton.edu/hardware/backup

voldacar · on Feb 12, 2020

Seagate always seems to have much higher faliure rates compared to HGST/WD/Toshiba etc

Does anyone here know the exact reason why? I assume there are enough people on this site who have worked for them or a competitor :)

rasz · on Feb 12, 2020

30% profit margin, why change anything?

throwaway17_17 · on Feb 11, 2020

Does anyone have any opinions and experience using backblaze as a personal only cloud storage and offsite backup for smaller amounts of data (under 30 TB)

dwohnitmok · on Feb 11, 2020

I use Backblaze's B2 service for both backup (via restic) and archival storage (via git-annex). I only maintain a distinction between the two in case I ever want to move to another service, and also because git-annex and restic have different strengths that make them more or less suitable for unchanging archives and often changing backups respectively. Between the two I have about 1 TB stored with them.

I have yet needed to do a full restore, but I do partial restores from time to time to double-check my backup procedures and every time it's done what I wanted. My monthly costs are usually a bit under $5.

Note I essentially never use B2's API directly, and only use it as a backend through wrappers others have written, so I have no real experience with how good its API is. One of the few times I did try the API, I remember at one point I think I was getting Java exceptions back in the error messages, which was mildly concerning from a hygiene perspective and made for rather terrible error messages, but no sensitive data was being emitted. I also think that's been fixed.

The bottom line is that B2 has worked fine for me and at a good price point.

newscracker · on Feb 11, 2020

> for smaller amounts of data (under 30 TB)

Did you mean to say 30 GB or 30 TB? Calling 30 TB as "smaller amount" seems weird to me in 2020, especially for personal data. Perhaps it would be the norm in a couple of decades. :)

FWIW, I have way under 1 TB of personal data to backup to different locations, and I consider that to be relatively large.

throwaway17_17 · on Feb 11, 2020

I did mean 30 TB, I have approximately 12 TB of data currently between all of my storage for video, audio, books, and games. However, I have been avoiding doing a lot of conversions to digital media from my physical collections because I'm just unsure of running a full blown archival server at home. I would estimate if I converted my entire video library to 4k it would put me somewhere over 10 additional TB. My comic books/manga and graphic novels, upgraded to archival resolution would probably run over 10 TB as well. Then there is the soon to be required ripping of PS2/PS3/WIIU roms when those hardware units become less reliable for actual playing. So I think that 30 TB of storage would do for the time being for me, but I think I will eventually need more than that.

TL;DR I am a digital horder, so I've convinced myself I do in fact need 30+ TB of storage.

tshannon · on Feb 11, 2020

Yeah I use B2 with rclone (https://rclone.org/) and it works great.

clSTophEjUdRanu · on Feb 11, 2020

+1 I have the same question and would like to read replies.

generalpass · on Feb 11, 2020

I have wondered about system downtime or time operating in a degraded state.

My understanding is other than mirrored, RAID configurations may take a long time to rebuild on the larger drives and this is a contributing factor to why the highest sales volume of drives has been 'stuck' at 4TB (thus the lower $/GB price).

linsomniac · on Feb 11, 2020

They don't use traditional RAID setups there. My understanding is they use a proprietary data encoding and distribution, which is more accepting of individual drive failures and reduces rebuild times. I believe I've heard they use something more like erasure coding rather than RAID-5.

ddorian43 · on Feb 11, 2020

https://www.backblaze.com/blog/reed-solomon/

There are many open source libraries.

jmnicolas · on Feb 11, 2020

Any particular reason they don't use Western Digital drives ?

volkl48 · on Feb 11, 2020

I will point out that HGST is owned by Western Digital and all their products are being rebranded to WD.

forgot-my-pw · on Feb 12, 2020

Other than HGST is owned by Western Digital, they also said:

There were no Western Digital branded drives in the data center in 2019, but as WDC rebrands the newer large-capacity HGST drives, we’ll adjust our numbers accordingly.

gesman · on Feb 11, 2020

So what does these mean:

smart_177_raw

smart_177_normalized

smart_233_raw

smart_235_normalized

???

thenewwazoo · on Feb 11, 2020

They're S.M.A.R.T. attributes: http://www.cropel.com/library/smart-attribute-list.aspx

gesman · on Feb 11, 2020

Thank you!