Some models are dwindling. Some are being tested. Others (like the Seagate and HGST 12 TB) are increasing. Only thing that's really perplexing is why they keep buying more and more of the high-failure-rate Seagate 12 TB drives. It must be more than 3% cheaper to buy (and service!) a Seagate with a 3% chance of failure than to buy an equivalent HGST with a 0.4% chance of failure. I guess when you have 120,000 drives, easy hot-swap enclosures, and software to handle it all that makes good sense! But as an individual consumer, even with a Backblaze backup, it's definitely worth my time to spend a bit more on a drive that's far more reliable than to save a few dollars on a Seagate.
If I make a hard drive, and sales are crappy, in part because BackBlaze told the world how shitty they are, I'm going to have to drop the prices to move product.
I suppose there's a movie plot in there where BackBlaze negs their favorite drive so they can buy them cheaper.
Your comment just sparked an interesting question in my mind:
If a drive has failed, until now I always imagined the drive was just trashed. But now that you mention they are probably RMA'ing them, do you think that BackBlaze send the RMA drives through a magnetic tunnel of some sort before they ship the drives back to the manufacturer? Because otherwise, how do they ensure potentially unencrypted customer files are not accessed during the repair/refurbishment process?
I work at a large B2B SaaS that stores customer data, we pay extra for the option not to return failed drives that can't be wiped for RMA. We still get a replacement but the original is physically destroyed with a shredder.
Very anecdotal evidence, but 3 of the 3 Seagate drives I ever used (all external 2,5” USB 3 HDD’s, in Seagate’s own enclosures) failed within 2 years, under very modest workloads (just used to store video files for my tv to play).
I have same experience with 5x 3 GB Seagates. None lasted 2 years. Replaced those with Toshiba's and HGST in my Synology and it's been 3+ years without bad sectors. Will never buy Seagate again.
I wonder if that's a characteristic of streaming a single file, consecutive blocks. That's be really strange behaviour though. Perhaps a thermal issue if the TV keeps it powered and spinning all the time? Certainly Xbox seems to keep the disk spinning - I had a WD Ext HDD attached and the light was always flickering even with the console off for whatever reason.
Personally I also find Seagate the loudest and 'clickiest' of all the drive brands. I can hear the mechanicals, making me think they will fail, so I trust them less than other brands.
Why do people use Amazon S3 when Backblaze B2 is 1/4 the cost of S3 and also includes a CDN for free. You also get way faster access speeds with Backblaze vs Amazon since they tier their IO speeds.
Unless you're a bootstrapped startup with just a couple people, paying the AWS bill is not something the engineer probably thinks about too much. Setting up a new billing account with another company is just enough friction to just use whatever AWS offers and call it a day.
Also, most employees aren't really incentivized to reduce or minimize infrastructure expenses.
I think a big reason is that people are using the rest of the amazon ecosystem. If your costs aren't primarily storage, you might be willing to pay a premium to use something that integrates nicely with other services you're using. Here's an article[0] that does some other comparisons between providers and mentions things like upload speed and security features.
> Last I checked, Backblaze still stores most data in 1 location, no?
Backblaze now has multiple regions! One in Europe (Netherlands) and one is called "US-West". Quietly the US-West is actually three separate data centers, but your data will only really land in 1 datacenter somewhere in US-West based on a few internal factors.
To be absolutely clear, if you only upload and store and pay for 1 copy of your Backblaze B2 data, it is living in one region. To get a copy in two locations you have to pay twice as much and take some actions. So if this kind of redundancy is important to you for mission critical reasons Backblaze B2 would only be half as expensive as one copy in Amazon S3, not 1/4 as expensive.
In the one copy in one region in Backblaze B2, any file is "sharded" across 20 different servers in 20 different racks in 20 different locations inside that datacenter. This helps insulate against failures like if one rack loses power (like if a power strip goes bad or a circuit breaker blows). But if a meteor hits that 1 datacenter and wipes out all of the equipment in a 1 mile blast radius, you won't be getting that data back unless you have a backup somewhere else.
I've combined cloudflare workers with backblaze to implement etags, signed URLs, etc. Backblaze is part of CF's bandwidth alliance so your bandwidth fee is zero. This makes for a very low monthly cost
Hi, I haven't had time to write up about this, however, I have dumped the majority of the related code here for you and others who are interested in this solution: https://gist.github.com/chocolatkey/a7ef0364e357629e9875521d.... That should help you get started. It includes HMACSHA256 shared secret URL signatures based on IP, expiry, and optional path scope restriction, caching, ETAGs, sentry error reports, access to non-B2 data from a server w/ basic auth, and more... URLs look like this: https://example.com/delivery/UNIQUE_ID/p-001.jpg?token=16fb4... . My B2 bucket is public, however the requested path is also hmac'd with a secret known only to the CF worker to derive the path of the resources in the bucket. It is optimized for my use case of serving EPUB data. I do not guarantee it to be free of flaws, but it's worked well so far.
I am fancying the idea to move our CDN from AWS stack to B2 + CF, thanks to Bandwidth Alliance. There's at least one thing stopping me: for simple schema of hosting static content out of bucket we should deploy Workers just for URL rewriting. Guys from CF recommending that way and not URL rewrites by simple rules[1]. But it puts us in a weak position of raising cost twice: for increased edge trafic AND for increased number of requests.
I looked at Wasabi some time ago, but their pricing is a LOT less simple than their headline says it is.
The major caveats are hidden away in their pricing FAQ: they charge a 1TB minimum if you use less, and there's a 90 days minimum retention period, meaning if you update a file a few times you will pay for the full 90 days of every intermediate version. Additionally, they reserve the right to make you pay for egress if it looks like you transfer more than you have stored.
So all in all, Wasabi might be the right fit for you if you store >1TB of files that are infrequently updated and get less than 1 download/month on average. If you fit that use case, I think their free egress pricing is awesome, but it's definitely not for everyone.
Wasabi does not allow you to use unlimited bandwidth. Your egress is supposed to stay close to your total ingress. So if you are uploading assets that will be access more than a few times in the first month, I think you will be out of spec for wasabi.
Because of vendor lock in. When you move a lot of data between S3 and EC2 it costs nothing (or very cheap). When you move data outside of AWS, there is extra cost, so it might not even be cheaper overall.
I am strongly considering B2 as an option for a dropbox-style system. Something where I run 8-16TB of hot tier on my local LAN, with B2 serving as the slower mass storage tier behind it. It seems that the average B2 access latencies would be ~100-200ms, which is very tolerable for a cache miss on such a massive tier of storage. With this amount of space available you could have pre-fetch rules that do things like pull down entire directories as files within them are accessed.
1. Scale - S3 is big - really really big! You don’t need to care if you store one KB or several petabytes.
2. Tiers: the default on S3 is several way replicated storage with 11 9s of durability with high availability. However you can select from cheaper options with the trade off you are happy with.
3. Cost: S3 has reduced prices several times, you can be reasonably sure your costs will go down over time on per unit basis.
I too use the serverless image handler but it's not perfect. The documentation is really crappy and over the summer they transitioned the whole system from thumbor to sharp and didn't provide great backwards compatibility.
Cloudfare does have a free CDN tier "For individuals with a personal website and anyone who wants to explore Cloudflare." but it's not the same as B2 including a CDN for free, even Azure is apart of the bandwidth alliance.
Right, but it means you can basically (ab?)use Cloudflare to get free egress from B2 storage. Cloudflare won't get too mad until you start hitting terabytes per month; even the free tier doesn't have restrictions.
You can also turn on an extremely aggressive caching policy with a page rule that will keep everything under a given subdomain for a month. This makes the "free CDN" part easy, though again, people who do this run the risk of getting their accounts terminated.
Disclaimer: I work for Backblaze so I'm biased. :-)
> the B2 API is much slower than S3.
This is "generally true" for 1 upload thread. We aren't even sure what Amazon is doing differently, but they can be a little faster in general for 1 thread (some people only see 20% faster, some see as high as 50% faster, might be latency to the datacenter and where you are located).
As long as you use multiple threads, I make the radical claim that B2 can be faster than Amazon S3. The B2 API is slightly better in that we don't go through any load balancers like S3 does, so there is no choke point. What this means is that in B2 40 threads are actually uploading to 40 separate servers in 40 separate "vaults" and none of the threads could possibly know the other threads are uploading and it does not "choke" through a load balancer. This was all designed originally so that 1 million individual laptops could upload backups all at the same time with no issues and no load balancers. And it works great every day.
Practically speaking, for most people in most applications, this means both Amazon S3 and Backblaze B2 are essentially free of any limitations. If you aren't using enough of your bandwidth, spawn a few more threads (on either platform) and soak your upload capacity. But in full disclosure, if your application is only single threaded, yes, B2 tends to be 20% slower for that 1 thread.
Genuinely curious ... do you not assign any value to having a backup outside of Amazon ?
AWS can certainly provide geographical diversity, but on the organizational abstraction layer, all eggs are in one basket, yes ?
Is having organizational redundancy something you assign zero value to, or something whose value conflicts with the egress costs so as to make it a difficult decision ?
I mean we have like 2 millions of line of python code written for lambda, S3, SQS, SNS, Kinesis, Redshift etc using boto3. So if AWS dies, it's not like data backup will save my startup. We're dead.
Not the parent, but they mentioned that they are a startup. AWS "dying" has killed zero startups so far. Time to market has killed many more, same for "not-invented-here" syndrome, and prematurely building for the future.
Maybe? I'm not an influential enough engineer to change something that fundamental. Seniors say it's troubling but they're already married to AWS so it's very expensive to have a plan B. I don't think AWS dying is high on the list of why the startup can die. There are bigger dangers and they can only be solved by writing code that works.
We attempted to be cloud agnostic (using terraform instead of CloudFormation for example) and then later multi-cloud. The amount of complexity and cost around it was just too much.
If AWS goes down, more or less a good portion of the internet goes dark. It's an acceptable risk at this point unless you are truly massive and entirely self contained- if you are using any 3rd party services, IE for auth, payment, whatever- they may be using AWS as well and you are still exposed.
We backup data that's not on S3 outside of AWS (code, operational databases), but most of our S3 data is effectively stuck due to the insane export prices. It's not the end of the world if we were to lose everything in S3 anyway.
To anyone reading this: Don't store lots of small files on S3. It's a terrible idea.
I barely had time to skim it, but I'm not sure I like how the ST12000NM0008 shows up in the table. I find it really hard to reason about what the real failure rate could end up being on those drives. For example, you've got about 45 days average on each drive, so the failure rate is multiplied by roughly 8 to extrapolate the annualized failure rate. Doesn't that over state the estimated rate of failure since drives will tend to fail more often at the start of their life?
I only guesstimated out of the table and didn't have time to look at the actual data, so it's possible I misread something.
Does anyone remember what is their definition of "drive failure"? Is it SMART "failure imminent" report, single uncorrectable read error or complete data loss for a whole disk? I recall reading about it in one of their previous report, but can't find it again.
EDIT: nevermind, found it.
"Backblaze counts a drive as failed when it is removed from a Storage Pod and replaced because it has 1) totally stopped working, or 2) because it has shown evidence of failing soon.
A drive is considered to have stopped working when the drive appears physically dead (e.g. won’t power up), doesn’t respond to console commands or the RAID system tells us that the drive can’t be read or written."
Ah yes, the reliable BackBlaze folks. That they've out-Googled Google in a niche using mostly commodity infrastructure and kept their business alive for so long is a testament to their ingenuity (I wonder how their operating costs compare with AWS Glacier which has a theoretical advantage of unpowered disks.). And the releasing of this proprietary operational business data is a testament to their coolness factor.
It's a timely article as I'm looking at HC530's (WUH721414ALE6L4 / WUH721414ALN6L4 (wiredzone carries it)) for a home FreeNAS box:
- any relatively-modern enterprise 4U 3.5" storage box with Xeon 4 cores or so
- quieter, high-volume fan mod
- RAM: 64-128 GiB, beyond that isn't useful unless deduping
- NIC: X710-T4L 4x 10GbE copper NIC
- ZIL: mirrored pair of high-endurance, write-intensive, reliable SSD like Optane 900p/905p 280-480GB
- L2ARC: striped pair of read-intensive/larger SSDs like the Gigabyte Aorus Gen4 1 TB
This will fit nicely as my home NAS for a water-cooled dual EPYC virtualized server/workstation build underway. I managed to get a single water block with (3) G1/4 connections that will cool both CPUs and the VRM chokes/converters.
If anyone has better suggestions, please chime in.
Why does someone need something like this? I ran a home symbology NAS at one point but it wasn’t worth the trouble. Let others run and maintain those hot, loud, power hungry disks.
Speed. Streaming files (movies, tv shows, youtube channels, linux ISO's, large file collections) is a lot faster; I can reliably hit 100MB/s on my home connection, my DSL caps out at 20MB/s if I do nothing else.
I’m a Backblaze customer of many years & respect their team a lot. But seriously, “out-Googling Google” because they have cheaper storage is a meme that needs to die.
GCP and AWS both store full copies of your data in multiple locations by default (Availability Zones in AWS-speak). So it’s not an apples to apples comparison. The reduced redundancy is priced in, for people who can tolerate it.
Woah, chillax the dramatic rhetoric, your majesty. ;-P
The original scrappy Google was founded on commodity hardware held together by LEGO. The point was to not do as enterprise with redundant everything, which was wasteful for web-serving use-cases that were solved with better high-availability in software. These days, if you're a giant company like FAANG, you can easily afford to go to Quanta and say: give me 10k racks worth of compute nodes to this specification. If you're starting out and broke, you gotta use what's on the shelf, cobble together a custom solution optimized for the purpose and/or kit out a test lab with a mis-mash of used servers from eBay.
Slightly off topic: is anyone using B2 (which seems cheaper if you have more than one computer for a certain amount of data) for personal data backups with strong client side encryption across multiple platforms (Linux, Mac, Windows)? If so, how do you handle it?
I sync all my device files to a local Freenas server which runs duplicacy in a jail and sync's it every night to backblaze B2. I looked at duplicity, restic, attic, borg and in the end settled for duplicacy. Pay attention to the duplicacy license, for somebody it could be a problem.
I do this, though not from Windows, just Mac and Linux. I use restic, which has B2 support smd handles all the encryption. it also does diffing for backups. There's a Windows build, so I assume it would work for you there as well.
Similar to the other two sibling comments, been using restic to sync to B2 over the past 6-7 months. Stored amount has been 450-475 GB, and total costs tend to be about $2.50-$2.75 per month.
It's pretty much been this way for a few years, with only a few model lines of Seagate being the outlier. As always, thanks to the BackBlaze team for publishing these numbers.
Don't think I made a typo there, but please check my work. Even counting as 1024 TB = 1 PB and 1024 PB = 1 EB, that leaves 1,048,576 TB = 1 EB and they're over that threshold.
The February 5, 2018 "500 Petabytes and Counting" blog post should soon be eclipsed by a 1 EB post - though it appears they're counting actual data stored, not capacity. Nonetheless, with some redundancy, extra capacity, and overhead, we'll likely see that number soon.
True with enough redundancy it's fine, and if they have special terms with SG such as free replacements and heavy discounts then it's little wonder they use so many.
Myself though, for SoHo use, I'm willing to pay more for less stress because I don't have the sheer volume of devices, and the time to replace is time spent doing something useful instead of shuffling HDDs and rebuilding RAID arrays. A 5% saving on a handful of drives is not worth it, but a 40% saving on thousands makes them competitive.
Every time I look at HDD price it seems Seagate is always a little more expensive. But the difference is less than 5% to the point one may argue they are all priced similarly.
I dont want to save a few dollars for potentially 4x the chance of failure and hassle.
And even if we ignore the outlier to 2%+ from two models, Seagate is still on average 2-3x more likely to fail.
Does anyone here have experiences with BackBlaze's B2 service for hosting files? I'm considering switching to it from S3 because it is much cheaper. (I need to transfer 2-3TB / month, usually in 2-3 bursts of worldwide distribution).
Yev from Backblaze here -> We're definitely more affordable and our integrations (https://www.backblaze.com/b2/integrations.html) make it easy to get your data to us. We even have partnerships with companies who can help transfer data from S3 into Backblaze B2!
How is Backblaze able to be so much cheaper than the other, larger competitors? I assume Amazon/Google/Microsoft has squeezed every last cent from suppliers and also has highly cost-optimized staffing costs.
Yev here -> great question! We are a bootstrapped company and we focus on inexpensive storage (https://www.backblaze.com/blog/vault-cloud-storage-architect...). Because we've built a robust system that doesn't use a ton of expensive components we can provide hot cloud storage (B2 Cloud Storage) and computer backup at an affordable rate while still making decent margins. To learn more about our business and decision making, we have a pretty cool series of entrepreneurship blog posts that might be interesting to some: https://www.backblaze.com/blog/category/entrepreneurship/
Reading about b2 pricing it says, you get "10GB of free storage, unlimited free uploads, and 1GB of downloads each day". Doesn't that amount to essentially free backups for (reasonable) personal use? Or am I missing something?
You aren't missing anything. I use B2 along with Restic to backup my Linux machines since their standard backup solution doesn't support Linux. It costs me around $1/month to backup my primary desktop and two laptops.
I don't. Although I can easily fill up a terabyte drive, little of that is my own personal files that I need to keep if the drive blows up. Most of my stuff is source code, documents/notes and some photos (with photos being the only thing that takes up significant space). Almost everything else I can re-download or rebuild from the original source as and when I need it.
In total, sure, but at least for myself the really important stuff would fit in 10MB and I think I could fit all of the medium importance stuff in 1GB. The remaining terabytes are nice-to-have but I wouldn't be too upset if I lost it.
Is there any consensus among Backblaze employees (or even just your personal opinion if applicable) for what brand/series of drives to use for home NAS devices?
I ask because the online favorite appears to be WD Reds, which you have phased out since 2018.
Yev here - it's interesting, we don't really chat about that often - what I would do is get the least expensive drive that has the most capacity and make sure the NAS is backed up somewhere in case of failure or theft. Personally I think the Toshiba drives are pretty good, but Seagates are affordable and do a good job. Plus there's always HGST which are rock-solid, but tend to run a bit more expensive.
We do have a datacenter in the EU (Amsterdam) - and if you set up your account there you'll be able to transfer data to it. That's a popular destination for folks living closely to it, but even before that one went "live" we had lots of people using the our West Coast Data Centers without much issue. If you have a ton of data you can take a look at the Fireball (https://www.backblaze.com/b2/solutions/datatransfer/fireball...) which allows you to rapidly ingest data to us.
What are you using as TCP congestion controller? BBR should provide better utilization on long pipes (e.g. transoceanic transfers if stuff isn't geo-replicated). Totally anecdotal, but it helped me FTPing data from the US to europe.
Hey Michael, I host RAW photos I want to share inside B2 (48mb each), and then put CloudFlare in front of it using their tutorial [1]. It gets edge caching, and achieves 200-500mbps. Its great, and I have absolutely no complaints.
@mherrmann - Only about 10-20GB, so not the TB levels you are dealing with, but backblaze isn't actually doing the transfer, it is CloudFlare.
@toomuchtodo - Yes, and on top of that, both B2 and CloudFlare are completely free since I'm under the 10gb storage limit (for now), and i'm a personal user of CloudFlare(for now).
I have made all my hard drive purchasing decisions based almost entirely on these reports for the last couple years and have not been disappointed with the results.
To quote them : Transparency breeds trust. We’re in the business of asking customers to trust us with their data. It seems reasonable to demonstrate why we’re worthy of your trust.
I’m sure Amazon, Google, Facebook, etc collect their own stats on drive failure. It would be almost negligent if they were just guessing in the dark every time they buy drives.
Main difference is probably Backblaze is small enough to publish these stats without hurting their supplier relationships. (pure speculation)
I have ~10mbps upload here in the US, and my backup was looking to take about a month for about 3ish TB of data. One thing that helped is that with default settings of only 1 backup thread, the Windows client was unable to saturate my upload bandwidth. Upping it to 4-6 threads allowed it to keep enough data moving to actually saturate my upload bandwidth and brought my backup down to like a week.
No really related remarks about this handy study, but anybody else still in real awe about how spoiled we are with regards to the sizes and speeds of HDs nowadays? I mean the smallest capacity drive on their chart is 4 Terabytes.
Not feeling spoiled at all, not at all. Especially not with 2 to 3 percent of failure rate. The failure rate I experienced in my workstation makes me worry about not having raid 1 or 10. HDs for 9 TB in raid 10 are not that cheap.
But the bigger issue is that the warranty terms for HDs nowadays is down to 2 or 3 years, so this investment is short living. It also tell you something about the manufacturers reliability estimation of their products.
Can't say I agree with that sentiment. The fact that I can quite reasonably have a 30TB usable RAID5 NAS array makes me feel pretty spoiled. Then again, I'm old enough that my first HDD was 10MB.
Mine was 10MB as well, with a dedicated controller. Quantum if I'm not mistaken. And it lasted much much longer than the averages I get from 4TB disks. I believe I managed to take files out of it in 2000, about 13 years after it was installed.
I'd be wary of making a RAID5 array with drives that big; you could easily lose another drive from the I/O caused by a rebuild; though if you have backups (you should) then it's probably an acceptable risk for non-critical data.
I'd agree with that. Even 2-disk redundancy these days is a bit dangerous when you're talking about 14TB drives and 100+TB arrays. As is often stated: RAID is not backup.
Since HDDs have, for the most part, been relegated to being external drives on laptops, I'm still looking forward to SSDs becoming way cheaper and reaching current HDD prices per GB. Internal storage on laptops has reduced or stayed the same while our datasets have grown exponentially over the years (with photos and videos). Since SSDs also perform much better when there's always a good amount of free space (for wear leveling and maintenance), it's all the more painful to live with lower capacity SSDs on laptops.
I have about 10 TB of video files. I use BackBlaze for Windows but I would like the files to be available on other computers and my phone in my local network.
What can I use to do this and still keep offsite backups?
I have two ST12000VN0007 (VN) Seagate drives. The report shows the ST12000NM0007 (NM) has a 3.32% failure rate. I wonder how closely related the VN and NM models are.
Surely it doesn’t matter when you have 10,000s of drives? Aren’t you already at a large enough sample size? If it isn’t, what is the point of them publishing this every year? I don’t know the math of the matter though.
> Surely it doesn’t matter when you have 10,000s of drives? Aren’t you already at a large enough sample size? If it isn’t, what is the point of them publishing this every year? I don’t know the math of the matter though.
I think drive age matters? I'm not clear if they cycle drives out at a certain age or just run them until they fail.
Also, if a drive is low enough in cost, then the additional cost of replacing an incremental 1% may be lower than the cost of acquisition of a more reliable drive.
For sure it is/should not be high priority, but releasing such an app in 2020 for sure does not reflect the great skills of the backblaze team. At least show me some basic stats, account settings and invoices. You can only download files from your buckets and that's it.. really?
Does anyone have any opinions and experience using backblaze as a personal only cloud storage and offsite backup for smaller amounts of data (under 30 TB)
I use Backblaze's B2 service for both backup (via restic) and archival storage (via git-annex). I only maintain a distinction between the two in case I ever want to move to another service, and also because git-annex and restic have different strengths that make them more or less suitable for unchanging archives and often changing backups respectively. Between the two I have about 1 TB stored with them.
I have yet needed to do a full restore, but I do partial restores from time to time to double-check my backup procedures and every time it's done what I wanted. My monthly costs are usually a bit under $5.
Note I essentially never use B2's API directly, and only use it as a backend through wrappers others have written, so I have no real experience with how good its API is. One of the few times I did try the API, I remember at one point I think I was getting Java exceptions back in the error messages, which was mildly concerning from a hygiene perspective and made for rather terrible error messages, but no sensitive data was being emitted. I also think that's been fixed.
The bottom line is that B2 has worked fine for me and at a good price point.
Did you mean to say 30 GB or 30 TB? Calling 30 TB as "smaller amount" seems weird to me in 2020, especially for personal data. Perhaps it would be the norm in a couple of decades. :)
FWIW, I have way under 1 TB of personal data to backup to different locations, and I consider that to be relatively large.
I did mean 30 TB, I have approximately 12 TB of data currently between all of my storage for video, audio, books, and games. However, I have been avoiding doing a lot of conversions to digital media from my physical collections because I'm just unsure of running a full blown archival server at home. I would estimate if I converted my entire video library to 4k it would put me somewhere over 10 additional TB. My comic books/manga and graphic novels, upgraded to archival resolution would probably run over 10 TB as well. Then there is the soon to be required ripping of PS2/PS3/WIIU roms when those hardware units become less reliable for actual playing. So I think that 30 TB of storage would do for the time being for me, but I think I will eventually need more than that.
TL;DR I am a digital horder, so I've convinced myself I do in fact need 30+ TB of storage.
I have wondered about system downtime or time operating in a degraded state.
My understanding is other than mirrored, RAID configurations may take a long time to rebuild on the larger drives and this is a contributing factor to why the highest sales volume of drives has been 'stuck' at 4TB (thus the lower $/GB price).
They don't use traditional RAID setups there. My understanding is they use a proprietary data encoding and distribution, which is more accepting of individual drive failures and reduces rebuild times. I believe I've heard they use something more like erasure coding rather than RAID-5.
Other than HGST is owned by Western Digital, they also said:
There were no Western Digital branded drives in the data center in 2019, but as WDC rebrands the newer large-capacity HGST drives, we’ll adjust our numbers accordingly.
https://www.backblaze.com/blog/wp-content/uploads/2020/02/Bl...
Some models are dwindling. Some are being tested. Others (like the Seagate and HGST 12 TB) are increasing. Only thing that's really perplexing is why they keep buying more and more of the high-failure-rate Seagate 12 TB drives. It must be more than 3% cheaper to buy (and service!) a Seagate with a 3% chance of failure than to buy an equivalent HGST with a 0.4% chance of failure. I guess when you have 120,000 drives, easy hot-swap enclosures, and software to handle it all that makes good sense! But as an individual consumer, even with a Backblaze backup, it's definitely worth my time to spend a bit more on a drive that's far more reliable than to save a few dollars on a Seagate.