Western Digital admits 2TB-6TB WD Red NAS drives use shingled magnetic recording

justinclift · on April 15, 2020

> “In a typical small business/home NAS environment, workloads tend to be bursty in nature, leaving sufficient idle time for garbage collection and other maintenance operations.”

Only up until the array needs to be rebuilt (eg replacing a drive). At which point the workload is the literal opposite of "bursty with idle time". :(

reitzensteinm · on April 15, 2020

Does SMR impose a penalty on reading?

My understanding is that it removes the ability to do random writes, since you'll disturb the nearby tracks. Like drawing pen thickness rings with a sharpie by overlapping them.

Instead you read data in, modify it, then write it out, like an extreme version of updating a byte in a cache line.

Your ability to read is more precise, so you have no trouble reading data laid out in this way.

Large contiguous writes are also pretty quick. I've owned one, and if you stick to large files it's fine. Pathological workloads like creating millions of small files can take days.

londons_explore · on April 15, 2020

I'm not sure if drives do this, but it's theoretically possible to linearize a random write workload, with the disadvantage that when you come to read the data, a sequential read becomes a random read.

Typically stuff you write randomly (small files, databases) one tends to read randomly, so it's usually worth linearizing writes.

The only case where you are at a performance disadvantage is when you did random writes followed by more than one linear read.

Things like LogFS do this, as do most SSD's. I'm pretty surprised modern HD's don't do it internally too.

jandrese · on April 15, 2020

> it's theoretically possible to linearize a random write workload

In fact OSes have been doing this for years now. Over a decade. There is a drawback however. To make this work you need to buffer your writes in a cache, typically in volatile memory, so they can be sorted before being written to the drive. The larger the cache the more improvement you see (although there are diminishing returns on this), but also the more data you lose in the event of power loss or kernel crash. So it is a tradeoff between speed and resilience.

Interesting thought about how this changes for SSDs. While you don't have seek times to consider anymore, if you have two writes that happen to fall on the same block in the SSD it is a big win to combine them, so SSDs still want to buffer data and have to make the speed/resilience tradeoff.

londons_explore · on April 16, 2020

You're talking about something else here. Linearized writes means write the data immediately, but rather than write the data to the 'correct' location on disk, just write the data immediately after the last bit of data you wrote.

Then keep some accounting information elsewhere to map the real location of data to the place it would have been written had linearization not been done.

jandrese · on April 16, 2020

That sounds awful from a fragmentation standpoint, but I guess it would be ok for write-mostly data. That metadata map would get pretty gnarly if you have a lot of interleaved writes though.

mikepurvis · on April 15, 2020

> theoretically possible to linearize a random write workload

Are there filesystems or databases which are specially designed to optimize for this constraint? It would basically boil down to structuring your whole system as a set of interlinked mostly-append journals.

londons_explore · on April 15, 2020

Bigtable is exactly this. An entire database is saved in a small number of files which are are append-only (and therefore can be written fully linearly).

Reads of a single record usually only read a single disk block. In-memory indexes to locate the block are fairly small, even for 1B+ records.

Reads of a range of records do N range reads of blocks, where N is typically < 5, and the number of bytes read compared to the number of bytes returned to the user is typically <10% overhead.

pwg · on April 15, 2020

Yes, see this at least as one reference:

https://en.wikipedia.org/wiki/Log_structured_filesystem

barrkel · on April 15, 2020

Yes, they're called log-structured file systems. Been around for decades, lots of research.

klodolph · on April 15, 2020

SMR doesn’t impose a penalty on reading, but DM-SMR must impose a penalty, by its nature.

vetinari · on April 15, 2020

Rebuilding is not random i/o though, it goes sequentially and that's what SMRs are good at.

Still, I did recently reshape from 4x6TB raid5 to 5x6TB raid6. The new drive was SMR WD Red, it took 12 days and I was quite pissed when I found out the SMR bit. It wasn't in the datasheet, if I knew upfront, I would never bought it.

syshum · on April 15, 2020

>> I knew upfront, I would never bought it.

SMR has a terrible rep for a reason, and no one with any knowlege of what SMR is would buy which is why they have to be unethical by removing this spec from the spec sheets

SMR is cheaper for them to manufacture, and is TERRIBLE technology that really has very little actual use case, except Archive storage. The manufacturers want to push SMR for more than that because it is more profitable for them

zozbot234 · on April 15, 2020

> SMR is cheaper for them to manufacture, and is TERRIBLE technology

It's not that terrible if the customer knows what they're getting. It's just that SMR competes with tape more so than with conventional (CMR) hard drives. Many workloads are good enough for SMR.

syshum · on April 15, 2020

>>Many workloads are good enough for SMR.

One workload is, which is what you said.. Archiving, what one would normally do to Tape.

Everything else is is substandard and should not be used

chmod775 · on April 15, 2020

But what if you need random reads?

SMR sounds like it's great (value) at providing random read access to large amounts of data if you only have sequential writes or infrequent writes.

mrjin · on April 15, 2020

That's true. There would be no issue as long as the fact is clearly displayed in data sheets. The problem is not really with SMR, but with hiding the fact.

sixothree · on April 15, 2020

Due to the "staging area", I would not expect this to be true. Contiguous data written to an SMR does not land on disk in a linear fashion.

labawi · on April 15, 2020

Data is juggled while staging, but doesn't it end up in its "proper" place? Do you have any info on how data is managed?

I mean, it could pull-off global block remapping, I just didn't expect it.

sixothree · on April 15, 2020

If it's in staging, it's not in its proper place. We know roughly how long it takes to fill staging. But I don't know how long it takes to clear.

During that time, data is not contiguous. Nor is there any guarantee that it will ever be.

samstave · on April 15, 2020

Serious question: so basically wikipedia is the only use case?

tfigment · on April 15, 2020

Time series historian or some siem logging or similar journaling seem applicable to me. Maybe continuous loop video recording.

samstave · on April 15, 2020

Ah video makes sense. Thanks.

klodolph · on April 15, 2020

You don’t need to write the entire drive sequentially with SMR, it’s just that you need to write sequentially within a given “chunk” of data. So there are some workloads that can, at least in theory, work on SMR. Think of object storage or log-structured data.

toast0 · on April 15, 2020

With proper system design and tiered storage, it would fit a lot of archival uses if it's visible.

Things like data warehousing, where you would want to store on something else for the current time period, but could store a big blob as the data gets finalized (once all nodes have reported in, or maybe after N days if you drop some columns at that time). Or household archival where data basically doesn't get changed once it's put on the storage --- wait for a block sized write to accumulate and then do it (need a different tier for the intermediate writes... Some SMR drives have a PMR section for this too).

But, device managed SMR is not a very good solution, even if the use case would be good for SMR, because the filesystem is most likely not aware of the need to cluster writes.

eeZah7Ux · on April 15, 2020

wikipedia is not a good use case.

Dylan16807 · on April 16, 2020

Once the software problems are fixed, then just about anything you might want to put on a NAS is suitable for an SMR drive. Big files work just fine. Put all your big media files on it. Little files are fine in limited amounts, or in occasional bursts. You could install a program to it, but you wouldn't want to put your user folder on it. Don't put cache or temp files on it, or a database. Don't put a virtual machine's disk on it, since that implies an internal filesystem that doesn't know about the layout.

Dylan16807 · on April 16, 2020

With a filesystem that understands the layout, they're almost as good as normal hard drives. With a filesystem that understands the layout and $3 of flash, you can make something that solidly outperforms a normal hard drive, at a much lower cost.

Also tape drives are too expensive for normal consumers.

wbl · on April 15, 2020

Host managed SMR is similar to SSD in terms of what needs to be done.

monocasa · on April 15, 2020

With a huge latency difference which makes them totally different in practice.

wtallis · on April 15, 2020

The latency difference means the two are suited for very different real-world workloads. But they're similar enough in the access patterns they prefer that the software work to make your IO fit those patterns can be shared between SMR and flash.

monocasa · on April 16, 2020

The latency difference means that the access patterns are extremely different in practice. Even random reads get really painful on SMR since they get structured as logs that need to be replayed.

SMR is way more like tape than an SSD.

bsanr2 · on April 15, 2020

>It's not that terrible if the customer knows what they're getting.

Between cigarettes, credit default swaps, and asbestos, I think this is a phrase we should become very wary of saying with a straight face.

TheKarateKid · on April 15, 2020

I’m still confused how SMR is suitable even for archive storage. The drives slow down tremendously when doing long sequential writes. I don’t think tape even does this.

justinclift · on April 16, 2020

Tape is pretty constant. It has a (let say) "optimum" throughput, and as long as you get the data to the drive controller fast enough that's what it'll do.

If you don't then the tape drive write speed falls off pretty quick as it does stuff to cope.

bluedino · on April 15, 2020

Dropbox has half their data on SMR drives. They did customize their storage software to work well with SMR, however.

mikepurvis · on April 15, 2020

They keep around old versions of your files for a time anyway, so the use case is almost a perfect fit.

ashtonkem · on April 16, 2020

Archival storage shouldn’t be use anything other than tape, in my opinion.

8fingerlouie · on April 15, 2020

> SMR has a terrible rep for a reason, and no one with any knowlege of what SMR is would buy which is why they have to be unethical by removing this spec from the spec sheets

I have a couple of 8 TB Seagate SMR drives i use for backup, and for that purpose they've been great. They're cheaper than PMR drives, and i honestly don't care if the backup takes an hour extra to complete.

Unless you're in a business scenario, how much of your data on your x*8TB drives at home do you actually write to on a regular basis ?

Half of mine is family photos, backups of clients, movie/music media, etc. For this purpose SMR also works well.

I'm far more worried about power consumption for NAS drives than write performance.

hu3 · on April 15, 2020

To be fair your parent did say SMR is fine for your usecase of backups (archive).

marcan_42 · on April 15, 2020

DM-SMRs aren't good at sequential I/O. They're bad, just not pathologically bad. SMR drives still need to stage sequential I/O to the cache area, but they at least get to write it out without a ton of write amplification.

With extra inefficiencies, yeah, SMR drives will only be a few times slower than regular drives for sustained sequential write workloads, as opposed to thousands of times slower for random writes.

Real host-managed SMR with SMR aware filesystems and storage engines might be OK, but that's not a thing in the consumer space.

tjoff · on April 15, 2020

With regular raid yes, not with something like ZFS.

(which is often argued as something positive but I would argue is a bad aspect of ZFS (regardless of whether SMRs are used or not))

magicalhippo · on April 15, 2020

There's been work done to make ZFS rebuilding (or resilvering as ZFS calls it) less random.

See for example "New prefetcher for sequential scrub" and "Improving resilver: results & operational impacts" from http://www.open-zfs.org/wiki/OpenZFS_Developer_Summit_2017

seabrookmx · on April 15, 2020

Yeah I just upgraded a ZFS mirror from 2 x 3TB to 2 x 8TB, using Seagate SMR drives (picked them up for $185 CAD which is super cheap).

This is for a media server so figured SMR would be fine, but the resilver took 28hours!

No checksum errors though, so now that it's done things seem to be humming along fine.

jandrese · on April 15, 2020

That has to hurt the aggregate write performance on the array. You hand it a big chunk of data and it goes ok, calculate the FEC and write to disks 1, 2, 3, 4. And disk 1 is done, disk 2 is done, disk 4 is done, ... , ... , ... , disk 3 is done, now to tell the OS the write is complete and to send me the next chunk.

johnchristopher · on April 15, 2020

That's a problem. I have been using and recommending WD to everyone who asks me what drive they should buy (based on backblaze data and 20 years of personal experience) for their private and everyday usage. Even though these are NAS drives it tarnishes the brand.

GuB-42 · on April 15, 2020

I have stopped caring about hard drive brands.

What Backblaze and experience show is how drives of a certain brand performed a couple of years ago, not how the drive you just bought will perform.

IBM used to be very good until Deathstars, and now HGST, who took over is among the most reliable. I used to be very satisfied with Seagate, then ST3000DM001 happened, and now, it looks like they are slowly getting back on track. Maybe it is WD turn right now. I don't think there is a way to know until it is too late.

Personally, I like to make RAID mirrors with drives of different brands, or at least different models. Even it it is not ideal for performance, it helps making sure that not all drives fail at the same time.

jandrese · on April 15, 2020

IBM had one batch of bad drives and a second batch of somewhat iffy drives after many years with an excellent record and suddenly nobody ever trusts them again. Meanwhile Maxtor put out crap drives year after year and nobody seemed to mind. It was a very weird situation.

FWIW however I bought some HGSTs a couple of years ago that are by far the loudest drives I've ever used. They've been reliable thus far, but the noise level is so bad that I had to move them away from where the people are working.

tinco · on April 15, 2020

Thinking of Maxtor gives me warm fuzzy feelings. That 20gb disk felt like it could store the world.

redis_mlc · on April 15, 2020

I had one of the first double-height 70 MB drives when it came out.

It literally could hold all of the popular software of the time, around 50 applications.

I also had one of the first CD-ROM drives at work (the Japanese director had connections at Sony.)

We were like, "How can anybody possibly ever fill 650 MB of data storage?"

We ended up mastering our own satellite data CD-ROMs to demonstrate to people what was possible.

Dylan16807 · on April 15, 2020

Meanwhile NTSC is sitting over there needing around a megabyte per second to store.

Pictures and movies have always been the easy answer for how it's possible to fill up huge amounts of storage.

freepor · on April 16, 2020

But user behavior data is a much bigger fraction of total storage use than video.

p1esk · on April 16, 2020

Source?

yellowapple · on April 15, 2020

And nowadays it'll barely store the operating system (unless you make a conscious effort to strip things down).

pmlnr · on April 15, 2020

I loved and owned quite a few 2.5" HGST travelstars. There were capably of takig insane abuse and worked just fine. Interestingly enough I never got to buy the 1.5TB ones; by the time I needed one it disappeared off the market, the website, everywhere. I never knew why.

    192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       2502492302
    193 Load_Cycle_Count        0x0012   086   086   000    Old_age   Always       -       142654

^^^ those are a working, happy disk; all due to a "feature" that was on in Ubuntu by default that wanted to idle/spin down the disk every other second and it took me a while to notice.

On contrary I'd worn a 2TB 2.5" Seagate out in 2 years. Bought it new, put it in the same machine the 1TB HGST had been running for 3 years already, the Seagate died earlier - or at least that's what ZFS told me. It's spinning, but it throws weird errors from time to time.

I never had experience with 3.5" Hitachis.

Eons ago there were some interesting disks in the market though; if I remember correctly my father had some Fujitsu SCSI160 or 320 drives which had their normal working temperature around 50-60C.

Dylan16807 · on April 15, 2020

Yeah, it would have been nice to know how loud the HGST would try to be before I bought one. But stuffing it in an isolating bracket like this basically fixed that: https://www.amazon.com/gp/product/B07LCD2RJY/

garaetjjte · on April 15, 2020

I guess they are same design as Toshiba DT01ACA series. My 10-years old Barracuda was barely audible drowning in fan noises, but this new Toshiba seeking is obnoxiously loud.

johnbrodie · on April 15, 2020

Not so sure that helps in this situation. FreeNAS users are reporting these drives failing to resilver, which _could mean_ that you end up with a failed array and data loss instead of a failed drive. I'll personally be staying away from WD for a while. I'd rather drives fail prematurely than be a ticking time bomb.

zionic · on April 15, 2020

The news isn't that WD is doing this, it's that WD is also doing this. At this point both the top 2 are doing this.

I'm so sick of HDDs, I can't wait for SSD NAS to close the gap. An NVMe M2-blade NAS would be incredible!

pmlnr · on April 16, 2020

I don't trust ssds for long term storage, and I'm mortified on the experience when they die from one second to the other.

At work, we had literal hundreds of machines failing the same week because of sudden ssd death.

Ssds are fine for speed, but for storage... maybe one day.

ashtonkem · on April 16, 2020

Probably not. The permanent need for power to avoid data slowly disappearing from the device pretty much makes SSDs a non starter for long term storage.

indiv0 · on April 16, 2020

Surely routine ZFS scrubs would mitigate this?

ashtonkem · on April 16, 2020

SSDs literally zero themselves out if they’re left unplugged for too long. No file system trickery will save you from that.

lonelappde · on April 18, 2020

Are you sure? This article says that was a misunderstanding.

https://www.pcworld.com/article/2925173/debunked-your-ssd-wo...

ashtonkem · on April 20, 2020

That’s not what the article says. The article says that consumers are overreacting, but that SSDs can and do lose their data with loss of power and/or heat.

joshstrange · on April 16, 2020

I feel like we are a long way off from that just based on price. Generously I think we can say a 1TB SSD (sata even to go cheaper) is about $100. That compared to a bare WD Red 8TB which comes in at ~$27/TB. ~4x the cost and smaller drives is a hard sell (even with the speed). I have a pool of SSD's that I keep some stuff on but the rest goes to spinning rust.

mixedCase · on April 15, 2020

>and now, it looks like they are slowly getting back on track.

The smartmontools ticket linked in the article says that Seagate is doing the same thing. Best to avoid.

jlgaddis · on April 15, 2020

Indeed. According to a newer article [0] on the same web site:

> Some Seagate Barracuda Compute and Desktop disk drives use shingled magnetic recording (SMR) technology which can exhibit slow data write speeds. But Seagate documentation does not spell this out.

---

[0]: https://blocksandfiles.com/2020/04/15/seagate-2-4-and-8tb-ba...

yellowapple · on April 15, 2020

> Personally, I like to make RAID mirrors with drives of different brands, or at least different models. Even it it is not ideal for performance, it helps making sure that not all drives fail at the same time.

This is exactly my approach lately, and it works great for RAID10 arrays (where you only need 2 or 3 different vendors - one per side of the mirror - to mitigate shared bathtub curves). I usually go with SSDs nowadays, though (half Crucial, half Kingston), both for performance reasons and because they rebuild faster (and also because the limited lifetime of flash memory makes some sort of RAID essential for longevity, though this has been less of a problem with modern flash tech); if I really needed the capacity of magnetic media I'd probably opt for a WD / Toshiba split.

aspenmayer · on April 16, 2020

What do you recommend for Crucial and Kingston SSDs?

I’m familiar with Crucial BX500 and Kingston UV500. They are both SATA and are decent performance considering the limitations of SATA 3. If price isn’t a factor or performance is important I’d go with a Samsung 860 EVO or Pro on SATA. I’ve also used a smaller number of SanDisk by WD SATA SSDs and they work fine as well. Had a bad Samsung 840 series but I think failure to implement proper overprovisioning was to blame. Also had a bad Intel SATA SSD but it was an older generation product and had a small sample size of only a few Intel SSDs, so not writing Intel off; yet it’s suffice to say their products don’t seem price competitive with Samsung’s offerings.

As for NVME SSDs I have used and installed probably ten so far but I think every single one so far has been a Samsung 9xx m.2 and I’ve been extremely satisfied with all of them, especially the 960 series EVO and Pro and moreso the current gen 970 EVO Plus and Pro.

yellowapple · on April 16, 2020

Honestly I just go with whatever's cheapest for a given target capacity (which, judging by my Amazon and Newegg purchase history, would be the Crucial MX500 and Kingston V300). Any SSD's almost always "fast enough" for my purposes (even with SATA instead of NVMe), and Kingston and Crucial have both been consistently reliable enough that I haven't felt the need to splurge much (and if they do fail... well, that's what RAID's for, right?).

Both the desktop on which I'm typing this comment (my personal "gaming" rig) and my work laptop (a Thinkpad T470) are using Crucial MX500s (1TB, M.2); the former has a couple extra M.2 slots, so I'm considering migrating to a Crucial P1 + Kingston A2000 combo (thus biting the bullet and switching to NVMe, and getting RAID going on this machine). Most of my 2.5" purchases tend to be Kingston V300s unless the MX500 happens to be cheaper (e.g. most recently for an old XP computer I'm rebuilding for a family friend) or I'm specifically buying both for the aforementioned purpose of RAID diversification (which is a bit tricky, since Crucial and Kingston use different increments for capacity, but I'm also perfectly willing to pull a bit of a SSD "no-no" and use the leftover space on whichever drive for a swap partition, or else just let the space sit unused, or perhaps use it for temporary files where I don't care about any sort of data preservation).

In any case, no complaints with any of 'em. I haven't tried the P1 or A2000 yet (let alone Kingston's pricier KC2000), so I can't attest to those, but the MX500 and V300 have both proven to be reliable workhorses.

I'll definitely be sure to try Samsung again at some point; the first SSD I ever bought was a Samsung (at Fry's), and I had pretty awful experiences with it, but that was back when SSDs were a pretty new concept so I'm guessing things have generally improved.

em-bee · on April 15, 2020

Personally, I like to make RAID mirrors with drives of different brands, or at least different models. Even it it is not ideal for performance, it helps making sure that not all drives fail at the same time.

i do that too, and effectivele that means i had to get one of each brand available. the choices are really limited here.

dehrmann · on April 15, 2020

There are really only two or three HDD makers out there, so you really can't take brand too seriously.

userbinator · on April 16, 2020

I think Seagate has always been low on reliability; the ST3000DM001 is an outlier but I've heard far more reports of data loss from them than any other brand. The number of threads in HDD recovery forums that refer to a particular brand makes for an interesting analysis... and even before that outlier they were below WD. From what I've seen, WD is somewhere in the middle.

tatersolid · on April 17, 2020

My beard must be grayer than yours. In the 1990s and early 2000s, Seagate was the gold-standard for reliability, and they were priced accordingly. We literally bought nothing besides Cheetahs for our data center for more than a decade. So did everyone else.

devonkim · on April 15, 2020

Past performance does not indicate future return, but so far I'm seeing no reason not to distrust Toshiba hard drives at the very least given the Backblaze stats. I've bought 8 Toshiba drives with zero problems over several years on aggressively used drives but I've had several failures across drive families over the past 10 years with Western Digital drives of different batches and lines.

One of the dangers of blending drives is that there are sometimes drive geometry incompatibilities that can screw up RAID. Saw someone show they had drives that are a few cylinders off and can't replace drives now with certain replacements.

JadeNB · on April 16, 2020

> Past performance does not indicate future return, but so far I'm seeing no reason not to distrust Toshiba hard drives at the very least given the Backblaze stats. I've bought 8 Toshiba drives with zero problems over several years on aggressively used drives but I've had several failures across drive families over the past 10 years with Western Digital drives of different batches and lines.

Are there too many negatives there? It seems that you first say that you distrust ("no reason not to distrust") Toshiba drives, but then you say that your experience with them has involved no problems.

devonkim · on April 17, 2020

Should probably drop the dis- , thanks. Too many conditions in a row too fast is how I get bugs in my code just like in my written sentences :P

basch · on April 15, 2020

does WD still make new HGST products? How can you tell which drives are from which team, or have they been blended together?

effie · on April 15, 2020

Some HGST tech is Toshiba now, some of it is WD. Hard to know whether WD drive is made up to HGST snuff. If you want the best, buy helium WD drive labeled HGST. If you can't get it, then WD or Toshiba helium datacenter drives should be close.

basch · on April 15, 2020

That's what I'm wondering though. Nearly all the WD data center drives are marketed as Ultrastar. Is this marketing, or did the HGST entity supplant WD enterprise during "integration" in the last two years? https://www.westerndigital.com/products/data-center-drives#h...

It looks like the only surviving product names are Ultrastar and CinemaStar. Nothing is labeled HGST anymore. Most of the HGST Helium disks were SMR, if I'm not mistaken. As of now, given the two HelioSeal series, one of the two is SMR.

Side Note: Hitachi had such a great thing going with their Schoolhouse Rock meets Dino DNA commercial. I wish this would have continued. https://www.youtube.com/watch?v=xb_PyKuI7II

cptskippy · on April 15, 2020

> based on backblaze data

I don't get that at all.

I've been in the WD camp since the 90s but Backblaze's data has always ranked them around "better than the worst" and slowly eroding my confidence in them.

For me, Backblaze's Stats have changed my perception of HGST who I've held in low regard since they were IBM with the whole Deskstar issue.

mst · on April 15, 2020

The deathstars were amazing.

The 60Gb ones that everybody screamed about worked perfectly if you only partitioned them out to 58Gb.

Best value for money on the market at the time so long as you remembered you had a 58Gb hard drive and behaved accordingly.

Dylan16807 · on April 16, 2020

Can you elaborate on that? I can't find anything else about this.

mst · on April 17, 2020

They were well known for failing.

The ones I got only failed on the outer edge.

Worked out nicely cost-effectiveness wise if you didn't trigger the failure mode.

mindslight · on April 15, 2020

The ridiculous thing is that nobody would have batted an eye if they did this with Green/Blue drives, especially if they just owned it upfront. I still don't particularly get the difference between Red/Green, or even Red/Blue for that matter, apart from opaque firmware tweaks about power saving, and some hand waving about chassis vibration. SMR/PMR would have been an obvious partition.

I still can't fathom how companies can be so oblivious about their actual customer market. People speccing individual hard drives are integrating their own systems and want to care about the details, not just some colorful indicator of "good/better/best".

ethbro · on April 15, 2020

You're talking about an industry that has no problem shipping known-unreliable drives for decades at a time.

They likely rebate / warranty their important hyperscale cloud, enterprise, & OEM customers. According to whatever custom contract they've negotiated.

And don't really care about the retail segment, given the margins. Most people probably won't notice. And those that do won't be able to do anything.

mindslight · on April 15, 2020

They apparently still care enough to create all those Red/Green/Black/Blue/Purple/Gold lines. Even if those are really just a few different #defines and different ink in the label printer, they're still additional SKUs. Also the margins seem plenty high given you can shuck 8TB Easystores for $130, with the retail version at ~$200.

And yeah individuals can never really do anything in the short term. Eventually there will be a suite of best practices to test a new drive to make sure it's not SMR, and likely some tweaks to filesystems to ratelimit rebuilds, based on knowing which drives are actually SMR. But they're arbitraging away brand loyalty to WD Red, for a short gain. Given the goodwill they enjoy(ed) from that Blackblaze study creating a refrain of "don't buy Seagate", this just seems foolish.

ethbro · on April 15, 2020

The product lines were driven by the fact that margins were low: attempting to decommodify via artificial use segmentation.

Shucking has always had weird economics. I think mostly because it's used by the manufacturers as a release-value for excess capacity, in an attempt to somewhat avoid the memory chip boom-bust cycle.

novok · on April 15, 2020

I though the red drives had more physical vibration dampening (rubber?) than the other ones.

sitkack · on April 15, 2020

No file system should be doing random writes, and when you have a cluster, there is always an open stream you can append to. I agree this is a bogus move, but tech with analogous limitations but also similar enhancements will make it into more things.

If risk and failure were spread more evenly, systems would behave more like gas and less like a ceramic.

wtallis · on April 15, 2020

> No file system should be doing random writes,

A typical size for a SMR zone is 256MB, and any writes that aren't appending sequentially to what's already been written to that zone count as "random writes" on a SMR drive. That's an absolutely massive block size to expect filesystems to work around, and I doubt there are any commonly used filesystems that can store their metadata in a SMR-friendly way.

sitkack · on April 15, 2020

Metadata belongs in persistent storage (PMEM or flash), file contents can striped onto a magnetic disk. Flash suffers from the same problem. Rewriting a file should be appending binary diffs to a log and reapply the writes against the first version into a new location on the cluster.

SMR is extremely painful for anyone except those that can handle distributed storage systems. I think the only place where SMR makes sense for home users is in a DVR, or game assets where a virtual disk is streamed down from a cloud provider in a single pass, and no interior mutations.

wtallis · on April 15, 2020

Ok, so when you say "no file system should be doing random writes", you should make it clear that you're not trying to describe file systems that actually exist in any sort of production-ready state. I would generally agree with a statement that new filesystems should be designed to avoid random writes except to solid state media. But for users wanting to deploy a real filesystem now to commodity hardware, the recommendation has to be "avoid SMR", not "use a smarter FS and tiered storage hardware".

sitkack · on April 16, 2020

Correct. SMR should be sold to tiered storage providers OR SMR drives should include a scratch flash or pmem device to hide the large write amplification of rewriting a block (ala back to tiered storage).

HN is fringe, I am not making statements about people with drobos.

Dylan16807 · on April 16, 2020

For many copy-on-write or log-structured filesystems it's not particularly hard to adapt them to such a block size, but I don't think it has actually been done on a major scale yet.

tracker1 · on April 15, 2020

You expect a networked file system accessed by multiple users to not have random writes?

sitkack · on April 15, 2020

For some value of expect, yes.

vermaden · on April 15, 2020

Looking at Backblaze I would recommend only Hitachi and Toshiba drives ...

elabajaba · on April 15, 2020

Western Digital bought HGST in 2012, started folding them into their business in 2015, and phased out the HGST brand in 2018.

fishpen0 · on April 15, 2020

I worked for Hitachi during that time period and that is mostly true. Hitachi actually sold the division to WD in exchange for private ownership shares of WD with insane leverage because WD had most of their factories flooded in 2011.

WD Drives were actually HGST drives with a different label for several years.

stiray · on April 16, 2020

I have stopped buying WDS and Seagate (their 3tb drives failed sequentially in my zraid in 4 months and I just barely replaced them with new). And I am not buying even brands that were merged with them (here is picture how hdd brands "consolidated" https://www.mydatarecoverylab.com/wp-content/uploads/2018/07...). Those companiea must be punished (by not buying their products) for all the mischief they have done to our data for the sake of maximising revenue, they need to be aware it will hurt their bussiness.

Now I am having 3xToshiba drives and one Hitachi and for now I am happy.

8fingerlouie · on April 15, 2020

Does the revelation that they use SMR somehow change their reliability or performance ?

I fail to see how it makes the drives any less useful only because it is revealed that it is SMR and not PMR.

It is models from 2TB to 6TB, meaning pretty much all NAS drives purchased in the past decade. I don't know about yours, by my drives perform just as well as they did yesterday.

simias · on April 15, 2020

>Shingled media recording (SMR) disk drives take advantage of disk write tracks being wider than read tracks to partially overlap write tracks and so enable more tracks to be written to a disk platter. This means more data can be stored on a shingled disk than an ordinary drive.

>However, SMR drives are not intended for random write IO use cases because the write performance is much slower than with a non-SMR drive. Therefore they are not recommended for NAS use cases featuring significant random write workloads.

[...]

>We brought all these points to Western Digital’s attention and a spokesperson told us:

[...]

>"In device-managed SMR HDDs, the drive does its internal data management during idle times. In a typical small business/home NAS environment, workloads tend to be bursty in nature, leaving sufficient idle time for garbage collection and other maintenance operations."

That seems like a pretty notable difference to me. You probably want to know that ahead of buying the drive.

p1necone · on April 15, 2020

Surely the use cases for NASs are pretty varied - they're general purpose storage.

derefr · on April 15, 2020

> Therefore they are not recommended for NAS use cases featuring significant random write workloads.

Emphasis mine: a NAS featuring "significant random write"s, is not the usual NAS, and might not actually be best described as a NAS at all.

In WD's terms, they'd tell you that if you've got e.g. your /var partition mounted over SMB to your NAS, then your NAS is receiving a Desktop workload, and so you should use their Desktop drives in it, not NAS drives. (Consider: the drives backing Amazon EBS—which receive Server workloads—are certainly not "archival media" disks, despite being nominally "network-attached storage" disks.)

The use-case for WD Red (NAS) drives is online archival media storage. Like S3 without the high availability. In other words, the thing most people buy a NAS for: backups and home Plex hosting.

This is why, I assume, WD don't literally call these drives "WD NAS", but instead just call them "WD Red." Just because you put them in a NAS, doesn't mean they'll function well there. You have to be using them for an idiomatic NAS workload. (Which also means that WD Red drives fare just fine in a desktop or server, too, if you're using them there purely for an idiomatic NAS workload.)

simias · on April 15, 2020

>In WD's terms, they'd tell you that if you've got e.g. your /var partition mounted over SMB to your NAS, then your NAS is receiving a Desktop workload, and so you should use their Desktop drives in it, not NAS drives.

That's still obfuscation bordering on scam IMO, if the drives have technical limitations they should be spelled out, not hidden behind broad terms like "NAS drive", especially when they're branded the same way as older drives that didn't have these limitations.

Also, you'll have to explain to me how having these drives perform correctly while rebuilding a RAID array is not a "usual NAS" scenario. Because reading TFA it seems like getting these drives in a NAS to begin with is a challenge on its own.

Teever · on April 15, 2020

Yeah, that seems pretty scummy to be. I was under the impression that "NAS" drives were objectively better than "Desktop" drives.

The clerk at the computer shop tried to upsell me on the NAS drives, chortled when I said "the "I" in RAID stands for "Inexpensive" and told me that I would regret my purchase.

It seems to me that my cheapness inadvertently paid off?

pdpi · on April 15, 2020

That's all well and good, but, given a sufficiently large array, it's a matter of when, not if, you'll need to recover from a failed disk, rebuilding a RAID array is precisely the expected sort of workload for a NAS — and a workload that reports indicate these drives fail at.

jojobas · on April 15, 2020

A rebuild is not a random write workload and is written in big chunks so they should hold up still.

NickNameNick · on April 16, 2020

Except they don't, because apparently SMR is slower to write than conventional recording, so the device buffers writes to conventional tracks, then uses idle-time to re-write the buffered data to SMR tracks.

But during a re-build there is no idle time...

detaro · on April 15, 2020

How much of the NAS market is "media storage" and how much of it is "file server in a small/medium business"?

derefr · on April 15, 2020

About 80:20 by units sold, I would say; leaning slightly more toward the latter by volume of drives sold, by the fact that businesses buy the larger NASes that take more drives in them, because they're actually somewhat serious about RAID recovery.

But you've got to further subdivide the use-case: a file server for a business is still not necessarily doing "random writes." You can create and replace as many small files on a filesystem as you like, and in a modern filesystem, you'll only be doing large batched writes to the disks themselves. It's only when you're writing within existing files—i.e. writing to a database of some kind—that you see a high number of random-write IOPS hitting the disk.

Mind you, there are a lot of things that turn out to have databases in them. A Chrome profile has a couple of SQLite databases in it. An iTunes library contains a database. Etc. These are the things that—if you stuck them on your NAS, and then pointed the relevant program at them from your PC—would constitute the NAS performing a "Desktop" workload. And, obviously, if you run Postgres on your NAS—or on a system that mounts a NAS disk over iSCSI—then that disk is doing a "Server" workload.

But the average business's use of e.g. shared Excel worksheets, does not "heavy random writes" make. Document/productivity software is almost always of the "on save, do a streaming overwrite of the whole file on the remote end" variety—which, again, is not a random-write workload, since the ftruncate(2) at the beginning allows the filesystem to grab a new free extent for the overwrite to write into.

Most software that knows that it can be used in a shared-remote-disk setting is built to accommodate that shared-remote-disk setting by attempting to minimize random writes (since this kind of workload has always been slow on the kind of storage used in NASes and SANs, especially when your filesystem or software RAID does checksumming.) These days, it's only the software that absolutely can't help it—that needs random writes to be performant at all—that still does them. And it's pretty obvious, to anyone using such software, that they're not using "NAS friendly" software. Mostly because it's so dang slow!

paol · on April 15, 2020

> Does the revelation that they use SMR somehow change their reliability or performance ?

Yes, as plainly stated in the article

> much all NAS drives purchased in the past decade

These are new models that have been introduced in the past year. You may not (probably don't) have any of them.

jsjohnst · on April 15, 2020

> It is models from 2TB to 6TB, meaning pretty much all NAS drives purchased in the past decade.

1. SMR didn’t really exist in the market a decade ago, definitely not for non-enterprise grade NAS drives

2. This is a fairly recent change to use DM-SMR and not explicitly market it as such

3. Your drive performance needs != to everyone else buying NAS drives needs.

snvzz · on April 15, 2020

>Does the revelation that they use SMR somehow change their reliability or performance ?

Yes, they do not perform anywhere like PMR on writing.

It's so bad they simply should _never_ be used on a RAID or Zpool, where they have such pathological behaviour on resilvers that can easily become the trigger for data loss.

I would never buy one of these voluntarily, as they'd cause me headaches.

So, definitely, I need to know whether what I am buying is SMR or PMR. And wd's behaviour is unacceptable.

mmastrac · on April 15, 2020

I wonder if this is why my unraid box has been consistently failing a parity check since I replaced some drives.

TwoNineFive · on April 15, 2020

Yes, they perform worse, but there is a bigger issue. They lie. The drives report themselves as being PMR when they should be reporting themselves as SMR. They actually lie about their technology type in the firmware.

callahad · on April 15, 2020

The article implies that's only recent revisions of the 2-6 TB products that use SMR, noting issues with "the latest iteration of WD REDs (WDx0EFAX replacing WDx0EFRX)"

bonestamp2 · on April 15, 2020

> I don't know about yours, by my drives perform just as well as they did yesterday.

My WD Reds do too, but that's because mine are two years old. The switch to SMR apparently only happened in the most recent model (WDx0EFAX).

johnchristopher · on April 15, 2020

> Does the revelation that they use SMR somehow change their reliability or performance ?

As I clearly stated: my concern is about brand damage.

Felger · on April 15, 2020

Seeing how brittle the SMR Seagate drives are (especially the rosewood family), I'm a little bit concerned.

reacharavindh · on April 15, 2020

Trust is a bitch - so hard to gain it, and very very easy to lose by doing shit like this. I was buying WD/HGST drives for work. Now, I have to test everything I buy before deploying them dammit.

stardude900 · on April 15, 2020

We are in the same boat.

prepend · on April 15, 2020

It seems pretty calculated to reduce costs while deceiving the buyer by omitting or having an incorrect spec sheet.

Would a class action suit help to prevent this kind of action in the future? Perhaps the punitive fine will make the cost calculations that led to this decision different since they will need to include the risk of judgement in the costs.

sircastor · on April 16, 2020

I would imagine not. The class action would settle, most consumers would get a trivial sum in the form of a check, and the litigating firm would get a large payout. The settlement would result in no one taking responsibility for anything.

prepend · on April 17, 2020

I was thinking in terms of being punitive. Litigation firm gets a lot. Buyers get a $4 check, WD pays $35M or something.

jl6 · on April 15, 2020

So is the independent product review industry so broken that nobody spotted this in testing?

sbx320 · on April 15, 2020

Aside from the other mentioned issues, a significant part is also the lack of retesting. WD Red drives aren't a fancy new thing and have been on the market for years now. As a result a test of the same drives is unlikely to give you a whole lot of revenue, since it's a known old product. People don't usually care a lot about yet another test with the same results. As a result retesting isn't really worth your time _unless_ you find something significantly changed (like in this case). Also running a useful test is non-trival as you'd need to source a significant amount of platforms, comparision products and setup a testing environment for it. Testing a brand new (CPU|GPU|SSD) instead would be much more likely to attract new readers and increase revenue.

Manufacturers have slowly started realizing this circumstance and are obviously able to exploit it by making products cheaper to manufacture down the line.

wtallis · on April 15, 2020

On the SSD side of the storage market, I know it's becoming more common over the past ~2 years for manufacturers to change drive internals without changing the model name/number (for consumer drives only). In the past year, I've had one brand proactively bring it up as part of a product announcement presentation and two other brands have mentioned it in the course of an extended conversation about their entire product line. For the one company that mentioned the hardware update in their press release, I asked if they could get me a review sample of the new revision, but the drive that arrived a few weeks later was the old version.

Most of the updates like this in the SSD market are actually pretty harmless—switching a SATA drive from 64L to 96L TLC generally isn't going to change the performance or power characteristics enough to care about, and is more likely to be beneficial than harmful. But when they start introducing QLC NAND into a product line that originally was TLC-only, that's a problem.

fierarul · on April 15, 2020

> Manufacturers have slowly started realizing this circumstance and are obviously able to exploit it by making products cheaper to manufacture down the line.

I believe this is a known technique in mass-produced markets. Called 'debasing' or something like that. Even new cars have more bells and whistles when first released while as the model gets a stead consumer base the manufacturer might reduce accessories or get cheaper versions, etc.

aspenmayer · on April 16, 2020

There’s an analogue here in the concept of shrinkflation. To my mind debasing occurs in capital purchase items that last years or for life to take advantage of the only opportunity to double dip on the single sale. Shrinkflation seems more applicable to commodity goods that are often consumable or single use, the hidden price increase of which causes larger impacts to ROI and and profitability as well as cost to the buyer. These factors combined make shrinkflation perhaps more impactful overall to the economy due to the frequency and necessity of such goods.

If you pay too much for too little on your car, oh well. Its probably too late to do anything about that now or you would have returned it to the seller already. For most people they settle for enough car for enough money. You can always add aftermarket parts or customize it, because there is a large secondary market waiting to make your outside investment even more worthwhile to your own judgement. If you pay too much for soap, it’s not even worth it to return it to the store in your time or your money so the market simply wins. It’s a rigged game akin to Las Vegas house games.

As buyers it’s just stressful being so wary all the time. I don’t know what the better way is but surely it must exist, we just haven’t made it profitable enough yet for the right people.

https://en.wikipedia.org/wiki/Shrinkflation

toolslive · on April 15, 2020

Yes, you will notice this, even if you do a simple write test with only big writes at full speed. After a while (about 2h in our tests) these drives become "tired". You need to leave them be for a few hours and then they are fine again. An added problem is that the usual candidate file systems (ext4, xfs,...) do not combine well with SMR.

Some people might be able to put these drives to good use, but only if they know up front that it's this kind of drive they are getting.

MisterTea · on April 15, 2020

That's a good way of explaining it and makes sense. I have 3x4TB WD Reds configured as an 8TB RAID 5 mdadm array in my Linux box. It's a sort of dumping grounds for backups/disk images, VMm images, etc. I would see initial write speeds of 150-200MB/sec but during large transfers moving 100's of GB, it would drop in write speed down to 50-60MB/sec. I thought it was a performance issue in the kernel or with mdadm and didn't bother to investigate further because I'm not a datacenter.

read_if_gay_ · on April 15, 2020

> After a while (about 2h in our tests) these drives become "tired". You need to leave them be for a few hours and then they are fine again.

Why does that happen?

jsjohnst · on April 15, 2020

> Why does that happen?

Because with DM-SMR drives there is usually a non-shingled area on the drive used as a buffer. Once the buffer is full, performance tanks dramatically as tracks are being rewritten. This is especially true in my experience when writing lots of smallish files vs several very large files.

jjgreen · on April 15, 2020

I've observed exactly this behaviour on TB volumes on AWS instances, is it known if they use this sort of drive?

ciceryadam · on April 15, 2020

Check I/O Credits and Burst Performance at https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volu...

gray_-_wolf · on April 15, 2020

That is probably more relates to IOPS credits I would think.

toolslive · on April 16, 2020

don't assume that if you hire a volume on AWS that you get a device. The thing you get is a virtual block device and is actually the entry point of a sophisticated piece of software that will spread your data over multiple devices (might be erasure coded, might be replicated, probably hierarchical, so data comes in on NVMe, is then moved to SSD and eventually to HDD). So whatever you're seeing, it's not the behaviour of a single HDD.

jjgreen · on April 16, 2020

Thanks for all the pointers!

Aissen · on April 15, 2020

Yes. I remember two years back when a small (really) independent french journal called CPC Hardware was able to publish a full review of the latest AMD CPU a month before everyone else in a bimonthly paper issue: they simply sourced the chip on eBay instead of signing NDAs like literally every other publication.

https://www.guru3d.com/news-story/cpc-hardware-amd-ryzen-7-2...

icefo · on April 15, 2020

CPC rocks ! I should renew my subscription, I let it lapse

tibbydudeza · on April 15, 2020

It is like those massive PC Magazine laser printer reviews decades back ... nobody cares anymore even the cheapest crud is good enough.

Reviews these days are rather all about GPU's and CPU's.

jl6 · on April 15, 2020

My guess is that a decent review of a hard drive takes a non-trivial amount of work: review unit acquisition, test setup, test execution, test repetition, analysis, writeup, publishing, ...

And that work needs to be paid for.

And nobody buys review magazines any more.

And advertising revenue on review sites can only pay for a small handful of SKUs to be reviewed. So only the sexiest, mass-marketest units get reviewed.

And the vendors fly under the review radar by flooding the market with so many variants that there isn’t enough funding to review more than a tiny fraction of them.

And by the time any headway is made into reviewing current models, new ones are released and suddenly nobody is interested in the old ones, further squeezing the window of opportunity for advertising revenue. And eliminating entirely the incentive for long-term testing.

Meanwhile, review sites realise they can get paid more in bribes/freebies/access than in advertising, and either outright compromise their reviews or only report the hits and never the misses.

And so consumers are flying blind, trying to extract signal from the noise of furious churn.

ourlordcaffeine · on April 15, 2020

>And so consumers are flying blind, trying to extract signal from the noise of furious churn.

I think this is true for almost any product nowadays that isn't in a ridiculously small niche.

Most non-niche products on Amazon I've found get a poor score on "FakeSpot".

I believe someone even admitted on reddit that he/she was actively creating fake reviews and fake stories about certain boot brands on relevant subreddits.

Unfortunately you can't trust anything online anymore. You need to spend hours or days researching the technology or product category in order to make a judgement on the products available.

thanksforfish · on April 15, 2020

I feel like "spec sheet omitted serious detail" is a good signal to avoid a specific brand even if that specific SKU isn't available.

pbhjpbhj · on April 15, 2020

Yes, I've been buying WD for what, 15-20 years (I'm just a domestic user, fwiw) .. these sorts of things signal "we've decided to burn the brand to make short term gains", like when venture capitalists take over.

pjc50 · on April 15, 2020

> Meanwhile, review sites realise they can get paid more in bribes/freebies/access than in advertising, and either outright compromise their reviews or only report the hits and never the misses.

Pretty much. Like the rest of news, accuracy is expensive, entertainment is cheaper, and anyway the audience prefer the latter.

tinus_hn · on April 15, 2020

Why would the test be difficult and not just automatic as you push a button?

wtallis · on April 15, 2020

Putting together a decent automated test suite is possible, but is a significant project for an individual drive reviewer. And any test suite you create is only good for a few years at most before you have to start worrying about new technologies that change performance or power characteristics in a way your tests can't capture. If you were putting together a hard drive test suite a few years ago to use for reviewing consumer/prosumer desktop and NAS drives, testing specifically for SMR behavior may not have been on your radar.

If you want to get really thorough, automation starts to be infeasible and unreliable: http://blog.stuffedcow.net/2019/09/hard-disk-geometry-microb...

skinkestek · on April 15, 2020

The only valid disk reviews I know about are those from Backblaze and they take some time to do ;-)

cm2187 · on April 15, 2020

But that's reliability, not performance.

For performance I think the problem is that not every single capacity is reviewed. Typically the top capacity in each range.

zozbot234 · on April 15, 2020

Well, the OP mentions that these drives are dropping out of RAID arrays during rebuilding operations. Some people might describe that as a reliability concern.

cm2187 · on April 15, 2020

Right. Though I doubt backblaze is using RAID in the first place.

prirun · on April 15, 2020

Backblaze does use RAID (erasure coding): data is striped across 20 drives, with 17 needed to complete a read. See:

https://www.backblaze.com/b2/storage-pod.html

"For Backblaze Vault Storage Pods each is one of 20 pods needed to create a Backblaze Vault. A Backblaze Vault divides up a file into 20 pieces (17 data and 3 parity) and places a piece of the file on each of the 20 Storage Pods in the Vault. We use our own implementation of Reed-Solomon to encode and distribute the files across the 20 pods, achieving 99.99999% data durability. We open-sourced our Reed-Solomon encoding implementation as well."

labawi · on April 15, 2020

They do not use RAID. They use userspace erasure coding of data stored in files on plain ext4 IIRC.

Maybe they use RAID for the OS.

jandrese · on April 15, 2020

It's software RAID. RAID isn't a piece of hardware, it's a system design.

labawi · on April 15, 2020

Yes it is a system design.

Backblaze's main reliability system is erasure coding[4] of shards of files, which is not RAID[1][2]. RAID mirrors disks or volumes on block or filesystem layers, not files in userspace.

That being said, I stand corrected in that they did use RAID at some point in time[3] (RAID6 on 13+2 drives), and may still be using some form of RAID. OTOH their reliability calculations don't seem to consider RAID - calculating with individual drive failures and shard rebuilds.

[1] https://en.wikipedia.org/wiki/RAID

[2] https://en.wikipedia.org/wiki/Non-RAID_drive_architectures

[3] https://www.backblaze.com/b2/storage-pod.html

[4] https://www.backblaze.com/blog/cloud-storage-durability/

zdkl · on April 15, 2020

As a lay person I don't understand why they wouldn't, could you give some more information as to why?

cm2187 · on April 15, 2020

For these large scale infrastructures, they typically use JBOD, i.e. no RAID whatsoever and they achieve redundancy by maintaining multiple copies of the data over multiple disks spread around the datacentre. So it's not hardware RAID that requires a certain response time (I suspect the drives mentioned here drop because they timeout as it take a long time to random write on an SMR disk), more like a distributed software RAID. I think they likely also have their own file system so they are not dependent on a fixed block size, which allows them to save space if they have loads of small files.

smueller1234 · on April 15, 2020

I think that's about half right: for large storage infrastructure, you would indeed eschew local RAID, but instead of storing multiple copies, you'd use Reed Solomon encoding to stripe a single copy across many disks/servers/failure domains with a configurable number of added parity stripes. Full copies are really expensive!

cm2187 · on April 15, 2020

I believe azure uses full copies [1]. I assume the other cloud providers must do the same.

[1] https://docs.microsoft.com/en-us/azure/storage/common/storag...

namibj · on April 15, 2020

Google seems to use erasure coding for some (iirc. multi-AZ nearline, or so) storage classes, last I checked.

evntdrvn · on April 15, 2020

https://www.backblaze.com/blog/reed-solomon/

klodolph · on April 15, 2020

RAID only protects you from disk failures, not machine failures. If you want to protect against machine failures (whole server offline), which might be an availability concern, then you would naturally want to replicate the data at a higher level. Once you’re doing that, it makes sense to only replicate the data at a higher level, because for a given level of safety, it is more efficient to replicate the data once, in one layer, than to replicate the data multiple times, at multiple layers.

RAID is only effective for workloads small enough that you care about a single machine.

sp332 · on April 15, 2020

It's a similar idea but they distribute the copies between machines, not just across disks in the same machine. https://www.backblaze.com/blog/reed-solomon/ So the drives in each box are not in a RAID configuration.

AnssiH · on April 15, 2020

Why is that happening?

Slow performance I could understand (though RAID rebuild is usually sequential, not random), but how does SMR cause these "dropouts"?

cm2187 · on April 15, 2020

Hardware RAID typically requires a disk to respond within a certain number of ms, either reporting a successful write or a failure to write. SMR, because it requires a whole section of the drive to be re-written everytime you need to modify a single block of that section, may have very slow response times, and result in timeouts. On a timeout, the hardware RAID controller assumes the disks is offline and drops the disk from the array.

The same happened with WD Green drives, which are not graded for RAID. Their error correction logic typically allows for a lot more attempts to read the data than a datacentre drive, resulting in timeouts and the drive dropping out of the array (which is why it is a very bad idea to put a consumer drive in a hardware RAID array).

Now WD Red NAS are meant for RAID arrays but I suspect WD assumed they would be used for software RAID only, which typically doesn't have those timeouts. But if so, it should be clearly stated.

effie · on April 15, 2020

Isn't that behaviour of Greens only a problem when the drive goes bad? In which case you want it to get kicked out of the array. Or do you mean this will happen to a new Green drive with too high a probability?

cm2187 · on April 15, 2020

The problem is that taking some time to read the data doesn't mean the drive has gone bad, at worst perhaps a sector has gone bad, and even then, it might even still be readable, just a bit slow. And apparently it is very common on large capacity green disks after a few months.

The typical bad scenario (and I have been burned by this) is that let's say you use RAID5, and one sector goes bad. While the disk tries to read it, the RAID controller kicks the disk out of the array because of the timeout. Now you need to replace the disk and rebuild the array. In the rebuilt, as you are doing a full read or all disks, you are pretty likely to find another sector that is slow to read on one of the other disks (particularly on multi-TB disks). And then the controller will kick another disk from the array. And now you lost everything.

Also why data scrubbing is pretty important in NAS.

effie · on April 15, 2020

Thanks for explanation. That was surely a bad experience. Luckily there are ways to overcome this incompatibility, such as increasing the timeout value for a drive in the OS, see the Linux wiki

https://raid.wiki.kernel.org/index.php/Timeout_Mismatch

cmurf · on April 15, 2020

They have long (deep) error recoveries, and most of them don't have configurable SCT ERC. If used on Linux, where the default command (queue) timer is 30 seconds, a bad sector could result in the drive not reporting success or failure well beyond 30 seconds at which point the kernel does a link reset thinking the drive is unresponsive. On SATA drives, resetting the drive clears the whole queue, all commands are lost, so it's not clear what sector has the problem. This prevents whatever healing properties RAID has. e.g. on md RAID (includes mdadm and LVM managed RAIDs these days) an explicit read error by the drive comes with a sector LBA value, and md will know what stripe its a member of (or its mirror) and will overwrite the bad sector with reconstructed data, fixing it. But if there are either write errors or a bunch of link resets, md will consider the drive faulty (kicking it).

https://raid.wiki.kernel.org/index.php/Timeout_Mismatch

It's important to know this kernel command timer and SCT ERC mismatch is (a) common and (b) affects mdadm, LVM, Btrfs and maybe ZFS RAID on Linux. I'm not really sure whether the kernel command timer applies to ZoL or if a vdev has its own policy.

effie · on April 15, 2020

Thanks for explaining. The kernel wiki describes an easy solution to this.

mrjin · on April 15, 2020

That's definitely a reliability concern IMO: critical operations cannot be completed in expected time.

duxup · on April 15, 2020

How many product reviews occur for these drives?

Were these new enough that anyone bothered?

frankharv · on April 15, 2020

I dont know of any independent review site besides BackBlaze. Most review sites receive product samples from the manufacturer. The linked article shows that the disk manufacturers cannot be trusted. I bet they did not send out DM-SMR drives to review sites.

syshum · on April 15, 2020

Is there even independent reviews any more?

pbhjpbhj · on April 15, 2020

"Which?" in the UK do pretty good consumer reviews for domestic goods IMO. Though in general they have a problem of always picking the most expensive products, usually they give enough info that you can get a substantially cheaper item with practically no loss.

layoutIfNeeded · on April 15, 2020

There are, but it's a moving goalpost. Consumers are chasing yet unbiased reviewers, while companies' marketing departments are busy bribing the next generation of reviewers.

At least in IT we have Stallman-esque zealots who refuse to be bribed. I don't think we have the equivalent in other markets unfortunately.

basicplus2 · on April 15, 2020

"The Synology forum poster said he called WD support to ask if the drive was an SMR or conventionally recorded drive..... the higher team contacted me back and informed me that the information I requested about whether or not the WD60EFAX was a SMR or PMR would not be provided to me. They said that information is not disclosed to consumers"

gowld · on April 15, 2020

I don't know or care about shingled, thatch, whatever, but I am not going to entrust my irreplaceable data to a vendor that refuses to tell me or anyone else what they are selling.

BadBadJellyBean · on April 15, 2020

That sounds rather deceptive

tibbydudeza · on April 15, 2020

My understanding is that the SMR drives are like SSD to a degree that they needs rather intelligent firmware to place the information on the disk.

At the moment everything is sitting behind the bogo SATA disk interface which was never designed for this type of storage - rapid burst writes to the 20GB onboard buffer area and than very slow I/O as it is emptied out.

Surely a native OS interface would be better for these drives.

mrjin · on April 15, 2020

It's much worse than that. For SSD, it's just the blocks need to be zeroed before being writing to, which is not even an issue if there is lot of free space.

But for SMR, the reading track is narrower than writing track. For example, reading track's width is 1 unit, writing track's width is 2 unit, there is a 1 unit overlap between two adjacent tracks.

And here is how it works, assuming writing sequentially: 1. Start with blank disk, and write to track #1. This is happy case, as no data there, we can simply write to it. 2. Write to track #2. Still can simply write to it. However, this will overwrite lower half of track #1, but as long as reading only need 1 unit, track #1 record is still good. 3. Write to the following tracks one by one, all good, same story with step #2, over writing half of previous track. 4. Repeat until track #n. 5. Okay, all of a sudden, now we need to change something in track #1. What would happen now: we cannot simply overwrite the stuff in track #1 as this will overwrite the top half of track #2, which is the reading track of track #2. So we'll have to read out track #2 and and write track #1 and then write back track #2, which in turn has to read out track #3 and write track #2 and write back track #3... 6. Repeat the previous operation until all impacted tracks are written back.

In reality, the situation could be slightly better: if the disk is almost empty, the firmware can always find somewhere to write to without reading out and writing back. But if the disk is relatively full, then the case could be much worse: every writing could technically trigger another reading out and writing back operation. So how long would it take to finish a writing operation can hardly be predicted.

The worse case is that the disk is full and the writing is targeting whole track #1, then almost the entire disk need to be refreshed so that it can complete the operation...

donmcronald · on April 15, 2020

Wow. That’s how I had visualized it in my mind when I read how they work, but I figured I was wrong because it seemed too dumb and I thought there’s no way it could work like that.

Thanks for explaining.

It would be amazing for someone to come up with a “worst case write” benchmark for SMR drives.

magicalhippo · on April 16, 2020

Don't they have some non-overlapping tracks every now and then so these "write avalanches" don't continue all the way?

donmcronald · on April 16, 2020

Yes. This [0] is over my head, but section 4.7 seems to indicate about 30 MiB at a time would be the worst case. I’m not certain though.

[0] http://blog.schmorp.de/data/smr/fast15-paper-aghayev.pdf

magicalhippo · on April 16, 2020

Very interesting paper, thanks!

The band size of 17-30MiB is a lot less than the 256MiB mentioned here by others.

nicoburns · on April 15, 2020

It's worse than that. SMR drives can't do random writes without rewriting large sections. So they are entirely unsuitable for general purpose workloads.

lallysingh · on April 15, 2020

Sounds like the same problem SSDs have, only really slow and shitty.

eeZah7Ux · on April 15, 2020

> SMR drives can't do random writes without rewriting large sections

Sounds like a perfect use-case for log-based filesystems.

eru · on April 15, 2020

Weren't journalling filesystems supposed to help with bad random write performance?

tveita · on April 15, 2020

File system journaling typically refers to a log that helps ensure filesystem consistency by recording attempted operations before they happen, but the system still has to do all the same random writes to the disk afterwards.

Some file systems (notably ZFS and Btrfs) use a copy-on-write approach that should in theory not require many random writes, but I don't think any of them are adapted for the large blocks you need to write to for SMR. (256MB?)

It should be possible to make a filesystem with drastically better performance on SMR drives given the right low-level access, probably based on something more like a log-structured merge-tree, like e.g. LevelDB.

One interesting example I found was Dropbox which seems to store data on SMR in big immutable chunks using raw Zoned Block Command access. https://dropbox.tech/infrastructure/smr-what-we-learned-in-o...

klodolph · on April 15, 2020

> It should be possible to make a filesystem with drastically better performance on SMR drives given the right low-level access, probably based on something more like a log-structured merge-tree, like e.g. LevelDB.

The filesystem would need to do periodic compaction. You would necessarily encounter situations where a write to “allocated” data would fail because the disk is out of space, which is a new, surprising error that would undoubtedly confuse some user-space programs.

eru · on April 20, 2020

You could go around that error by claiming a lower capacity.

All file systems take up a bit of space to store their own metadata etc. So users are already used to 'lose' a bit of space on formatting.

klodolph · on April 21, 2020

That wouldn’t fix it. As long as programs can write faster than the compaction algorithm frees space, you can run out of space.

The only solutions I see here are to block IO until the compaction algorithm frees space, or return ENOSPC. Both options violate assumptions that real-world programs make.

brandmeyer · on April 15, 2020

I think you actually mean log-structured filesystems.

They help with poor random write performance only at the cost of poor sequential read performance.

Both SMR drives and SSDs benefit greatly from sequential reads. A file that was randomly written will be read back in random order regardless of the reader's interest.

eru · on April 20, 2020

Yes, I meant log-structured filesystems.

isatty · on April 15, 2020

It seems like a rotational disk that supports TRIM is SMR but not all SMR disks support TRIM[0].

[0] https://github.com/openzfs/zfs/issues/10182