Hacker News new | past | comments | ask | show | jobs | submit login
Tell HN: Server Status
273 points by kogir on Jan 16, 2014 | hide | past | favorite | 120 comments
HN went down for nearly all of Monday the 6th. I suspected failing hardware.

I configured a new machine that is nearly identical to the old one, but using ZFS instead of UFS. This machine can tolerate the loss of up to two disks. I switched over to it early morning on the 16th, around 1AM PST.

Performance wasn't great. Timeouts were pretty frequent. I looked into it quickly, couldn't see anything obvious, and decided to sleep on it. I switched back to the old server, expecting to call it a night.

Then the old server went down. Again. The filesystem was corrupted. Again. So I switched back to the new server. During this switch some data was lost, but hopefully no more than an hour.

And here we are. I'm sorry that performance is poor, but we're up. I'll work to speed things up as soon as I can, and I'll provide a better write-up once things are over. I'm also really sorry for the data loss, both on the 6th and today.




By tolerating the loss of two disks, do you mean raidz2 or do you mean 3-way mirror?

Raidz2 is not fast. In fact, it is slow. Also, it is less reliable than a two way mirror in most configurations, because recovering from a disk loss requires reading the entirety of every other disk, whereas recovering from loss in a mirror requires reading the entirety of one disk. The multiplication of the probabilities don't work out particularly well as you scale up in disk count (even taking into account that raidz2 tolerates a disk failure mid-recovery). And mirroring is much faster, since it can distribute seeks across multiple disks, something raidz2 cannot do. Raidz2 essentially synchronizes the spindles on all disks.

Raidz2 is more or less suitable for archival-style storage where you can't afford the space loss from mirroring. For example, I have an 11 disk raidz2 array in my home NAS, spread across two separate PCIe x8 8-port 6Gbps SAS/SATA cards, and don't usually see read or write speeds for files[1] exceeding 200MB/sec. The drives individually are capable of over 100MB/sec - in a non-raidz2 setup, I'd be potentially seeing over 1GB/sec on reads of large contiguous files.

Personally I'm going to move to multiple 4-disk raid10 vdevs. I can afford the space loss, and the performance characteristics are much better.

[1] Scrub speeds are higher, but not really relevant to FS performance.


Reading 2 disks is not very much slower than 1. Its always the I/o bandwidth that is at issue, so it very much depends on the connectivity.

And it all pretty much impacts your customer. We've all suffered under the 4-hour mirror rebuild, the whole machine made inoperable by the constant disk load. The only way to alleviate that, is to design extra bandwidth (e.g. another cable and controller) that's used exclusively for recovery.

I've designed/build storage that worked that way, but it was for enterprise. The home user doesn't want to pay for extra bandwidth and then not use it most of the time. And if the DO use it, then they notice when a rebuild is triggered, and complain. Its catch-22.


>Reading 2 disks is not very much slower than 1. Its always the I/o bandwidth that is at issue, so it very much depends on the connectivity.

This was a problem in the old shared-bus u320 days... but now that we've got a 3 or 6 gigabit serial link to each disk? the bottleneck, unless you have some super-fancy SSD shit going on, is going to be getting the bits off the disk. I don't know of any spinning disk that can consistently saturate even a 2 gigabit link.

That's the thing... random access on spinning rust is staggeringly slow compared to almost everything else your computer does... and while a rebuild is sequential access, mostly, if you are trying to use the system during the rebuild? well, simultaneous sequential accesses become random access, so yea, your system is gonna suck during the rebuild anyhow. Add to this, well, disk diagnostics suck. Quite often a single disk will perform under-spec for some time before failing, slowing down the whole raid.

But SATA solved almost all of the bus bottleneck issues when it comes to disks.


Can you elaborate on the second controller? Are you referring to two disks on two controllers, using mdadm (or similar) to create a software RAID 1 across the two?


I'm not up on PC controllers these days. We were building a Fibre-Channel switch/router for Dell back then; all proprietary hardware.


Why not Raid 60 with Btrfs? It'll tolerate two disk loss with pro-active parity protection via btrfs and be faster and provide you with more disk space.


Do you feel comfortable with reliability of btrfs?

When I do a Google search, I want to see confirmed cases of fs lockups / data loss from fs bugs to be multiple years in the past. btrfs isn't there yet.

I like btrfs's ability to e.g. switch raid levels without offlining the array. I think it's promising. But I wouldn't trust my data to it tomorrow.


Raid 60 is not really that good. the rebuild time of a raid 6 slice of 12 disks is 36 hours. (sas 3tb disks)

Raid 10 is more expensive on space, but is the best compromise of traditional raid.

LSI have some fancy magic virtual disk chunking mechanism to make rebuidling large raid 6 luns much quicker (4-8 hours)


For a home NAS intended to stream video, my experience is 'no, don't'. I ran btrfs on my home NAS (on mdadm RAID1), and playing video from it was awful - pausing every few minutes. I thought it was just my misconfiguration of samba, but a rebuild using xfs fixed it, and a friend having the same problems on a btrfs (no raid) home media server now make me gun-shy. I don't need the benefits btrfs has on my box, so I'll stick with xfs.

I am very much not a filesystem expert, but my experience is that for me, btrfs is no benefits and significant problems.


They are running BSD. I don't think BtrFS is an option and ZFS can do pretty much every trick BtrFS can.

The data corruption on the first machine seems like a hardware problem.


ZFS has inbuild raiding and much much friendlier tools.

The BTRFS tools are like MDAM and lilo had a child.


Because BRTFS isn't even feature stable yet.


... or supported on FreeBSD, which is what the server runs.


The trend I'm noticing is people mentioning that if only HN was moved to <insert-cloud-provider>, problems would go away.

Instead of doing that, they probably dropped a bit more than a thousand dollars on a box, and are probably saving thousands in costs per year. This is money coming out of someone's pocket.

This site is here, and it's a charity, being provided free of cost, to you. Who cares if HN is down for a few hours? Seriously? Has anyone been hurt because of this, yet?


HN is not a charity, it is a marketing platform for YC with some community aspects.

There is a very strong bias to everything YC.

The HN community has also outgrown the software HN was built on you can see this in threads like: https://news.ycombinator.com/item?id=7051091

but even that thread is an extreme many front page items that gain traction are hard to go through because of things like lack of foldable comments. Other things that are extremely noticeable are expiring links which pg has said he doesn't think are important enough to fix. There are many small UI issues that won't be fixed for the community.


HN is not a charity, it is a marketing platform for YC with some community aspects.

Feels more like a community with some YC marketing aspects to me.


I disagree. It feels very YC driven.


Here's a bookmarklet that turns each timestamp into a comment tree folder (collapsing that branch):

https://gist.github.com/maxerickson/8456792

Credit to http://alexander.kirk.at/js/hackernews-collapsible-threads-v... which I referenced for dealing with the structure.



Not to discount the value of HackerNews, however there is great benefit to YCombinator being associated with it; They could let it go down and never host it again but they would lose value.


Yes, but I don't think such shortages do them much harm.


I think it's pretty bad that something run by YCombinator, a company trying to build billion dollar companies, has outages - and where they don't have immediate redundancy.


"do them much harm."

Hard to quantify "harm".

But I will say that anything that is a habit, when broken, opens you up to the possibility of being exposed to a new habit "the addiction".[1]

Along those lines 1 day of downtime probably isn't going to shift attention much. But an extreme, 2 weeks, would certainly break some addictions as people would fork to a new (I don't know insert some french or latin word here!) So who is to say where that slippery slope of is? (Somewhere between 1 minute and 2 weeks).

[1] Reason that I have heard that Starbucks renovates restaurants at greater expense and keeps them open at least some hours. Because people's habits are mercurial.


Yup. This is why practices like fasting, or change in environment / traveling is good to help you get out of routines - to see what you've been conditioned to and what habits you've formed.


Sadly, I agree. But only because there's not a viable alternative. It's like if YouTube goes down for an hour, everyone whines but really nothing changes.


I wouldn't say that. I think they are competitive because they focus on what is important and disregard the rest as much as they can. And high availability is clearly not what is important for success in this case. With Youtube I wouldn't be so sure...


These type of threads always end up being "cloud vs dedicated" when in my opinion it should be about the architecture. It shouldn't matter if you are running dedicated vs cloud assuming a fairly reliable provider. It should be about avoiding single points of failure and redundancy.

eg. If there is a hardware failure, can the site still remain operational? Of course resources cost money, but you could have 2 cheaper/smaller servers load balanced for around the price of 1 pricier/bigger server.


It's hardly a charity when it creates real value for ycombinator.


I'm sure it has been asked many times before, but I'd love to hear the latest thinking... Why in 2013 is HN still running on bespoke hardware and software? If a startup came to you with this sort of legacy thinking you'd laugh them out of the room.


this sort of legacy thinking

That's the kind of facile statement that makes people riotously mock the entire startup community, like "MongoDB is webscale" but even less valid.

Cloud services are not a panacea, and there are myriad situations in which running one's own infrastructure can be a good idea. What matters is that the issues and benefits are taken into account; if one can show research demonstrating that a custom infrastructure is cheaper, or more reliable, or less prone to legal issues, for example, then there's nothing to laugh at.

And remember that PaaS in particular can cost a buttload of money - I'm certain it's contributed to the downfall of more than one otherwise promising startup.


The advantage of cloud servers is if one experiences corruption or goes down you just kill it and start a new one. If your cloudy EBS equivalent experiences corruption, you restore from snapshot and off you go again. Either way it involves less downtime than HN seems to have. The downside is it costs more (usually, depends on how high your server management and data center overheads are.) I'd like to point out that despite several high profile down time incidents in the aws, never was there a case that I'm aware of where you couldn't just restore from your last snapshot to another availability zone or region.


You realize cloud services are vulnerable to data loss as well?[1] The cloud isn't some magic machine off in a datacenter somewhere. Its a bunch of servers and SANs just like what you or I would roll out if we needed bare metal infrastructure. The only difference is the extraordinary markup that you're paying amazon to use their servers.

[1]http://blogs.computerworld.com/18198/oops_amazon_web_service...


Yes, obviously. That's why I said you should restore from backups in that event. If you lose your backups on S3, congratulations, you had better odds of winning in your state lottery. The big difference is not just the price as you say, but the flexibility.


Before the cloud we just called those backups.


And how do you guarantee the validity of your snap shot? has your process been writing corrupt records for the last n hours and not notice?

Do you even have access to that snapshot if the system is down?

the cloud simply exchanges know levels of failure with unknown level of failure. You pay someone else to think about it for you.

If there is a massive data loss at amazon, and people have been backing up to glacier, the recovery times will jump from hours to days even weeks, simply because there aren't enough drives to recover from.

The cloud may be more reliable, but when it goes down, it'll gown down hard, leaving you high and dry if you don't have real backups.


The same with any other incremental backup. You have a backup retention policy and go back to the last good one. Being in the cloud doesn't change that.


Yeah you can do private cloud as well. Docker lets you do that. It can cost a bit of money in the effort spent but payoff in the long run usually worth it - at scale.


Yes, it's great if you want a lot of multi-tenant instances on one box. Why would HN benefit from that, though?


HN is about hosting a forum for intelligent people to communicate various intellectual topics, not about being a dev/ops challenge. There is no good reason that they can't run this on some services/hardware that are more stable than what they've got. People behind the YC community (VCs investing in the HN companies and many of the successful founders) are millionaires+ so the "price" argument really dons't make sense. YC should be setting the example of how to do it right.


it's definitintely not more reliable in this case...


Bespoke hardware is much cheaper, has much more consistent performance, and a beefy dedicated machine is much faster than a large AWS slice. If you can run your entire web app on a single server you have much less infrastructure to worry about, which is a big plus. All this vertical vs horizontal scaling nonsense makes your software so much more complex, and with a tiny bit of optimization it's often not needed.

Besides, HN also uses cloudflare as CDN and DDOS protection, so it's not like they're stuck in 1999.


If you can plan your hardware capacity, which is not that hard for a simple website like HN, it becomes far more cost effective to run on real hardware, even with a large overhead.

This is quite hard for startups to do, because their core function is to keep adding and removing features, experimenting and scaling, and the initial cost of buying hardware to support these functions is too large. This plus all the services provided by AWS et al. saves a lot of time and effort.

Nothing to do with "cloud" vs "non-cloud".


Periodic HN downtime helps software businesses be more productive.

Repeated Disk corruption maybe indicates a hardware problem unless the file system isn't up to snuff. Generally those file systems are pretty reliable.


FWIW, our start-up began on AWS during early days, but finally due to cost constraints, migrated most of our servers to our own h/w setup. We had substantial number of servers of varying types on AWS (live servers, full hadoop cluseters etc..), on east and west regions, lots of EBS storage etc.., so not some trivial setup.

We really liked AWS, and the support we got for most of the times was good, but as we had 24x7 live traffic, even with reserved instances, the cost finally caught up to justify the move to our own hosting solution.

One thing that the made the migration easier for us was that from day 1, we decided to treat AWS as a co-location, hence we set things up with the usual open-source s/w stack (i.e. avoided proprietary Amazon solutions like dynamo, messaging solutions etc.. maybe just used S3 for offline archiving of logs. It was tempting to build on their components, but ended up building it ourselves.), including own own hadoop setup. When the time came, we could easily migrate things out.

Hope this gives some perspective. Still a fan of AWS, and if I were do another start-up, would follow the same script all over again.


Maybe because it's fun? What's the point of getting rich if not to do what you really want?

Also, I suspect that if a startup came to them with a loyal following like HN enjoys, they wouldn't be laughed out of the room at all...


If HN was on AWS, where would we go to discuss AWS outages?


There are downside to AWS other than reliability (a la Netflix). One of those is privacy. If you host with AWS Amazon has access to all of your user data, and with Amazon likely the NSA. If you want to maintain a free and open forum for people to express support for people like Snowden and Assange and question the NSA or military / industrial complex' percent of the federal budget (>50) then you have to run your own servers in a skilled, ethical, small to mid-sized ISP's facility.

This doesn't explain the penny-wise (IMO) allocation of hardware or dev/sysadmin resources of course.


Just don't put it in AWS US East, and you can talk about AWS outages just fine.


if a startup came to YC with on the order of 10k uniques within minutes of posting a top link, and the kind of user accounts and engagement that HN has, they wouldn't care if it was written in Perl 2.0 running on a repurposed old 747's partial fly-by-wire computer. Really, Nobody Cares.

Dropbox was an rsync executed every two minutes.[1] AngelList was done via email.

[1] can't find this reference atm, could be apocryphal


HN isn't a mission-critical business. It's a fun, experimental distraction (albeit one that we all rate very highly, of course).


> It's a fun, experimental distraction

It seems to incur maintenance costs and reputation hits that are avoidable though, isn't that reason enough to invest some time in building a reliable solution?


No-one's going to say "Yeah I was going to apply to YC but HN keeps going down so I don't think I will after all".


Speak for yourself


Could you elaborate on why that's a factor at all? Given that YC's core business has nothing to do with HN, I wonder if the idiosyncrasies of HN aren't actually contributing something like intermittent reinforcement training to community members.


Why's it a factor at all? Because I wouldn't want to work with an organization that half-asses things. I don't understand the second half of your comment.


"Positive reinforcement" would be a bar which gives some food to a rat who pushes it three times. If a rat is hungry, he pushes the bar enough times to get as much food as he needs. "Intermittent reinforcement" is a bar that gives some food after a random number of presses each time. A rat will push the bar frantically until he's exhausted, even if he's not very hungry. The technique works pretty well on humans too. https://en.wikipedia.org/wiki/Reinforcement#Intermittent_rei...


And also because that is the hallmark of a true hacker - roll your own - rather than rely on a ready made solution!


Kids these days. Next thing you say you can't whistle at 2400 baud. Sheesh.


2400 Hz, if I'm not mistaken. Baud indicates how many signals you can send in a second. Hz is the wave frequency, which you have to mimic for the signaling tones.


No, 2400 baud - phase shifting is used encode the bits, so the actual frequencies used are complex. 300 baud used two tones for Tx and two tones for Rx so it is easy to whistle (albeit the characters coming out are garbage). Higher baud rates are probably impossible to whistle because the encoding is much more complex. I never tried, acoustic couplers didn't work above 300 baud. ;-)

Ref: http://en.wikipedia.org/wiki/Modem#1200_and_2400_bps


I am pretty sure he meant 2600Hz anyway

Besides everyone knows, nobody whistles 2600Hz, they just get the toy out of the Cap'n Crunch box to do it for them.

http://en.wikipedia.org/wiki/John_Draper#Phreaking


Or rehabilitate an old payphone with an in-house PBX, and then phreak it with a modded Radio Shack tone dialer. http://www.citizenengineer.com/


Well, no. It was a joke, so it was 2600 baud, not Hz.


Yes, there were a chain of people who didn't get the joke, I was just pointing out that 2400Hz wasn't a thing either, though now I see it was a thing, maybe still is, thanks Wikipedia[]

: http://en.wikipedia.org/wiki/Blue_box


Because it is actually 2014 :)

Kidding aside, running a bare metal server you own or rent is always cheaper, assuming you cannot save money by turning things on and off and know you capacity needs. Sure HN grows but not nearly as fast as FB, Twitter, etc. the service is big enough where it would require expensive virtual servers. This is the best case scenario for a physical hardware box run by people who are familiar with such things.


hn is mostly a single process application, so dedicated hardware is really the only way they can scale performance (and adding stuff in front like a proxy, cloudflare, etc.)

They even went out of their way to get the fastest single core xeon possible, which is a fairly mid-range CPU, singe they care about process performance vs. overall.


because it's cheap and works my dear fellow.

Yes it may seem archaic, but it a real server has real disk IO, something which on amazon and the like doesn't come cheap


It's 2014 already!


If a startup came to you with this sort of legacy thinking you'd laugh them out of the room.

In many circles, you would be laughed out of the room for cargo culting like this, and for trying to draw a bias against a legitimate deployment choice by declaring it "legacy".

There are many scenarios where hosting on your own hardware is a superior option for a variety of reasons: Financially, security, performance, flexibility.

In the case of HN it seems like it's on some pretty meager hardware, making compromises like software RAID. If this were a critical system for YC, they would have it on redundant machines with redundant, flash-based, hardware-RAID equipped platforms, clustered with redundant 10Gb cross connects, etc. Criticizing a deployment strategy because of the peculiar issues they have faced is like writing off AWS because someone's unbacked up small instance got killed and they had no strategy for it.


There's a lot of tuning that can be done on a ZFS setup to improve performance. I'm not a pro, so others will have more feedback and knowledge, but some things off the top of my head to get you started:

Add a flash memory based (SSD) ZIL or L2ARC or both to the box. That'll help improve read/write performance. I believe the ZIL (ZFS intent log) is used to cache during writes, and the L2ARC is used during reads.

You might want to look into disabling atime, so that the pool isn't wasting energy keeping access times on files up to date. Not sure if this is relevant with the architecture of HN or not. This can be done with

    zfs set atime=off srv/ycombinator
Finally, ZFS needs a LOT of memory to be a happy camper. Like 3-5GB of RAM per TB of storage.

I actually think you'll probably have a lot of fun with ZFS tuning, if that's the problem with news.yc. FreeBSD's page is pretty detailed: https://wiki.freebsd.org/ZFSTuningGuide


> I believe the ZIL (ZFS intent log) is used to cache during writes, and the L2ARC is used during reads.

I think the ZIL (zfs intent log) is an intermediary for synchronous writes only. My understanding is that it effectively turns the sync write into an async write (from the standpoint of the zpool) -- this is why it requires a faster device than the pool it is used with. If it is absent, the pool itself houses the zil.


ARC basically is a read cache.

ZIL improves performance of writes, but by itself is almost never read (only written to) except on failures. It will be read for example on power failure to finish writing the data to the disk. It is used to speed up synchronous writes.


Not really related but any update on releasing the HN code again?

[the current release is pretty old: https://github.com/wting/hackernews]


Being the sysadmin on a site frequented by sysadmins has to be frustrating at times.

Thanks for all you do!


This reminds me I'm still looking for a (pki?-)encrypted zfs snapshots as a backup service, /wink-wink @anyone

Hoping the box has ECC ram, otherwise zfs, too, can be unreliable (http://research.cs.wisc.edu/adsl/Publications/zfs-corruption...)


Tarsnap?


could possibly work if wrapped around zfs send, something plug-n-play would be nice though


Using DTrace to profile zfs:

http://dtrace.org/blogs/brendan/files/2011/02/DTrace_Chapter...

I'm sure other more experienced DTrace users can offer tips but I remember reading this book and learning a lot. And I believe all the referenced scripts were open source and available.


may i ask where are the machines hosted? is that on AWS? if not, why don't you move to a more reliable hosting, like AWS?


>if not, why don't you move to a more reliable hosting, like AWS?

Ah yes, because AWS has worked out so well for the reddit folk. :-)


Reddit is also a couple orders more active than HN. Also I thought Reddit's AWS issues were resolved years ago.


What makes you think AWS is more reliable?


from the downvotes i gather i may not ask...


Probably due to the fact that you levelled an unqualified, unsubtantiated claim that AWS is better than any alternative


That's not what he said. There's nothing about AWS being better than alternatives to AWS.

> "a more reliable hosting, like AWS"


Also because if you had RTFA you would have already known it's not on AWS


softlayer. which is generally much more robust (physical hw) vs. AWS.


on a nginx server on cloudflare


why don't you move to a more reliable hosting, like AWS?

As a fairly heavy ($x00,000/month) user of AWS, this is the funnniest and most-misguided thing I've read in a long time. AWS is horrible. It's extraordinarily bad, with all sorts of insanely complex failure modes to account for. There are some reasons to consider using it, but reliability is most definitely NOT amongst them.


i agree. we have taken customers from $x0,000/month AWS footprints to $x,000 month managed hardware footprints. an order of magnitude.

aws is a lot of things, but cheap and reliable aren't on that list.


I've been reading this site regularly for almost 7 years. 6-Jan-2014 is the only downtime I remember, and it was really a very minor inconvenience. Sucks about the data loss though, always hard to own that when doing system administration. Thanks for the explanation.


It's not the only one I remember. It seems like it's down for several hours every couple months.


Have you thought about perhaps open sourcing the server setup scripts for HN? I'd love (and I'm sure many others here) to help with the configuration. Perhaps a github repo for some chef recipies that people could work on given the current servers?


Thanks for the info!

Out of curiosity, do you have an idea about the source of the corruption problems?


The OP is saying "the loss of up to two disks", maybe a hard drive failure?


That's on the new server.


Assuming the disk footprint is small...

Would recommend a new SSD-based ZFS box (Samsung 840 Pros have been great even for pretty write-intesive load), with raidz3 for protection and zfs send (and/or rsync from hourly/N-minute snapshot for data protection which should eliminate copying FS metadata corruption, as not sure if zfs send will).

Happy to provide and/or host such a box or two if helpful.


Thanks for the update. No worries, it's just a news message board and no businesses are hurt when it's down. I quite enjoy seeing how these things are solved and I'm sure all will be forgiven if you post a meaty post-mortem.


ZFS instead of UFS on what, an Illumos derivative, FBSD, or actual Oracle Solaris?


FreeBSD.


I'd love to show you how HybridCluster, which automates ZFS replication and failover (FreeBSD + ZFS + jails), might be able to help. Relatedly, we've just announced free non-commercial licences which would be perfect for HN: http://www.hybridcluster.com/blog/containers-distributed-sto...


Are you bottlenecking on high iowait? or something else?

just one random bit to try... Obviously, I have no insight into your system and I'm not saying I know more than you or anything, but I've been seeing more situations lately where I had massive latency but reasonable throughput and the disks mostly looked okay wrt. smart, and I mostly just wanted to write about it:

[lsc@mcgrigor ~]$ sudo iostat -x /dev/sda /dev/sdb /dev/sdc /dev/sdd Linux 2.6.18-371.3.1.el5xen (mcgrigor.prgmr.com) 01/16/2014

avg-cpu: %user %nice %system %iowait %steal %idle

           0.00    0.00    0.05    0.02    0.00   99.93
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util

sda 0.70 75.11 35.66 1.38 4568.62 611.67 139.85 0.36 9.61 0.53 1.95

sdb 0.46 75.10 35.62 1.39 4566.77 611.67 139.89 0.22 5.89 0.45 1.66

sdc 0.80 75.14 35.63 1.35 4569.63 611.63 140.10 0.64 17.18 0.57 2.10

sdd 0.46 75.09 35.62 1.40 4566.60 611.63 139.87 0.13 3.47 0.40 1.49

(this is a new server built out of older disks that appears to have the problem. It's not so bad that I get significant iowait when idle, but if you try to do anything, you are in a world of hurt.)

Check out the await value. re-do the same command with a '1' after /dev/sdd and it will repeat every second. If sdd consistently has a much worse await, it is what is killing your RAID. Drop the drive from the raid. If performance is better, replace the drive. If performance is worse (and with raid z2, it should be worse if you killed the drive) the drive was fine.

(Of course you want to do the usual check with smart and the like before this)

The interesting part of this failure mode that I have seen is that /throughput/ isn't that much worse than healthy. You get reasonable speeds on your dd tests. but latency makes the whole thing unusable.


How about error page show the last static HN page? Most people just need likns


I used http://www.hckrnews.com/ for that :-) But I actually usually prefer reading the comments more than the links ^_^


Why on earth are you not using SSD's? The HN footprint can't be that large. The extra speed and reliability from a pair of SSD's has to far outweigh the costs.


I'd guesstimate that the READ load is served practically entirely from RAM (file cache) and the WRITE load is non-critical enough that it's done "eventually consistent" (e.g. synchronous_commit=off in PostgreSQL, or fsync=off elsewhere) - or at least that's how I'd run it. YMMV.


Failing SSDs get data corruption too.


Thanks for the writeup.


Maybe you could provide details on the current configuration and architecture and some suggestions could be made on how to improve. Just a thought.


I still like hardware RAID because it's conceptually simple and nicely isolated. Sometimes horrible things happen to it, though, too.

I didn't realize HN had enough disk storage needs to need more than one drive. I guess you could have 1+2 redundancy or something.


Don't worry about it. I visited facebook for the first time in years when hn went down. Is hn on linux using zfs or bsd?


The world would be better place if software could exist without hardware.


It already does, just take your pen and some paper and you're set.

Oh, and don't forget the aspirin, you'll need it...


I remember what software was like before hardware. It sucked!


I am curious to know the server configuration, architecture and the number of hits it is getting.

If someone does offer a new software architecture, and hosting, would people be open to move hackernews there?


Good you posted this, but it came a little late. After the first series of timeouts you could've posted an update so everybody knew what was going on. But hey, thanks for the update, this clears up a lot.


This passive-aggressive attitude is going to get you nowhere in life. Back handed complements are only cool in pre-teen TV shows.


Hi, I'm the CEO of http://www.clever-cloud.com/ and I'll be happy to help you on this, ping me on twitter : @waxzce


Can you prove that you are the CEO? I'm doubting between this being a bad prank from someone impersonating you, and you actually placing advertisements for your own company here.


I know he is, he has posted on HN several times before.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: