Tell HN: Server Status

barrkel · on Jan 16, 2014

By tolerating the loss of two disks, do you mean raidz2 or do you mean 3-way mirror?

Raidz2 is not fast. In fact, it is slow. Also, it is less reliable than a two way mirror in most configurations, because recovering from a disk loss requires reading the entirety of every other disk, whereas recovering from loss in a mirror requires reading the entirety of one disk. The multiplication of the probabilities don't work out particularly well as you scale up in disk count (even taking into account that raidz2 tolerates a disk failure mid-recovery). And mirroring is much faster, since it can distribute seeks across multiple disks, something raidz2 cannot do. Raidz2 essentially synchronizes the spindles on all disks.

Raidz2 is more or less suitable for archival-style storage where you can't afford the space loss from mirroring. For example, I have an 11 disk raidz2 array in my home NAS, spread across two separate PCIe x8 8-port 6Gbps SAS/SATA cards, and don't usually see read or write speeds for files[1] exceeding 200MB/sec. The drives individually are capable of over 100MB/sec - in a non-raidz2 setup, I'd be potentially seeing over 1GB/sec on reads of large contiguous files.

Personally I'm going to move to multiple 4-disk raid10 vdevs. I can afford the space loss, and the performance characteristics are much better.

[1] Scrub speeds are higher, but not really relevant to FS performance.

JoeAltmaier · on Jan 16, 2014

Reading 2 disks is not very much slower than 1. Its always the I/o bandwidth that is at issue, so it very much depends on the connectivity.

And it all pretty much impacts your customer. We've all suffered under the 4-hour mirror rebuild, the whole machine made inoperable by the constant disk load. The only way to alleviate that, is to design extra bandwidth (e.g. another cable and controller) that's used exclusively for recovery.

I've designed/build storage that worked that way, but it was for enterprise. The home user doesn't want to pay for extra bandwidth and then not use it most of the time. And if the DO use it, then they notice when a rebuild is triggered, and complain. Its catch-22.

lsc · on Jan 16, 2014

>Reading 2 disks is not very much slower than 1. Its always the I/o bandwidth that is at issue, so it very much depends on the connectivity.

This was a problem in the old shared-bus u320 days... but now that we've got a 3 or 6 gigabit serial link to each disk? the bottleneck, unless you have some super-fancy SSD shit going on, is going to be getting the bits off the disk. I don't know of any spinning disk that can consistently saturate even a 2 gigabit link.

That's the thing... random access on spinning rust is staggeringly slow compared to almost everything else your computer does... and while a rebuild is sequential access, mostly, if you are trying to use the system during the rebuild? well, simultaneous sequential accesses become random access, so yea, your system is gonna suck during the rebuild anyhow. Add to this, well, disk diagnostics suck. Quite often a single disk will perform under-spec for some time before failing, slowing down the whole raid.

But SATA solved almost all of the bus bottleneck issues when it comes to disks.

chrissnell · on Jan 16, 2014

Can you elaborate on the second controller? Are you referring to two disks on two controllers, using mdadm (or similar) to create a software RAID 1 across the two?

JoeAltmaier · on Jan 26, 2014

I'm not up on PC controllers these days. We were building a Fibre-Channel switch/router for Dell back then; all proprietary hardware.

shiftpgdn · on Jan 16, 2014

Why not Raid 60 with Btrfs? It'll tolerate two disk loss with pro-active parity protection via btrfs and be faster and provide you with more disk space.

barrkel · on Jan 16, 2014

Do you feel comfortable with reliability of btrfs?

When I do a Google search, I want to see confirmed cases of fs lockups / data loss from fs bugs to be multiple years in the past. btrfs isn't there yet.

I like btrfs's ability to e.g. switch raid levels without offlining the array. I think it's promising. But I wouldn't trust my data to it tomorrow.

KaiserPro · on Jan 16, 2014

Raid 60 is not really that good. the rebuild time of a raid 6 slice of 12 disks is 36 hours. (sas 3tb disks)

Raid 10 is more expensive on space, but is the best compromise of traditional raid.

LSI have some fancy magic virtual disk chunking mechanism to make rebuidling large raid 6 luns much quicker (4-8 hours)

vacri · on Jan 16, 2014

For a home NAS intended to stream video, my experience is 'no, don't'. I ran btrfs on my home NAS (on mdadm RAID1), and playing video from it was awful - pausing every few minutes. I thought it was just my misconfiguration of samba, but a rebuild using xfs fixed it, and a friend having the same problems on a btrfs (no raid) home media server now make me gun-shy. I don't need the benefits btrfs has on my box, so I'll stick with xfs.

I am very much not a filesystem expert, but my experience is that for me, btrfs is no benefits and significant problems.

rbanffy · on Jan 16, 2014

They are running BSD. I don't think BtrFS is an option and ZFS can do pretty much every trick BtrFS can.

The data corruption on the first machine seems like a hardware problem.

KaiserPro · on Jan 16, 2014

ZFS has inbuild raiding and much much friendlier tools.

The BTRFS tools are like MDAM and lilo had a child.

duaneb · on Jan 16, 2014

Because BRTFS isn't even feature stable yet.

jlgaddis · on Jan 16, 2014

... or supported on FreeBSD, which is what the server runs.

makmanalp · on Jan 16, 2014

The trend I'm noticing is people mentioning that if only HN was moved to <insert-cloud-provider>, problems would go away.

Instead of doing that, they probably dropped a bit more than a thousand dollars on a box, and are probably saving thousands in costs per year. This is money coming out of someone's pocket.

This site is here, and it's a charity, being provided free of cost, to you. Who cares if HN is down for a few hours? Seriously? Has anyone been hurt because of this, yet?

catinsocks · on Jan 16, 2014

HN is not a charity, it is a marketing platform for YC with some community aspects.

There is a very strong bias to everything YC.

The HN community has also outgrown the software HN was built on you can see this in threads like: https://news.ycombinator.com/item?id=7051091

but even that thread is an extreme many front page items that gain traction are hard to go through because of things like lack of foldable comments. Other things that are extremely noticeable are expiring links which pg has said he doesn't think are important enough to fix. There are many small UI issues that won't be fixed for the community.

Harj · on Jan 16, 2014

HN is not a charity, it is a marketing platform for YC with some community aspects.

Feels more like a community with some YC marketing aspects to me.

OafTobark · on Jan 17, 2014

I disagree. It feels very YC driven.

maxerickson · on Jan 16, 2014

Here's a bookmarklet that turns each timestamp into a comment tree folder (collapsing that branch):

https://gist.github.com/maxerickson/8456792

Credit to http://alexander.kirk.at/js/hackernews-collapsible-threads-v... which I referenced for dealing with the structure.

helicoidal · on Jan 18, 2014

https://userscripts.org/scripts/show/110317

loceng · on Jan 16, 2014

Not to discount the value of HackerNews, however there is great benefit to YCombinator being associated with it; They could let it go down and never host it again but they would lose value.

annnnd · on Jan 16, 2014

Yes, but I don't think such shortages do them much harm.

loceng · on Jan 16, 2014

I think it's pretty bad that something run by YCombinator, a company trying to build billion dollar companies, has outages - and where they don't have immediate redundancy.

larrys · on Jan 16, 2014

"do them much harm."

Hard to quantify "harm".

But I will say that anything that is a habit, when broken, opens you up to the possibility of being exposed to a new habit "the addiction".[1]

Along those lines 1 day of downtime probably isn't going to shift attention much. But an extreme, 2 weeks, would certainly break some addictions as people would fork to a new (I don't know insert some french or latin word here!) So who is to say where that slippery slope of is? (Somewhere between 1 minute and 2 weeks).

[1] Reason that I have heard that Starbucks renovates restaurants at greater expense and keeps them open at least some hours. Because people's habits are mercurial.

loceng · on Jan 16, 2014

Yup. This is why practices like fasting, or change in environment / traveling is good to help you get out of routines - to see what you've been conditioned to and what habits you've formed.

colinbartlett · on Jan 16, 2014

Sadly, I agree. But only because there's not a viable alternative. It's like if YouTube goes down for an hour, everyone whines but really nothing changes.

annnnd · on Jan 17, 2014

I wouldn't say that. I think they are competitive because they focus on what is important and disregard the rest as much as they can. And high availability is clearly not what is important for success in this case. With Youtube I wouldn't be so sure...

johne20 · on Jan 16, 2014

These type of threads always end up being "cloud vs dedicated" when in my opinion it should be about the architecture. It shouldn't matter if you are running dedicated vs cloud assuming a fairly reliable provider. It should be about avoiding single points of failure and redundancy.

eg. If there is a hardware failure, can the site still remain operational? Of course resources cost money, but you could have 2 cheaper/smaller servers load balanced for around the price of 1 pricier/bigger server.

whyme · on Jan 16, 2014

It's hardly a charity when it creates real value for ycombinator.

cincinnatus · on Jan 16, 2014

I'm sure it has been asked many times before, but I'd love to hear the latest thinking... Why in 2013 is HN still running on bespoke hardware and software? If a startup came to you with this sort of legacy thinking you'd laugh them out of the room.

matthewmacleod · on Jan 16, 2014

this sort of legacy thinking

That's the kind of facile statement that makes people riotously mock the entire startup community, like "MongoDB is webscale" but even less valid.

Cloud services are not a panacea, and there are myriad situations in which running one's own infrastructure can be a good idea. What matters is that the issues and benefits are taken into account; if one can show research demonstrating that a custom infrastructure is cheaper, or more reliable, or less prone to legal issues, for example, then there's nothing to laugh at.

And remember that PaaS in particular can cost a buttload of money - I'm certain it's contributed to the downfall of more than one otherwise promising startup.

slashdev · on Jan 16, 2014

The advantage of cloud servers is if one experiences corruption or goes down you just kill it and start a new one. If your cloudy EBS equivalent experiences corruption, you restore from snapshot and off you go again. Either way it involves less downtime than HN seems to have. The downside is it costs more (usually, depends on how high your server management and data center overheads are.) I'd like to point out that despite several high profile down time incidents in the aws, never was there a case that I'm aware of where you couldn't just restore from your last snapshot to another availability zone or region.

shiftpgdn · on Jan 16, 2014

You realize cloud services are vulnerable to data loss as well?[1] The cloud isn't some magic machine off in a datacenter somewhere. Its a bunch of servers and SANs just like what you or I would roll out if we needed bare metal infrastructure. The only difference is the extraordinary markup that you're paying amazon to use their servers.

[1]http://blogs.computerworld.com/18198/oops_amazon_web_service...

slashdev · on Jan 16, 2014

Yes, obviously. That's why I said you should restore from backups in that event. If you lose your backups on S3, congratulations, you had better odds of winning in your state lottery. The big difference is not just the price as you say, but the flexibility.

throwaway5752 · on Jan 16, 2014

Before the cloud we just called those backups.

KaiserPro · on Jan 16, 2014

And how do you guarantee the validity of your snap shot? has your process been writing corrupt records for the last n hours and not notice?

Do you even have access to that snapshot if the system is down?

the cloud simply exchanges know levels of failure with unknown level of failure. You pay someone else to think about it for you.

If there is a massive data loss at amazon, and people have been backing up to glacier, the recovery times will jump from hours to days even weeks, simply because there aren't enough drives to recover from.

The cloud may be more reliable, but when it goes down, it'll gown down hard, leaving you high and dry if you don't have real backups.

slashdev · on Jan 16, 2014

The same with any other incremental backup. You have a backup retention policy and go back to the last good one. Being in the cloud doesn't change that.

perlpimp · on Jan 16, 2014

Yeah you can do private cloud as well. Docker lets you do that. It can cost a bit of money in the effort spent but payoff in the long run usually worth it - at scale.

throwaway5752 · on Jan 16, 2014

Yes, it's great if you want a lot of multi-tenant instances on one box. Why would HN benefit from that, though?

joeblau · on Jan 16, 2014

HN is about hosting a forum for intelligent people to communicate various intellectual topics, not about being a dev/ops challenge. There is no good reason that they can't run this on some services/hardware that are more stable than what they've got. People behind the YC community (VCs investing in the HN companies and many of the successful founders) are millionaires+ so the "price" argument really dons't make sense. YC should be setting the example of how to do it right.

uccrost · on Jan 16, 2014

it's definitintely not more reliable in this case...

gizmo · on Jan 16, 2014

Bespoke hardware is much cheaper, has much more consistent performance, and a beefy dedicated machine is much faster than a large AWS slice. If you can run your entire web app on a single server you have much less infrastructure to worry about, which is a big plus. All this vertical vs horizontal scaling nonsense makes your software so much more complex, and with a tiny bit of optimization it's often not needed.

Besides, HN also uses cloudflare as CDN and DDOS protection, so it's not like they're stuck in 1999.

nullspace · on Jan 16, 2014

If you can plan your hardware capacity, which is not that hard for a simple website like HN, it becomes far more cost effective to run on real hardware, even with a large overhead.

This is quite hard for startups to do, because their core function is to keep adding and removing features, experimenting and scaling, and the initial cost of buying hardware to support these functions is too large. This plus all the services provided by AWS et al. saves a lot of time and effort.

Nothing to do with "cloud" vs "non-cloud".

acomjean · on Jan 16, 2014

Periodic HN downtime helps software businesses be more productive.

Repeated Disk corruption maybe indicates a hardware problem unless the file system isn't up to snuff. Generally those file systems are pretty reliable.

jrmenon · on Jan 16, 2014

FWIW, our start-up began on AWS during early days, but finally due to cost constraints, migrated most of our servers to our own h/w setup. We had substantial number of servers of varying types on AWS (live servers, full hadoop cluseters etc..), on east and west regions, lots of EBS storage etc.., so not some trivial setup.

We really liked AWS, and the support we got for most of the times was good, but as we had 24x7 live traffic, even with reserved instances, the cost finally caught up to justify the move to our own hosting solution.

One thing that the made the migration easier for us was that from day 1, we decided to treat AWS as a co-location, hence we set things up with the usual open-source s/w stack (i.e. avoided proprietary Amazon solutions like dynamo, messaging solutions etc.. maybe just used S3 for offline archiving of logs. It was tempting to build on their components, but ended up building it ourselves.), including own own hadoop setup. When the time came, we could easily migrate things out.

Hope this gives some perspective. Still a fan of AWS, and if I were do another start-up, would follow the same script all over again.

normalhuman · on Jan 16, 2014

Maybe because it's fun? What's the point of getting rich if not to do what you really want?

Also, I suspect that if a startup came to them with a loyal following like HN enjoys, they wouldn't be laughed out of the room at all...

trout · on Jan 16, 2014

If HN was on AWS, where would we go to discuss AWS outages?

pconf · on Jan 19, 2014

There are downside to AWS other than reliability (a la Netflix). One of those is privacy. If you host with AWS Amazon has access to all of your user data, and with Amazon likely the NSA. If you want to maintain a free and open forum for people to express support for people like Snowden and Assange and question the NSA or military / industrial complex' percent of the federal budget (>50) then you have to run your own servers in a skilled, ethical, small to mid-sized ISP's facility.

This doesn't explain the penny-wise (IMO) allocation of hardware or dev/sysadmin resources of course.

vacri · on Jan 16, 2014

Just don't put it in AWS US East, and you can talk about AWS outages just fine.

logicallee · on Jan 16, 2014

if a startup came to YC with on the order of 10k uniques within minutes of posting a top link, and the kind of user accounts and engagement that HN has, they wouldn't care if it was written in Perl 2.0 running on a repurposed old 747's partial fly-by-wire computer. Really, Nobody Cares.

Dropbox was an rsync executed every two minutes.[1] AngelList was done via email.

[1] can't find this reference atm, could be apocryphal

oneeyedpigeon · on Jan 16, 2014

HN isn't a mission-critical business. It's a fun, experimental distraction (albeit one that we all rate very highly, of course).

lazyjones · on Jan 16, 2014

> It's a fun, experimental distraction

It seems to incur maintenance costs and reputation hits that are avoidable though, isn't that reason enough to invest some time in building a reliable solution?

user24 · on Jan 16, 2014

No-one's going to say "Yeah I was going to apply to YC but HN keeps going down so I don't think I will after all".

lmm · on Jan 16, 2014

Speak for yourself

randallsquared · on Jan 16, 2014

Could you elaborate on why that's a factor at all? Given that YC's core business has nothing to do with HN, I wonder if the idiosyncrasies of HN aren't actually contributing something like intermittent reinforcement training to community members.

lmm · on Jan 16, 2014

Why's it a factor at all? Because I wouldn't want to work with an organization that half-asses things. I don't understand the second half of your comment.

sp332 · on Jan 16, 2014

"Positive reinforcement" would be a bar which gives some food to a rat who pushes it three times. If a rat is hungry, he pushes the bar enough times to get as much food as he needs. "Intermittent reinforcement" is a bar that gives some food after a random number of presses each time. A rat will push the bar frantically until he's exhausted, even if he's not very hungry. The technique works pretty well on humans too. https://en.wikipedia.org/wiki/Reinforcement#Intermittent_rei...

pitchups · on Jan 16, 2014

And also because that is the hallmark of a true hacker - roll your own - rather than rely on a ready made solution!

huhtenberg · on Jan 16, 2014

Kids these days. Next thing you say you can't whistle at 2400 baud. Sheesh.

wlesieutre · on Jan 16, 2014

2400 Hz, if I'm not mistaken. Baud indicates how many signals you can send in a second. Hz is the wave frequency, which you have to mimic for the signaling tones.

gvb · on Jan 16, 2014

No, 2400 baud - phase shifting is used encode the bits, so the actual frequencies used are complex. 300 baud used two tones for Tx and two tones for Rx so it is easy to whistle (albeit the characters coming out are garbage). Higher baud rates are probably impossible to whistle because the encoding is much more complex. I never tried, acoustic couplers didn't work above 300 baud. ;-)

Ref: http://en.wikipedia.org/wiki/Modem#1200_and_2400_bps

yebyen · on Jan 16, 2014

I am pretty sure he meant 2600Hz anyway

Besides everyone knows, nobody whistles 2600Hz, they just get the toy out of the Cap'n Crunch box to do it for them.

http://en.wikipedia.org/wiki/John_Draper#Phreaking

sp332 · on Jan 16, 2014

Or rehabilitate an old payphone with an in-house PBX, and then phreak it with a modded Radio Shack tone dialer. http://www.citizenengineer.com/

huhtenberg · on Jan 16, 2014

Well, no. It was a joke, so it was 2600 baud, not Hz.

yebyen · on Jan 17, 2014

Yes, there were a chain of people who didn't get the joke, I was just pointing out that 2400Hz wasn't a thing either, though now I see it was a thing, maybe still is, thanks Wikipedia[]

: http://en.wikipedia.org/wiki/Blue_box

IgorPartola · on Jan 16, 2014

Because it is actually 2014 :)

Kidding aside, running a bare metal server you own or rent is always cheaper, assuming you cannot save money by turning things on and off and know you capacity needs. Sure HN grows but not nearly as fast as FB, Twitter, etc. the service is big enough where it would require expensive virtual servers. This is the best case scenario for a physical hardware box run by people who are familiar with such things.

rdl · on Jan 16, 2014

hn is mostly a single process application, so dedicated hardware is really the only way they can scale performance (and adding stuff in front like a proxy, cloudflare, etc.)

They even went out of their way to get the fastest single core xeon possible, which is a fairly mid-range CPU, singe they care about process performance vs. overall.

KaiserPro · on Jan 16, 2014

because it's cheap and works my dear fellow.

Yes it may seem archaic, but it a real server has real disk IO, something which on amazon and the like doesn't come cheap

raimue · on Jan 16, 2014

It's 2014 already!

corresation · on Jan 16, 2014

If a startup came to you with this sort of legacy thinking you'd laugh them out of the room.

In many circles, you would be laughed out of the room for cargo culting like this, and for trying to draw a bias against a legitimate deployment choice by declaring it "legacy".

There are many scenarios where hosting on your own hardware is a superior option for a variety of reasons: Financially, security, performance, flexibility.

In the case of HN it seems like it's on some pretty meager hardware, making compromises like software RAID. If this were a critical system for YC, they would have it on redundant machines with redundant, flash-based, hardware-RAID equipped platforms, clustered with redundant 10Gb cross connects, etc. Criticizing a deployment strategy because of the peculiar issues they have faced is like writing off AWS because someone's unbacked up small instance got killed and they had no strategy for it.

whalesalad · on Jan 16, 2014

There's a lot of tuning that can be done on a ZFS setup to improve performance. I'm not a pro, so others will have more feedback and knowledge, but some things off the top of my head to get you started:

Add a flash memory based (SSD) ZIL or L2ARC or both to the box. That'll help improve read/write performance. I believe the ZIL (ZFS intent log) is used to cache during writes, and the L2ARC is used during reads.

You might want to look into disabling atime, so that the pool isn't wasting energy keeping access times on files up to date. Not sure if this is relevant with the architecture of HN or not. This can be done with

    zfs set atime=off srv/ycombinator

Finally, ZFS needs a LOT of memory to be a happy camper. Like 3-5GB of RAM per TB of storage.

I actually think you'll probably have a lot of fun with ZFS tuning, if that's the problem with news.yc. FreeBSD's page is pretty detailed: https://wiki.freebsd.org/ZFSTuningGuide

stock_toaster · on Jan 16, 2014

> I believe the ZIL (ZFS intent log) is used to cache during writes, and the L2ARC is used during reads.

I think the ZIL (zfs intent log) is an intermediary for synchronous writes only. My understanding is that it effectively turns the sync write into an async write (from the standpoint of the zpool) -- this is why it requires a faster device than the pool it is used with. If it is absent, the pool itself houses the zil.

takeda · on Jan 16, 2014

ARC basically is a read cache.

ZIL improves performance of writes, but by itself is almost never read (only written to) except on failures. It will be read for example on power failure to finish writing the data to the disk. It is used to speed up synchronous writes.

hartator · on Jan 16, 2014

Not really related but any update on releasing the HN code again?

[the current release is pretty old: https://github.com/wting/hackernews]

JayNeely · on Jan 16, 2014

Being the sysadmin on a site frequented by sysadmins has to be frustrating at times.

Thanks for all you do!

erkkie · on Jan 16, 2014

This reminds me I'm still looking for a (pki?-)encrypted zfs snapshots as a backup service, /wink-wink @anyone

Hoping the box has ECC ram, otherwise zfs, too, can be unreliable (http://research.cs.wisc.edu/adsl/Publications/zfs-corruption...)

lucb1e · on Jan 16, 2014

Tarsnap?

erkkie · on Jan 16, 2014

could possibly work if wrapped around zfs send, something plug-n-play would be nice though

shawn-butler · on Jan 16, 2014

Using DTrace to profile zfs:

http://dtrace.org/blogs/brendan/files/2011/02/DTrace_Chapter...

I'm sure other more experienced DTrace users can offer tips but I remember reading this book and learning a lot. And I believe all the referenced scripts were open source and available.

ishener · on Jan 16, 2014

may i ask where are the machines hosted? is that on AWS? if not, why don't you move to a more reliable hosting, like AWS?

thejosh · on Jan 16, 2014

>if not, why don't you move to a more reliable hosting, like AWS?

Ah yes, because AWS has worked out so well for the reddit folk. :-)

voidfunc · on Jan 16, 2014

Reddit is also a couple orders more active than HN. Also I thought Reddit's AWS issues were resolved years ago.

kalleboo · on Jan 16, 2014

What makes you think AWS is more reliable?

ishener · on Jan 16, 2014

from the downvotes i gather i may not ask...

oneeyedpigeon · on Jan 16, 2014

Probably due to the fact that you levelled an unqualified, unsubtantiated claim that AWS is better than any alternative

viraptor · on Jan 16, 2014

That's not what he said. There's nothing about AWS being better than alternatives to AWS.

> "a more reliable hosting, like AWS"

staz · on Jan 16, 2014

Also because if you had RTFA you would have already known it's not on AWS

rdl · on Jan 16, 2014

softlayer. which is generally much more robust (physical hw) vs. AWS.

defcon84 · on Jan 16, 2014

on a nginx server on cloudflare

hbags · on Jan 16, 2014

why don't you move to a more reliable hosting, like AWS?

As a fairly heavy ($x00,000/month) user of AWS, this is the funnniest and most-misguided thing I've read in a long time. AWS is horrible. It's extraordinarily bad, with all sorts of insanely complex failure modes to account for. There are some reasons to consider using it, but reliability is most definitely NOT amongst them.

beachstartup · on Jan 16, 2014

i agree. we have taken customers from $x0,000/month AWS footprints to $x,000 month managed hardware footprints. an order of magnitude.

aws is a lot of things, but cheap and reliable aren't on that list.

Goladus · on Jan 16, 2014

I've been reading this site regularly for almost 7 years. 6-Jan-2014 is the only downtime I remember, and it was really a very minor inconvenience. Sucks about the data loss though, always hard to own that when doing system administration. Thanks for the explanation.

elwell · on Jan 16, 2014

It's not the only one I remember. It seems like it's down for several hours every couple months.

conorh · on Jan 16, 2014

Have you thought about perhaps open sourcing the server setup scripts for HN? I'd love (and I'm sure many others here) to help with the configuration. Perhaps a github repo for some chef recipies that people could work on given the current servers?

nmc · on Jan 16, 2014

Thanks for the info!

Out of curiosity, do you have an idea about the source of the corruption problems?

hartator · on Jan 16, 2014

The OP is saying "the loss of up to two disks", maybe a hard drive failure?

Kudos · on Jan 16, 2014

That's on the new server.

avifreedman · on Jan 16, 2014

Assuming the disk footprint is small...

Would recommend a new SSD-based ZFS box (Samsung 840 Pros have been great even for pretty write-intesive load), with raidz3 for protection and zfs send (and/or rsync from hourly/N-minute snapshot for data protection which should eliminate copying FS metadata corruption, as not sure if zfs send will).

Happy to provide and/or host such a box or two if helpful.

richardw · on Jan 16, 2014

Thanks for the update. No worries, it's just a news message board and no businesses are hurt when it's down. I quite enjoy seeing how these things are solved and I'm sure all will be forgiven if you post a meaty post-mortem.

rincebrain · on Jan 16, 2014

ZFS instead of UFS on what, an Illumos derivative, FBSD, or actual Oracle Solaris?

pmarin · on Jan 16, 2014

FreeBSD.

lewq · on Jan 16, 2014

I'd love to show you how HybridCluster, which automates ZFS replication and failover (FreeBSD + ZFS + jails), might be able to help. Relatedly, we've just announced free non-commercial licences which would be perfect for HN: http://www.hybridcluster.com/blog/containers-distributed-sto...

lsc · on Jan 16, 2014

Are you bottlenecking on high iowait? or something else?

just one random bit to try... Obviously, I have no insight into your system and I'm not saying I know more than you or anything, but I've been seeing more situations lately where I had massive latency but reasonable throughput and the disks mostly looked okay wrt. smart, and I mostly just wanted to write about it:

[lsc@mcgrigor ~]$ sudo iostat -x /dev/sda /dev/sdb /dev/sdc /dev/sdd Linux 2.6.18-371.3.1.el5xen (mcgrigor.prgmr.com) 01/16/2014

avg-cpu: %user %nice %system %iowait %steal %idle

           0.00    0.00    0.05    0.02    0.00   99.93

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util

sda 0.70 75.11 35.66 1.38 4568.62 611.67 139.85 0.36 9.61 0.53 1.95

sdb 0.46 75.10 35.62 1.39 4566.77 611.67 139.89 0.22 5.89 0.45 1.66

sdc 0.80 75.14 35.63 1.35 4569.63 611.63 140.10 0.64 17.18 0.57 2.10

sdd 0.46 75.09 35.62 1.40 4566.60 611.63 139.87 0.13 3.47 0.40 1.49

(this is a new server built out of older disks that appears to have the problem. It's not so bad that I get significant iowait when idle, but if you try to do anything, you are in a world of hurt.)

Check out the await value. re-do the same command with a '1' after /dev/sdd and it will repeat every second. If sdd consistently has a much worse await, it is what is killing your RAID. Drop the drive from the raid. If performance is better, replace the drive. If performance is worse (and with raid z2, it should be worse if you killed the drive) the drive was fine.

(Of course you want to do the usual check with smart and the like before this)

The interesting part of this failure mode that I have seen is that /throughput/ isn't that much worse than healthy. You get reasonable speeds on your dd tests. but latency makes the whole thing unusable.

lukasm · on Jan 16, 2014

How about error page show the last static HN page? Most people just need likns

codfrantic · on Jan 16, 2014

I used http://www.hckrnews.com/ for that :-) But I actually usually prefer reading the comments more than the links ^_^

scurvy · on Jan 16, 2014

Why on earth are you not using SSD's? The HN footprint can't be that large. The extra speed and reliability from a pair of SSD's has to far outweigh the costs.

ivoras · on Jan 16, 2014

I'd guesstimate that the READ load is served practically entirely from RAM (file cache) and the WRITE load is non-critical enough that it's done "eventually consistent" (e.g. synchronous_commit=off in PostgreSQL, or fsync=off elsewhere) - or at least that's how I'd run it. YMMV.

cperciva · on Jan 16, 2014

Failing SSDs get data corruption too.

jffry · on Jan 16, 2014

Thanks for the writeup.

carsonreinke · on Jan 16, 2014

Maybe you could provide details on the current configuration and architecture and some suggestions could be made on how to improve. Just a thought.

rdl · on Jan 16, 2014

I still like hardware RAID because it's conceptually simple and nicely isolated. Sometimes horrible things happen to it, though, too.

I didn't realize HN had enough disk storage needs to need more than one drive. I guess you could have 1+2 redundancy or something.

0xdeadbeefbabe · on Jan 16, 2014

Don't worry about it. I visited facebook for the first time in years when hn went down. Is hn on linux using zfs or bsd?

smalu · on Jan 16, 2014

The world would be better place if software could exist without hardware.

ama729 · on Jan 16, 2014

It already does, just take your pen and some paper and you're set.

Oh, and don't forget the aspirin, you'll need it...

baking · on Jan 16, 2014

I remember what software was like before hardware. It sucked!

rrpadhy · on Jan 16, 2014

I am curious to know the server configuration, architecture and the number of hits it is getting.

If someone does offer a new software architecture, and hosting, would people be open to move hackernews there?

superice · on Jan 16, 2014

Good you posted this, but it came a little late. After the first series of timeouts you could've posted an update so everybody knew what was going on. But hey, thanks for the update, this clears up a lot.

whalesalad · on Jan 16, 2014

This passive-aggressive attitude is going to get you nowhere in life. Back handed complements are only cool in pre-teen TV shows.

waxzce · on Jan 16, 2014

Hi, I'm the CEO of http://www.clever-cloud.com/ and I'll be happy to help you on this, ping me on twitter : @waxzce

lucb1e · on Jan 16, 2014

Can you prove that you are the CEO? I'm doubting between this being a bad prank from someone impersonating you, and you actually placing advertisements for your own company here.

Gmo · on Jan 16, 2014

I know he is, he has posted on HN several times before.