Why HN was slow and how Rtm fixed it

mmaunder · on Jan 19, 2011

"In 7 seconds, a hundred or more connections accumulate. So the server ends up with hundreds of threads, most of them probably waiting for input (waiting for the HTTP request). MzScheme can be inefficient when there are 100s of threads waiting for input -- when it wants to find a thread to run, it asks the O/S kernel about each socket in turn to see if any input is ready, and that's a lot of asking per thread switch if there are lots of threads. So the server is able to complete fewer requests per second when there is a big backlog, which lets more backlog accumulate, and perhaps it takes a long time for the server to recover."

I may have misunderstood but it sounds like you have MzScheme facing the open internet? Try putting nginx (or another epoll/kqueue based server) in front of MzScheme. It will handle the thousands of connections you have that are waiting for IO with very little incremental CPU load and with a single thread. Then when nginx reverse proxies to MzScheme each request happens very fast because it's local which means you need much fewer threads for your app server. That means less memory and less of the other overhead that you get with a high thread count.

An additional advantage is that you can enable keepalive again (right now you have it disabled it looks like) which makes things a faster for first-time visitors. It also makes it slightly faster for us regulars because the conditional gets we do for the gif's and css won't have to reestablish connections. Less connections established means you give your OS a break too with fewer syn/syn-ack/ack TCP handshakes.

Someone mentioned below that reverse proxies won't work for HN. They mean that caching won't work - but a reverse proxy like nginx that doesn't cache but handles high concurrency efficiently should give you a huge perf improvement.

PS: I'd love to help implement this free. I run a 600 req/sec site using nginx reverse proxying to apache.

sedachv · on Jan 19, 2011

Or I don't know, use continuations in a place that's actually appropriate? John Fremlin showed that even with horrible CPS rewriting and epoll you can get way better throughput in SBCL (TPD2) than nginx. MzScheme comes with native continuations. It's not hard to call out to epoll.

Instead everyone in the Lisp community (pg included) is still enamored with using continuations to produce ugly URLs and unmaintainable web applications.

pg · on Jan 19, 2011

Instead everyone in the Lisp community (pg included) is still enamored with using continuations to produce ugly URLs and unmaintainable web applications.

If you read the source of HN, you'll see that it doesn't actually use continuations.

I find the source of HN very clear. Have you read it? Is there a specific part you found so complicated as to be unmaintainable?

sedachv · on Jan 20, 2011

Pagination/"More" uses fnids; looking at the source it's a callback, but from an HTTP client perspective it might as well be a continuation.

How do you test and debug things like that that have random URIs and function names and get GCed on a regular basis? That's what I mean when I say continuations lead to unmaintainable web apps.

pg · on Jan 20, 2011

I've been using this technique since 1995 and it has never once been a problem. It's an instance of programming with closures, which has been common in the Lisp world for even longer. One doesn't need to examine something represented as a closure any more than one needs to examine a particular invocation of a recursive function.

Perhaps the reason I've never had a problem is that I've been careful to use this technique in fairly restricted ways. Lisp macros are the same sort of thing. They could yield unreadable code if abused. But good programmers use them in principled ways, and when used with restraint they are clearly a net win.

blasdel · on Jan 20, 2011

It's a problem for me when the fnids in every reply <form> cause them to expire several times a day when the server crashes.

Edit: also when the server redirects me back to the wrong origin, I was sent to http://news.ycombinator.com/threads?id=pg instead of http://news.ycombinator.com/item?id=2120756 after posting this reply initially.

pg · on Jan 20, 2011

also when the server redirects me back to the wrong origin

Sounds like that could be a bug. Was http://news.ycombinator.com/item?id=2120756 the page you were on when you replied?

blasdel · on Jan 20, 2011

Yes, I clicked the reply link on your comment from there.

I figured you had to know about this bug, since it happens to me regularly (maybe 10% of eligible comments) when I comment in active threads during standard procrastination hours. The misdirect is usually to the threads page of a user further up the comment tree, though sometimes it's to the permalink of a grandparent comment.

Seems like you're mixing up the redirects of concurrent users but never across comment hierarchies so it's not omnipresent.

jules · on Jan 20, 2011

> I've been using this technique since 1995 and it has never once been a problem.

Thousands of HN users experience the problems every day: link expired.

pg · on Jan 20, 2011

The issue we were talking about was maintainability.

Giving links longer expirations is trivially easy, and I already do it in cases where it matters, like submit buttons on big forms.

jules · on Jan 20, 2011

The problem is that links expire at all. There are two products: HN with "link expired" and HN without "link expired". You can write the former in a highly maintainable way.

Also, even if you accept that links expire, it's not trivial to make the links not expire for a long time. You can add a lot of RAM to a single machine, but only up to a point. Supporting multiple machines is very hard, although it can be done (distributed object system). RAM is not the only problem however. The server is inevitably going to restart once in a while. Perhaps not in the literal sense of killing and restarting the mzscheme process if you're very careful, but still in the practical sense. The data structures for storing content change as you develop a web app/site, thereby invalidating old closures hanging around.

Flow · on Jan 20, 2011

I get the same problem on Reddit, except they call it "there does not seem to be anything here".

axod · on Jan 19, 2011

> If you read the source of HN, you'll see that it doesn't actually use continuations.

> It had to be some dialect of Lisp with continuations, which meant Scheme, and MzScheme seemed the best.

(From further down the page).

I'm confused. What needs continuations?

pg · on Jan 20, 2011

I just wanted to have them in the language. The fact that I don't currently use them in HN doesn't mean they're useless.

axod · on Jan 20, 2011

ah ok thanks for clarifying.

ezalor · on Jan 20, 2011

I always thought the purpose of Arc was to be cruft-free, "don't include it unless it is actually needed".

dauphin · on Jan 19, 2011

Errors/exceptions, for one, are implemented using continuations.

axod · on Jan 20, 2011

Sounds terribly inefficient to me, but what do I know -shrug-

jrockway · on Jan 20, 2011

All control flow is a subset of continuations. The stack is a continuation (calling a function is call-with-current-continuation, return is just calling the "current continuation"), loops are continuations (with the non-local control flow, like break/last/redo/etc.), exceptions are continuations (like functions, but returning to the frame with the error handler), etc. Continuations are the general solution to things that are normally treated as different. So continuations are just as efficient (or inefficient) as calling functions or throwing exceptions.

In a web app context, though, it's kind of silly to keep a stack around to handler something like clicking a link that returns the contents of database row foo. People do this, call it continuations, and then run into problems. The problem is not continuations, the problem is that you are treating HTTP as a session, not as a series of request/responses. (The opposite of this style is REST.)

sedachv · on Jan 20, 2011

In theory yes, in practice you need to reify the stack (even for one-shot continuations). Clinger, Hartheimer and Ost have a really good survey paper of the different ways to do that:

http://www.scribd.com/doc/47221367/Clinger-Implementation-St...

gchpaco · on Jan 20, 2011

MzScheme/Racket's continuations are of the "copy the C stack" variety, or were last time I checked. They are in no way efficient; it would probably be better to CPS transform your own code than try to use MzScheme/Racket's continuations directly in performance sensitive code.

sayrer · on Jan 21, 2011

They said caching wouldn't work, but they could be wrong. You can't change the Cache-Control header to public for HN responses, because the same URL can appear different to different users. There may be some ways around this, including giving each logged-in user their own URL to browse with.

But that might be a lot of work. You can still set up a proxy that kicks in only for requests that don't contain a session cookie. Then, requests without a cookie can be responded to with a cached copy from Varnish, and Varnish could refresh every 30 seconds or so. That might reduce the number of connections to MzScheme by quite a lot.

joshu · on Jan 19, 2011

Exactly this.

esbcupper · on Jan 20, 2011

Filo's BSD hack to buffer the entire HTTP request and then pass it on has the same effect as using nginx here.

It doesn't help with keepalive though, but that's probably not needed.

akkartik · on Jan 20, 2011

"it sounds like you have MzScheme facing the open internet?"

Yep. The arc webserver runs directly on port 80, for which it needs to run as root. To avoid all sorts of security headaches, it runs:

  (setuid 2)

soon after startup.

The whole thing seems hacky.

rarrrrrr · on Jan 19, 2011

Since no one has mentioned it yet - Varnish-cache.org, written by a FreeBSD kernel hacker, has a very nice feature, in that it will put all overlapping concurrent requests for the same cacheable resource "on hold", only fetch that resource once from the backend, then serve the same copy to all. Nearly all the expensive content on HN would be cacheable by varnish. Then you can get it down to pretty close to "1 backend request per content change" and stop worrying about how arbitrarily slow the actual backend server is, how many threads, how you deal with the socket, garbage collection, and all that.

j_baker · on Jan 19, 2011

Can't you only use varnish for mostly non-dynamic content? Like for example, wouldn't the fact that it displays my username and karma score at the top of the page make it so that you couldn't use varnish (or at least make it more difficult)?

seiji · on Jan 19, 2011

Check out http://www.varnish-cache.org/trac/wiki/ESIfeatures

There could be a private internal URL to just return username and karma to populate the user info header.

j_baker · on Jan 19, 2011

Doesn't that kind of defeat the purpose though? The point of using varnish is that it keeps you from having to access the backend altogether. This is getting into an area where something like memcache might be more appropriate.

danudey · on Jan 19, 2011

Well, the point of using varnish is to keep you from having to access the backend any more than is absolutely necessary. It's incredibly trivial to generate HTML showing a user's username and karma, and even if it weren't it could be stored in memcached. Generating the front page, the comments pages, etc. is the hard part, and varnish can keep that from being generated any more than is necessary.

j_baker · on Jan 20, 2011

Of course, but I seem to recall pg writing at some point that one of the goals of HN being to prove that "slow" languages can scale using caching. I assume, therefore, that he already has caching of some kind in place for those things. If varnish isn't going to save an access to the server (which seems to be the primary thing that's slowing things down), what value is varnish providing above what pg already has in place?

danudey · on Jan 20, 2011

The requests won't queue up as badly because the server will be able to clean out 'simple' requests in a much lower time than generating much larger pages. They won't queue up as much because the requests take less time to handle, so they can be cleared out faster than they come in (compared to larger requests that queue up faster than they can be handled).

dminor · on Jan 19, 2011

Yes, you'd have to somehow separate the dynamic content from the static content so they could be fetched in different requests and then combined (probably via ajax). If it's just your username and karma then it's simple enough, but if the comments are displayed differently for different people then it could be bear.

rarrrrrr · on Jan 19, 2011

I think the only way PG would try this is if it's clear that javascript, client side hacks, etc are not required -- and I think that's the case.

Aside from the header, there's only a relatively small number of variations for any given content, right? Showdead, ability to downvote, etc? So, each of these variations gets a distinct ESI URL. Like /item_$showdead_$downvote_$etc right in the internal URL, so any combination of these is a distinct URL. Only the first user to hit any particular combination would result in a request to the backend, and that could remain in cache until its content changed. No wizardry required.

yummyfajitas · on Jan 19, 2011

Comments are displayed differently for different people.

I once posted a comment which was immediately invisible to everyone besides me - I'm guessing it was marked as spam for some reason, but left visible to me so I think I successfully posted it.

amethyst · on Jan 19, 2011

Reverse proxies won't work for HN, because requests for the same resource from multiple users can't use the same results. Not only are certain bits of info customized for the user (like your name/link at the top), but even things like the comments and links are custom per user.

Things like users' showdead value, as well as whether the user is deaded, can drastically change the output of each page. Eg, comments by a deaded user won't show as dead to that user, but they will for everyone else...

jjoe · on Jan 19, 2011

There's cookie-based caching in Varnish (and in some other proxy caches too). Essentially, the key is going to be made of the usual hash + the cookie like this:

sub vcl_hash { set req.hash += req.http.cookie; }

What this means is that the cache is per-logged-in-user and pretty much personalized. The server's going to need a lot more RAM than usual. You can set a low TTL on the cache entries so they're flushed and not kept in memory indefinitely. But the performance boost is great.

This is not recommended as an always-on measure. We wrote an entry about accomplishing something similar w/ python&varnish. Here it is if you're interesting in reading about it: http://blog.unixy.net/2010/11/3-state-throttle-web-server/

Regards

piotrSikora · on Jan 20, 2011

Of course it will work. The whole point of reverse-proxy is to buffer slow requests and send them fast over LAN to your back-end servers that cannot handle high concurrency efficiently.

The FreeBSD's accept_filter() used by Rtm does more or less that (you can think of it as of reverse-proxy in the kernel), but it only works for plain HTTP and HEAD/GET methods.

aonic · on Jan 19, 2011

Varnish supports edge side includes. The header bar could be an ESI and the rest of the page could be cached

amethyst · on Jan 19, 2011

> and the rest of the page could be cached

Except they can't, for the reasons I mentioned above. Eg, if my account is deaded, when I view a thread with one of my own comments, it looks different than if someone else was viewing that some thread, especially for those of us with or without showdead checked in our profiles.

Its not as straightforward as you would like it to be.

aonic · on Jan 20, 2011

Special cookies could be set for dead users and users who enable showdead to bypass the cache.

For example, one of the sites I run has about 50K pageviews/day by logged in users, and another 600K pageviews/day by anonymous users coming from referrals or search engines. Logged in users have similar customization options so we bypass cache for these users by detecting a cookie.

Obviously going the cache route would require some changes to how things are setup, its not a turn-key solution. But the insignificant amount of changes are well worth it for most content sites, but for a user generated content site like HN it would also depend on how the TTLs and cache purging are setup.

nitrogen · on Jan 20, 2011

The majority of requests probably come from live accounts in good standing or from people not even logged in, so the majority of requests could still be cached.

nkurz · on Jan 20, 2011

Interesting point: what percentage of viewers are logged in? I was presuming it was high, but I guess I really don't know.

rfugger · on Jan 19, 2011

To make this work you could do all the per-user stuff with javascript and ajax calls in the browser. It would be quite a bit of revamping though.

Aaronontheweb · on Jan 19, 2011

So, you can't do donut caching in Varnish?

esbcupper · on Jan 20, 2011

Instead of having concurrent requests wait for the single request to the backend it's almost always better to use stale-while-revalidate (http://tools.ietf.org/html/draft-nottingham-http-stale-while...). AFAIK both varnish and squid support this.

dauphin · on Jan 19, 2011

You clearly don't understand the problem. Even mod_pagespeed or memcached would be more appropriate here: They are rate-limited by the LISP kernel anyway (we are talking about dynamic content here).

rarrrrrr · on Jan 19, 2011

Varnish sits in front of the backend, responding to all requests that it has cached content for directly, without bothering the backend at all. It lives higher in the stack than than pagespeed or memcache, and not limited by the backend's speed in any way.

ximeng · on Jan 19, 2011

Off-topically, piecing together the years that the previous posts were made, it looks like February 4 2009 had a downvote cap of 100. It's 500 as of late last year. That suggests the annual karma inflation rate in the HN economy is sqrt(5), or ~224%.

kabdib · on Jan 19, 2011

I worked at a startup once that made a network card that did this type of buffering (wait for whole HTTP requests, then forward as a lump to the host, across a local fast bus).

Pretty whizzy, definitely helped server scaling.

We started shipping in 2001; the dot-com bust more or less canceled any interest in the product, and canceled the company, too . . .

pclark · on Jan 19, 2011

What was the name of the company/product?

kabdib · on Jan 20, 2011

Akamba

rdj · on Jan 19, 2011

related but a little different (just FYI): some Intel nics can handle interrupt modulation. Decreased latency but wouldn't fix the open handle/scaling issues here.

axod · on Jan 19, 2011

1 thread per connection??? Not doing continual GC in a separate thread and instead taking 7 seconds and blocking everything?

What is this the 1990s?

pg · on Jan 19, 2011

Feel free to fork MzScheme and replace the garbage collector with a new one that runs continuously.

acangiano · on Jan 19, 2011

Do you ever question the choice of MzScheme?

pg · on Jan 19, 2011

It had to be some dialect of Lisp with continuations, which meant Scheme, and MzScheme seemed the best. Our first version ran on Scheme48, but that was such a pain that we switched.

axod · on Jan 19, 2011

Surely there's a way to target the JVM?

I'd trust my life on the JVM, it's pretty battle tested, and the GC is simply awesome.

defen · on Jan 19, 2011

> I'd trust my life on the JVM

You sure you want to do that? :-)

> Java technology is not fault tolerant and is not designed, manufactured, or intended for use or resale as on-line control equipment in hazardous environments requiring fail-safe performance, such as in the operation of nuclear facilities, aircraft navigation or communication systems, air traffic control, direct life support machines, or weapons systems, in which the failure of Java technology could lead directly to death, personal injury, or severe physical or environmental damage

http://technet.microsoft.com/en-us/library/cc976720.aspx

axod · on Jan 19, 2011

Yeah I'd trust my life on it. I've had java processes running for several months without issue. The hardware fails before the jvm does.

I wouldn't trust anything from microsoft though.

projectileboy · on Jan 22, 2011

I've helped build rail control systems that run on Java, so you're probably trusting your life to Java daily without knowing it.

shortlived · on Jan 20, 2011

The JVM really is great and the garbage collecting outstanding. Anyone who has written a real high performance app can tell you that.

marcamillion · on Jan 20, 2011

I love this response. Would be really interested in a follow-up from axod (or by pg about axod's version).

It's easy to criticize, but let's see what happens when the pedal hits the metal.

axod · on Jan 20, 2011

My response TBH, would be that there is no point spending months tweaking the engine of a ford to try and get it to perform like a porche. Just get a porche in the first place.

I don't value "being able to write it in my favorite language" at all. From what I've read, pg does. To the extent that the product suffers.

There would be absolutely no point me trying to improve mzScheme when you can do exactly the same job in other languages/platforms, and the user doesn't care/know the difference. HN could be rewritten in a weekend, in PHP/python/whatever and we wouldn't be sitting here waiting for pages to load.

(I run Mibbit, which handles a few thousand HTTP requests a second, on VPS level hardware. In Java).

jmtulloss · on Jan 20, 2011

Neither Python nor PHP run garbage collection continuously in a separate thread. Python is reference counted, and as far as I know PHP is too.

axod · on Jan 20, 2011

But I'd bet both of them perform far better than MzScheme.

In any event, it's a "solved problem". The sheer amount of time and effort going into this is silly. Just pick a better language/platform and use it.

marcamillion · on Jan 20, 2011

I guess the point is, axod, if you aren't willing to backup talk with action, it might be best to think about what you say before you say it.

Don't slam something saying I could do better, and then have your bluff called.

Makes you look silly. This applies to even if the person calling your bluff was NOT pg.

jmtulloss · on Jan 21, 2011

It depends on what you mean by "perform better". Any reference-counted GC mechanism will have memory leaks, which is a serious problem in a long-running process. Python gets around this by also occasionally running a traditional GC, which, btw, is a stop-the-world GC.

projectileboy · on Jan 22, 2011

Of course you can write a link aggregator in a weekend, but you can't write HN in a weekend - there's a lot of complexity in the HN source around controlling voting rings, spam, etc. When you're dealing with complex issues, it's a net win to use a language that enables you to think at the highest level of abstraction possible.

jhrobert · on Jan 19, 2011

There is no such thing as a free lunch

jlouis · on Jan 20, 2011

Let me add that one "thread" per connection works in Erlang for up to 80k connections easily. And since each "thread" is actually a process and each with their own GC, long pause-times are never a problem.

It is not the year, but the naivety that is the problem here, if any.

samdk · on Jan 19, 2011

The traffic graphs linked in this post [0] are an interesting addition to the "How often do you visit HN?" poll [1] that was done a week ago. From the graphs, it looks like there are about 10x as many page views as unique IPs.

[0] http://ycombinator.com/images/hntraffic-17jan11.png [1] http://news.ycombinator.com/item?id=2090191

b_emery · on Jan 19, 2011

Anyone have an idea what the huge spikes in unique IP's are all about?

cperciva · on Jan 19, 2011

pg / rtm: If you need any FreeBSD-related help, please let me know (preferably not in the next couple of days, though...). There are lots of HN fans in the FreeBSD developer community.

jwr · on Jan 19, 2011

Which is why it's good to have a mature VM underneath your language. Paul's choice of basing an implementation of Arc on MzScheme was a very good one (I remember people criticizing him for not building a standalone implementation with a new VM).

I write time-critical applications in Clojure and JVM's -XX:+UseConcMarkSweepGC flag is a lifesaver. We no longer get those multi-second pauses when full GC occurs.

nc17 · on Jan 19, 2011

YC ranks 2400 on Alexa, and I'm sure most of the traffic is HN. I bet you'd be hard-pressed to find a top 10k site written in Scheme. Does anyone know of one?

http://www.alexa.com/siteinfo/ycombinator.com

ximeng · on Jan 19, 2011

And, presumably thanks to tptacek's hard work, one of the key search terms bringing people in here is "sous vide supreme".

brlewis · on Jan 19, 2011

Not yet. Working on it.

joshu · on Jan 19, 2011

Or you can put a reverse proxy in front.

http://joshua.schachter.org/2008/01/proxy.html

(Like I suggested in 2009...)

ars · on Jan 19, 2011

BTW, you suggest pound for the slow client problem, but according to this email it doesn't help for that.

http://www.apsis.ch/pound/pound_list/archive/2010/2010-11/12...

joshu · on Jan 19, 2011

Pound helped this problem for delicious in 2005, and by the time I wrote this article it was starting to not be the right answer. In 2011, it's definitely wrong :)

taylorbuley · on Jan 19, 2011

I feel silly asking, but who or what is 'rtm'?

kujawa · on Jan 19, 2011

Some punk skript kiddie who broke the internet in 1988.

steveklabnik · on Jan 19, 2011

I hear he crashed 1507 systems in one day.

meifun · on Jan 19, 2011

Zero Cool....

cschep · on Jan 19, 2011

Yo, this is RTM!

alanfalcon · on Jan 20, 2011

Well that's great. There goes MIT.

kirubakaran · on Jan 19, 2011

pg's friend. YC cofounder.

http://en.wikipedia.org/wiki/Robert_Tappan_Morris

taylorbuley · on Jan 19, 2011

Thanks for not being this guy: http://news.ycombinator.com/item?id=2121511

bear · on Jan 19, 2011

"Robert Morris, the other founder of Viaweb, whose username is Rtm"

http://lib.store.yahoo.net/lib/paulgraham/bbnexcerpts.txt

dmuino · on Jan 19, 2011

http://en.wikipedia.org/wiki/Robert_Tappan_Morris

mcantor · on Jan 20, 2011

Someone with a Wikipedia page where the only thing in the "See also" section is "List of convicted computer criminals."

Diagnosis: Badass.

ananthrk · on Jan 20, 2011

http://news.ycombinator.com/threads?id=rtm

Most recently, one of the creators of Arc (along with pg), the language used to create/run HN.

nkassis · on Jan 19, 2011

First guy to completely crash the internet ;p

dauphin · on Jan 19, 2011

1: https://secure.wikimedia.org/wikipedia/en/wiki/Robert_Tappan...

2: same acronym as RTFM, but polite.

deno · on Jan 20, 2011

Read The Friendly Manual. How much more polite can you be?

dauphin · on Jan 19, 2011

Book worm, nerd. Think Comic Book Guy, but healthier and smarter. Totally a bad-ass.

ezalor · on Jan 19, 2011

http://en.wikipedia.org/wiki/Robert_Tappan_Morris

allwein · on Jan 19, 2011

When I read this headline, my immediate thought was "Oh, he must have forgotten to shut down the copy of his worm that was running on the HN servers."

http://en.wikipedia.org/wiki/Morris_worm

mahmud · on Jan 20, 2011

Please, retire that stupid "joke". It's 22 years too old.

mcantor · on Jan 20, 2011

First time I had heard it.

antirez · on Jan 19, 2011

pg: you could probably try to write an event driven HTTP server on top of Arc, so that you don't have this kind of problems. Something like node.arc

Also if I understand correctly you use flat files that are loaded into memory at startup. It seems like that switching to Redis could be an interesting idea in theory, as it is more or less the implementation of this concept in an efficient and networked way.

Probably with such changes you can go from 20 to a few hundreds requests per second without problems.

jey · on Jan 19, 2011

> when [MzScheme] wants to find a thread to run, it asks the O/S kernel about each socket in turn to see if any input is ready

They've never heard of select()? </snark>

But really, is there some reason that it's hard to collect up all the fds at once or something?

tmsh · on Jan 19, 2011

I imagine mzscheme is using select under the hood. But I'm surprised they're not using poll. There's something up with the racket page, so I'm not able to download the mzscheme source just now (via http://arclanguage.org/install), but that's one thing I've noticed in Python web server implementations. Side-stepping epoll, libevent and all that for a second, there's a tendency to use select (which uses an array of file descriptors; 0 to highest file descriptor) instead of poll (which isn't encumbered the same way).

I think in part there is a tendency, among server developers, correctly, to fear anything that looks like a busy wait (e.g., with the name poll). But really poll is just as asynchronous as select in this context (I don't know about FreeBSD's implementation -- but Linux puts to sleep wait queues the same way, afaik). It just doesn't suffer from the crazy indexing scheme of select....

At any rate, I didn't get a chance to finish probing the internals of what mzscheme uses. But if there's a way to substitute poll for select, it can often alleviate those issues of 900 requests queue up and you eventually have an fd with a value of 1024 or greater -- even though you may not have 1024 actual concurrent requests....

Though others feel free to correct me if I'm wrong. I only comment because I came across a similar issue recently. This link may be useful too:

http://www.makelinux.net/ldd3/chp-6-sect-3.shtml

ETA. i finally got a copy of the most recent racket source (though probably not the one rtm and pg are using). but if anyone is curious, browse racket/src/network.c. the source version for mac uses a bunch of selects (e.g., for tcp-accept). replacing with poll might help.... the max number of FDs per login session is often 1024 by default. so you might want to bump that up if it's not already. and consider using poll.... just an idea.

kujawa · on Jan 19, 2011

Read C10K? Both select() and poll() have this problem internally. You have to use one of the more advanced techniques available if you really want to scale. epoll(), kqueue() or friends.

wmf · on Jan 19, 2011

Based on RTM's description, it sounds like even select() would be a huge improvement. HN needs to solve the C100 problem before worrying about C10K.

jey · on Jan 19, 2011

Rtm's phrasing implies that MzScheme is making a syscall for each socket in turn, so it sounds like it isn't even using the basic select() or poll().

mahmoudimus · on Jan 19, 2011

I thought the same exact thing when I read rtm's description. I thought to myself select()/poll() is much more efficient, and it sounds like that's not what's implemented underneath.

caf · on Jan 20, 2011

poll() is slightly better than select(), because you only have to iterate over the file descriptors that were passed, rather than from 0 to nfds.

maw · on Jan 20, 2011

It doesn't hurt that its interface is far more pleasant to use than select's, either.

idlewords · on Jan 20, 2011

Not being able to handle 20 requests/sec quickly, in 2011, for a read-mostly website is just shameful.

jwhite · on Jan 20, 2011

I think that all depends on the purpose of the site. If I were paying a subscription to access the site I would be within my rights to object. As it is, with HN being a free service, built to be the application spurring the development of Arc, whose reference implementation is intended for exploring language design and not performance, I wouldn't choose the word "shameful" to describe this situation.

idlewords · on Jan 20, 2011

I guess it depends on your priorities. Personally, I think the community discussion here is far more interesting than the toy language project it runs on. If you think Arc is the future of computing you may think of this discussion board as just a convenient test suite for the language.

Either way, it's 2011 and that really is some spectacular slowness.

jwhite · on Jan 20, 2011

My comment was not expressing an opinion on the relative values of Arc and the HN discussion community. HN delivers lots of value for the modest price of your time. Claiming that its performance is shameful when it isn't being directly monetized, or even indirectly monetized like Facebook &co., is unfair.

If you were talking about Facebook, Twitter, or Basecamp, that would be a different matter.

blinkingled · on Jan 19, 2011

Sounds like there is a scalability issue within MzScheme in that it iterates over the number of threads, asking each thread about the sockets it has. As one can tell, once # of threads and # of sockets grow - finding which thread to run in user space becomes awfully expensive. As any clever admin will do, a least invasive fix involving limiting the number of connections and threads was done - with what sounds like immediate results!

I have no idea what MzScheme is but I am curious about why is HN running threads in user space in 2011? The OS kernel knows best what thread to pick to run and that is a very well tuned, O(1) operation for Linux and Solaris.

metageek · on Jan 19, 2011

MzScheme is an implementation of Scheme (dialect of Lisp); it implements its own threading. This is not uncommon for languages which support (or used to support) many OSes, with many different versions of threading: it's easier to write it yourself, once, than maintain N+1 OS-specific versions.

Of course, these days, N+1 is probably 2, since everything except Windows supports pthreads.

blinkingled · on Jan 19, 2011

If it's just a porting issue, there is Pthreads-win32 which worked well enough the last time I used it few years ago.

metageek · on Jan 19, 2011

I don't know the details. I suspect the answer is that the threading support was written when pthread support was less common, and the MzScheme developers haven't been sufficiently interested in rewriting it.

elibarzilay · on Jan 24, 2011

Racket now includes a new facility -- "features" -- which are essentially a lightweight OS-level thread. There's also another -- "places" -- which is a more separated heavy threads (closer to a new process), but that one is not enabled by default.

klochner · on Jan 19, 2011

I think this whitepaper covers the plt web server bundled with MzScheme (now 'Racket'):

http://www.cs.brown.edu/~sk/Publications/Papers/Published/kh...

idoh · on Jan 19, 2011

Hacker News uses a web server written in arc.

svlla · on Jan 19, 2011

not to mention that one thread per connection is, well, extremely outdated.

jrockway · on Jan 19, 2011

I don't know much about MzScheme, but it's quite possible that "thread" means "stack", not "OS thread". One context stack per TCP connection is quite sustainable; with Haskell's threads and Perl's coros, I run out of fds long before I'm using any significant amount of memory. (This is somewhere around 30,000 open connections on my un-tweaked Linux desktop. I know I can do a lot more if I tried.)

The issue, in the case of HN, is with O(n) IO watchers. Most sockets are idle most of the time, so you really want an algorithm that is O(n) over active sockets, not O(n) over active and inactive sockets. You typically have so few active fds at any time that the n is really tiny, making massively scalable network servers trivial to write. But you also have a lot of connections at any one time, so if you are O(n) over active and inactive fds, then you are going to have performance issues. Basically, you don't want to pay for connections that aren't doing anything.

Fortunately, we have the technology; epoll on Linux, kqueue on BSDs, /dev/poll on Solaris. You just need to use an event loop, so it does all the hard stuff for you (and so you don't have to worry about the OS differences). Hacking a proper event loop into MzScheme may be hard, but it's absolutely necessary for writing scalable network servers. Handling 10k+ open connections is trivial with today's technology. And, all the cool kids are doing it (node.js, GHC, etc.).

swannodette · on Jan 19, 2011

My understanding is that MzScheme / Racket has a proper event loop.

jrockway · on Jan 20, 2011

Yeah, I have no idea. All I know is that proper threads do not bloat anything, and that proper IO watchers are not O(n) over inactive connections.

PStamatiou · on Jan 19, 2011

Would be interesting to see if traffic goes up after this and is elastic. Marissa Mayer had a talk at some conference in 2009 where she explained her early tests on number of search results on Google - 10, 20, 25, 30 - but in the end it was just about the speed associated with loading the pages that accounted for the number of pageviews and visitors.

Luyt · on Jan 19, 2011

"It turns out there is a hack in FreeBSD, invented by Filo, that causes the O/S not to give a new connection to the server until an entire HTTP request has arrived."

I wouldn't call it a hack, but a feature ;-)

    # Buffer a HTTP request in the kernel 
    # until it's completely read.
    apache22_http_accept_enable="yes"

Is HackerNews web scale?

makmanalp · on Jan 19, 2011

What does "web scale" mean? I see it thrown around a lot without much explanation.

klochner · on Jan 19, 2011

http://www.youtube.com/watch?v=b2F-DItXtZs

Luyt · on Jan 19, 2011

A transcript can be found at http://mongodb-is-web-scale.com/

"Shards are the secret ingredient in the web scale sauce. They just work."

Luyt · on Jan 19, 2011

I think it means that your web app can survive being slashdotted. So, a lightweight C10K-capable frontend, all static content on a CDN, everything heavily cached, database queries optimized to the hilt, etc... etc... Some people think that just using a NoSQL database is sufficient, but there's much more to it. Interesting talk on this stuff (was posted on HN earlier): http://ontwik.com/python/django-deployment-workshop-by-jacob...

earle · on Jan 19, 2011

HN only supports 20 req per second???

imp · on Jan 19, 2011

Maybe that's just for dynamic pages. Probably most of the most popular pages are cached and wouldn't be considered in that 20 req/sec. Just my own wild guess though.

PStamatiou · on Jan 19, 2011

flat files, no database

pg · on Jan 19, 2011

That's not the bottleneck. Essentially there's an in-memory database (known as hash tables). Stuff is lazily loaded off disk into memory, but most of the frequently needed stuff is loaded once at startup.

The bottleneck is the amount of garbage created by generating pages. IIRC there is some horrible inefficiency involving UTF-8 characters.

gills · on Jan 19, 2011

Are you using any sort of in-memory fragment caching? That seems like it might reduce some render overhead.

pg · on Jan 19, 2011

A great deal, and it does.

svlla · on Jan 19, 2011

perhaps continuations could be used more judiciously as well

ezalor · on Jan 19, 2011

> That's not the bottleneck.

What is the main bottleneck of HN?

pg · on Jan 19, 2011

See the second paragraph.

dauphin · on Jan 20, 2011

That's 0.05 second for a single request: actually pretty good.

alain94040 · on Jan 20, 2011

I have a proposal to settle flamewars by the way. I had meant to propose something like this (a debate solver) for years. Here it is:

After 4 levels of back and forth (Joe says "...", Tim replies, then Joe replies once more, then Tim replies again), freeze that branch, hide it from the general public, and turn the branch into a settlement: both Tim and Joe are allowed one final comment each, that they both approve. Only once they have posted this compromise, is it shown in-place, where the original sub-thread used to be.

Simple. Prevents endless arguments. Good for everyone.

brc · on Jan 20, 2011

Hmm, I have had ideas for something similar, but which involves having to choose a side of the argument (ie, agree with parent or disagree) before posting. Once chosen, you can only vote on your 'side' (either up or down). Poor arguments on your 'side' can be killed with sufficient downvotes, so that the ensuing set of arguments hopefully ends up being the best set. This tends to happen in an informal way on HN, but only because people largely behave. In other forums, not so much. Perhaps glomming together your idea of maximum posts per user on a topic, along with side-based voting, some type of civil debating platform coudl be developed. After all, in actual debates you get 2 chances to state your position and a final sum-up.

bootload · on Jan 19, 2011

is there any reason the noobs url (29 Apr: Faster, Fewer Flamewars) ~ http://news.ycombinator.com/noobs shows no results?

pg · on Jan 20, 2011

It has since been split into noobstories and noobcomments.

gms · on Jan 19, 2011

Who's Filo? David Filo?

strlen · on Jan 19, 2011

Yes. Having worked at Yahoo, this isn't even a bit surprising. rtm's capitalization of 'filo' as 'Filo' -- everybody spelled it lowercase i.e., as an /etc/passwd entry not a lastname -- is the surprising part.

Regardless of what you think of Yahoo's current situation, somebody who could easily retire wealthy but still hacks and flies economy class on Southwest to meetings in remote offices is worthy of respect.

sh1mmer · on Jan 19, 2011

Yahoo's entire BSD kernel team report to filo still.

Last time I had lunch with him we talked about the minutia of DNS server implementations because I was working on some optimization tricks and he was really interested in seeing them get implemented.

filo is a really amazing guy. He's the most down to earth billionaire I know. He talks way more about his family and hacking more than 'stuff'. If he isn't hacking on code as much as he used to it's because he cares about his company and is doing an important job looking after technical stuff that needs doing, even if it isn't interesting.

aristus · on Jan 19, 2011

Yep. He's still a mensch. Yahoo did some incredible stuff on Apache & FreeBSD back in the day. I remember a hack that added hardcoded HTTP headers to the image files on disk, to squeeze that extra nilth percent out of the server.

jcapote · on Jan 19, 2011

That's awesome!

PStamatiou · on Jan 19, 2011

That was my first thought and it would appear so. Filo was a big FreeBSDer

http://zer0.org/daemons/yahoobsd.html

spidaman · on Jan 19, 2011

Filo was a total hacker, reportedly still is. At least into the late 90's, he was active on the FreeBSD mailing lists with encyclopedic knowledge of SCSI card and NIC drivers and assorted other hardware and FreeBSD stuff. He was oft found sitting on the hallway floor in the first colo I used (ISI in Mountain View), screwdrivering the chassis' of Yahoo's early servers.

zandorg · on Jan 19, 2011

Chief Yahoo.

ezalor · on Jan 19, 2011

David Filo is the co-founder of Yahoo! if you don't know him.

rograndom · on Jan 19, 2011

I was getting connection errors less than an hour ago, so I'm not sure if it actually worked.

richcollins · on Jan 19, 2011

Sounds like a switch to async I/O would be helpful.

dauphin · on Jan 19, 2011

Well, HN is written in Arc, which is a layer on top of MzScheme. MzScheme handling of sockets is actually already done with the select() syscall, and its "threads" are lightweight non-blocking threads (think Erlang). So it's already async but with "sugar".

js4all · on Jan 19, 2011

  > In 7 seconds, a hundred or more connections accumulate. So  
  > the server ends up with hundreds of threads, most of them  
  > probably waiting for input

This is why Nginx handles large site much better. The request are queued without spawning threads. Evented I/O for the rescue.

jacquesm · on Jan 19, 2011

Let's hope that puts an end to all the time-outs!

Thanks for the work and it sure seems to be a lot more responsive.

spydum · on Jan 19, 2011

Does seem a bit faster -- could be placebo though :)

MikeCapone · on Jan 19, 2011

Neat. All I can say is: Thanks!

ezalor · on Jan 19, 2011

Reverse-proxying via nginx would solve this problem and more: the arbitrary 30 second limit on form submission (hotspots sometimes are slow...), nginx could handle rate limiting & logging instead of srv.arc, etc. The Arc codebase would btw be smaller and cleaner (no policy/sanitization code, etc.).

Serving static content via Apache was a first step ;-)

Don't reinvent the wheel!

djcapelis · on Jan 19, 2011

Did you just tell people running a company that reinvented funding with their custom written news site written in their own programming language with a custom web server that they shouldn't re-invent the wheel?

They think they can build a better wheel. They seem to like doing it and have a habit of it. There's nothing wrong with that.

kirubakaran · on Jan 19, 2011

Unless they built their server with nand gates, I don't see running nginx reverse proxy to be incoherent with their philosophy.

djcapelis · on Jan 19, 2011

Right, they only re-invent wheels when they feel they can make a better one. I don't think they feel they can go through the effort of fabricating a better chip. Though I wouldn't put it past rtm to try.

The philosophy "Don't reinvent the wheel" however, is definitely inconsistent with their philosophy. They will reinvent the wheel whenever they feel they can make a better one. Just because they haven't reinvented every wheel does not mean "don't reinvent the wheel" applies to this group.

They chose to create the best solution they think they can. They don't seem to care whether or not that involves reinventing wheels. The original argument that they should seems pretty silly.

pig · on Jan 19, 2011

I think he means that the nginx reverse proxy is just a part of the infrastructure, like the server, OS, MzScheme etc they use.

ezalor · on Jan 20, 2011

Totally. "If you want to make an apple pie from scratch, you first need to recreate the Universe" -- Sagan.

dauphin · on Jan 19, 2011

> It turns out there is a hack in FreeBSD, invented by Filo, that causes the O/S not to give a new connection to the server until an entire HTTP request has arrived. This might reduce the number of threads a lot, and thus improve performance; I'll give it a try today or tomorrow.

Anyone know if they're referring to "accept filters" here? FreeBSD folks can "man accf_http" if they're curious, which does prevent a request from being handed off to the application until the complete (and valid?) request has been made. Certainly not a "hack" but a feature of the OS itself.

Or they could use a proxy. All this "fuck me I'm famous" attitude is stupid.

SageRaven · on Jan 19, 2011

Anyone know if they're referring to "accept filters" here? FreeBSD folks can "man accf_http" if they're curious, which does prevent a request from being handed off to the application until the complete (and valid?) request has been made. Certainly not a "hack" but a feature of the OS itself.

svlla · on Jan 19, 2011

this seems impossible for item pages due to how continuation ids are used for replies

dauphin · on Jan 19, 2011

This could be resolved using consistent hashing or a critbit tree.

dauphin · on Jan 19, 2011

Lisp is definitely not a slow language: you can handle the crazy rate of 20 requests/second on a multi-core server!

btilly · on Jan 19, 2011

They are only using one core. See http://news.ycombinator.com/item?id=2116987 for verification.

dauphin · on Jan 19, 2011

Ooops, yes, you are right, sorry. Then I guess it's a pretty decent rate for a Core2 duo.

mahmud · on Jan 20, 2011

If you choose to. We pushed north of 800 r/s in production, and just shy of 4k in our LAN, that's using stock hunchentoot with just mere customization.

This guy here broke the 10k barrier:

http://john.freml.in/teepeedee2-c10k

seanfchan · on Jan 19, 2011

Yes placebo, oh our brains, though its not mid day just yet :)

dhimes · on Jan 19, 2011

Sorry, guys, it was entirely my fault. I got Michael Grinich's iphone HN app and just love the damn thing. :)

mbubb · on Jan 19, 2011

I find it disturbing to see people asking "Who is Rtm?" "Who is filo?"

I understand if you are in tech you might not know figures in history or literature... but these guys?

Every time you login to a UNIX/Linux system you use the passwd file and related setup - authored at least in part by Rtm's father.

http://www.manpages.info/freebsd/passwd.1.html

Rtm has done lots in his own right as the wikipdia pages show.

But seriously - if you don't know who these people are you really should.

Read this: http://www.princeton.edu/~hos/Mahoney/unixhistory

and maybe ESR's writings and that online anthology of the early Apple days and old issues of 2600, etc, etc

I am sorry - but it is really irritating to me that someone would be on this site and really not be aware of the deeper history and culture. It is not that deep - 1950s to present (to cover Lisp).

As Jay-Z (whom you probably know) says - "Go read a book you illiterate son of a bitch and step up your vocab ..."

viraptor · on Jan 19, 2011

It's really nothing to get excited about. There are so many important programmers in the world that noone will remember them all. Not everyone works in the area where those people will mean anything more than a credit line in some tool they use... and that's fine.

I work with VoIP daily and could name lots of people who you "really should" know - you're using a phone all the time after all. Or people who create amazing stuff right now. But no... actually I don't expect that. Everyone has their own area of interest. I appreciate that someone wrote `cat` or one hundreds of other nice utilities, but I'm not going to read their history unless I've got a lot of free time and want to do that.

marcamillion · on Jan 20, 2011

This has to be one of the most unnecessary comments I have seen recently on HN.

Civility people. What happened to that?

Btw, I down vote me if you like, but it's true. It's easy for us to get caught up in our own brilliance that we talk down to others that don't know as much in a particular subject as we do.

Ironically, it shows more about you, than it does them.

mbubb · on Jan 20, 2011

Civility? Go to a sports blog - put on you Bio that you are a journalist for sports illustrated and then in the commentary of an article on American football ask the question "who is Joe Namath?" Then judge my commentary by its relative civility. Do you not read about the authors of books you read - music you listen to? Why then wouldn't you read up on the people whose software you use. I am sorry - it is ignorance and on HN that deserves to be called out.

taylorbuley · on Jan 19, 2011

You're making little sense. So "go learn" but "don't ask?"

mbubb · on Jan 20, 2011

You write for a major magazine on the subject. Yes - I would hope you know more about the history of this topic than I do.

biot · on Jan 20, 2011

Who is ESR? ;)