I think the server costs around $350/mo. I don't like to think how much time HN actually takes up, but when I was traveling recently I found that checking in for about 30 min a day was enough to keep things under control.
On a weekday we get about 350k pageviews from about 30k unique ip addrs.
Oh I don't know, perhaps it doesn't include the 4-projector basement monitoring screen in the basement, relaying real-time hits and anomalous voting patterns.
PG: "The new server seems to be about 2x as fast. The frontpage renders for me in about 50 msec. But the site should seem more than 2x faster (for logged-in users) because many requests will terminate before being interrupted. There's now enough memory that we can fit all the links and comments in memory at once again. We should be good for another year or so." (traffic: http://ycombinator.com/newsnews.html#15jan09)
Not sure of how or where server is hosted or what else is used e.g. router/firewall/bandwidth/ups/utilities/etc., but if pg/rtm billed for time all other costs would be insignificant.
Maintainance:
A lot less than most people considering it's rtm and pg.
You could get similar services from a colocation company as well. You could purchase a fully decked 1U server for $2000, then pay $50 a month for colocation (that's what I'm doing).
Just for an example, you can rent Xeon 5570, a rather nice 2.93 Ghz machine with 12GB RAM, and 2 TB of monthly bandwidth for $724 from Softlayer and others.
You're right. My math was off. I read it as 24 GB RAM, instead of 12 GB RAM.
I was actually checking out Mediatemple's nitro server. Low on the memory and speed end (2.33 GHz and 8 GB RAM) but it seems to be getting glowing reviews for it's customer service.
I still think it's amazing that we live in a world where we can store 25,769,803,776 bytes on a computer that we entrepreneurs can afford!
It's causes a huge change in the way people look at Databases- In the past, you needed huge database servers so that you could keep everything stored on disk, and intelligently cached.. Today, I'd imagine that 99% of companies data can be kept entirely in memory.
Stunning, really. It entirely changes how you think about data storage.
> It entirely changes how you think about data storage.
It should entirely change how you think about data storage. What's stunning to me is the number of people who are stuck on "Fully ACID, fully RDMS or it's crap and you will fail and deservedly so."
In health care applications? In financial applications? I would never trust anything less than fully acid with all the bells and whistles.
In a website where people aren't going to be that bothered of a few transactions are inconsistent if some major snafu happens? (status updates get lost in the ether, for example.) The cost isn't worth the benefit of high power.
If Hacker News is anything like my site, both of which basically need to serve up little bits of text to lots of people, then hosting is cheap.
I bet you could serve 20,000 people a day on an HN-like site for well under $1,000/month on AWS, assuming you are smart about caching and you don't thump the database with every request.
So, even if you were totally clueless about ads and just ran random ad sense, the thing would more than pay for itself.
It's incredibly cheap to run a site that doesn't need much bandwidth. Basically, if you are serving no pictures or media, you should be able to build a website at super-scale in your garage, or using some Cloud Service.
Under a $1000 a month? I'd hope so! That would pay for a couple extra large instances which is way overkill for a simple site like this. It could be hosted on a simple VPS for $50 if you wanted to (well with a different architecture, hosting it all in memory makes a VPS not very efficient).
HN has very little user specific content in it's hot pages. Also, since it's mostly time based, there's a lot of locality. It should cache like mad.
My guessing math puts it at 300MB for the hot data set (1000 100kb pages, 20k 10kb users). With 10 page views per unique, 90% hit rate on the page/user cache and 20% of page views being writes I'd guess average is less than 1 iop a second. Even at high peak to average ratios it's likely within what a single sata disk can do.
I'd say it might be a little tight on a 512MB vps, but possible if you optimized carefully. On a reasonable dedicated machine it should run very well without any particular effort paid to optimization beyond basic caching.
Given the "more" links on the front page, I'm guessing HN is using continuations as state storage mechanisms, and that they're stored on a per-user basis. So, I'd expect the site's RAM requirements to be pretty hefty just on that basis, hence the 12GB of RAM.
350k hits per day is only 4 req/s. Assuming there are typical peak hours, I'm guessing it could reach 300 req/s during peak hours. Nginx on a crappy 128MB server can deliver around 8000 req/s on a static page. Even if there was zero caching, I would hope you could manage 300 req/s.
(and yes, I actually tested this on a crappy 128MB VPS. I also tested out using a rails, merb, and compojure generated pages and was able to reach 500-1000 req/s easily)
Why not? That's only a couple page views a second. Around $50/m can get you a gig of RAM. Use some for memcached, spend the rest running Nginx or Lighttpd.
20,000 people a day is definitely worth 35 dollars per day. That would mean, if your ads were impression-based, your CPM of $1.75.
You pay $45 CPM for eyeballs on a site like backpacker.com, which actually has a tight audience, but you're gonna get $1.75 unless your site is just spam.
CPM means cost per thousand eyeballs. I'm talking about impressions here just to simplify things.
The reason I ask is that my site gets about 7,000 unique visitors a day. If 20,000 = $35 a day, 7,000 = $12.25 a day. I make about half that.
I don't recall seeing any articles on HN about advertising options (AdSense vs. AdBrite vs...) or optimization. (Though it's possible I've simply missed them.) Any good resources I should investigate?
I don't know why you are showing adsense ads for dictionaries on your site. I can't imagine your audience is the sort that buys dictionaries. Given you are a slang dictionary, I imagine your users are young. And most young people get their dictionaries online.
It seems like such an oxymoron that you would have an ad to buy something printed on your dictionary website. Your ad sense literally show ads for a printed competitor to your site.
It seems to me you have your adsense set up wrong, or naively. If I go to a page and look up a word slang for a word like girlfriend, you should show me ads for dating! When I look something up in your dictionary, then you start to know who I am and you can show me relevant ads.
As it was, all I get is banners for online colleges and slang dictionaries and a Google ad for Ask.com.
You just need a bit better ads... I guess I was wrong about being able to put up random ads, but I think you could make more if you thought about it more.
I'd be quite happy with better ads, but my understanding is that one's control over what AdSense ads display on one's site is limited. Using the "competitive ad filter" you can explicitly block specific ads, but there's no opposite analogue: you can't say what kind of ads you want to run on certain pages.
The ads that run are up to Google's discretion. It does its best to determine what the page is about and shows ads accordingly. Perhaps I need to tweak the template text, meta description, and meta keywords - but again, Google has final control over what ads are run.
In my own opinion, yes. If I were the owner of the site, I would use a Link Ad just below the The Online Slang Dictionary with the same background color and white text color.
And I would add another Link Ad below Welcome to The Online Slang Dictionary this time with a white background.
Then I would remove the big vertical ad and place it in a box similar to the "Subscribe to updates" and "Bookmark or share" boxes and label it with maybe "Other Resources" -- after all Google text ads are relevant :P
I actually used to have the big vertical ad ("wide skyscraper") in the right sidebar (where the "Subscribe to updates" etc. boxes are) but it performed terribly. Putting it where it is now increased clickthroughs by something like 10x.
If it were a very cool clustered, load-balanced setup with analog monitoring gauges set up in a nuclear bunker, then I'm guessing we would've heard about it by now :) So it's probably just your good old boring rack server.
On a weekday we get about 350k pageviews from about 30k unique ip addrs.