I'm happy that I received so much feedbacks, thank you all!
However I see a few comments that missed a bit the point, and this is probably my fault because I did not specified some very important thing:
Once you substitute not just an hello world template, but a few templates N times (since you have N comments in the page), the performance starts to be so slow you can't serve more than about 10 requests per second, just because template substitution is so damn slow if you don't preload templates, and the obvious way to do it is calling :erb (you see this in all the Sinatra code around, in the examples, and so forth).
10 requests per second instead of the 1000 you get even with 'N' includes if you perform a faster substitution, or if you preload the templates, or using any other trick, is a huge difference.
I don't get your article. You're yak shaving, bothering with meaningless issues.
In case you want raw rendering speed, you don't choose PHP or Ruby or any other high-level language. You go for straight C/C++ instead -- that's what companies like Google do; that's what Adobe did for Photoshop.com, that's what Yahoo does too for various web services they provide.
We pick PHP, Ruby, Python or whatever because there's a point of diminishing returns. For our apps 10 extra requests per second do NOT matter.
But 1000 reqs per second do matter, but for a real world app, not for a dumb hello world that doesn't do anything -- and even that may be insufficient, as depending on your use-case you may need 100000 requests per second out of a single server.
AND performance on the web == page rendering time, not requests per second. It gets pretty damn hard to have 300ms page loads, and IMHO this is where template rendering could make a difference, but again, not that much.
"you don't choose PHP or Ruby or any other high-level language".
Why? There is no technological limit, the template substitution should be fast, with minor coding effort it can be fast. It's just lame code. We are not speaking of algorithmic code.
Probably language speed is not involved at all, it is just as lame as the framework reloading the template at every page view. And even more lame than that given that even my test loading the template every time was faster.
So, how much do you spend on web servers (not including databases) per week? You could fix Ruby templating, or just multiply that by 10. Or you can try to overhaul the templating. It's a boring problem.
Databases are much more interesting. You simply can't scale databases up trivially. A N times linear improvement in the code (due to, say, caching) means you don't have to throw N^2 servers (if that's how your database scales) at the problem.
Linear load reductions have superlinear cost reductions for things like databases. Linear load reductions have linear cost reductions for web servers. This is why people don't care so much.
Though yes, it would be nice to see Ruby templating a little bit faster.
In my freshman year (7 years ago) I was an efficiency and premature optimization zealot, despising anything that wasn't written in C/C++.
Then I met a hacker that told me how awesome dynamic language were (in particular Python and Lisp). I told him "but they are some 10-20 times slower than C!". And he replied "Yes, but it takes 10 times less time to code. If the outcome is not fast enough, you can spend the rest of the time optimizing". Since then, Python has been my language of choice, relegating C++ to the optimization of the (few) bottlenecks.
Why is this funny? Because that hacker was antirez :)
It's unfortunate that he included benchmark numbers because people will discuss the numbers rather than his main point. Since we're discussing the numbers anyway, antirez, were you running Sinatra in production mode? My benchmarks on my MBP look like this:
"hello", development mode 1620 req/s
erb :index, development mode 1000 req/s
"hello", production mode 1620 req/s
erb :index, production mode 1350 req/s
I assume Sinatra is doing some kind of template reloading in dev mode which may explain the speed difference?
That said, I think antirez is right that there are lots of opportunities for making ruby faster. I'm particularly hopeful about JRuby's use of invokedynamic and all of the work that is going into Rubinius.
The thing is, once you're able to scale out horizontally then you're back to riding the Moore's Law cost curve. If my ruby app requires 4 machines now, it'll require 2 machines in two years. Should I pay the up-front cost now of writing it in something lower-level? It depends, but in lots of cases probably not.
Hello Mnutt, I'm running the test into an MBA 11", it's very very slow, this is why this numbers are so different. The MBA is a good development machine as everything seems slow even if it is just a bit slow :)
Btw the PHP code is reloading the template at every request for sure. I pasted the example code I used on the blog.
Sorry, I just posted numbers to show the difference between dev mode and production mode.
I guess what I don't understand is: if you're running a site in development, isn't 250 req/s enough? And if you're running in production, do you need to reload the template on each request?
Ok my fault that did not said this in the blog post.
I was playing with Sinatra, and my real page involved substituting a few nested templates, with different parts of the site. News box, comments boxes, ...
The result was... 30 requests per second once a few templates started to stack, and with mostly toy code.
It is easy how things can get worse just doing a few mistakes along this path.
It would be interesting to see the results of this simple experiment for different web frameworks: almost empty template and no other processing => how many requests per second? If it's much slower than i/o times, something should probably be done about it.
This would be much better experiment. Who does web apps on bare languages anymore? And any decent framework will have some caching implemented, so you don't render the same template if nothing changes.
"Your only bottleneck in a web app should be: the databases, the workers. There are no excuses for the page generation to be slow. In the web speed is scalability because every web server is conceptually a parallel computation unit. So if the web page generation takes 10 ms instead of 100 ms I can server everything with just 10% of the hardware."
If the database is the bottleneck then the speed of web page matters much less. A page with a single 50 ms query will take %40 of the hardware (60 ms versus 150 ms).
This is even much less of an issue if you take into account how long it took for his test. PHP served 1500 requests per second vs Ruby's 250 requests per second which is equal to 0.7 ms per page and 4 ms per page. Assuming again you have a single 50 ms database query you are looking at 50.7 ms vs 54 ms which means you will need ~94% as much hardware. This is assuming that the database and webserver are on the same machine.
If one puts them on separate machines then the time of execution does not matter as long as the time it takes to query the database is less than the time it takes to render the page. Now this is bad in terms of page load as 50 ms + 49ms for page rendering is much more than 50 ms + 0.1 ms but in both cases you will be able to serve the same number of requests per second. This of course assumes that this is running in a multi threaded environment which allows one thread to sleep and other threads to start while waiting for a response from the database.
Slow template rendering only matters if your pages are heavily personalized for each user. Stick Varnish in front the page and serve over 2,500 request/sec and use whatever framework you are most comfortable with.
all things can not be cached heavily. And even for things that can you still need to regenerate the cached data to some degree. Code that is ten times slower will require 10 times more hardware even if you cache. The cache only makes the number you're multiplying by 10 smaller.
This is why Facebook spent so much time and energy working on HipHop. I'm sure they have many caching layers as well, but caching only gets you so far -- eventually you have to look at your code.
interesting how things come and go and come back. Some years ago it was Yahoo moving their web layer from C++ based templating to something of more higher level, now Facebook [implicitly] moves their layer down to C++ ...
HipHop engineer, here. Unless you consider programming in C++ to be [implicitly] programming in machine language, this doesn't make a lot of sense. Our application engineers write real, no-fooling PHP, at exactly the same level of abstraction as they always have. They usually don't bother compiling it either, instead iterating using our interpreter which behaves more like a drop-in replacement for Apache+Zend.
I've been looking at HipHop lately -- can you give me a general idea of how much you have to work around HipHop for it to still compile? I know that the references I've seen to it have all indicated that it might choke on certain types of code, but I'm curious as to how mindful you have to be?
If you stick to the brightly lit parts of the language, things will work. The Facebook codebase was and is enormous, so the vast majority of PHP has to work as advertised. The one big thing that's ruled out is eval, but other wacky PHP tricks ($f(), $C::staticMethod, $obj->$methodName(), foreach order, etc.) work. Outside of eval, you have to go out of your way to break it.
That said, it's a young project, and things can be a bit rocky. Including compilation in your deployment process is also a pain; do not kid yourself about that. It's kind of "industrial strength" in general; unless you care about how many PHP requests you can squeeze out of a unit of hardware, HipHop doesn't have much to offer.
This is logically just nonsense. So the php compiles to C++ now? The devs still write in php, they don't move to another language (well they use a somewhat restricted set of php, but a lot of places do that anyway for various reasons).
It is as silly as saying anyone programming in a JVM language is just programming in Java, or anyone using a language which compiles to machine code is just using assembler. (implicitly of course!)
It seems a lot of people over look ESI support in Varnish. Proper use of ESI in pages that have content sections that update at different intervals can make a large impact.
I agree, not only because I've run benchmarks myself but because simpler and faster code is possible. Just a comparison of everything Sinatra does when rendering (http://goo.gl/6U7yr) vs what Cuba does (http://goo.gl/z1Oes) can shed some light on this issue.
Antirez why did you title the HN post as "almost the same thing". They ARE the same thing. You are 200% correct. Lets turn it around: Is there an example where speed and scalability are different? Maybe the level of concurrency which behaves differently because performance can drop suddenly when you max out, but really that's just another type of speed. Even in an async environment like node.js you may handle a lot of connections but if there's no speed they hang around for too long waiting for something to return so we're back to speed === scalability in all environments.
Lets turn it around: Is there an example where speed and scalability are different?
Well, actually there are a lot of such examples. For instance consider the idea of caching data in shared memory. This is very fast. But the second you've done it with something that has to remain consistent across requests, you can't have 2 webservers any more. You're fast, but can't scale. (Don't laugh, I know of a number of websites that made this mistake at some point in their history. http://www.kuro5hin.org/ and http://slashdot.org/ are two good examples.)
Concurrency and locking issues provide lots more examples. Having fewer locks over large things is much faster than having lots of little locks. But you get less concurrency. For a toy website, MySQL with MyISAM is fine - lock whole tables. Row level locking as done in Oracle, PostgreSQL or MySQL with INNODB is much slower, but it scales.
I know that you think that this is "just another type of speed", but it really isn't. If a critical lock somewhere deep in a database is operating at 95% of maximum capacity, there is basically no visible effect on performance from the lock contention. At 101% of capacity, your database falls over. The characteristic failure mode isn't that you get slower as the request rate increases. It is that you max out your capacity and then crash if you try to go faster. I've been there, and it isn't fun.
Now with all of this said, there is a large, obvious, connection between speed and scalability. Suppose that you are comparing 2 languages, one of which runs twice as fast as the other. You can scale the slow one - just run twice as many boxes. But now you need twice as many database connections. Holding those connections open consumes resources, so you need a bigger database. And so on.
Therefore the faster environment can frequently push off the point at which you start encountering other scalability issues. But speed and scalability are not at all the same thing.
I can give you an example where speed and scalability are different.
Step 1: Create two WordPress blogs served by two Apache servers — one with KeepAlive on and one with it off
Step 2: Benchmark the speeds — you'll see that the KeepAlive one is faster
Step 3: Get a link to each blog on Daring Fireball
Step 4: Notice which server is still accessible (hint: not the faster one)
The server with KeepAlive will fulfill requests faster up to a certain number of people within a certain time span, but past that number it will simply start turning people away while the other server keeps delivering pages a little bit more slowly because Apache's KeepAlive trades capacity for speed.
At any rate, in a real production environment, there are measures you can take to make sure your site is both scalable and fast. Toy examples of poor configurations aren't very informative IMO.
Speed at scale and scalability are the same thing, but speed and scalability most certainly aren't. A server that performs really well on a single request, but slows down as requests are added would be considered 'fast', but not 'fast at scale'.
Yes, there are plenty of instances in which this is the case, and most notably, this is a very common affliction with databases. Also, even in scenarios where you can add more web servers to an application stack, you can't necessarily add more database instances, so you scale those UP instead of OUT.
In summary, the statements made in the article are true, but yours aren't really. In the case of web servers, yeah, they're almost the same thing. In the case of other things, they're generally not.
When you become profitable, it easy to optimize/fix CPU usage and speed (or at least it is not disruptive for your entire product) - it very hard to fix scalability part (including optimizing for memory usage).
Think about it this way: Speed is the number of requests/second a single server can process.
Scalability is the amount of computing power you lose when splitting a job amongst multiple machines.
Intuitively you can think of it like asymptotic complexity. You can have a really fast implementation of quick sort, but at a large enough input size it will be flattened by a very slow version of radix sort.
Similarly you can have the fastest web server on the planet, but if you scale poorly, at some number of nodes even the slowest albiet well scaling system will out perform you.
All I see is a guy who set up a naive test app in Sinatra (served by WEBrick for all we know) and found that it served fewer requests per second than a test app he wrote in PHP. Then, where most people would have tried to find the slowdown, he just skipped all that and decided that "apparently" Ruby is slow and "probably there is some lame reason why this happens." On the basis of one application with one particular set of libraries that he didn't even benchmark.
Like, if I run a C++ program off a 2X CD-RW drive in a computer with a Motorola 68000 and it's slower than my Ruby site is on my 8-core Mac Pro with an SSD, do I get to write a rant about how slow C++ is?
I can guarantee it. I was thinking of an actual computer I had, and the read speed of the drive alone was slower than the time it takes Mongrel to serve up a page with Rails.
It doesn't matter if you can serve 10,000 requests per second if the cost of that request exceeds what you're paid for it.
For most sites ruby is profitable, the value add from ruby is decreased cycle time between releases which can increase the profitability of your site more than increasing the pages per second can.
The vast majority of sites don't need to scale beyond a single server, and when you do need to scale I prefer things like drop in support for S3, drop in support for memcached and a whole host of other performance increasing techniques over raw pages per second.
The benchmark benchmarks the very simple case of a single PHP page, what real PHP app have you used that is a single PHP page? The request times for something like WordPress are insane because of the number of PHP files that need to be interpreted per request.
edit: To clarify the jist is that low margin activities where increasing the speed of your page by 5X would have a great impact on your profitability are not the areas where you should focus you efforts. Instead focus on high margin areas where you could write your pages in SnailScript and you'd still be profitable.
1) AMZN and GOOG have proven that pageload speed does in fact directly relate to profitability.
2) Regarding "The vast majority of sites don't need to scale beyond a single server," you could make the same point that the vast majority of sites are not profitable either. It's a total non sequitur in either case.
3) Your point that it "costs less" to develop in Ruby which can increase the profitability of your site is an incorrect assertion. It may be true that the time to develop in ruby can decrease your R&D costs, it does not affect your cost of goods sold and has no impact on your margins. If you just say "it costs me $100 less to develop in ruby, thus I've made an extra $100" you need to take an accounting course.
More to the point, if you are building your business with your optimization being focused on decreasing development time, you have already lost sight of the goal.
PHP sucks, but there are a ton of profitable companies that use it because it is wicked wicked fast. In my mind, that makes it suck less. :-)
On your point #3. Actually, you are incorrect. Profit is revenue - all costs. Whether or not it counts as COGS is only relevant for gross margin, a metric which isn't the best to judge software companies. R&D is an expense, so it affects your profitability and your net margin. If your company spends $100 less, that's $100 more you have in the bank, $100 less that you need to sell people on.
gross margin is unquestionably the metric used to gauge software companies. That's why we are so scalable. Especially SaaS. It's because our COGS are typically so low that we can afford relatively large R&D budgets (compared to other industries).
Lots of software companies have gross margin's north of 60% or even 80% (GOOG, afaik) which is unheard of in other industries.
If you focus your business on growing revenue and having a reasonable gross profit margin, you are focusing on the right knobs. If you are focusing on decreasing R&D as a means of maximizing net profit, you are focused on the wrong things.
> PHP sucks, but there are a ton of profitable companies that use it because it is wicked wicked fast. In my mind, that makes it suck less. :-)
I think in this mini benchmark, it's not so much that "PHP" is fast (as a language, it isn't, really), rather that Ruby + its various frameworks are pretty pokey.
In any case though, while I remain a Ruby fan, the grandparent sort of misses the point: by being more efficient, you have a wider range of things that can make you money. If you have to have a huge amount of resources to get some pages up, doing ads or some other low-margin activity might not even be feasible like it would with a faster solution.
You're actually on to my point which is to stop doing low margin activities and focus on high margin activities. Low margin activities are where writing a page in C / C++ might be a really good idea. The performance critical parts of your site can always be rewritten in assembler if need be.
If your margin on a page request is 5% your business is probably fucked anyway (unless your serving billions of pageviews per month, there are only a few sites that work on this business model). I'd much rather have a business with orders of magnitude fewer requests and a profit margin on the order of 10s of thousands per page request.
What do you think 37Signals margin on a page request for basecamp is? My guess is at least 10,000%.
"You're actually on to my point which is to stop doing low margin activities and focus on high margin activities."
That's something I would certainly endorse. I didn't get that out of your post, I completely agree. Engineers who start companies often overlook this fact and focus on the entirely wrong set of problems when building their business.
My point is that out of the gate I'd rather have 2011 tools running on 1999 hardware than 1999 software running on 2011 hardware. A 2011 dev team will run circles around a 1999 dev team because their tools are better and allow them to iterate much more quickly. By the time your startup overwhelms a Pentium 3 server you should have enough profit to by new stuff.
Good luck getting a PIII to serve Ruby on Rails content fast enough to keep people interested and cheap enough to still make a profit from advertising.
I actually tried this recently. Page load speed isn't too bad at all, but the time it takes to start up the rails stack is awful. It's unusable for dev work.
AMZN and GOOG have proven that pageload speed does in fact
directly relate to profitability.
Let's not forget that's the client side speed not a server side. The difference is important: the time to generate html on the server may be just 10-20 percents of total load time.
On the other side, client-side optimization does help your servers: imagine if you cut from 60HTTP requests to just 6—and did that with proper caching policy, so your servers won't be hammered on subsequent page loads just to answer "304 Not Modified".
Ever since I got interested in client-side optimization (a bit over two years now) I am amazed how neglected this aspect is. I may be biased, of course.
What's stopping you from profiling your rails app after you finished it before the home built php solution? You always want to decrease development time, product out the door is what matters as Windows and Linux/Unix prove that even with superior alternatives (plan9, inferno) being first and with momentum is infinitely more valuable.
As per point #3, remember that you amortize the cost of r&d along some payback period. As such it does effect your margins.
I do agree with your point though. Decreasing development time is the means to the end of making more money out of the product and making a greater return for your (money/time) investment.
Page load speed does relate to profitability, but your language choice most likely won't move the needle so far that you notice a difference. My startup's Rails-based homepage averages ~11ms render time.
That's all well and true but honestly, how many currently unprofitable sites would become monstrously profitable if they could increase their requests per second by a factor of 5?
The study you link to shows a decrease of in rev of 5% by adding 2 seconds of load time vs. 50 ms. There are very few businesses whose costs are dominated by servers. Most spend more on a single developer than servers. Yes, Google, Twitter, and Facebook could probably do well by doubling their requests per second but the average companies costs are dominated by employees, that why it's called Ramen profitable and not EC2 Micro profitable.
There is nothing in 99% of Ruby slowness culture that is due to optimizing for the programmer IMHO. It is just the missing profiling step or the right design to be fast most of the times. So many things can be improved without killing any abstraction opportunity.
On the other side, profitable or not, users don't like to wait that 200 milliseconds more because the code is not written in the right way.
However I see a few comments that missed a bit the point, and this is probably my fault because I did not specified some very important thing:
Once you substitute not just an hello world template, but a few templates N times (since you have N comments in the page), the performance starts to be so slow you can't serve more than about 10 requests per second, just because template substitution is so damn slow if you don't preload templates, and the obvious way to do it is calling :erb (you see this in all the Sinatra code around, in the examples, and so forth).
10 requests per second instead of the 1000 you get even with 'N' includes if you perform a faster substitution, or if you preload the templates, or using any other trick, is a huge difference.