Hacker News new | past | comments | ask | show | jobs | submit login

Hi saosebastiao,

As Pat points out, we definitely look forward to implementing some more computationally-intense request types in the future. This round does include the first server-side template test. We'd like to hear the community's opinions about more tests.

That said, I feel most of the frameworks' implementations of the existing tests are not cheating. Our objective in this project is to measure every framework with realistic production-style implementation of the tests. No doubt there is temptation to trim out unnecessary functionality and focus on the benchmark's particular behavior. We have attempted to identify any such tests that remove framework features to target the benchmark as "Stripped" and those can now be filtered out from the list.

In other words, our aim is that the implementation of each framework's test is idiomatic to that framework and platform. And if that's not the case for a test, we want to correct it.

Your concern could be clarified by pointing out that framework authors may be tuning up their JSON serialization, database connection pools, and template processing in order to improve their position on these charts. And, to be clear, I have already seen evidence of that in my interaction with framework authors. To that concern, however, I would say: That is awesome. I want those features to be fast.




I would like to pile my thanks onto this list as well. I'm the author of Phreeze and I can say that I'm grateful that fairness is being encouraged. There is certainly glory in ranking well on any benchmark and I have to admit, as I was implementing the tests in Phreeze, I saw many opportunities to "cheat." For example, skipping the framework routing, not using the "proper" way to communicate between the layers, etc and substituting things with "raw" code would have potential to skew the result. I feel that would be missing the entire point of a benchmark, so I'm glad that is being considered.

I can also say that this benchmark inspired me to take a hard look at class loading and I was able to make some improvements to the framework's efficiency in general. So, in a way, I did some tuning - not for the benchmark, but rather as a result of the benchmark. Thanks to this benchmark all Phreeze users will gain a little performance.

I would also like to suggest a test idea. I think the biggest challenge for frameworks comes into play when you have to do table joins. Something like looping through all purchase orders and displaying the customer name from a 2nd table - that would be a very real-world type of test. I think foreign key type of queries are more telling about an ORM than a single table query.

Thanks again!


Jakejake, perhaps I've said it before, but it bears repeating: your reaction to and participation in this project has been precisely the kind we hoped it would see (but weren't sure we'd actually see in practice). Thank you very much for joining in and having fun with it. It sounds like you've been able to get some increased performance from your tuning, and I hope you don't mind us feeling a little bit of pride in having inspired that.

Some readers may feel we are attempting to paint some frameworks in a poor light. Yes, we do have favorites, but we are absolutely intent on keeping this open and fair. If we're doing something wrong, help us fix it! A pull request is very happily received.

When I read reactions of that sort, I selfishly want to point the author to Jakejake's comments to demonstrate how awesome it is to see a framework improving. Speaking of that, I want to eventually have the ability to show performance over time (e.g., compare Round 1 to Round X) as a potentially interesting illustration of a framework's intent to improve performance.

Also, thanks for the idea for a future test. That sounds like a good one.


Can you share with us the tuning you did with class loading for instance ? thanks for your comment.


Oh sure, nothing to complicated. Basically I just happened to notice that I loaded several classes that were not always needed. I was able to tune up the framework to load some of them on-demand instead.

One example is that the framework loaded an lot of MySQL classes whether or not you do a DB query. So, now I wait to initialize the DB stuff until after you make a call that requires it. Phreeze has always been lazy about opening the DB connection, but now it's even lazier and doesn't even load the classes until you need them!

There were some other utility-type classes like XML parsing and such that probably don't even get used much. So that is lazy loaded now too.

For a non-DB request I was able to get it down from about 37 files that loaded to around 20. For a DB request I think it's still around 30 files, but I definitely consider that a performance improvement. The benchmark led me to scrutinize what is being loaded so I think it has already improved the framework.


The logic you have previously posted on HN for these benchmarks is that they measure the minimum overhead available on the platform, so that you cannot get faster than the benchmarked numbers. If a framework is too slow, the framework-chooser can exclude it from consideration for because the resulting project just can't be any faster than the framework benchmark. Sounds reasonable.

Except now it is clear that you are refusing optimizations for some frameworks due to a vague, aesthetic judgement of 'stripped'. Which now means that you actually aren't measuring the minimum framework overhead. You are measuring the overhead of the defaults, or the overhead of not taking optimization seriously, with large amounts of performance left on the table. Worse, selectively applying optimizations means you are comparing one framework's defaults to another framework's minimum overhead. And since you have abandoned minimum overhead, it now makes very little sense about why we are measuring performance independent of normal first-resort tactics like caching (who is running Cake without caching?)

If you were going to do that, you should have benchmarked defaults right down the line and allowed a full, normal range of simple deployment optimizations. Instead we have selective optimization and totally unrealistic deploys, so it really indicates very little.


Hi Pekk,

I'm not sure where you get the impression that we are refusing tuned tests (what we call "Stripped" tests). We have accepted two of those and would accept further tests of that nature. An implementation of course still needs to work and meet the obligations of the test scenario. For example, each row must be fetched from the database individually and the response must be serialized JSON. We did "reject" one test that fetched all 20 rows using a WHERE IN clause, but that implementation was quickly reconfigured by the submitter to match our specification.

We are expressly not including reverse proxy caches in these tests. We're not benchmarking the performance of the nginx proxy cache, Apache HTTPD's proxy cache, Varnish, or anything similar. You can find such benchmarks elsewhere. We are benchmarking the performance of the application framework for requests that do reach the application server. The tests are intended to be a viable minimum stand-in for application functionality in order to fulfill requests that, for whatever reason, reach your application server.

If the scenario is difficult to conceive, imagine your site cannot leverage a proxy cache because every request is providing private user information.

To be clear: none of the frameworks are being tested with a front-end cache.

Also presently, none of the tests use a back-end cache either, but future tests will include tests of back-end in-memory and near-memory caches.


I think quite a few of these frameworks were tuned for this benchmark but it is not marked as stripped.

For example, Yesod has client session and logging disabled. I'm also sure that quite a few frameworks have logging disabled.

Does that not count as "stripped" since it deviates from the norm for deployment?


Hi Apkdn,

These are very good points you bring up and I will need to address them in the site's FAQ in addition to this response. I would appreciate any follow-ups as I am open to revising the opinions I include below.

First, if there are any specific examples of frameworks that have been mis-characterized, I would appreciate that we address each individually as a Github issue. For example, I will create an issue to discuss the Yesod test and its session configuration [1].

Here is our basic thinking on sessions. None of the current test types exercise sessions, but if the test types were changed to make use of sessions, session functionality should remain available within the framework.

If the a particular test implementation/configuration has gone out of its way to remove support for sessions from the framework, we consider that Stripped. If session functionality remains available but simply isn't being exercised because the test types we've created to-date don't use sessions, then at least with respect to sessions, that is Realistic.

Logging is an important point that we need to address. We intentionally disabled logging in all of the tests we created and will need to be careful to review the configuration of community-contributed tests to do the same.

You're correct, disabling logging is not consistent with the production-class goal. So, why did we opt to disable logging? A few reasons:

* We didn't want to deal with cleaning up old log files in the test scripts.

* We didn't want to deal with normalizing the logging granularity across frameworks. (Or deal with not doing so.)

* In spot checks, we didn't observe much performance differential when logging is enabled.

We're not unmovable on logging, however, and if there is sufficiency community demand, we would switch to leaving logging [2].

[1] https://github.com/TechEmpower/FrameworkBenchmarks/issues/25...

[2] https://github.com/TechEmpower/FrameworkBenchmarks/issues/25...


I fully understand why logging is disabled. What I am just pointing out is that the numbers that you see are probably not indicative of a framework's production performance. I realize that logging does add another variable to the mix but in my opinion, it is something worth knowing as it gives an idea of the actual performance of a framework. And on the contrary, I find that logging impacts performance noticeably depending on the implementation and granularity. I also think that cleaning up the logs on server shutdown should be fairly trivial. However, there are the cons you also listed that's quite a compelling argument for disabled logging.

As for sessions, I just used Yesod as an example but it applies to all frameworks and other "middleware" as well and this is something I am mixed on. Some platforms do not support any middleware at all so should these also be classified as "stripped" or "barebones" also? What I'm getting at is, is this really a fair comparison? From a glance on the benchmark page, it is not apparent which frameworks have which configuration or feature if you're not familiar with the framework itself and it can get really complicated. I think labeling the frameworks in terms of size is a huge step in the right direction but my belief is that more information is needed.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: