Hacker News new | past | comments | ask | show | jobs | submit login
Million requests per second with Python (medium.com/squeaky_pl)
452 points by d_theorist on Feb 1, 2017 | hide | past | favorite | 169 comments



>HTTP pipelining is crucial here since it’s one of the optimizations that Japronto takes into account when executing requests.

https://en.wikipedia.org/wiki/HTTP_pipelining

> Of all the major browsers, only Opera based on Presto layout engine had a fully working implementation that was enabled by default. In all other browsers HTTP pipelining is disabled or not implemented.[3]

>Internet Explorer 8 does not pipeline requests, due to concerns regarding buggy proxies and head-of-line blocking.[7]

>Internet Explorer 11 does not support pipelining. [8]

>Mozilla browsers (such as Mozilla Firefox, SeaMonkey and Camino) support pipelining; however, it is disabled by default.[9][10] Pipelining is disabled by default to avoid issues with misbehaving servers.[11] When pipelining is enabled, Mozilla browsers use some heuristics, especially to turn pipelining off for older IIS servers.[12]

>Konqueror 2.0 supports pipelining, but it's disabled by default.[citation needed]

>Google Chrome previously supported pipelining, but it has been disabled due to bugs and problems with poorly behaving servers.[13]

That seems like optimizing for a benchmark.


HTTP/1.1 pipelining works beautifully for requesting web pages or other files in bulk from a single domain.

I would imagine "bots" like Googlebot make use of it.

I have been using it daily for 15 years with no problems. I mainly use netcat. This is how I digest large quantities of information as text not possible to do using a browser.

Most websites allow pipelining. Usual max is 100 requests per connection.

But there are rare exceptions where pipelining is disabled. For example, users cannot request pages of results for 100 different queries from Google in a single connection.

Efficient. But not allowed. Always wondered why.

Advertising-filled web pages opened in a "modern browser" are expected to auto-request files from many third party domains. I will call these "conglomerate" pages for lack of a better term.

Not sure HTTP/1.1 pipelining works very well for that. Hence HTTP/2.

HTTP/1.1 benefits users like me. HTTP/2 benefits advertisers, Google, and other companies in the web advertising racket, but not sure how it would benefit users except to serve them advertising and conglomerate pages more efficiently.

As has been posted to HN before the number of outgoing connections that "modern browsers" make to third party servers upon loading a single webpage is staggering. At least, it looks staggering if you have been using the web since the early days, before web advertising grew to its current proportions.


> But there are rare exceptions where pipelining is disabled. For example, users cannot request pages of results for 100 different queries from Google in a single connection.

> Efficient. But not allowed. Always wondered why.

The likelihood is that your HTTP requests are getting routed to a backend server/instance/whatever that knows about your query, and can thus return data about that query, but that a different search would get routed somewhere else, to an instance which knows about that query specifically.

> Advertising-filled web pages opened in a "modern browser" are expected to auto-request files from many third party domains. I will call these "conglomerate" pages for lack of a better term.

> Not sure HTTP/1.1 pipelining works very well for that. Hence HTTP/2.

> HTTP/1.1 benefits users like me. HTTP/2 benefits advertisers, Google, and other companies in the web advertising racket, but not sure how it would benefit users except to serve them advertising and conglomerate pages more efficiently.

I think you misunderstand how HTTP/2 works.

HTTP/1.1 pipelining allows you to send five requests to a server, in order, and receive five responses in order. Very straightforward.

HTTP/2 allows you to send any number of requests to a server, in any order, and for the client or server to provide a priority for each of them, and for them to download out of order or in parallel.

What doesn't change between the two is that you're still connecting to one specific server, the same as you were before in HTTP/1.x, and not any number of third party servers, for any reason, in any context. It has zero impact on third-party services like ad networks, trackers, or the like.


Let me clarify a few things about my usage. I like the responses to received be in order, and groups of say 100 responses to separated by a header (these can act as delimiters for my text filters as I import into my databases). I have no need for compressed headers especially huge cookies because I have no need for cookies. Nor have I any need for preferential serving of my requests because I only want what I ask for, not what the server thinks I need. Finally, I am not doing interactive "browsing" or "user experience" and do not need a "warm" connection for "pushing" anything to the server. I have no need for Javascript, CSS and other window dressing. I am requesting information, preferably 7-bit ASCII, in bulk. Boring and old, but useful. There is no room for advertising, garnering "impressions", etc. This is www information retrieval. For this usage, ye olde HTTP/1.1 pipelining works well. The companies/organizations behind HTTP/2 rely on advertising. If HTTP/2 has no benefit to the business of serving ads (i.e., serving very heavy pages which are heavy because they are full of ads), then I am not sure why it was introduced. It stands to reason that pages without ads, devoid of "interactive features" to entice users to click, and served without large cookies for tracking should not not need HTTP/2. But I am not an expert on HTTP/2, and I am not running a business that needs to serve web pages with ads. I am just an ordinary dump web user who prefers plain text, likes HTTP/1.1 pipelining but does not like ads and other web cruft.


So basically you have a very nonstandard workflow, and expect the protocol to be optimised towards your needs?


With all due respect for your comment, that is not how I see it. I adjusted my "workflow" to what was available for the past decades: HTTP/1.1 pipelining. It has worked for me with no problems. And I am sharing this experience with others.


As soon as I've read HTTP 1.1 and HTTP pipelining I was thinking the same. We want real world performance and tests with real world browsers out there. And HTTP/2 is the way to go for the future also (and not 1.1).


The client isn't always a browser though.


But it is often a proxy, and proxies are both clients and servers. The number of misbehaving proxies is large enough that it isn't reasonable to turn on pipelining globally, which is why browsers don't.


True, but that's not directly relevant to the observable problems with pipelining. Any client implementing the http/1.1 pipelining semantics will run into the same issues because of the limitations built into the design.


If you go like this:

client ---(HTTP/2)---> nginx ---(pipelining)---> japronto

Would that not take benefit of japronto's optimisation?


Correct me if I'm wrong, but nginx doesn't support pipelining outbound network requests.

Even if it did, the benefits described in the post (mostly due to making fewer syscalls and having fewer packets to parse) go away because now nginx has to do the work instead of Python. There's probably also some overhead required to collect multiple requests together to pipeline them. You'd just be moving the load around.


I would imagine that nginx would probably do a better job of dealing with web serving (pipelineless and otherwise) than python would, mostly due to age

And if the nginx process somehow ends up using more CPU than the python process, then using multiple nginx servers would be fairly easy, compared to multiple application servers


Would it make make sense to optimize for HTTP/2 then?


It'd probably be easier to put a reverse proxy in front of it that speaks HTTP/2 and reverse proxy the requests using HTTP/1.1 with pipelining enabled.


I guess the problem there is client support. The server still has to scale in the same way when legacy clients are connected, too.


That makes little sense. Of course it wouldn't scale in the same way for legacy clients that can't be optimized.


HTTP/2 is a big step forward, but I wish web servers and HTTP clients supported HTTP/2 over TCP (h2c), TLS encryption is useless for internal microservices.


Isn't that thinking what got google tapped by the NSA in those infamous prism slides? (Honest question, I don't know if there are other solutions)


Yes, and it wasn't just Google.

Client or even frontend https wouldn't have solved the problem in most cases (it is necessary but not sufficient).

The database (or key value store, etc) replication stream is a prime target, as are any backend protocols that let edge data center storage proxy internal requests for core data centers.


> TLS encryption is useless for internal microservices

This is very much untrue, especially if you have multiple datacenters, run in the cloud, or operate out of a colo facility.


Funny, TLS + internal CA + multiple roots is how my microservices determine who is allowed to talk to whom.


You probably don't want clear text. A number of major ISPs (and proxies) poorly implement HTTP. They are either buggy or intentionally modify your connection. There was a blogpost by Youtube a while back about how turning on SSL actually increased the speed and reliability of video serving.


Can you elaborate?


I would interpret it as:

a) HTTP/2 is often only supported in encrypted mode (h2). E.g. all browsers only supported the encrypted version, and lots of server side libraries then went the same route.

b) encryption might not be needed, if the services already communicate to each other in a secure environment (e.g. are located on the same host, run in an encrypted/secure network, etc.) In such environments the additonal encryption on HTTP level will only show it's downsides, like lower performance and additional deployment/maintainence concerns (certificate deployments, etc.).

I'm no expert in cloud/microservice environments, so I don't know that the usual configuration is there. But in general I would agree that there are some environments where HTTPS is simply not needed, like using it for localhost IPC on trusted systems.


I'm just thinking how safe of an optimization pipelining actually is. Let's say a browser has to issue 2 HTTP requests and decides to write them to a single connection using pipelining. Now the server decides that the first one requires a large response (file download) or even an infinite response (server-side streaming) -> The second response is stuck in that case. And the browser might not even be able to safely repeat it on another connection if it doesn't know that it causes idempotent behavior. Or is pipelining only allowed for GET requests?


Pipelining doesn't specify which requests are okay to be pipelined and which aren't, but pipelined requests should be idempotent, and POST requests should not be assumed to be idempotent, whereas GET and HEAD requests should be assumed to be idempotent.


I wish there was a header a server could send to declare it supports pipelining.


"HTTP/1.1"


Yeah. More like I wish implementors would follow standard.


Pipelining support is optional. It's up to the client to back off if its attempts fail.

  8.1.2.2
  Clients MUST also be prepared to resend their requests if
   the server closes the connection before sending all of the
   corresponding responses.


> HTTP/1.1 conforming servers are required to support pipelining. This does not mean that servers are required to pipeline responses, but that they are required to not fail if a client chooses to pipeline requests.

http://www-archive.mozilla.org/projects/netlib/http/pipelini...

8.1.2.2 that you quoted [RFC2616] continues:

> A server MUST send its responses to those requests in the same order that the requests were received.


> That seems like optimizing for a benchmark.

Perish the thought, we've never seen that before in claims about which languages are fastest


The project is very interesting but the benchmark results must be completely wrong.

On my machine, using https://github.com/rakyll/hey to test:

1 concurrent request: node.js does ~8000 req/second, go does ~9000, japronto does ~10000

10 concurrent requests: node.js does ~24000 req/second, go does ~45000, japronto does ~55000

Nowhere near a million. That's without pipelining though. Can pipelining really add SO MUCH performance?

UPDATE: tried wrk with pipelining. japronto does 700000, golang 150000, node 35000. Holy shit, pipelining is epic.

But the benchmark is not fair when only one server supports pipelining.


I'd rather stick with Golang here because it's not hiding anything. Everything in the Go stack is written in Go, and all your http handler (business) logic is written in Go.

With this mix of C and Python it's impossible to tell what will happen to the system's performance when you actually include significant amounts of business logic written in Python on top of the C framework here. The author even makes a point about how little actual Python code and/or data structures are in use. If use of the host language is discouraged in the framework, how can I trust the performance of the framework with my code written in the host language?

I bet if you put anything non-trivial on top and try to connect with real-world http clients which mostly don't use pipelining, it will fall over hard. Enjoy debugging what you can't understand.

The emphasis on http parsing using SSE intrinsics is odd, as http parsing is rarely a bottleneck. I'm not saying it can't be, but that'll only be when you've got the rest of your stack highly tuned, performance is predictable, and profiling has shown that http parsing is your actual bottleneck. Even so, http/2 in theory alleviates this bottleneck and you don't have to use processor-specific intrinsics, as well as providing a better solution to pipelining problems with its multiplexing of requests.

EDITED: "how can I trust the performance of the framework with my code written in the host language?" Original text lacked emphasized addition.


Python always made it possible to use C, that's a design feature. It's ridiculous to characterize this as untrustworthy.


I only meant trustworthiness in terms of predictability of performance.

The point was that Go's http stack and standard library and runtime being written all in Go gives me confidence that any Go code I write on top will enjoy the same performance characteristics and that there will be very few surprises. It's predictable due to its uniformity.


I assume your OS and hardware firmware is also written in GO...


Performance is a non-predictable good on macroscopic (application) scales.


This is also why I am looking into more analytics work in Go. Uniformity of performance across different data analysis processes.


I wouldn't say untrustworthy, but it's definitly misleading to say python achieves those performances, when it's actually calling into C for most of it.


It may be misleading to imply that this is pure Python, but if the C is encapsulated well enough that someone can write their server solely in Python then it's easy to argue that this is Python. After all, Go has bits that are written in assembly, but we don't have to qualify every Go benchmark with an asterisk and a footnote to mention that.


Well, I guess that's a good argument. Though I feel like unless it's part of the base language implementation, you'd have to consider it an extension to the language.

So CPYTHON can not in fact deliver those performances. But CPYTHON can be extended through Japronto to achieve them. I'd say it's still misleading, since casually when we refer to Python, it means CPYTHON.

Go's primary implementation provides all the fast goodies needed for fast performance. There could be a slow Go compiler which didn't, but when you say Go, it refers to Google's implementation.

You could argue that at least, CPYTHON makes it easy to extend the language with C. So Python language extensions are trivial to use compared to some other languages. So at least I'll give you that.


That's how scripting languages have always worked. Typically the reason one language is considered scripting while another like Go is not. I tend to like the Python arrangement more because it's almost as high level as you can go and even pure Python performance suits me. But they're both good setups.


many parts of the stdlib in python are written in C, so C and python are really working close together in python.


Yes, because the real bottleneck is TCP. Pipelining reduces number of packets sent. However, very few clients actually support pipelining, so it's almost useless


Also, it's measuring the number of static responses to a single client. I guess there probably is an use case for this, but a more typical situation is many clients making one or a few requests, and also doing the TLS handshake.


Yeah, HTTP/1.1 pipelining is useless. But the good news is, HTTP/2 is pipelined (and doesn't have the head-of-line blocking problem). And HTTP/2 is supported by modern browsers.

But anyway… HTTP/2, or 1.1 pipelining for that matter, is usually terminated at the reverse proxy level. So it's really not necessary in a web framework! Just makes these unfair benchmark results possible.


I think that their benchmark isn't pushing the server hard enough (and having one "hey" requesting system may not be enough either?). And that if it were, you'd actually see the benefit of using fewer blocking OS threads.

If you look at the Plaintext TechEmpower benchmark [1], for instance, the "echo-prefork" Go benchmark hits 3.6M requests per second. fasthttp hits 2.8M. On a less powerful cloud server, fasthttp hits 850000 and echo-prefork hits 742000 (yes, fasthttp is faster on a slower system...the fun of benchmarks).

Not sure how fast your machine is, but I'm sticking with Go for my performance-critical code. As a parallel comment points out, Go is also fast no matter what code you're running, and Go's optimizations are throughout the stack, including some pretty extreme garbage collection optimizations, so when you have a complex, long-running server, GC won't be killing latency at a crucial time.

[1] https://www.techempower.com/benchmarks/#section=data-r13&hw=...


I measured single thread because the author measured single thread.


I missed that fact.

In that case, the author is measuring something that's completely useless. The entire advantage of Go is that it has really good multithreaded asynchronous behavior.

A benchmark that's measuring single thread performance of a task that's optimized to do well in a single thread (i.e., it doesn't actually do anything other than return static text) is entirely worthless.

If you're going to be returning static text, may as well compare to Nginx, which I'm certain can return static text even faster. If you're going to be doing processing in Python, then do at least something.

And run in multiple threads, since that's Go's native environment.

Rating: PANTS ON FIRE. [1]

[1] Not your comparison, but the original author's performance claims.


The author doesn't care about the advantage of Go, he cares about fair comparison to his project, which is single-threaded Python.


> But the benchmark is not fair when only one server supports pipelining.

Anyone know why go went from 45kreq/s to 150 when pipelining was enabled - without supporting pipelining? Or is your last run with yet more concurrent requests?


'coz enabling pipelining for the client has some gains (all requests in one package). Even if the server won't pipeline.


But isn't the flow then:

  1) client: gimme 10
  2) server: here, have 1
  3) client: ok, gimme 9!
  4) server: here, have 1
  ...

?


No. Client sends one packet containing the 10 requests. Server sends 10 packets. Basically, the client avoids 9 extra round-trips and sits and waits for the ten replies. However, this whole thing has many cons, that's why is not even used anywhere.

It's more like this:

    1) client: gimme 10, gimme 9...
    2) server: here, have 10
    3) server: here, have 9
    ...
Edit: There is not always a 1:1 relationship between packets and requests/responses, but it's easier to explain that way. You end up sending less packets when pipelining, plus you don't need to wait for a response to send the next request which also saves time (i.e: client has sent every request and server don't have to wait for the client to receive the response to send another request).


Have you tried running benchmark using Gatling? This load testing tool seems promising and more accurate others I've used.


ITT people complaining that the underlying library is written in C.

Who cares? The author wrote the C code and Python bindings for his library so we can _all_ benefit from handling more requests per second just by writing Python code and using his framework.

I appreciate the work and will be exploring its potential use. Thank you squeaky_pl for writing it.


It matters because all the heavy lifting is done by C and has little if anything to do with Python itself.

You could give the same treatment to virtually any other host scripting language (Ruby, Lua, PHP) and then make outlandish claims that that host language is now capable of ridiculous things when it had nothing to do with the alleged success other than that it exposes a FFI that lets you create something like this.


He never claims anything to the contrary.

"All the techniques that were mentioned here are not really specific to Python. They could be probably employed in other languages like Ruby, JavaScript or PHP even. I would be interested in doing such work as well but sadly this will not happen unless somebody funds it."


> You could give the same treatment to virtually any other host scripting language (Ruby, Lua, PHP) and then make outlandish claims that that host language is now capable of ridiculous things when

Making extensions in C and FFI are a feature of Python. Should we then say that the scientific community is not really using Python because it's using all the C modules in numpy and other optimization library to do all that fast work?

> You could give the same treatment to virtually any other host scripting language (Ruby, Lua, PHP) and then make outlandish claims that that host language is now capable of ridiculous things when

Nothing wrong with it. Then you'd have a fast framework in that language two. You can have 3 fast frameworks in 3 different scripting languages. That is totally ok.


Your same argument can be applied to virtually any other language:

This ruby benchmark doesn't matter because ruby is written in C!

This nodejs benchmark doesn't matter because v8 is written in C/C++!

etc.


> Your same argument can be applied to virtually any other language

No it can't. Each of those languages you listed (as well as Python) has its own runtime system designed to implement the desired semantics and features of the language. When writing a C extension to one of these runtimes, you're escaping the confines of the runtime system in order to improve performance or capabilities not native to the host runtime.

The point isn't "because it's written in C"; the point is that it's not written in Python and therefore the results of the benchmark cannot be attributed to Python, to say nothing of the merit of the benchmark in the first place.


The article mentions that the framework avoids creating python datastructure (e.g. A dictionary to read HTTP headers) to improve benchmark performance. It's a clever trick, but I am unconvinced that A less-than-simple application will demonstrate performance anywhere close to what was claimed in the article.


I think his argument is that ymmv as you add more python code to this native-heavy benchmark.


ITT: Web developers being surprised by the meaninglessness of microbenchmarks.


No, we're not surprised by microbenchmarks. We're surprised by the amount of investment in trying to improve the foundations of toppling-over buildings when that investment could be better spent elsewhere.


Go is written in Go.


That's how it always been. The whole aspiration to have code Pythonic is to make it look like business logic, and you can't do this without abstracting things. This is how for example uvloop adds performance when using it with over asyncio.

I was able to do network streaming (including live encoding) of audio and video in Python. Do you think that would be possible if it was native python code? Of course not, I used GStreamer which did that in C. My objective was not to write an encoder but stream the video, so as far as I am concerned my code was in Python.

One of big strengths of Python is that when you need to improve its performance you just need to locate bottlenecks and rewrite them in a faster language. This is especially great when you want to do it in code that is not your primary objective, like in this project, you as a developer care what your code returns on specific requests you don't really care about how service handles each request etc. This is what a high level programing language is about.


>It matters because all the heavy lifting is done by C and has little if anything to do with Python itself.

Except it's all exposed via a python API that can be easily installed with pip and you can use it without even realizing that you're calling C.

This is how python was always supposed to be optimized.


Well, it's certainly great work but I still can't shake a feeling that "million RPS with Python" and "million RPS with C code launched via Python wrapper" are slightly different things. Not that the latter is not good, but it's not as surprising as if it were actual Python code.


A million requests per second in Python, if you write all the code that does work in C.


You use it in Python. OP writes the C code for you.


As soon as you do anything else than return 'hello, world' in Python, your performance will drop dramatically. Just parsing a tiny JSON string a million times in python takes 7s (i5-2467M).

  $ time python -c "import json;
  for i in range(0, 1000000):
      json.loads('{\"data\": \"hello, world\"}')"

  real	0m7.055s
  user	0m6.951s
  sys	0m0.063s
It is no use optimising one part of a system if other parts take 99% of the time.

As it stands, the title of the post would be more accurately stated as 'million requests per second with C'.


json:

    $ time python -c "import json;
    for i in range(0, 1000000):
          json.loads('{\"data\": \"hello, world\"}')"

    real	0m3.142s
    user	0m3.098s
    sys	0m0.034s
ujson

    $ time python -c "import ujson as json;
    for i in range(0, 1000000):
          json.loads('{\"data\": \"hello, world\"}')"

    real	0m0.444s
    user	0m0.399s
    sys	0m0.028s
Like with many things, it's a question of being smart about your choice of libraries.

Also, just because I was curious:

    $ time pypy -c "import json;
    for i in range(0, 1000000):
          json.loads('{\"data\": \"hello, world\"}')"

    real	0m0.422s
    user	0m0.386s
    sys	0m0.032s


If you use a C library and have hardly any Python code running it obviously goes faster. That misses the point I was making. At some point you'll have to write some Python code, and that will lower the number of requests dramatically. If you're talking millions of requests per second, you will have maybe 10s of thousands of CPU cycles per request. That doesn't buy you much in Python. Case in point: parsing a trivial json string in Python eats up your budget.

Of course, most websites aren't dealing with anywhere near that volume of requests, and then Python is fine.

Pypy is interesting, but last time I looked it didn't support C extensions and hence can't be used with the poster's library.


FWIW, you can use ujson, which is much faster than the builtin stdlib json parser as well. I've seen some larger webapps in python do just this.


You might want to use the timeit module instead:

  $ python -m timeit --setup 'import json' "json.loads('{\"data\": \"hello, world\"}')"
  100000 loops, best of 3: 9.34 usec per loop


time python -c "import ujson; for i in xrange(0, 1000000): json.loads('{\"data\": \"hello, world\"}')"

0.291s

Well, that was an easy way to reduce 7 seconds to 0.3.


And then you get all the fun of debugging clang internals when things go wrong.


And as little actual python code in the routes as possible...


Funny that people bother to do this when there's plenty of other languages that will handle the load without issues.

Java Golang Rust Scala Maybe C# C

The semi famous techempower benchmarks shows several popular frameworks that can handle over 1mill/sec


pretty sure Phoenix with Elixir can handle that load too.


Elixir/Erlang is really slow. The strengths of Erlang/Elixir is easy multithreading, easy distributed computing (just as easy as local multithreading), easy fault tolerance and easy low latency. If you have to do something single threaded and performance critical then you do the same thing in Erlang/Elixir as you do in Python: FFI to native code.


I wasn't implying Elixir/Erlang was fast.

My statement was that it could handle the load. Taken into account the fact that the language is built around concurrency, it would handle the load within multiple threads without a hitch.


By using C for the hot paths (e.g. HTTP parsing) and python for business logic you can get code that is both faster and easier to maintain than, say, Java.


HTTP is not the bottleneck of anyone's application. Any interesting business logic will almost certainly cause Python to perform worse than Java, Go, etc. I'm also skeptical of the maintainability claim, because I write Python professionally.


Writing web code in c isn't my idea of easy to maintain


If you used this framework you wouldn't be maintaining web code in C.


That's why you write in Python and use this library


The Python part would slow the whole thing down again.


Anything not written in assembly would slow things down. But very few developers write web apps in assembly these days.


>wrk with 1 thread, 100 connections and 24 simultaneous

Say what? When I do this kind of testing using wrk:

  1. I use multiple clients
  2. Each client uses a thread for every logical CPU
  3. I usually use at least 256 connections
What is actually being tested here? This smells like a test to get a desired result.

No to mention in regards to Go/Others:

  1. He is using an RC
  2. By default it uses all all cores, why handicap it?
  3. He didn't share his code for his contestants, how can I trust this?


The contestants are in the repo. One core is used "to be fair" (the Python stuff only uses one core). But the test, yeah. What's being tested is the presence of pipelining :D


>What's being tested is the presence of pipelining :D

Which is silly considering a majority of wild browsers now support HTTP/2.

Thanks for pointing our the repo, I missed it before: https://github.com/squeaky-pl/japronto

His Go isn't horrible, but it could be written like this:

  func hello(respWtr http.ResponseWriter, req *http.Request) {
    text, status := []byte("Hello world!"), http.StatusOK
    if request.URL.Path != "/" {
      text, status = []byte("Not Found"), http.StatusNotFound
    }
    respWtr.WriteHeader(status)
    response.Write(text)
  }
Rather than: https://github.com/squeaky-pl/japronto/blob/master/benchmark...

Also, limiting Go to use a single core is unfair because the Go GC is optimized for multi-core operation.


Yeah the hard part is having customers. If you have that, PHP, 5 layers of proxies, worst case latency, who cares? The pig can be made to fly at that point.

"optimised for loopback!" - which is nobody's actual problem.

It's a cool thing though, just cynical I guess because it doesn't solve my actual problems which are all self inflicted and organisational. Thinking about deleting this comment because it would logically apply to every incremental technical advance :(


I don't think it's cynical at all. These edge-case optimizations really are just entertainment.

The author's claim that the other http servers are deficient is misleading.


This looks like a contrived benchmark to me. If that's really your use case, use nginx instead. The http parser might be useful, but it's not very well documented and not super readable at first glance. I do prefer not to parse http at every layer if I can but other people seems to like it.

Along those lines, what happened to uwsgi async mode? uwsgi has been rock solid for me, with a straightforward binary framing and hot loading, and it would be great to use it for the (admittedly, few) cases where async might be useful. It seems like one of those obvious good ideas that no one uses?


How does this get to the front page? Has nobody seen this? https://www.techempower.com/benchmarks/#section=data-r13

Much more thorough than this toy benchmark article, which reads like marketing release...


I certainly hadn't. Great site.


If you work out a way to model the performance of your own systems based on these benchmarks, then on the way you will discover what is wrong with them. Or switch to Urweb for all your future sites, I guess?


the source to all the benchmark is available https://github.com/TechEmpower/FrameworkBenchmarks

If you want to make your chosen platform fast you can see what they're doing in the src to get those results. The ones not marked "stripped" and with "full" for ORM are very realistic examples. Not sure where you're going with this? Nobody's real workload looks exactly like benchmarks.


I checked the benchmark code out. I have asked the author for more information. I am not sure if all the tests were limited to using only 1 core during the run. GOMAXPROCS=1 severely limits the Go version.

1 Go worker process can utilise all the cores in the server. I just want to get that clarified.


From the article:

> To be fair all the contestants (including Go) were running single worker process

That seems to imply that they were all limited to a single core.


A single worker process sounds like what it says it is. A single process containing possibly many OS threads. Just because it's a single process doesn't mean it won't use all cores. Where's the implication that process = core? GOMAXPROCS defaults to the number of cores in the system.


GOMAXPROCS is the number of cores used.


> To be fair all the contestants (including Go) were running single worker process.

This makes the benchmark results much less impressive. Sure its cool that this micro framework allows you to serve a million requests per second with a single process but no one is going to be running a single process per server in production if they value their uptime.


This is a great project. However, it not a very useful result (1M RPS in Python). Once you start layering more work into your handlers (JSON deserialization, admittedly C-optimized, business logic, IO, etc.), you can be assured that your HTTP serving layer is not the bottleneck.


I wonder how it would compare to the fasthttp libary https://github.com/valyala/fasthttp

I don't suppose the hand tuned microhttp parser supports h2?

Pipelining is a dead technology that was never really alive. For those who aren't familiar with it, it's based on a http "hack" where after http/1.1 was released and response end could be signaled via something other than EOF on the tcp connection they figure you can send n requests in a row and receive n responses in a row. This was in the days where good manners required you make no more than 2 simultaneous requests to a server.

Nobody ever really supported it, and the lack of support for out of order responses meant that any one slow request could slow down everything else. The h2 model allows at least hundreds of simultaneous requests and responses multiplexed over one TCP stream.


There is no pure python out there. It's python with bindings to the highly optimized C code. It would be fair to say it's C gives you almost million of RPS in some tweaked tests.


Can this be retrofitted into flask (just the performance improvements) ?

I would pay for that.. a lot (as much as I can afford). And I'm pretty sure so would a lot of others.


I'd also pay for this to happen.


I think, one of the main points of writing a framework in Python or another high-level language is the ease of extension in the future by the community.


Where are the "ease of extension" benchmarks?

What kind of data do people look at when they want to prove their language is better, or choose the best language?


Evidently there's at least one group of developers who mainly derive their decisions from irrelevant and often incorrectly conducted micro-benchmarks.


HTTP Pipelining should not be the base for a performance benchmark. It's an unrealistic scenario.

To make the benchmark a bit fair, add a graphic of memory usage, pipelining might have a huge impact on resources usage in the server side.


When something is too good to be true it probably is not. Even if it was written in assembly these are dubious results. The graph alone tells me the quality of the claims.


Sorta unrelated, but the author made an interesting claim that PyPy will reach 3.5 "conformance" in 2017 in the context of talking about NumPy. Does that mean PyPy is expected to work seamlessly with C extensions like those used in SciPy or generated by Cython?


Django needs this kind of performance boost, without depending on gunicorn/uwsgi servers


Nothing will save django for free because if you used that your most likely real problem is scaling postgres or mysql.

Use as many appservers as your customer base will pay you for.

Basically, cry in your beer because you're so rich and successful now that the framework won't handle your stuff.


Agreed. Scaling Django isn't difficult really. Just spin up another instance if you need to. The major problem is scaling the DB. Making sure you're not bogging Postgres down by making it have to deal with possible race conditions is difficult.


Why would you prefer to depend on this (Japronto), instead of Gunicorn or uWSGI?


I don't thing for now those are comparable, uWsgi is a LOT more than what a 0.1 release is.


uWSGI has very decent HTTP/1.1 performance

http://uwsgi-docs.readthedocs.io/en/latest/HTTP.html


A million requests per second with Python.

Spoiler: technically, it's not "with Python".


When did anyone start to believe that languages scale ?


I like the cute expression that you're referencing, but for a majority of projects, to a major extent, language choice greatly matters. If you're getting 10x performance from Go over RoR, that's 10 fewer servers you need to deal with and a lot less thinking about network concurrency.


Meanwhile, most web applications are content with 1/10th of the computing resources that are deployed for them, regardless of language or framework.


That's totally fair, but the underlying suggestion that "language choice isn't really important to scaling" is more what I was addressing.


what do you mean?


It might be a variant of the old argument that "languages aren't slow, implementations are", meaning that scalability is a property of the language, not the implementation.

In any case, this is written in C so I'm not sure it proves anything about the Python language.


It says something about the performance that is available from the Python language. This is how Python has always been, starting from the interpret and builtin data structures in CPython which are written in C. That isn't cheating, that's normal, realistic usage of the tool. Why should benchmarks go out of their way to use only unrealistic Python programs just to prove that it cannot "x += 1" as fast as C? Who is surprised by that, and why would it matter? Node uses libuv and nobody blinks an eye.


I think he meant, in a slightly acid (but valid) way, that languages don't scale, but concepts and good programming techniques do.

You can scale up whatever code you want in whatever language, you just need to structure it in a way that allows for that.


I love Python and use it as the main tool at my day job but I have to call it out.

> It lets you do synchronous and asynchronous programming with asyncio and it’s shamelessly fast. Even faster than NodeJS and Go.

and then,

> To be fair all the contestants (including Go) were running single worker process.

What is the point of such a benchmark? It is completely useless. You are deliberately crippling other tech before comparing your tech with them.

What am I missing here HN?


While I certainly don't want to give this sort of "benchmarking" any credibility at all:

allocating significantly different compute resources for different samples would be at least as laughable (eg. giving Go a dozen cores [per it's defaults] but constraining Python to one worker [per it's defaults]).


If I'm reading this correctly, and he set GOMAXPROC to 1, that means that Go uses one system thread, but it still serves many requests concurrently, as there is an asynchronous IO loop under the hood. Same goes for Node.


Just want to say that this post was inspiring to me. Thanks for writing it in such detail!

Sounds like you have a long and fun road ahead of you with this project.


Trying to grasp the concepts here. But if I understand pipelining correctly, it's a great boost for a single browser making a million requests but won't really help you in dealing with a million browsers firing one request each.

If that is correctly understood then the benchmark is highly artificial.

But yeah, feel free to correct me if I got it wrong.


I was impressed til he revealed most of Japronto is C... so much for learning of some new nifty Python technique :/


Have you tried load testing it with Gatling (http://gatling.io/) which seems to provide more accurate results including potential hicups you can't see with other load testing framework?


Who cares about Hello world benchmark?


Great work. The name looks to be Brazilian PT for "already ready," aka now!


this is a ruse. all the code which does the work in this is in C.


So? The API is still Python. How do think numpy got so popular?


I think the difference here is that numpy is not flaunting its speed as a benefit of using python. They are very open that they were bringing the performance of BLAS and LAPACK to python.


as someone else further down this thread said "It matters because all the heavy lifting is done by C and has little if anything to do with Python itself."

I'm getting slammed for my previous comment!


Couldn't get this to work on osx, initially got an error from strip in src/picohttpparser/build - which was fixed by running `strip -x libpicohttpparser.so` which got `python setup.py develop` to work.

But then running the demo app resulted in:

  ImportError: dlopen(/tmp/japronto/src/japronto/protocol/cprotocol.cpython-35m-darwin.so, 2): Library not loaded: libpicohttpparser.so
    Referenced from: /tmp/japronto/src/japronto/protocol/cprotocol.cpython-35m-darwin.so
    Reason: image not found
Didn't bother after that


Exactly, so you need to manually copy src/picohttpparser/libpicohttpparser.so to /usr/local/lib


Doesn't LD_LIBRARY_PATH work on a Mac?


It's DYLD_LIBRARY_PATH or something on macOS.


It's worth considering Python asyncio as well for this type of work https://magic.io/blog/asyncpg-1m-rows-from-postgres-to-pytho...


The current situation of Python's performance is really pathetic. Thanks for the project, we need to make Python great again otherwise the future is for Go, Elixir and Rust.


People are obsessed with performances, while my experience is that for the enormous vast majority of projects you could run on something 10 times slower than Python.

Actually, I have a friend of mine running a streaming website. He banks 15k€ a month for 400k daily users.

Python perf has zero inpact on his project. Python ease of use and maintainability means he, as a terrible (really terrible) programmer have been able to keep the site up and running for 5 years.

I worked in africa for a while. Needs for Python perf ? Zero. Needs for code that is understandable and easy to write. A LOT.

Moved to geographes working on SIG. Needs for Python perf ? Hell the GIS engine does the heavy work, they don't care. But they don't undestand jack about programming and need things done. Python let them to do it cleanly.

Or I moved to embeded programmers. Hardcore C coders that are using Python to test their boards. Need for Python perf ? They don't even know what you are talking about. For them perf is assembly. They want something that is fast to write, with 10000 libs available.

As a trainer and dev I teach Python on a monthly basis. JS training. Jeez, where do I begin ? If you let them do something more than a one-file stuff, they are lost. Python ? They are project ready in no time.

Data mongers ? They have numpy and databases.

Now myself ? I code in Python all the time. I've seen many slow sites. It's always either because of the DB queries or the static assets. Never about the server side language, be it Python, JS, Ruby or PHP.

Coding in Go, Elixir or Rust for task where my productivity is more important than the machine output makes no sense. Especially in a world where an unmetered VPS cost 3€/month (https://www.ovh.com/us/vps/) with 2 GB RAM and 10 GB SSD. I can just take 100 of those and build a freaking skynet without making my company accountant blink.

We are not google or facebook (and they do use Python BTW. Fb just announced they are migrating to 3 massively).

So yes, more perf is nice. I would literally pay to help having better perf because it's always a good thing. But stop pretending you need it so bad, because you would be magically the 0.1% that actually do on all the forum post where people are saying they are.

It's a nice to have.

If you want to switch to something else, just admit you are following a trend. Dev always are. They love what's new and shiny. Nothing wrong about it. But those "Python is slow I'm out" posts are ridiculous. Dropbox can say that. Bank of America can say that.


Yeah. I mostly agree with this. Developer productivity is the best thing to optimize for. These are the most important metrics for that, in my opinion: Code readability, good editor w/ intellisense, fast test suite, solid library ecosystem.

It would be interesting to see an analysis done of developer productivity across languages to find out which language provides the best over all value for new and existing codebases.


On the higher level languages I've used (PHP, JS, Ruby, JS, Python, Basic, Bash, Perl), Ruby and Python are clear winners : easier to read, organize and promote more solid code.

But Python provided me with a better experience than ruby because:

- it's very useful outside of the Web as well. Ruby is still very much tied to RoR.

- the Python community value documentation, tests and dislike too much magic, like monkey patching.

- as a trainer, making people setup Python + venv and make them use it is wayyyy easier than rvm.

Then second to them would be JS. It's a terrible language, but it's the only one on the browser platform which is amazing. Having code that natively manipulate GUI, sound and video in a few line is the most motivating experience you can get when you start coding.


Perl would be my bet.


Perl fail on big code base, particularly because the cognitive load of reading it is high.


I'm not really buying into that argument. I have seen Python code that is unreadable, actually a lot worse than 90s PHP/Perl. Take a look at modern Perl please.


Perl 6 is better, but it still offloads a lot of the logic parsing to the human brain.

Create an array or string, take the last element and display it:

<A B C>[* - 1].say;

The same in Python:

print(['A', 'B', 'C'][-1])

The same in Ruby:

print ['A', 'B', 'C'][-1]

The first version is a good example of what perl is good at : it's short to write. The Python version is more verbose.

But, your brain has to do the following:

- Scan the element to realise A, B and B are strings. Perl can have strings with "A", but in this here you can write it without it. So when you read, you need context to know what's going on. The Python version make it clear it's a string.

- Then figure out the that the separation is on the spaces. Python put comas, so you don't have to parse negative space in your head.

- the you have the "*" you need to integrate to get context of what element you use. Then you get the ";" which is just noise.

Small elements like those add up very, very quickly in a code base. Every time your brain has to gather context before understanding what it reads, you add cognitive load. You can't scan the code.

PHP, Perl and JS by nature have a higher cost that Ruby or Python. The laters have been design with readability in mind. You have less tricks, they are more boring. And easier to read.

Of course you have many other small issues with perl, like figuring out why you use .say but .WHAT in uppercase. Why the shell by default on some linux setup have a broken history. Why you can leave out the .say most of the case in the shell, but sometime if you don't use it nothing is displayed. That you have to install the rakuto package on Ubuntu but run the perl6 command to start the shell, etc.

All in all, Perl is a neat technical achievement, but has not been very ergonomy oriented.


Is Russian the hardest language to read for someone that doesn't know Russian?

C type brackets and semicolons are second nature to a lot of people.

Perl has a lot of built-in stuff that makes a lot of sense when you have learned it.

Your example in Perl 5:

  print qw/A B C/[-1];
I instantly know that qw has to do with a list of words, and the qw// can be replaced with qw(), qw[], qw##, etc to make the code the most readable to you.

Can be written like this if you like the commas and quotation marks:

  print qw/'A', 'B', 'C'/[-1]; #Will warn
  print(('A', 'B', 'C')[-1]);
'say' instead of 'print' adds a newline:

  say qw/A B C/[-1];
Print the three first elements:

  print qw/A B C D/[0..2];
Assign elements to variables:

  my ($a, $b, $c) = qw/A B C/;
Print the number of elements and assign it to a variable at the same time:

  print my $num =()= qw/A B C/;
You CAN very easily write "boring" readable Perl if you want to. But the power is there when needed. One letter variables, insane uncommented regexes, removing all whitespace etc is generally not how modern Perl programmers code.

Take a look at some code in Python that is done by someone less disciplined (or intentionally obfuscated). It is as unreadable (or worse) as the obfuscated C and Perl from the late 90s.


Of course you can get used to anything. But even if you get use to run, walking is always easier.

Using so much context to convey meaning implying a lot of processing even if you are train for it, so you have an additional load. Using so many symbols to convey meaning as well.

By design, Perl is harder to read if you write it the way it's intended to be used.

Just like Russian is harder to read than english because the language has some concept like letter effecting other letters than implies you brain need to backtrack.

Eventually, if you become a master at it, you won't feel it. But you can only master so many things in life. Most of the stuff, at best, we become very good at it. But very good means you still use your brain CPU. There is no way around it.

That's why language design is important, and to me, Perl failed this big time.

It's not just about habit either. Take Erlang for example. It is a very different language. It's hard to reason about. But it's not very hard to parse in your head because it's quite consistent and regular in the way it unfolds.

Erlang has other issues of course and I wouldn't recommand it over Perl for non highly // related tasks.

But yes, Perl has been written to be clever. This is not a quality in language design. Handy yes. Intelligent yes. Clever, not so much.


"Using so many symbols to convey meaning as well." Actually as a visually oriented person I find it very helpful. And logical:

  $ - a single thing
  @ - a list of things
  % - a hash (associative array, key-value pair list)
  \ - a reference to any of the above

  my @fruits = qw/Apple Pear/;
  my $company = $fruits[0];
  
  # A reference can be declared directly, this is a \%
  # with a nested \@. Encodes nicely to JSON as an example.
  my $hash_ref = {
      fruits => [
          'Apple',
          'Pear'
          ]
      };

  my $apple = $hash_ref->{fruits}[0];
"By design, Perl is harder to read if you write it the way it's intended to be used."

What way is it intended to be used? Perl 4 style mixing Perl and shell script?

Perl 5 has had (roughly) yearly releases since 1994 and is now at 5.24. Quite a bit has changed in that time and the language has evolved and improved.

There are modern non blocking web frameworks, great database support, several modern OO systems etc.

And Perl still works great shell script style (you can choose to code in a clean style) and for the most parts there is great backwards compatibility.

Perl is definitively about getting the job done. Hence the slogans "make easy things easy and hard things possible" and "there is more than one way to do it".

For someone well versed in Perl something like idiomatic Python becomes a prison where it is hard to get the job done because you can't freely express yourself.

I think it's not just philosophy or language design but people are wired differently. And Perl is a lot of fun for creative individuals.

Lastly, I am looking forward to the next years when the Python community has to clean up because of its current popularity with novice programmers. That's what Perl has been through and that old novice code is also where most of the undeserved reputation come from.


Perl 6 is a different language by the way. It is interesting but hardly production ready if you ask me. It would be better if it was called something else but BDFL you know.. Capital letters stuff come from C# since one of the main Perl 6 devs was strong in that..


Exactly. Every language can scale the web servers easily. Every language cannot scale the SQL server easily.

Highest level language that's maintainable wins. That's why Python is incredibly popular. The real snag to Python is the flippant disregard to backwards compatibility even though it's already massively popular. Fragmenting the ecosystem.


>> Actually, I have a friend of mine running a streaming website. He banks 15k€ a month for 400k daily users.

Wow. Which streaming website is this?


My experience is that people that scream the loudest about "performance" are often also people that make micro-benchmarks and derive ridiculous claims from them, and also seem to often be the people that otherwise don't know much about "performance", or be in need of it.

For the by far largest share of web applications, easily everything to the 99th percentile, performance, in the "sense" of this benchmark, is wholly irrelevant.


>For the by far largest share of web applications, easily everything to the 99th percentile, performance, in the "sense" of this benchmark, is wholly irrelevant.

Amen to that. Performance is incredibly low on the list of concerns for most businesses I've worked for. Getting code out quickly and making it work reliably are usually priorities #1 and #2.

Performance is a "sexy" problem though.


Not so sure that I would want to use Python. For anything.


Some very petulant pythonista downvoters here. Thinking that no one should have a different opinion than themselves. Actually that profile fits the language... only one way... sad people.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: