Hacker News new | past | comments | ask | show | jobs | submit login
I am Doing HTTP Wrong (pocoo.org)
107 points by admp on May 13, 2012 | hide | past | favorite | 26 comments



Oh wow. I did not expect this to hit hackernews considering the slides are totally not intended to be consumed without notes. Let me just quickly point out a few things before I write up something longer about this thing.

First things first:

> I'd like to add that maybe Fireteam should have not chosen HTTP as a transfer protocol for games, where state is everywhere and HTTP is stateless. Although I don't feel qualified to suggest alternatives.

We have very good reasons for supporting HTTP but it's not the only protocol we're implementing. As pointed out on one slide: HTTP is better supported than raw socket connections on certain devices we care about.

> Trying to over-abstract network communication is one of the classic mistakes in distributed system design.

We're not doing that. If you look at the slides we're 100% HTTP compliant, we don't make some fancy schmancy protocol on that just uses HTTP as a transport layer but does not embrace it. I have the "pleasure" to use SOAP in the past and this is not what we're after.

We're basically treating HTTP as an implementation detail for JSON/urlencoded etc. REST. and we do that by sprinkling a ton of meta information around in the code. Once the request was dispatched to the function it's returned to the valuable information. One layer higher all the HTTP logic happens.

The general workflow however is fundamentally different from what I did in the past with Flask or how Django works which also explains the title.

I will write something up because I think from the slides alone you get a completely wrong impression about why this is cool :-)


Although this wasn't the entire message of the original - I'm intrigued by the general idea of committing to a line-in-the-sand approach on streaming vs. buffering. I've been writing a fair number of file format parsers lately for various audio data, and one of the basic architecture questions that comes up is whether to go for the streamed implementation, when the format allows it, but my use case clearly doesn't need it (and probably won't be subjected to production-grade tests at any time in the near future).

Streaming is pretty much universally the more complex option - and when you use a streamed implementation in a buffered fashion, it's wasted engineering effort. Treating data as just data, with the associated ability to copy it around, will always be simpler than maintaining protocol state.

So after reading this and reflecting on my own process, I've got a more solid feeling about how to make that call: Support streaming if a certain application demands it. Otherwise, do the simple thing. Do not try to entangle both implementations together, because they have different intentions. Write both independently and refactor the reusable parts instead.


Interesting! This seems like something that might work well with 3.0's function annotation support with the ability to specify that the handler's input argument conform to either a the streaming or buffered interface.

As a point of curiosity about best practices, how are you handling the meta information right now?

It feels like decorators provide the cleanest solution currently, but I've always wondered if there are alternatives.


> Interesting! This seems like something that might work well with 3.0's function annotation support with the ability to specify that the handler's input argument conform to either a the streaming or buffered interface.

I thought so, but the annotations in Python 3 are largely useless. They would work if you could forward the signatures through a chain of decorators, but ther is no guarantee for that. I did play around with Python 3 but it's actually not in any way better for what we're doing here.

> It feels like decorators provide the cleanest solution currently, but I've always wondered if there are alternatives.

Decorators require a routing system that can resolve routing "dependencies". We're using the Werkzeug routing which was designed to do that. The basic idea is that it does not matter in which order you define the URL rules, the routing system figures out the ordering.

If you don't have that, you would have to move the definitions to a central file (akin to urls.py in Django) to make this work because you need to define the ordering.

We also experimented with having the meta information in JSON files but it was not really worth it. Having it next to the function makes it easier to maintain it.


I'm always eager to hear whatever Armin has to say, but I feel so-so this time, probably because I didn't fully understand or his point was not clearly communicated.

The premise is that you can serialize request and response objects only after you made sure they are fully independent of the underlying protocol.

However, when the protocol is in fact underlying, you cannot fully get rid of its nature, and, for example, resources are still going to be requested by their URLs. The middleware he develops already takes care of the details like the protocol headers and encoding. I believe that even with the best efforts and intentions, HTTP will remain implied.

It is also not clear to me what something like Flask would gain from buffered and serializable objects. At first sight it is a performance penalty rather than a practical improvement, isn't it? I'd like to hear more.

Edit: I'd like to add that maybe Fireteam should have not chosen HTTP as a transfer protocol for games, where state is everywhere and HTTP is stateless. Although I don't feel qualified to suggest alternatives.


I have to say that I was equally puzzled by this post the first time I read it. Perhaps having some background information about what exactly he's working on might have helped.

On a second read, it looks like the meat of the post is about decoupling the implementation of your API from the underlying network protocol. Unless I'm missing something, that would be MVC.

Controllers are completely oblivious to the fact that HTTP is being used behind the scene. The routing mechanism takes care of parsing the request, deserializing the request body, instantiating the appropriate controller and invoking the correct action passing the deserialized objects as parameters. Once the controller's action has completed and returned its result, the View is responsible for serialising the response to HTML, JSON, XML or whatever serialisation format you use, and for setting the correct HTTP status code and headers.

So still puzzled, even after a second read.


Trying to over-abstract network communication is one of the classic mistakes in distributed system design.

Remember the old "RPC" idea from the 1980's? Despite a huge amount of effort to optimize remote procedure calls, nobody could ever get an RPC to be within orders of magnitude of a real procedure call in speed.

The RPC concept only became mainstream in the 2000's when people had given up on performance; SOAP and POX systems use inefficient XML and JSON serializations and are integrated into http stacks that were designed to use something else. These systems flourished in the 2000's because interoperability, not performance, was the driver.

Now, if you're in an AJAX, Flex or Silverlight environment, you've got the whole asynchronous communications issue -- you can't paper that over with an abstraction layer, you've got to build the whole application around the fact of async comm if you want to build something that really works.


Abstraction gone wrong almost always come down to wanting to abstract for the wrong reasons. RPC was all about pretending from the client side that accessing a remote resource was the same as calling a local method. This was and still is non-sense and a prime example of leaky abstraction.

My understanding of the OP's post is that he is talking about server-side issues rather than client-side though. Others have pointed out that he's talking about the design of a specific python library or framework that I'm not familiar with - I'll need to read up on this first.


We're not abstracting HTTP away, we're just looking at the whole thing differently. In a way our code is RESTful and considers HTTP a way to implement REST. XMLRPC and other things throw all of HTTP over board with the exception of the entity body which removes all the advantages and reasons to use HTTP.

Our stuff gives us an unified API and it's fully embracing HTTP.


One of the direct consequences for instance is that the first WSGI request object that starts consuming for data is the one that ends up being the only one that can have it.

Protocols like HTTP that lack length-prefixing of messages have some unfortunate implications for streams of messages.

1. The reader of the first message must buffer. Since it doesn't know how much it should read, it may buffer too much, and consume some of the second message from the stream.

2. The reader of the second message needs access to the stream, plus that extra data.

3. If your stream is a non-seekable file descriptor like a socket, there's no way to rewind, or somehow put that extra data back onto the original stream. This forces you to either (a) convey the stream + extra data, which is less natural or (b) create a new stream for the second message reader, copy the extra data to it, and then copy (in userspace!) data from the original stream into the new stream.

Take a look at any cgi or fastcgi implementation and you'll see something like 3b.

It sucks, and I think makes a good argument for using length-prefixed messages in your internal protocols (or whatever ones you can control).


I am not sure whether Ronacher realizes it, but this is exactly the motivation behind Pump [1], an HTTP abstraction that takes a different approach than WSGI. It was criticized a lot by HN [2] and by Ronacher himself [3] but I’m glad that he understands my point of view now.

[1] http://adeel.github.com/pump/manual.html [2] http://news.ycombinator.com/item?id=2810373 [3] http://lucumr.pocoo.org/2011/7/27/the-pluggable-pipedream/


To be honest I never understood why wsgi conflated buffered and streaming modes in one API. Having two separate specialized sets of API makes much more sense.


Yeah all the warts in WSGI are basically because of streaming. They should have been 2 APIs. Nobody likes start_response, and I think that was because of streaming. Although I suppose you could still return a 3-tuple (status, headers, body) and still have the body be iterable, so maybe I'm wrong.


There isn't any notion of 'buffered mode' in WSGI. It is just left up to the app to buffer as needed before yielding. Middleware and servers are not supposed to perform buffering. So I don't see how the two are conflated. How am I misunderstanding you?


> There isn't any notion of 'buffered mode' in WSGI. It is just left up to the app to buffer as needed before yielding. Middleware and servers are not supposed to perform buffering. So I don't see how the two are conflated. How am I misunderstanding you?

The problem is that you would have to support streaming because you don't know what the inner WSGI application is doing. Our stuff does not work on top of WSGI because of that.

At the end of the day it is implemented as a WSGI application but it only uses WSGI for talking to the server.


I don't think I have any criticism for your approach on your project.

I was trying to comment on the idea that WSGI somehow conflates buffered mode with streaming mode. How is that even possible when WSGI does not even have a notion of buffered mode?

If you want your app or even your framework to buffer, you can just do that in the app layer. I can't see that the server interface should have additional moving parts just to do what is cleanly doable in the app layer.

I can only imagine this being a disappointment if someone wanted to use WSGI middleware to carry out buffering. But if (edit: internal) WSGI compatibility is not a high priority, then why would you want to use WSGI middleware for this (generally pretty messy)? Do it in your framework or in your app.

So I do not understand the complaint about the 'conflation' of buffering and streaming. I think it is appropriate to leave the buffering decision in the app layer. And it is also reasonable to use WSGI as just a way to talk to a server (what else is it really for? Obviously not to offer a big fat servlet API...)

If you want to do separate buffering and streaming APIs in your app layer then that could be a smart way to reduce complexity but I don't see why this is some sort of complaint about WSGI.


I'm missing something: why is Armin mentioning that "slides are useless without the talk" if the link is to a text with no reference of a talk?


Sounds like we may be getting another name or acronym to add to the long list of failed technologies that have already tried to promote a similar idea.

We already have ONC RPC, DCE, Java's Remote Method Invocation, Jini, CORBA, SOAP, and .NET Remoting, among many others. Does this family of failure really need to grow any larger?


No, this is about how the libraries that bind to HTTP don't actually match HTTP's model of operation.

I've banged on this drum in a couple of cases myself with framework authors, in some cases even right at the beginning of the framework's life, and generally hit a brick wall. HTTP is not a request/response protocol. It certainly once was back in HTTP/1.0, but with pipelining and reusing connections and now websockets (and soon SPDY's server push/hint), it's a streaming protocol that has a common use case where it is used for request/response, and that requires a completely, completely different API than a protocol that is truly request/response. It is a lot easier to build a streaming base that has a special case for request/response than it is to build a request/response base and then hack in bizarre conceptually-impure bullshit for streaming, but every web framework I can find is the latter, if indeed they don't simply punt entirely on streaming because they've written it so thoroughly out of the API. (I'm vaguely aware of some that are actually sensible but all in languages I don't know and haven't gotten to yet, so I'm not sure if they really are built on a sensible base or if it merely raises the bizarre crap to API-blessed status.)


Can you describe this further? Reading this guy's post it sounds like he's complaining about the exact opposite problem: that the web framework is designed for streaming and, despite that I agree with you that specializing request/response out of that should be easy, he claims that this seems to make it awkward or impossible for him to have his request/response model.


> We already have ONC RPC, DCE, Java's Remote Method Invocation, Jini, CORBA, SOAP, and .NET Remoting, among many others. Does this family of failure really need to grow any larger?

Just that what we're doing is nothing of that. It's plain old HTTP with JSON. The implementation on the server side is the difference.


Could someone add a little more context to this please?


I'd guess this impetus for this post came about after this meeting at pycon

http://kennethreitz.com/the-future-of-python-http.html


I will write something up later, unfortunately there is no recording from the talk and the slides by themselves are entirely useless.


Thanks. I'm glad you showed up on the thread!


How about latest buzz websockets?




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: