Hacker News new | past | comments | ask | show | jobs | submit login

This essentially became the laser of death the other day and lead to cascading failure which eventually brought down our system. That's why I'm posting this.

Very few people know about this and it's really scary. I'm happy people are voting this up to increase some awareness of this.

Potential workarounds: You might think disabling `proxy_next_upstream timeout` will do, but that will also disable connection timeout retry which is not what you want!

Increasing `proxy_connect_timeout` is not an option, because then you risk filling up too many connections in the nginx instance if the upstream server swallows SYN packets or whatnot.

The real workaround: Use haproxy. Serioously.




We've hit 3 problems with nginx:

1. Exactly this, we had mystery double trades from our clients and it took us a long time to realise it was nginx assuming we timed out and routing traffic to the next server

2. It doesn't do health checks. When a server goes down it will send 1 out of every 8 real requests to the down server to see if it responds. Having disabled resubmitting of requests to avoid the double trade issue above this means when one of our servers is down, 1 out of every 8 requests will have an nginx proxy error which is significant when you have multiple API calls on a single page

3. This isn't something I've personally hit so can't explain the nitty gritty but it's something one of my coworkers dealt with: outlook webmail does something weird where it opens a connection with a 1GB content size, then sends data continually through that connection, sort of like a push notification hack. Nginx, instead of passing traffic straight through, will collect all data in the response until the response reaches the content size provided in the header (or until the connection is closed). I don't know if nginx is to blame for this one or not, but I do feel that when I send data through the proxy, it should go right through to the client, not be held at the proxy until more data is sent.

HAProxy also solved our issues and is now my go-to proxy. Data goes straight through, it has separate health checks, and it better adheres to HTTP standards. It can also be used for other network protocols which is a bonus.


Whilst Nginx doesn't do healthchecks, they are available in Nginx Plus. I do appreciate that it is a charged for product, but it has a number of strong features over and above the OSS version and of course support (who are very responsive indeed).


3. is the reason why NGinX is the recommended proxy in front of webapps with scarce parallelism (for example Ruby with Unicorn; see http://unicorn.bogomips.org/PHILOSOPHY.html for an explanation) when "slow clients" are to be expected. NGinX is protecting the webapp from blocked workers by slow clients and Outlook Webmail seems to behave just like one. I don't know by heart how to tune this behavior if one wants to avoid it but this property is the main reason we use NGinX.


That's… unique - and wrong - spelling of the name. (Pet peeve of mine, people spell my app's name in all sorts of bizarre ways too.)


This sounds like something else. In the outlook case their servers they seem to use the connection as a stream (which is actually valid, although not really supported by browsers outside of the event-stream class), where the server only writes little chunks of data of a time. But the server there not hindered from writing by a slow client - it simply has not more data to write at that point of time.


Regarding 3, buffering behavior is highly configurable in nginx.(eg. proxy_request_buffering, proxy_buffering on/off)


Wrote a post where I ran into (and fixed) this problem with streaming uploads through nginx: http://killtheradio.net/technology/nginx-returns-error-on-fi...


It's only as of 1.8 that you can disable buffering of incoming requests though. Just a few month iirc.


Nginx can also be used for other protocols, see stream block.


I didn't know that. Thanks for the correction.


"but that will also disable connection timeout retry which is not what you want!"

Why is this not what you want? Are you using the reverse proxy as a load balancer to multiple servers? Otherwise, if it's 1:1 proxy (for something like SSL termination) wouldn't having nginx fail/timeout when the server does be acceptable?


> Are you using the reverse proxy as a load balancer to multiple servers?

That's extremely likely.

Somebody with an Nginx reverse proxy is probably using it for high availability, load balancing and static files cache, probably at the same time. This is what it is good for.


Using NGINX as a reverse proxy is an extremely common scenario. In fact that's what I currently run (with a support subscription), but will be evaluating moving to HAProxy if their tech dept does not provide a way to resolve this issue (which is actually a very big deal for me, and I was not aware)


This is Owen from NGINX. We have a workaround for this behavior (https://gist.github.com/thresheek/2fa6479ffb7aca710493), and are tracking a separate new feature request. Please submit a support ticket or send me an email, owen@nginx.com.


Thank you, I will be opening the ticket tomorrow. Regarding the gist you just posted, it seems this simply disables proxy_next_upstream for any and all non idempotent requests.

However what would really need to happen is to only disable proxy_next_upstream if data has been written or read from the backend(preferably configurable by backend or location for either of those two options). Right now you basically lose the redundancy in non-idempotent requests, and immediately return the error. Or maybe I read the configuration incorrectly.


Yes, I have multiple servers behind nginx. It's very common.



No. Requests will still be retried.


What about `proxy_next_upstream off;`?


If I temporarily bring an upstream application down for upgrade I want nginx to retry the next upstream. This is a very common scenario when doing reverse proxying. Disabling next upstream breaks this.




Consider applying for YC's first-ever Fall batch! Applications are open till Aug 27.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: