Hacker News new | past | comments | ask | show | jobs | submit login
Mitigating DoS Attacks with Nginx (nginx.com)
205 points by Garbage on Dec 23, 2015 | hide | past | favorite | 57 comments



I love nginx. But most of these are mitigation for DoS only targeting the web server, not DDoS. They're explaining how to throttle a single IP. The whole point of DDoS is to distribute the attack to bypass mechanisms that throttle single IP's (plus to amplify).

Also the DDoS attacks that we've been hit with actually target our uplinks by saturating them with traffic, not our services. We have a 1 Gbps port and the last DDoS we were hit with was over 20 Gbps, which is a relatively small one. The mitigation we used was to have our hosting facility get their upstream provider to route the traffic through a layer 7 DDoS mitigation filter provided by an external company. It worked wonderfully.

These are cool features, but when your link is saturated it doesn't matter what a daemon listening on a port does.


And once IPV6 gets up to steam, welcome to a world of people with millions of "addresses" to attack from. One advantage of IPV4 is that it was accidentally pretty granular.

That'll be a while, of course, but already we see attackers with access to a tremendous number of unique IP addresses in the IPV4 space.. they'll have many orders of magnitude more soon.


> welcome to a world of people with millions of "addresses"

... in the same /64 range for the most part, so as easy to block/filter/limit as one IPv4 address.

You risk inconveniencing people who are assigned just a few addresses because you potentially end up blocking many of them due to the actions of a few on the same subnet, but you can't be held responsible for hosts/ISPs doing IPv6 wrong.


You risk inconveniencing people who are assigned just a few addresses because you potentially end up blocking many of them due to the actions of a few on the same subnet, but you can't be held responsible for hosts/ISPs doing IPv6 wrong.

Not to mention that some ISPs do carrier-grade NAT specifically due to the limitations of IPv4, so blocking a single IP(v4) might affect multiple people as well.


> ... in the same /64 range for the most part, so as easy to block/filter/limit as one IPv4 address.

Possibly even easier, because of IPv4 deaggregation.

(Because of IPv4 address scarcity, many providers have discontinous IPv4 address space. This is mostly a problem for the core, because it leads to much larger BGP routing tables.)


CoughDigitalOceanCough


Lots of people say so, but that doesn't make it a useful model of IPv6.

When you think about IPv6, start with /64 networks. That's the basic unit. What a DSL customer gets from the ISP is a /64 network, not some number of individual addresses. The customer may two, ten or 2047 of then, it doesn't matter. The point is "one owned DSL subscriber = one /64 network".

Just like with IPv4, some people have larger allocations. But the basic allocation unit is a /64 network.


Here is a sweet trick for dropping traffic with NGINX. When I mean drop, I mean, don't send a response, literally terminate the connection.

    location = / {
        if ($http_user_agent ~* foo|bar) {
            # return non-standard (NGINX only) 444 code
            # closes the connection without sending a response header
            return 444;
        }
    }


If I'm not mistaken, the above implies accepting a TCP connection (3-way handshake) and a request from the attacker; then you look up the contents of the user-agent header and decide to stop replying based on its contents.

This will get you nothing in a typical DDoS scenario, but thanks for sharing as it may come handy for other situations.


Filtering like that also has a rather big performance impact on the nginx side and make things worse under moderate, but not DDoS-y, traffic conditions.


I'd love to be able to "drop" a connection without sending a FIN to the attacker. Leave them hanging on a timeout.


You can't solve a traditional DDOS attack at the destination. No matter what you do with the incoming flood, the fact is, your bandwidth is full of the attack requests, leaving no room for legitimate traffic. Do what you like to the attack requests, but the pipe is still full.

To mitigate a DDOS, you need to go upstream to your network providers and filter out the traffic before it reaches you.


True, but not all DDoS involve bandwidth saturation, since if the site has a decent pipe, those are harder to achieve. Resource depletion based on forcing the server to perform heavy tasks is common.


Nginx is great, and I absolutely love it. However, if you're under a true DDoS attack, the box is going to be completely bogged down at the kernel level way before traffic is even close to being accepted and processed by NGINX. So this post is not very useful against a decent magnitude attack.


For all but the most basic attack, you really want a script putting these IP's into iptables. Using the application itself to block them still requires the connection setup/teardown resources to be used, as well as the application itself.


I can see how doing it lower level would be more efficient. Are there any scripts anyone could recommend as a starting point?


fail2ban can watch the nginx logs for throttling and/or blocking messages and add iptables rules for you.

I haven't read this all the way through, but on a cursory glance it looks reasonable:

https://easyengine.io/tutorials/nginx/fail2ban/


Urgh logwatching actively pains me these days. So much waste string parsing what was originally binary data anyway.

I'm starting to think that we need some agreement where instead of logs, we just get apps to emit a stream of protocol buffers and a format string for the messages and data.

Which does make me wonder if you couldn't LD_PRELOAD something which replaced fprintf and the like...


"So much waste string parsing what was originally binary data anyway."

"So much" is pretty imprecise. How much waste do you believe string parsing incurs in this case?


You can go the systemd route if you want, or just know the fact that parsing strings works, is mostly reliable, and really isn't as much overhead as people make it out to be. How many system profiles have identified it as a problem?


Of course if you can write the filter, it must be simple enough to script changes to iptables based on recent packet statistics directly.

But often times you don't even know what domain is being connected to at the network layer. You need output from the process holding the connection key. And you want very clear separation from that task...


Streams of plain text are what UNIX was built on. If you want binary APIs, look outside the *nix family.


Or modernize the applications. Throw out the ad-hoc formats and parsers, replace them with machine-readable equivalents.

For example, systemd finally provides a logging system that allows structured logging with key/value fields.


You can also use ulogd and iptables accounting to count hits per IP or per subnet if you want something more light weight. If you have a non-trivial system you can probably produce a netflow stream from your router which gives you the accounting without performance penalty.


I wonder how big of a performance penalty we're talking about here, in any case. Systems are so fast, and text processing is so cheap, I doubt anyone is going to find that tailing the log and grepping out some strings is going to impact their system in any meaningful way. It would require tremendous request rates to be notable, and the bottleneck in such a scenario would be far, far, up the stack (probably database, followed by web app, followed by web server, followed by a hundred other things on the system, with tailing the log somewhere way down at the bottom).

I'm not opposed to other ways of solving it, but I think the belief about how much resources it takes to process a text log file that people are expressing in this thread is at least a few orders of magnitude off.


That's neat, thanks for the fail2ban pointer. I wonder if there is something that would tail nginx/apache logs and compute aggregate request counts by response code, error code for monitoring/alerting. Not looking for log file collection itself, just aggregates.


Just setup fail2ban and it works great, thanks.


You can some stuff around iptables recent:

http://blog.zioup.org/2008/iptables_recent/


I suppose that's where Lua would come in handy?


I personally have a access_by_lua script that counts accesses per ip and applies a (very generous) rate limit If the limit is reached the user will just be presented with a page explaining you hit a rate limit and a button that runs some javascript to verify you're not a bot which in turn whitelists the user This strategy has worked really well so far - havent been a target of too bad things yet though. Its a very good and cheap way to go for smaller sites though


Can a bot dedicated to your site just ping back whatever the javascript would have done anyway to cancel the rate limit?


It /could/, the next step would be a captcha or something harder for bots to solve - haven't had to go that far yet though. But I usually only have to deal with script kiddies who rent out a botnet, enter a url and click the "attack" button


If someone wants to take you down they'll just bombard you with traffic, and this won't help you there. Having been the victim of several DDoS attacks over the years, almost all of them haven't been on the application layer.


Cloudflare, for example, is good at preventing non-application-layer DDOS attacks but for application-layer they can't help much.

This blog post is a good starting point for the kinds of strategies you need to fill that gap in protection.


We tried it, setup was easy, but our response time for dynamic content increased by 150 millis so it didn't work for us. It's worth noting that their model is different from CDN - they proxy all of your traffic through their own servers.


That's not atypical for a CDS these days; fastly and cloudfront can work the same way, e.g. https://aws.amazon.com/cloudfront/dynamic-content/. How else do you expect them to cache and serve your dynamic content?


I don't recomend it, but you could use different domains for static vs dynamic.


Some organisation do just that. But having your entire site behind CDN does have additional benefits besides mitigating DDoS attacks. Such as allowing you to handle other kinds of service outages more effectively (eg busy pages). They can offer you analytics, allow you to separate different traffic under the same domain name (sometimes handy for SEO), etc. Some CDN providers also do some cool stuff like enable IPv6 on your site even if your origin servers are only running IPv4 - but that's more a niche time saving feature than some "must have" deal breaker.


I like analytics if the price is less than 50ms per request. We use GA and statcounter for analytics anyways. Charts that show how much static traffic you saved are nice, but with bandwidth close to free, it's not a big deal. CDN analytics need to be better than GA at which point I will not only trade off latency but convert to premium all the way.


> I like analytics if the price is less than 50ms per request. We use GA and statcounter for analytics anyways.

GA would cost you more than 50ms too. More so than a CDN controlled analytics. But obviously that cost with CDN is an upfront latency rather than the more hidden cost with background loading of GA. So arguably GA's cost is less "bad" than the CDN's cost.

Personally speaking, I prefer the CDN approach as it produces web pages with a lower browser footprint which I think does improve the user experience (though I'm not implying that GA give a bad user experience!).

GA does give a greater breadth of information than CDN analytics though. Often that's the real deal breaker since analytics is usually driven by project managers / clients rather than by the developers.

> Charts that show how much static traffic you saved are nice, but with bandwidth close to free, it's not a big deal.

Oh it's definitely a big deal if you serve high traffic websites ;) I've spent hours working against those kind of reports on projects that were seeing 100k concurrent users. I will say that these graphs aren't so much about judging what bandwidth can be saved but more about judging what requests can be offloaded. The idea being the fewer calls to your origin servers you need to make, the more resources you have available in your farm for generating the dynamic content (dynamic content you cannot cache!). This also has the potential to save you money in server costs (depending on how they're licenced) as well as improving site performance at peak times.

> CDN analytics need to be better than GA at which point I will not only trade off latency but convert to premium all the way.

Indeed. GA will likely always be better from an account management perspective. But as a devops engineer, CDN analytics fulfils my needs. The great thing is that we have a multitude of options we have available :)


Unfortunately, CDN analytics is no alternative to GA so it's either/or kind of choice for us. Hence, full proxy type of CDN means that latency is additive.


I wasn't aware the point of our discourse was for me to sell your business additional CDN services ;)

In all seriousness though, it might help to look beyond the very specific set up of your present company when asking about why other people opt for other CDN services. But for what it's worth, I've not experiences the same degree of latency issues with either Cloudflare nor Akamai that you've described. And I have done extensive load tests.


I'm really interested in knowing if other HN members have similar data points on this topic. I tried Cloudflare one year ago and had the same issue (response time increased a lot).


Curious about your experience with CloudFlare. If interested I'm jgc @ cloudflare com.


Hi John,

I remember listening to your talk at dotGo 2014 :-)

I tried CloudFlare in November 2012 (3 years ago, and not 1 year ago as I wrote in my previous comment). At that time, the origin server was hosted by Typhon in France. I remember that after having enabled CloudFlare, the latency was significantly increased. I haven't kept the specific timings, but to give you an idea, the response time was like 100 ms without CloudFlare and 500 ms through CloudFlare.

That said, it was a long time ago and I can guess things have changed a lot since. So I did a new test today. The origin server is hosted by DigitalOcean in Amsterdam. The median response time from my machine is around 100 ms. After enabling CloudFlare, I cannot see a significant difference in response time. The median response time, and the distribution of response time, looks very similar.

I guess that during the last few years you have expanded your network and your connections with the major hosting providers (Amazon, Google Cloud, DigitalOcean, Linode, etc.). Maybe it explains the difference between today's test and 3 years ago?

In general, is it useful and/or recommended to use CloudFlare in front a fully dynamic service, for example a HTTP-JSON API, with no static content (no images, no stylesheets, no scripts), and thus no need for the CDN feature?


Yes. A lot has changed since then. Including a whole lot of stability and expansion. I think you'd have a different experience today.

In general, is it useful and/or recommended to use CloudFlare in front a fully dynamic service, for example a HTTP-JSON API, with no static content (no images, no stylesheets, no scripts), and thus no need for the CDN feature?

We do have lots of customers who do that. Two reasons: Railgun and Security. Railgun gives speedups for the JSON because of the ability to diff the boilerplate JSON. Security for APIs is of course important and clearly attackers like to go after APIs.


I have difficulties imagining what can I gain from the JSON "diffing" made possible by Railgun: could you provide an example?

About security, what are the specific security features you're thinking of?


What layers were they on? Lots of traffic means network?

Just hoping you can give a concrete example, as I'm not super familiar with this stuff.


AWS said at re:Invent 2015 that about 15% of DDoS attacks on AWS were application layer [1]. Some were state exhaustion at 20% (SYN floods, etc.) but the vast majority (65%) were "volumetric" attacks, meaning layer three like DNS reflection and SSDP reflection.

[1] https://youtu.be/Ys0gG1koqJA?t=172


Exactly the kind of information I was looking for; thanks.


This is really helpful and practical, when an app start getting more popular and people start writing scrapers and things like that sometimes they can mistakenly send an insane amount of requests, so this surely help on this cases since it make more since to solve that on the level of the server not the app itself.

About DDoS I don't think there's any cheap solution for that...


I really wish Nginx didn't think it was so important to their business model to hold hostage the health check.

That's a dealbreaker for me at their prices.


Nginx, Varnish, etc. set criminal precedents by adding basic features to their paid offering. From cache purging to basic health checks - this arm-twisting ruins the experience! Instead of providing support and provide nice dashboards for paid users, they put the basics behind a paywall. I'm not sure how much they make from their Plus offerings, but I bet it's less than what Elastic collects in a much clever way!


The Tengine module is available for compiling into the core nginx codebase. You just have to roll your own binary though

https://github.com/yaoweibin/nginx_upstream_check_module


Take a look at Tengine, from Taobao. It's based on nginx.


It is, just like OpenResty [0], but it lags severely behind recently - not sure why.

Edit: Linked to Tengine in [1].

[0]: http://openresty.org/

[1]: http://tengine.taobao.org/


All the characteristics of DDoS attacks listed seem trivial to defeat... e.g.: Constantly spoof IP addresses, etc..




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: