I love nginx. But most of these are mitigation for DoS only targeting the web server, not DDoS. They're explaining how to throttle a single IP. The whole point of DDoS is to distribute the attack to bypass mechanisms that throttle single IP's (plus to amplify).
Also the DDoS attacks that we've been hit with actually target our uplinks by saturating them with traffic, not our services. We have a 1 Gbps port and the last DDoS we were hit with was over 20 Gbps, which is a relatively small one. The mitigation we used was to have our hosting facility get their upstream provider to route the traffic through a layer 7 DDoS mitigation filter provided by an external company. It worked wonderfully.
These are cool features, but when your link is saturated it doesn't matter what a daemon listening on a port does.
And once IPV6 gets up to steam, welcome to a world of people with millions of "addresses" to attack from. One advantage of IPV4 is that it was accidentally pretty granular.
That'll be a while, of course, but already we see attackers with access to a tremendous number of unique IP addresses in the IPV4 space.. they'll have many orders of magnitude more soon.
> welcome to a world of people with millions of "addresses"
... in the same /64 range for the most part, so as easy to block/filter/limit as one IPv4 address.
You risk inconveniencing people who are assigned just a few addresses because you potentially end up blocking many of them due to the actions of a few on the same subnet, but you can't be held responsible for hosts/ISPs doing IPv6 wrong.
You risk inconveniencing people who are assigned just a few addresses because you potentially end up blocking many of them due to the actions of a few on the same subnet, but you can't be held responsible for hosts/ISPs doing IPv6 wrong.
Not to mention that some ISPs do carrier-grade NAT specifically due to the limitations of IPv4, so blocking a single IP(v4) might affect multiple people as well.
> ... in the same /64 range for the most part, so as easy to block/filter/limit as one IPv4 address.
Possibly even easier, because of IPv4 deaggregation.
(Because of IPv4 address scarcity, many providers have discontinous IPv4 address space. This is mostly a problem for the core, because it leads to much larger BGP routing tables.)
Lots of people say so, but that doesn't make it a useful model of IPv6.
When you think about IPv6, start with /64 networks. That's the basic unit. What a DSL customer gets from the ISP is a /64 network, not some number of individual addresses. The customer may two, ten or 2047 of then, it doesn't matter. The point is "one owned DSL subscriber = one /64 network".
Just like with IPv4, some people have larger allocations. But the basic allocation unit is a /64 network.
If I'm not mistaken, the above implies accepting a TCP connection (3-way handshake) and a request from the attacker; then you look up the contents of the user-agent header and decide to stop replying based on its contents.
This will get you nothing in a typical DDoS scenario, but thanks for sharing as it may come handy for other situations.
Filtering like that also has a rather big performance impact on the nginx side and make things worse under moderate, but not DDoS-y, traffic conditions.
You can't solve a traditional DDOS attack at the destination. No matter what you do with the incoming flood, the fact is, your bandwidth is full of the attack requests, leaving no room for legitimate traffic. Do what you like to the attack requests, but the pipe is still full.
To mitigate a DDOS, you need to go upstream to your network providers and filter out the traffic before it reaches you.
True, but not all DDoS involve bandwidth saturation, since if the site has a decent pipe, those are harder to achieve. Resource depletion based on forcing the server to perform heavy tasks is common.
Nginx is great, and I absolutely love it. However, if you're under a true DDoS attack, the box is going to be completely bogged down at the kernel level way before traffic is even close to being accepted and processed by NGINX. So this post is not very useful against a decent magnitude attack.
For all but the most basic attack, you really want a script putting these IP's into iptables. Using the application itself to block them still requires the connection setup/teardown resources to be used, as well as the application itself.
Urgh logwatching actively pains me these days. So much waste string parsing what was originally binary data anyway.
I'm starting to think that we need some agreement where instead of logs, we just get apps to emit a stream of protocol buffers and a format string for the messages and data.
Which does make me wonder if you couldn't LD_PRELOAD something which replaced fprintf and the like...
You can go the systemd route if you want, or just know the fact that parsing strings works, is mostly reliable, and really isn't as much overhead as people make it out to be. How many system profiles have identified it as a problem?
Of course if you can write the filter, it must be simple enough to script changes to iptables based on recent packet statistics directly.
But often times you don't even know what domain is being connected to at the network layer. You need output from the process holding the connection key. And you want very clear separation from that task...
You can also use ulogd and iptables accounting to count hits per IP or per subnet if you want something more light weight. If you have a non-trivial system you can probably produce a netflow stream from your router which gives you the accounting without performance penalty.
I wonder how big of a performance penalty we're talking about here, in any case. Systems are so fast, and text processing is so cheap, I doubt anyone is going to find that tailing the log and grepping out some strings is going to impact their system in any meaningful way. It would require tremendous request rates to be notable, and the bottleneck in such a scenario would be far, far, up the stack (probably database, followed by web app, followed by web server, followed by a hundred other things on the system, with tailing the log somewhere way down at the bottom).
I'm not opposed to other ways of solving it, but I think the belief about how much resources it takes to process a text log file that people are expressing in this thread is at least a few orders of magnitude off.
That's neat, thanks for the fail2ban pointer. I wonder if there is something that would tail nginx/apache logs and compute aggregate request counts by response code, error code for monitoring/alerting. Not looking for log file collection itself, just aggregates.
I personally have a access_by_lua script that counts accesses per ip and applies a (very generous) rate limit
If the limit is reached the user will just be presented with a page explaining you hit a rate limit and a button that runs some javascript to verify you're not a bot which in turn whitelists the user
This strategy has worked really well so far - havent been a target of too bad things yet though.
Its a very good and cheap way to go for smaller sites though
It /could/, the next step would be a captcha or something harder for bots to solve - haven't had to go that far yet though.
But I usually only have to deal with script kiddies who rent out a botnet, enter a url and click the "attack" button
If someone wants to take you down they'll just bombard you with traffic, and this won't help you there. Having been the victim of several DDoS attacks over the years, almost all of them haven't been on the application layer.
We tried it, setup was easy, but our response time for dynamic content increased by 150 millis so it didn't work for us. It's worth noting that their model is different from CDN - they proxy all of your traffic through their own servers.
That's not atypical for a CDS these days; fastly and cloudfront can work the same way, e.g. https://aws.amazon.com/cloudfront/dynamic-content/. How else do you expect them to cache and serve your dynamic content?
Some organisation do just that. But having your entire site behind CDN does have additional benefits besides mitigating DDoS attacks. Such as allowing you to handle other kinds of service outages more effectively (eg busy pages). They can offer you analytics, allow you to separate different traffic under the same domain name (sometimes handy for SEO), etc. Some CDN providers also do some cool stuff like enable IPv6 on your site even if your origin servers are only running IPv4 - but that's more a niche time saving feature than some "must have" deal breaker.
I like analytics if the price is less than 50ms per request. We use GA and statcounter for analytics anyways. Charts that show how much static traffic you saved are nice, but with bandwidth close to free, it's not a big deal. CDN analytics need to be better than GA at which point I will not only trade off latency but convert to premium all the way.
> I like analytics if the price is less than 50ms per request. We use GA and statcounter for analytics anyways.
GA would cost you more than 50ms too. More so than a CDN controlled analytics. But obviously that cost with CDN is an upfront latency rather than the more hidden cost with background loading of GA. So arguably GA's cost is less "bad" than the CDN's cost.
Personally speaking, I prefer the CDN approach as it produces web pages with a lower browser footprint which I think does improve the user experience (though I'm not implying that GA give a bad user experience!).
GA does give a greater breadth of information than CDN analytics though. Often that's the real deal breaker since analytics is usually driven by project managers / clients rather than by the developers.
> Charts that show how much static traffic you saved are nice, but with bandwidth close to free, it's not a big deal.
Oh it's definitely a big deal if you serve high traffic websites ;) I've spent hours working against those kind of reports on projects that were seeing 100k concurrent users. I will say that these graphs aren't so much about judging what bandwidth can be saved but more about judging what requests can be offloaded. The idea being the fewer calls to your origin servers you need to make, the more resources you have available in your farm for generating the dynamic content (dynamic content you cannot cache!). This also has the potential to save you money in server costs (depending on how they're licenced) as well as improving site performance at peak times.
> CDN analytics need to be better than GA at which point I will not only trade off latency but convert to premium all the way.
Indeed. GA will likely always be better from an account management perspective. But as a devops engineer, CDN analytics fulfils my needs. The great thing is that we have a multitude of options we have available :)
Unfortunately, CDN analytics is no alternative to GA so it's either/or kind of choice for us. Hence, full proxy type of CDN means that latency is additive.
I wasn't aware the point of our discourse was for me to sell your business additional CDN services ;)
In all seriousness though, it might help to look beyond the very specific set up of your present company when asking about why other people opt for other CDN services. But for what it's worth, I've not experiences the same degree of latency issues with either Cloudflare nor Akamai that you've described. And I have done extensive load tests.
I'm really interested in knowing if other HN members have similar data points on this topic. I tried Cloudflare one year ago and had the same issue (response time increased a lot).
I remember listening to your talk at dotGo 2014 :-)
I tried CloudFlare in November 2012 (3 years ago, and not 1 year ago as I wrote in my previous comment). At that time, the origin server was hosted by Typhon in France. I remember that after having enabled CloudFlare, the latency was significantly increased. I haven't kept the specific timings, but to give you an idea, the response time was like 100 ms without CloudFlare and 500 ms through CloudFlare.
That said, it was a long time ago and I can guess things have changed a lot since. So I did a new test today. The origin server is hosted by DigitalOcean in Amsterdam. The median response time from my machine is around 100 ms. After enabling CloudFlare, I cannot see a significant difference in response time. The median response time, and the distribution of response time, looks very similar.
I guess that during the last few years you have expanded your network and your connections with the major hosting providers (Amazon, Google Cloud, DigitalOcean, Linode, etc.). Maybe it explains the difference between today's test and 3 years ago?
In general, is it useful and/or recommended to use CloudFlare in front a fully dynamic service, for example a HTTP-JSON API, with no static content (no images, no stylesheets, no scripts), and thus no need for the CDN feature?
Yes. A lot has changed since then. Including a whole lot of stability and expansion. I think you'd have a different experience today.
In general, is it useful and/or recommended to use CloudFlare in front a fully dynamic service, for example a HTTP-JSON API, with no static content (no images, no stylesheets, no scripts), and thus no need for the CDN feature?
We do have lots of customers who do that. Two reasons: Railgun and Security. Railgun gives speedups for the JSON because of the ability to diff the boilerplate JSON. Security for APIs is of course important and clearly attackers like to go after APIs.
AWS said at re:Invent 2015 that about 15% of DDoS attacks on AWS were application layer [1]. Some were state exhaustion at 20% (SYN floods, etc.) but the vast majority (65%) were "volumetric" attacks, meaning layer three like DNS reflection and SSDP reflection.
This is really helpful and practical, when an app start getting more popular and people start writing scrapers and things like that sometimes they can mistakenly send an insane amount of requests, so this surely help on this cases since it make more since to solve that on the level of the server not the app itself.
About DDoS I don't think there's any cheap solution for that...
Nginx, Varnish, etc. set criminal precedents by adding basic features to their paid offering. From cache purging to basic health checks - this arm-twisting ruins the experience! Instead of providing support and provide nice dashboards for paid users, they put the basics behind a paywall. I'm not sure how much they make from their Plus offerings, but I bet it's less than what Elastic collects in a much clever way!
Also the DDoS attacks that we've been hit with actually target our uplinks by saturating them with traffic, not our services. We have a 1 Gbps port and the last DDoS we were hit with was over 20 Gbps, which is a relatively small one. The mitigation we used was to have our hosting facility get their upstream provider to route the traffic through a layer 7 DDoS mitigation filter provided by an external company. It worked wonderfully.
These are cool features, but when your link is saturated it doesn't matter what a daemon listening on a port does.