Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: How is DDoS protection implemented?
260 points by elephant0xffff on May 13, 2018 | hide | past | favorite | 54 comments
The big services (Google, Cloudflare, etc.) provide DDoS attack mitigation (and seem to succeed), but details on their tactics are rare (at least I did not find in-depth information on that).

I guess to make this work well you have to do classification (regular request vs. malicious) on several protocol layers and then reroute or drop packets accordingly. But how does that prevent severe service degradation - you still have to do some kind of work (in computation and energy) on the listening side or can fat edge-servers just eat that up?




I was lead developer on Arbor Network's DDoS product in the early 2000s (I left in 2005 to start Matasano Security). My information on this is surely dated, but people seem to still be using the same terminology now as then.

You can break down DDoS into roughly three categories:

1. Volumetric (brute force)

2. Application (targeting specific app endpoints)

3. Protocol (exploiting protocol vulnerabilities)

DDoS mitigation providers concentrate on 1 & 3.

The basic idea is: attempt to characterize the malicious traffic if you can, and or divert all traffic for the target. Send the diverted traffic to a regional "scrubbing center"; dirty traffic in, clean traffic out.

The scrubbing centers buy or build mitigation boxes that take large volumes of traffic in and then do heuristic checks (liveness of sender, protocol anomalies, special queueing) before passing it to the target. There's some in-line layer 7 filtering happening, and there's continuous source characterization happening to basic network layer filters back towards ingress.

You can do pretty simple statistical anomaly models and get pretty far with attacker source classification, and to track targets and be selective about what things need to be diverted.

A lot of major volumetric attacks are, at the network layer, pretty unsophisticated; they're things like memcached or NTP floods. When you're special-casing traffic to a particular target through a scrubbing center, it's pretty easy to strip that kind of stuff off.


It sounds like we were working on the same problem at the same time and came to roughly the same conclusion (see my sibling comment about eBay's DDOS mitigation system). :)


>"The scrubbing centers buy or build mitigation boxes that take large volumes of traffic in and then do heuristic checks (liveness of sender, protocol anomalies, special queueing) before passing it to the target. There's some in-line layer 7 filtering happening, and there's continuous source'

Where these heuristics done in hardware then? ASICs FPGAs? Could you elaborate what the "liveness of sender" and "special queueing" heuristics are?


Yeah, custom hardware (ASIC/FPGA depending). Liveness is trying to detect things like Slowloris [0], with things like timeouts, SYN cookies (which ask the client to do some minor work), etc.

[0] - https://en.wikipedia.org/wiki/Slowloris_(computer_security)


It was silicon (or, at least, optimized general compute) in the mid-2000s, but who knows anymore? It could all be user land TCP/IP on Linux today. High speed network processing got weird.


It's a mix depending on what market segment you're looking at. I watch it from afar. There's still a lot of silicon use, esp for accelerating TCP/IP or decryption. I also found one recently you all might enjoy with slides on using a GPU:

http://on-demand.gputechconf.com/gtc/2017/presentation/s7468...


Thanks for the link. This is really interesting. Might you know if the talk that accompanied this is available somewhere?


A10 actually lists the number of FPGAs in their mitigation appliances for sizing purposes.

"Select Thunder TPS models have high-performance FPGA-based Flexible Traffic Acceleration (FTA) technology to detect and mitigate up to 60 common attack vectors immediately in hardware — before data CPUs are involved. "


>"High speed network processing got weird."

I was curious about this statement. Can you elaborate, weird how?


Shifted from hardware intensive (ASICs, FPGAs) to software so we can do high-speed packet mangling on commodity hardware. Initially pretty involved with DPDK etc but much easier as of late with XDP+eBPF.

e.g. https://jvns.ca/blog/2017/04/07/xdp-bpf-tutorial/ https://netdevconf.org/2.1/papers/Gilberto_Bertin_XDP_in_pra... https://people.netfilter.org/hawk/presentations/OpenSourceDa...


Not the GP, but I worked in the DDoS space for a spell a few years ago, helping develop the company's 3rd generation product. Their 1st generation was ASIC-based; 2nd generation a manycore CPU (Tilera) running a custom OS mostly written in assembly; 3rd generation used the next generation of that CPU (Tile GX) which provided lots of dedicated highly-parallel network processing hardware (including a programmable coprocessor), some of which was designed following feedback from our CTO.

The Tile GX (including the hardware) was available for general-purpose use from Linux (which we ran), but could also be programmed directly to do lots of packet classification even before the packets got to the CPU and main memory (which we did). The Cavium network processor worked similarly.


What happens when it doesn't work? For instance why does something like Mirai happen? The first D is too D?


Yeah, I don't know. The biggest Mirai traffic spike involved a pretty simply volumetric GRE attack; GRE is its own IP protocol, so I mean it's trivial to filter but also lots of middleboxes won't even forward it in the first place. There was some confusion about how bad the Mirai attack was because the propagation code for Mirai, independent of the DDoS attacks, managed to crash some routers.

It's definitely not the case that all DDoS attacks can be reliably cleaned up in an ISP scrubbing center.


You call up krebs and the FBI and they'll dox/arrest the attacker.


No.


I worked on the eBay DDOS prevention system in the early 2000's. My coworkers filed a patent on part of the system.

https://patents.google.com/patent/US7992192

Once the traffic was detected, the signature was sent to a second system that was a series of hardware optimized for layer 7 packet inspection. The devices were updated with signatures of current attacks, and then checked every incoming packet for that signature. Any packet that matched was parsed for where it was coming from, and then the router was updated to drop traffic from that source for a period of time.

As far as I know, today's techniques are fairly similar, along with just having a whole lot of computers that can absorb the traffic.


What does a "signature" look like specifically, or generally if you can't be specific? Would love to hear about what is actually getting sent to the L7 optimized hardware.


In the early 2000s you could get a long way with just the 5-tuple, some basic aggregation inference, and a RRD histogram. The tricky parts were having the ability to divert and process the traffic once characterized. The actual processing wasn’t that complicated; it just needed way bigger rules than could be fit in a switch TCAM.


Not the GP though I've also worked in the DDoS space. Think fancy regexps (augmented with e.g. fast string search engines, counters, etc.), running inside a protocol-specific interpreter.

At least in the product I worked on, L7 processing was done purely in software. You could probably make hardware to do that but there's not a ton of benefit as you're pretty much constrained by memory bandwidth, not CPU power, once you start looking at anything past fixed headers.

(Our product also performed deep-packet inspection – in fact that was its original function – so the L7 processing was probably a bit more general than DDoS-only products.)


It would look at the http request and break it down by uri, host, parameters, cookies, etc. A signature was some combo of those.

It also did layer 2 and 3 detection and looked for the stuff mentioned below like IP and port and if the 3 way handshake was “normal”. Stuff like that.


>"Once the traffic was detected, the signature was sent to a second system that was a series of hardware optimized for layer 7 packet inspection."

Was this custom DPI hardware or something from a vendor?


It was off the shelf hardware but then the software was customized by my coworker. He had to write some C and assembly for it.


I wonder if someone with more industry knowledge might be able to hazard a guess what kind of off the shelf hardware.

You didn't say VHDL (or maybe this was irrelevant for switching circa 2000?), so I'm really curious what architecture was in use. (That's really all my question is; CPU architecture / system design)


By far the biggest part of attack mitigation in my experience is out-scaling the attack. A well written and configured application stack can handle a decent amount of traffic itself before becoming bogged down processing malicious traffic, but at some point you'll cap out either the application, the NIC, the upstream switch, the router, or the ISP line, if your application is running in just one place. To get around that, huge providers like the ones you listed are heavily multi-homed. This means they announce their traffic routes to the internet from multiple locations, so traffic naturally flows to the closest (hops wise, not necessarily geographically speaking) endpoint.

From there, you can add layers of protection ranging from simple things like blocking traffic that is obviously malicious (TCP flags, port numbers, etc) to more complex things like pattern recognition in both the overall trends of the data and on a per-packet basis. After you've decided with a decent certainty that it's not malicious traffic, you pass it off to the actual backend service.

For systems that are designed to scale horizontally, that may be a neighboring machine (or even the same machine) in that data center. For single-homed backend systems that can't scale horizontally to multiple locations, that "clean" traffic is then sent via some mechanism (possibly a GRE tunnel, possibly just raw internet traffic to a secret IP) to the backend service. Depending on the methodology used, the filtering may be a true bidirectional proxy, in which case the reply goes back to the scrubber and then out to the original sender, or it may be a unidirectional proxy, in which case the reply goes directly back to the original sender.

All attack mitigation works in some way like this, whether it be by designing your application from the beginning to be multi-homed and able to run in multiple datacenters, or by installing a separate mitigation layer that scrubs attack traffic.


From my personal low-end server perspective (which has stood up to simple attacks from Russian IPs), I have the following:

1. Static page caching (in RAM ideally) - dynamically generated content will kill you quicker than anything else, especially calls to a database. WordPress is very easy to kill in it's default state.

2. Kill high frequency requests from the same location as quickly as possible (make sure your response is less than the data they send you - ultimately you want their systems to be busier than yours). You want to free the port up as quickly as possible.

3. Move anybody you can identify as a legitimate user (credentials, low frequency requests) out to another server if possible.

Firewall wise, my system sits on the cloud, so usually high frequency traffic is the only issue I have to deal with. Interested to hear any advice of other people here.


For #2 -- how do you "kill" high frequency requests? By ignoring them?


Yep. Add the source address (or some more specific yet easily computed identifier) to a table that is checked early in the network path (in hardware if possible).

Or, if you want to be fancy, "tarpit" them (complete TCP handshake and then ignore, forcing attacker to actually commit resources), but apparently that's of questionable value these days. [1]

[1] https://en.wikipedia.org/wiki/Tarpit_(networking)


Interesting, thanks.


In addition to @colanderman's suggestion I also do this on an application level - although not everyone will be able to do this.


Or by banning the source, especially repeat offenders.

I find that using a combination of nginx's limit_req and fail2ban over nginx logs is an easy measure that already goes a long way in handling basic types of DoS, like clients producing an abnormally high volume of requests.


It's essentially still the same thing: having the bigger pipe.

A distributed DoS attack has many sources, and when including botnets on infected consumer systems you have legitimate source addresses/devices as well. This defeats most "blackhole the source" options as the source is the same thing as legitimate visitors/customers.

So for a DDoS that simply tries to saturate your link(s) and where you can't blackhole the source, the only 'protection' is having more bandwidth than the attacker(s) has (or have).

After that a few other things come in to play, attack-traffic from legit sources may have a pattern, so while you can't blackhole upstream, you can prevent traffic with a pattern to get to the actual application/site. This is relevant in cases where you might suffer from application overload before link overload. If your link can handle the DDoS traffic but your application can't, you're still screwed. (and with application I include load balancers, databases, storage etc.)


Fastly had a good presentation about ddos trends and how they mitigate them at one of their recent Altitude conferences. Video at https://vimeo.com/212305516 and the mitigation stuff starts around 14:45


The easy answer: Load balancing

Anycast is the most important piece of the puzzle, allowing you to route traffic to a bunch of different locations.

Let's say you can handle 10 Gbps at a single location. If the traffic is evenly split between 100 destinations then you can have a single IP that can handle 1 Tbps of traffic.

Of course, the setup behind these IPs might vary a lot, and one might even use DNS load balancing in front of the IPs.


Load balancing is in place for all but the most trivial sites, though, so what you're really saying is horizontal scaling. Which is fine but expensive compared to pattern based mitigation techniques.


Actually very few services uses this kind of load balancing without relying on another service that does it for them.

If you're going to find patterns to decide what to block then you first need to make sure you receive all the traffic. If a single entry point can't handle it, well, then you need to load balance the incoming traffic.


I don't think this is about regular load balancing. DDoS is coming from a large number of infected machines, but they can't control how their traffic is routed. By using anycast you're splitting the machines that are used to attack into small groups that your pattern based mitigation or even your regular reverse proxies can handle.


CDN networks are well equipped for this because of their large geographical footprint. If they can terminate "bad" requests closer to their origin then they don't add up nearly as badly for the application server.


One "trick" to know is that transit links are generally billed as the higher of inbound or outbound traffic. If you have a service which is unbalanced and pushing out a lot of data (like most hosting services), your inbound is thus essentially "free" up to a very high volume.


This might give you a nice idea of how HAProxy provides the feature. https://www.haproxy.com/blog/use-a-load-balancer-as-a-first-...


I'm working on DDoS protection at Cloudflare. AMA

We try to publish most of what we do, the more obvious links:

https://blog.cloudflare.com/how-cloudflares-architecture-all...

https://blog.cloudflare.com/meet-gatebot-a-bot-that-allows-u...

https://blog.cloudflare.com/the-root-cause-of-large-ddos-ip-...

https://blog.cloudflare.com/memcrashed-major-amplification-a...

https://blog.cloudflare.com/syn-packet-handling-in-the-wild/

https://blog.cloudflare.com/reflections-on-reflections/

https://blog.cloudflare.com/say-cheese-a-snapshot-of-the-mas...

https://blog.cloudflare.com/the-new-ddos-landscape/

https://blog.cloudflare.com/unmetered-mitigation/

https://blog.cloudflare.com/introducing-the-p0f-bpf-compiler...

And maany more.

Also two talks:

https://idea.popcount.org/2016-02-01-enigma---building-a-dos...

https://idea.popcount.org/2015-11-16-black-hat-eu---defendin...

> But how does that prevent severe service degradation

It doesn't. You DROP the most specific thing you can. To avoid collateral damage we are able to do "Scattering" (move client across IPs with the hope the attack won't follow), and for example apply the controversial limits only in certain geographical areas (anycast network allows this).

> you still have to do some kind of work (in computation and energy) on the listening side

Yes. BPF for L3 works like charm. Read on XDP.

> or can fat edge-servers just eat that up?

Yes and no. You have to specifically optimize, whatever you do probably won't make Apache or IIS work under DDoS. Most vendors use "scrubbing centres", when they can have small number of beefy dedicated servers. We didn't find this architecture sufficient though, so in our case edge servers do handle the load. But we do spend time on tuning the servers and our applications.


One way these companies mitigate DDoS attacks is by being huge. If you have a small house w/ one entrance, there's no great way to manage 1000 people trying to get through the front door. If you have a huge house w/ dozens of entrances, dealing with 1000 people trying to get in the building is much more manageable :)

From https://en.wikipedia.org/wiki/DDoS_mitigation:

One technique is to pass network traffic addressed to a potential target network through high-capacity networks with "traffic scrubbing" filters.


But if all those people get in and all try to get into the single elevator in the building it will be a problem :p (link DoS vs. application DoS)


Not an expert but I would guess at least the following: traffic filtering, peer traffic filtering by (possibly dynamic and automated) agreement, traffic classification and anomaly detection (DNS/TCP/HTTP(S)/etc.), routing different clients (based on origin AS and/or geolocation) to different IPs through DNS, hosted web frontends, web-level active user challenges, potentially dynamically altering the advertisement of routes, and by charging so much money the moment you need to use them that buying extra bandwidth and netblocks isn't an issue for them. Probably some of them also drop to high-overhead traffic reduction modes which can expand frontend IPs and DNS response segmentation, dropping DNS TTLs and spinning up new proxy systems in order to better filter out robotic attackers. Many also probably create/profile/buy various browser fingerprinting techniques and may have a library of non publicly disclosed approaches available for additional mitigation during high bandwidth attacks. Oh yeah, and replicating a static cache as a cheap means of degraded service provisioning.


We've open sourced and talked about a lot of how we do DDoS mitigation. Details are in the following blog posts:

No Scrubs: The Architecture That Made Unmetered Mitigation Possible - https://blog.cloudflare.com/no-scrubs-architecture-unmetered...

Meet Gatebot - a bot that allows us to sleep - https://blog.cloudflare.com/meet-gatebot-a-bot-that-allows-u...

How Cloudflare's Architecture Allows Us to Scale to Stop the Largest Attacks - https://blog.cloudflare.com/how-cloudflares-architecture-all...

Kernel bypass - https://blog.cloudflare.com/kernel-bypass/

SYN packet handling in the wild - https://blog.cloudflare.com/syn-packet-handling-in-the-wild/

How to achieve low latency with 10Gbps Ethernet - https://blog.cloudflare.com/how-to-achieve-low-latency/

How to receive a million packets per second - https://blog.cloudflare.com/how-to-receive-a-million-packets...

Introducing the BPF Tools - https://blog.cloudflare.com/introducing-the-bpf-tools/

BPF - The Forgotten Bytecode - https://blog.cloudflare.com/bpf-the-forgotten-bytecode/

Introducing the p0f BPF compiler - https://blog.cloudflare.com/introducing-the-p0f-bpf-compiler...

Single RX queue kernel bypass in Netmap for high packet rate networking - https://blog.cloudflare.com/single-rx-queue-kernel-bypass-wi...


You can see a DDoS attack live for a demo of how it works: https://www.youtube.com/watch?v=FIQUUFVE6tU

They are also doing a webinar (apologies for the link) so you can see exactly how it's implemented: https://www.incapsula.com/blog/want-to-see-what-a-live-ddos-...


BGP Flowspec is commonly used ISPs. Sadly, they won’t extend that to their customers.

They’d rather sell yet another service rather than supporting open protocols.


There are two! There are two ISP's that allow customers to send Flowspec to their backbone! https://twitter.com/flockforward/status/909090299724664832


I use remote black hole routine announcements to the upstream ISPs to filter source or destination addresses from traversing the congested link.


I wonder where you are, very vaguely speaking. (Just in case people might be near (downstream of) you unaware that they could take advantage of these announcements.) Maybe this is a tricky question (because of domain nontriviality, or because of privacy), which is fine.

PSA: this user's profile definitely deserves reading, everyone go look


I wonder if anyone has ever tried counter attack. The downside is in turn DoS the origin, which often are victims like infected host in a botnet. Double-edged sword. But it would be very interesting to see how quickly one could defeat the attack.

I also wonder why attack often last only a few hours.


1. That would be just as illegal as the original attack; cybersecurity laws have no provisions for self-defense. (It's true that nations are attempting to negotiate clauses like that in international relations, but even if that pans out, it will definitely never be a privilege afforded to individuals)

2. Attack what? It's a distributed DoS, the calls are coming from all over. You mean going after every node sending traffic? What would "attacking them" even mean? It's not like you can shut them down.

3. All those nodes are innocent and being used unknowingly. Attacking them would be both illegal (see point 1) and pretty unethical: you're deliberately aiming at innocents and not the attacker (whom you have no chance of locating). Imagine if you took down a hospital attempting to stop an NTP flood on your dumb blog. Have fun explaining why that was necessary.

"Counter-hacking" sounds cool and sexy, but there are reasons why it is never done.


I know next to nothing about botnets, but i wonder if you could divert traffic from botnets to a fake server that does nothing other than trying to keep the connection open for as long as possible (or being super slow in general) in order to increase the number of open connections from the bot's side, in order to stall it from opening new connections or make it slow in general.


Usually one would setup a sinkhole to divert traffic away from real hosts, but I think to keep these connections running they are usually just sending packets and disconnect. smurf attack and SYN attack are very classic.


> I also wonder why attack often last only a few hours.

Using the botnets costs either money (if you're renting one) or opportunity (if you own one and could be renting it out).




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: