BGP is one of those things that make the internet fragile yet resilient at the same time.
Fragile in that any person who controls an Autonomous System can advertise routes and if your neighboring AS'es are not configured to filter out stuff it will just keep propagating. Imagine a government deciding to just reroute traffic to their data center maliciously. Obviously a good chunk of the internet will go WTF and blacklist the route but not before you get a deluge of data.
In the same way it is easy to route around traffic. Congestion in the midwest? No problem. Send the traffic down to Texas. All this is built on network admins trusting each other for the most part.
I'm a little hazy on the technical details of BGP routing - am I correct in thinking that if you're a first-class BGP citizen, you can just advertise yourself as handling others' traffic and that's it, it just comes?
That's pretty much my understanding of it, though "it just comes" is a bit of a simplification.
Edit (clarification):
- BGP has no authentication so anyone can advertise 'anything' (still has to be a valid address).
- "it just comes" isn't entirely accurate as the changes propagate outwards through your peers (in p2p fashion). I'm not sure what happens technically when two networks are advertising themselves as serving a particular IP though (I have a fuzzy idea, but don't know how the edge-cases would be resolved).
Well I imagine it would only be a portion of traffic. I wouldn't think a router in the UK would capture any US to US traffic through this method, for example. Is that what you mean?
BGP hops are actually AS (autonomous system) numbers, which are not closely correlated with routers or distance.
If an ISP in Los Angeles transits five autonomous systems to reach an ISP in New York, but four to reach an ISP in London, and the London ISP suddenly starts advertising the New York ISP's IP addresses, oops, your packets are now headed to the UK.
Great care is taken to avoid bizarre physical routes for packets, but it still happens constantly. One DSL connection I used in Santa Clara had a propensity for routing packets through Seattle to get to Dallas for a while.
I added some clarification, but that's the general idea. The routers update their routing tables based on the advertisements they get from/through peers networks (as I understand it).
From the perspective of a router trying to route packets, you basically make choices based on what your peer networks are advertising that they can (directly or indirectly -- e.g. route through) service.
[Disclaimer: This is all stuff that I "know" but I've never worked directly with BGP so I'm open to correction.]
So, in summary, when I connect to an IP the only assurance I have (putting aside the application layer e.g. HTTPS) that I'm connecting to the right box is the fact that I made it to some machine with an interface configured to that address? Oh dear.
It is useful to remember that IP is connectionless protocol. The role if IP is just to shuffle packets around from node to node (where many of the nodes are not the source or final intended destination of the packets) in a fairly simplistic way.
> ... to some machine with an interface configured to that address
Technically, the other endpoint does not even need to have an interface configured with that address. You can quite easily configure a box to send replies for any packets that happen to end up to it.
> Oh dear.
There is a good reason why IPsec (etc) was invented.
The problem with end-to-end crypto is that we often think of its security properties mathematically and neglect its practical performance. Obviously this is increasingly not true (Heartbleed probably did more to educate the world on crypto than anything else in history) but if you think of crypto as what it so often turns out to be - something waiting to be broken in semi-spectacular fashion - then I don't think it's so out of line to wish for additional assurances from complementary systems.
Yes, because your computer doesn't know anything about the destination besides its address. So it's not possible for your computer to verify anything else about the host.
Well yes, that's obviously true, which is why it's (idealistically speaking) important to verify the routing mechanism.
Of course most of this is mitigated by end-to-end crypto but given that we see all too frequently how fallible that can be, this topic remains of interest. I mean if crypto fails and leaks your private key (a la heartbleed) and it falls into the hands of an attacker who can hijack some BGP routes then that attacker is potentially in a very powerful position. We've seen BGP hijacking by spammers needing clean IPs in the past, so this isn't a totally implausible situation.
Top tier providers "peer" with each other. They share routes via BPG with their routing "peers". When you know an AS only is authoritative for XYZ, you can ACL them to only be authoritative for XYZ.
For the most part, yes. If you advertise a route it will be accepted. Weather or not traffic flows over the link depends on on the BGP path selection algorithm.
It's common for one AS to advertise routes for locations not within their network. It's called transit.
In a standard situation, a transit network would announce your own routes as well as your customer routes, to your peers, transit or other customers. Peer to Transit, Transit to Transit or Peer to Peer should never be done.
It's meant as no criticism but "should" always worries me in contexts like this. I find that often the reason it's "should" as opposed to "will" is because there's potentially dangerous human input somewhere in the process - as appears to be the case here.
It's hard to be comfortable when this is true of systems as important as those which route the internet or PKI, for example, because it's impossible to know what might happen next. Perhaps it's erroneous to take the "structure" in "infrastructure" literally but in the context of the internet that word is becoming increasingly misnomered in my mind.
> For the time being, we have quarantined the Medellín data center
> and disabled connectivity with Internexa.
Does this imply that it was CloudFlare's trust of Internexa's announced BGP routing which caused or contributed to the outage? Did CloudFlare redirect traffic that it should have handled internally to Internexa? If so, isn't it more appropriate for CloudFlare to prioritize its own routes over those claimed by external parties?
I would have anticipated that Internexa's routes would have affected routing by third party networks (eg: Internexa claims it handles traffic for CloudFlare's IP range, and Level3 redirects all that traffic to them) which isn't something that CloudFlare can do much about other than notify Internexa and those third parties of the problem and hope they resolve it themselves. Having not worked directly with BGP, I'm sure I'm misunderstanding something and would appreciate any additional clarification.
edit: Incidentally, this is the kind of scenario which prevented me from using CloudFlare recently. I wanted to only CNAME our production web site to CF's systems which is something they only offer with the $200/month Business plan and not with the $20/month Pro plan. Otherwise, you have to delegate ALL of your DNS for the entire domain to CloudFlare. As one user in the comments says in response to how someone could have worked around the outage:
"That [bypassing CF] would be a good idea, except cloudflare.com
and control panel was inaccessible during this period too, so not
sure how this could have been done..."
I really hope they revisit their policy of not allowing Pro customers to CNAME individual sites to CloudFlare. Putting all your eggs into CloudFlare's basket limits the ability to mitigate around these kinds of issues.
>>Does this imply that it was CloudFlare's trust of Internexa's announced BGP routing which caused or contributed to the outage? Did CloudFlare redirect traffic that it should have handled internally to Internexa? If so, isn't it more appropriate for CloudFlare to prioritize its own routes over those claimed by external parties?
"This downtime was the result of a BGP route leak by Internexa, an ISP in Latin America. Internexa accidentally directed large amounts of traffic destined for CloudFlare data centers around the world to a single data center in Medellín, Colombia. This was the result of Internexa announcing via BGP that their network, instead of ours, handled traffic for CloudFlare. This miscommunication caused a flood of traffic to quickly overwhelm the data center in Medellín. The incident lasted 49 minutes, from 15:08UTC to 15:57UTC."
The problem wasn't that cloudflair thought their IPs were somewhere else, the problem was that the rest of the internet[1] thought cloudflair lived in Medellin (somewhere else) and not with cloudflair.
[1] Complete hyperbole, but "huge swathes"[2] of the internet had the wrong idea.
[2] "The exact impact of the route leak to our customers’ visitors depended on the geography of the Internet. Traffic to CloudFlare’s customers sites dropped by 50% in North America and 12% in Europe. The impact on our network in Asia was isolated to China. Traffic from South America was also affected as data centers there had to cope with an influx of traffic normally handled elsewhere."
I understand what you quoted (refer to what I wrote in the second paragraph). If the problem is with the rest of the internet, why did CloudFlare quarantine the Medellín datacenter? To use an analogy, if your company's upstream phone provider redirected calls to your phone number to Siberia, what good does it do for you to quarantine Siberia?
There is an obvious solution, but it's an enormous undertaking.
Each level of delegation must be cryptographically validated. Route announcements not signed using a certificate to which authority for that block has been delegated must be rejected.
That would work but as they say the devil is in the details.
For example, are we checking CRLs & would this slow propagation (and what if the server is down), what about countries unable to get CA generated certificates due to political embargos (on either their money or businesses are just restricted from doing business with them e.g. Iran, NK, etc), it would grant the US government even more control over the internet (as now they control many root DNS servers and root CAs) which they could use for military or political purposes, and so on.
I'm not saying that the concept doesn't have merit. BGP is quite evidently deeply flawed. However this solution has so many gotchas, question marks, and complexity to it you really need to dig deep down into the proposal before knowing if it is even a good plan.
You could sign that you trust your next-to-last hops to deliver traffic to you but transitive delivery isn't guaranteed and not every chain of valid hops is actually workable as a route capacity-wise. A completely signature-valid route with near 100% packet loss isn't much better than an undeliverable leak.
The entirety of all possible AS paths don't need to be signed to make a difference. Even just signing the last 1-2 ASs would stop most of the big screwups.
I spent a good bit of time trying to figure out why GAuthify had brief downtime and do a post-mortem. Luckily I'm read most CF blog posts here on HN and saw this. Is Cloudflare planning on having any real-time notification system for things like this? I'm sure it'll save a lot of headache trying to figure out what happened especially if you don't read the cloudflare blog.
Its very likely that I and especially the other non-hn folks could have missed this all together (checking cloudflare for an issue is likely my last check list item due to its amazing record).
Yes. Without going very deep into what exactly happened here, the other NOC is the only network with control over the route announcements and so you get on the phone.
No, aiui, the trust model is the same. If a peer announces routes to you, all you can do is:
- throw away "martians" (routes to known-reserved/bad parts of IP space)
- throw away stuff you know they shouldn't be announcing (your stuff basically)
other than that, I think you have to trust them when they say that they have a magic route to IP 16.0.0.1 which is shorter than any other you know about. (Since it might be true).
My understanding is that it would not do anything to help prevent deliberate attacks on BGP. But I do think it may help limit the damage of accidental BGP misconfigurations since the routing tables are less confusing and crowded.
Why don't we use cryptography to guard against false routes? If each IP range were signed-for, then I think a relatively simple protocol could be used to document which networks a network could route for.
Fragile in that any person who controls an Autonomous System can advertise routes and if your neighboring AS'es are not configured to filter out stuff it will just keep propagating. Imagine a government deciding to just reroute traffic to their data center maliciously. Obviously a good chunk of the internet will go WTF and blacklist the route but not before you get a deluge of data.
In the same way it is easy to route around traffic. Congestion in the midwest? No problem. Send the traffic down to Texas. All this is built on network admins trusting each other for the most part.