Network Update: Multihomed, Increased Transit, Peering

pjungwir · on Nov 2, 2016

> per-customer VLANs

I am looking forward to that! Linode is my go-to hosting service, but it's a little troubling that anyone in the datacenter can hit your private IPs [1]. On the other hand, maybe it shouldn't matter, and you should always act like the network is compromised. Isn't trusting their private network how Google leaked traffic to the NSA? Still, it seems like a nice improvement that would make compromises less likely.

[1] https://blog.linode.com/2008/03/14/private-back-end-network-...

dimfeld · on Nov 2, 2016

Agreed. Back when I was using Linode I had a crazy setup that used iptables on each server to accept traffic only from my other servers. With benefit of hindsight I might have maintained it differently, but needless to say, it was a bit fragile.

preetamjinka · on Nov 2, 2016

> On the other hand, maybe it shouldn't matter, and you should always act like the network is compromised.

What about cases like AWS's VPCs?

vgt · on Nov 2, 2016

To add color to both your comment and parent's.

Everything at Google Cloud is encrypted at rest and in transit [0]. Any GCE project is essentially a VPC by default, and a global one at that [1] (aka no need to VPN between regions). Traffic between GCE zones/regions never hits public wire by default ,and Google will carry your packet to the nearest Google POP around the world on its private backbone [2].

(work at Google Cloud, but not on networking/GCE)

[0] https://cloud.google.com/security/encryption-at-rest/

[1] https://cloud.google.com/docs/compare/aws/

[2] http://peering.google.com/#/infrastructure

nodesocket · on Nov 2, 2016

Nice, great information. Google Cloud has the networking model right.

bogomipz · on Nov 3, 2016

Whats the data at rest model? LUKs?

Edit. Never mind I see the link.

secure · on Nov 2, 2016

I appreciate how transparent they are about their locations, transit and peering.

I’m looking to replace one of my VPS at digitalocean because of stability issues (need to reboot the VM every couple of months, it just entirely drops off the network apparently).

Linode seems like a good alternative. My criteria for this application are ≥ 1G of RAM, SSD storage, fast RTT to my other VPS, native IPv6 support.

geuis · on Nov 2, 2016

I fucking love Linode. I've been hosting with them for years and over time I've gotten more performance and more data transfer for the same money.

https://jsonip.com is hosted with Linode and supports millions of requests a day. It's been a great home for the service.

grubles · on Nov 2, 2016

Just don't store cryptocurrency there:

https://news.ycombinator.com/item?id=3655137

tbrownaw · on Nov 2, 2016

Don't store it on any cloud service.

swalsh · on Nov 2, 2016

I guess it never occurred to me before, but with the increased amounts of attacks lately it's been near the top of my mind. These guys seem to be throwing around the physical addresses of data centers pretty freely. What is the security of these places like? How decentralized are we really? It seems like a few strategic strikes could deal a devastating blow to our edge infrastructure. I know personally, my servers are only hosted in a single datacenter. The company I work for is in 3 datacenters, but I'm not sure the other 2 data centers could handle the full load for an extended period of time if the primary one was completely taken down.

Granted not as big of deal as power plants etc, but if you're looking for soft targets, it's a scary thought.

jlgaddis · on Nov 2, 2016

These are all very well-known datacenters with several layers of physical security (there are standards and certifications for datacenters). Their locations aren't exactly secret.

Most datacenters have fences/gates, require access cards and/or biometrics to get in and move around inside the building. Once inside, you can only get into your own cages.

It's not like you can walk up, knock the door in with a battering ram, and then have access to everything inside.

aroch · on Nov 2, 2016

>It's not like you can walk up, knock the door in with a battering ram, and then have access to everything inside.

Well, I mean, yes you can. The actual doors/gates used aren't 'milspec' intrusion rated. They're `better-than-home-depot` doors (all steel, steel door frames, reinforced). Cage door are often hilarious flimsy (thing metal sheet/bars).

Certainly breachable by even modestly equipped attackers.

The reason you pick real DCs and not the basement of your fortified house is because there's human security in addition to the physical security measures. Which means, some one will notice if you try to bust down the door

CodeWriter23 · on Nov 2, 2016

And here I thought the reasons to choose a DC over my basement was multi-homing, abundant bandwidth, redundant air conditioning and battery/diesel electrical backup.

aroch · on Nov 3, 2016

We were talking about security...

But yes, it's almost like DCs were built with this in mind. Wh would've guessed...

CodeWriter23 · on Nov 3, 2016

Security in my basement > Security in average data center. Similar quality locked doors. No visitors. Armed response by owner.

preetamjinka · on Nov 2, 2016

> It seems like a few strategic strikes could deal a devastating blow to our edge infrastructure.

Imagine a large meteorite impact :).

vetrom · on Nov 2, 2016

Are they still running a hard-to-audit ColdFusion CMS?

ksec · on Nov 3, 2016

Wanted to know that as well. They were making a major rewrite. Not sure if it is finished yet.

StanAngeloff · on Nov 2, 2016

This is good news for their users, incl. us given the frequency of DDoS attacks lately. There has been hardly a month go by without their status page flagging an incident report involving increased traffic to one of their datacentres as a result of a DDoS attack.

hhw · on Nov 2, 2016

"we now manage our own true service provider network, allowing us to deliver robust and reliable connectivity."

What's needed to combat DDoS attacks is distributed defense. Without their own backbone / private transport links between all of their locations, their network is just a disparate set of data centres and there is no advantage to their having multiple locations, so far as protection from DDoS attacks are concerned.

They also fail to mention what capacity each of the links are. They could be anywhere from 1Gbps to 100Gbps, but I presume they'd mention as a selling point anything 40Gbps and up, so let's assume they're using all 10Gbps links and not 1Gbps to give them the benefit of the doubt. So, they range from 50Gbps (Singapore) to 100Gbps (London) per location.

It's an impressive list to look at in aggregate, but not really that much for any one location in 2016, especially given a company of their size and visibility, when you can rent shared access to a 200Gbps+ botnet for $19.99. https://www.nanog.org/sites/default/files/20161015_Winward_T...

Instead of buying transit from up to 7 carriers per location, when there are starkly diminishing returns after 3 or 4 so far as routing performance is concerned, they should have instead bought higher capacity to each provider (to ensure at least 10Gbps of unused capacity per provider outside of regular legitimate traffic), external DDoS mitigation, or domestic backbone links and turned up more capacity at the LA Any2 (for Asia) and NYIIX (for Europe) to absorb the majority of DDoS traffic which comes from those regions. With up to 7 carriers, they simply have 7x different points of failure each at only 10Gbps, while getting worse deals on transit pricing due to lower volumes with each provider.

dsl · on Nov 3, 2016

You don't need a private backbone to be able to mitigate attacks across multiple locations. I've done it fighting off multi-hundred Gbps attacks and it was never an issue. You can QoS your own intra-site GRE tunnels.

Linode is moving 200-300 Gbps globally. That is about 37.5 Gbps per location, and when you figure in a 20% utilization (because you need to be able to burst)... they have about 300 Gbps of transit per location. Spread across 3-5 carriers I would guess they have 40-100 Gbps from each. Way more than your estimated 10 Gbps.

As far as "routing performance" they appears to be buying from a few Tier 1 networks per location, and a mix of regional Tier 2s. That is in line with best practices. Sometimes to reach the right networks you do need to spin up circuits with multiple Tier 2s, there is no such thing as "diminishing returns" if you are doing traffic engineering properly.

The right way to build networks is to meet your performance needs first and foremost, have enough headroom to grow and serve your customers, and work with your upstreams to manage incoming attacks. An external scrubbing service makes no sense when you can adapt your network as Linode has done so they can easily blackhole targets at their upsteams edge.

I applaud their efforts. This is some smart network engineering.

alexforster · on Nov 3, 2016

> Linode is moving 200-300 Gbps globally. That is about 37.5 Gbps per location.

We're actually moving to 5-10x that figure per location. We aren't playing around!

hhw · on Nov 3, 2016

Moving to suggests you're not actually there yet. What are your aggregate capacities like in each location at present as of this announcement?

hhw · on Nov 3, 2016

Why the heck would you send traffic out transit links using GRE tunnels, when it's more cost effective and you have more control over your own private backbone links? At the scale I speculated 1/10th of what you're suggesting, it would have already been cost effective to operate a backbone. If they're anywhere near the scale you're suggesting, then it should be a no brainer for them to operate their own backbone of 100Gb waves.

I'm quite skeptical of the numbers you're citing though. Their PeeringDB profile suggests they only have 10-20Gbps of peering per city. Considering their profile was updated just a few days ago, I would be inclined to consider those listed capacities accurate. Although they mention 'hundreds of Gbps' of capacity per city, they also mention sending up to 50% of traffic through peering in London, where they only have 50Gbps of total peering capacity. Perhaps you're right on that 300Gbps of capacity per location, in which case they would run much lower utilization rates on transits than peers. But that would be even worse allocation of spending than in my initial assessment, considering the cost of a port at an exchange is much cheaper than a transit link with CDR. It also leaves them highly vulnerable to DDoS attacks through exchanges.

For a content network, public peering at exchanges in North America just with route servers and networks that have open policies would result in 30-40% of traffic going through the exchange. They would easily do more traffic at the exchange than any one transit in a mix of 3-4, let alone in a mix of 5-7.

With any significant private peering, easily 60% of traffic could be settlement free. And guess what, most significant peers require peering at multiple locations with a full set of prefixes, which requires you have a backbone. With the traffic levels you're suggesting they run, they should be able to negotiate settlement free peering with many major regional Tier 2's, making it even less sensible to be purchasing from multiple ones. Considering most Tier 2's within a given region will all peer with each other, there are very few improvements to be had by turning up additional ones. Where there's the most room for improvement is being on the right long haul fiber paths. In which case, given their North American focus, they should be buying from Level3 and they probably could at comparable rates to their current agreements by concentrating more of their commits at fewer providers. If they had their own transport, they could also determine which fiber paths they take across their backbone to ensure optimal latency. Beyond that, given that Tier 1's all peer with each other by definition, it's just a matter of dumping traffic out any one of them for local traffic without traversing a congested peering link. The microseconds it takes to go an extra AS hop within a city has indistinguishable impact on performance.

I'm not sure what best practices you're referring to. Who else can you name that utilizes up to 7 transit providers in a given city, without operating their own backbone? The only one that I can personally think of Internap, when they abandoned building their own backbone halfway through turning it up. Ask their former network engineers, from their golden years when they had their highest market share, how that worked for their network and for their business.

Are you a current Linode customer in one or more locations? If you were, you'd probably have experienced packet loss issues on a regular basis due to DDoS attacks. There's a reason why they're performing these network upgrades; they've had near daily network interruptions due to DDoS attacks since Christmas of last year, with some outages lasting up to almost a day. Smart network engineering would have never let their network become that unreliable in the first place. And if you were going to blame a lack of budget for that, my suggestions would be even more appropriate for them as they would them to scale their network in a much more cost effective way. An external scrubbing service makes sense, when they've been ineffective at mitigating attacks to date. Your network is only as resilient towards DDoS attacks as your weakest links, and spreading out capacity to a larger number of providers instead of concentrating higher capacities with fewer ones makes it much easier to saturate connectivity to one of them.

The only way Linode's current network strategy makes sense, assuming that it's not due to technical oversight, is if it's marketing driven. That's a fair reason, but it should also be fair to call them out on it. I'm not sure why you feel that strategy is in any way optimal, when it's the opposite of the models of hosting companies most renowned for their networks. Take for example SoftLayer, who went to great lengths to build out their own backbone fairly early on. I may halfheartedly agree that Linode's network upgrade strategy might be smart marketing, but I would wholeheartedly disagree that it's smart network engineering. It's not cost effective, is sub-optimal for resiliency against attacks, and fails to leverage peering effectively.

dsl · on Nov 3, 2016

> Why the heck would you send traffic out transit links using GRE tunnels

I was responding to your statement that you needed a backbone to be able to deal with DDoS attacks. That is simply untrue. Most hosting providers announce separate space from each location and do not backhaul. If you do want to sink attacks closer to the source (which in itself only really makes sense if you have highly diverse POPs) you can GRE the clean traffic between sites.

Your PeeringDB profile indicates you push 10 Gbps at peak. As this is getting quite long in the tooth for a HN thread, email me next time you are in the bay area. I'd happily share some operational tips for running high volume and attack sinking networks that really require a whiteboard. Heck, maybe we can get someone from Linode to join us too for beers. :)

alexforster · on Nov 3, 2016

We're usually around at nanog!

alexforster · on Nov 3, 2016

Welp, there's a lot that needs addressing in this comment, but I'll stick to the heavy-hitters.

> they've had near daily network interruptions due to DDoS attacks since Christmas of last year, with some outages lasting up to almost a day

Whoa, nope[1].

> It also leaves them highly vulnerable to DDoS attacks through exchanges.

We aggressively de-peer with networks that regularly originate attack traffic, allowing us to size our ports according to utilization rather than worst case attack sizes. Multilateral peering is actually a bit more expensive than transit these days - much more expensive if you intend to significantly overprovision capacity.

> Who else can you name that utilizes up to 7 transit providers in a given city

That's fair, but missing context. We're figuring out who works best for our traffic profile. We will scale back/remove the underperformers and scale up those that prove their worth.

> It's not cost effective, is sub-optimal for resiliency against attacks, and fails to leverage peering effectively.

All of this is dramatically incorrect.

[1] https://cloudharmony.com/status

hhw · on Nov 3, 2016

> Whoa, nope[1].

Perhaps fewer issues in the last month, but your own status page shows a litany of issues in September and earlier months. And I am fairly certain your status page has not covered every instance of attacks / packet loss on your network(s).

> We aggressively de-peer with networks that regularly originate attack traffic, allowing us to size our ports according to utilization rather than worst case attack sizes.

The Internet is far from perfect, and every network of sufficient scale is going to have a number of compromised machines on it, especially eyeball networks, who should be your most desirable peers. Rather than blanket de-peering such networks, you should be looking at PNI's with them.

True, there are a number of 'bulletproof' networks, but they are few and far between.

> Multilateral peering is actually a bit more expensive than transit these days - much more expensive if you intend to significantly overprovision capacity.

That is simply not true either. Even if taking the worst case of leasing waves across 3 major long-haul routes across the continent, you're looking at about $0.20/Mb. 100Gb waves have gotten cheap in the last year, almost cut in half of what they were not too long ago. Unlike transit though, you get full use of both directions, so your effective rate as a content heavy network is going to be closer to $0.10/Mb. And this is just a small minority of your traffic; most peering traffic is local and much of it would travel much shorter distances as the majority of non-local destinations aren't going to be at extreme ends of the continent. And at the end of the day, you can simply limit not accepting peering routes in a given city from other cities that are too far away, and push those out through local transit instead if you prefer.

Some exchanges are even one-time fee only, and most others are priced quite reasonably. You peer with networks available on those exchanges first, and only peer with networks at the more expensive exchanges if they're exclusively there. Believe it or not, other networks like to save money too, and those worth peering with are usually at the more cost effective exchanges.

> That's fair, but missing context. We're figuring out who works best for our traffic profile. We will scale back/remove the underperformers and scale up those that prove their worth.

That's a very expensive way to do things, considering minimum 1 year terms for most providers. Would've been significantly cheaper for you to hire some consultants with extensive experience working with the networks you were considering. Heck, you could have just sent some of the more outgoing members of your network team to a NANOG and gathered feedback for free over a few vendor sponsored beers.

It's also not exactly rocket science to figure out which Tier 1's are strongest in which corner of the world. At the end of the day, you're not dependent on their support if you're multi-homing (and shouldn't be because most of them are terrible).

> All of this is dramatically incorrect.

See points above.

neom · on Nov 2, 2016

Reminds me of DigitalOcean in 2015. Curious what a "per-customer VLAN" is in reality.

VLM · on Nov 2, 2016

Maybe one way to describe MPLS is its kinda a VPN for VLANs. Or a way to put VLANs in something like a VLAN sorta.

I can't speak for them but I worked at what boils down to a semi competitor a decade ago doing network stuff. MPLS is old stuff now and you can google the specific cisco model numbers and MPLS if you'd like to read configuration guides.

Superficially only having 4096 VLAN labels on an ethernet connection appears to be a big problem if you have more than 4096 customers. However MPLS label space is 20 bits so you're good to a million customers.

Then you have some "fun" mapping games such that your router connects traffic on MPLS label 123456 (which is your customer number) to local ethernet interface port wtf on vlan 100 or whatever you have been given.

At least that would have been cutting edge a decade ago and probably still is today.

Its unlikely to be any more, or any less, secure than anything else in a virtualized cloudy environment.

jlgaddis · on Nov 2, 2016

> Or a way to put VLANs in something like a VLAN sorta.

802.1ad, a.k.a. "Q-in-Q" [0]

> However MPLS label space is 20 bits so you're good to a million customers.

VXLAN, cf. RFC 7348 [1], is the latest coolness, allowing for up to 16M "virtual" networks (using 24 bits) and bridging layer 2 over IP (4789/UDP).

[0]: https://en.m.wikipedia.org/wiki/IEEE_802.1ad

[1]: https://tools.ietf.org/html/rfc7348