Going Multi-Cloud with AWS and GCP: Lessons Learned at Scale

nodesocket · on Aug 21, 2017

One of the biggest benefits of Google Cloud is networking. By default GCE instances in VPC's can communicate with all instances across zones and regions. This is a huge plus.

On AWS, multi region involves setting up VPN and NAT instances. Not rocket science, but wasted brain cycles.

Generally, with GCP setting up clusters that span three regions should provide ample high availability and most users don't need to deal with the multi cloud headaches. KISS. You can even get pretty good latency between regions if you setup North Carolina, South Carolina, and Iowa. Soon West Coast clusters will be possible between Oregon and Los Angels (region coming soon).

manigandham · on Aug 22, 2017

This is one of the biggest features that you appreciate a lot when you dont have it, and makes global apps incredibly easy. Softlayer has a similar network with default region peering but not as advanced.

Of course anything can be setup using custom VPN but this is a lot more work and will never be as easy, reliable, automated or cost effective.

That being said, AWS is rolling out automatic VPC peering, running on their own private backbones between regions so there should be functional parity soon, although with different price and performance compared to GCP.

autotune · on Aug 22, 2017

Having used SoftLayer and experienced their API, support, and GUI, I would not consider that a redeeming feature.

manigandham · on Aug 22, 2017

It's a feature they have. It might not make up for other parts of their platform that don't fit what you need but we've found their servers to be efficient and their support is quick and helpful.

They're overshadowed now by the scale, efficiency and managed services of the major clouds but can still be useful if you're running on their dedicated machines. Last I checked, Keen.IO runs on softlayer.

autotune · on Aug 22, 2017

Digital Ocean is a far better alternative if you don't want to use the other bigger cloud providers, quite frankly. AWS dedicated servers are also far less expensive and you get more bang for the buck. m4.2xlarge, for example, is the most comparable hourly offering based on memory and CPU cores to SL base dedicated server and outperforms the hell out of it. See here:

https://aws.amazon.com/ec2/purchasing-options/dedicated-inst...

http://www.softlayer.com/bare-metal-servers%20

One success story is not enough compared to thousands elsewhere.

sdrothrock · on Aug 22, 2017

I may be completely off here, but isn't this due to their underlying architecture decisions? That is, AWS from the start has kept all regions completely separate, so that problems in one region do not influence another. But GCP has has issues with failure across regions IIRC.

outworlder · on Aug 22, 2017

Having a software defined networking spanning across regions and failure cascades across regions are two different things. There's nothing preventing a vendor from presenting to you a single network, while they are actually distinct networks.

hossbeast · on Aug 22, 2017

Having distinct networks in different regions encourages you to architect your application in a fault tolerant way.

kakwa_ · on Aug 22, 2017

Or the contrary. In most cases there is something to synchronize between regions, like a replica of the data.

With difficult interconnection of regions, it makes it somewhat harder to do, and it can easily end-up with "meh, AZs are good enough".

_pfwi · on Aug 22, 2017

It is also potentially due to Google owing their private fiber backbone that connects all regions and as well as their software Defined Network that allow high bandwidth and low latency routing of packets across regions.

manigandham · on Aug 22, 2017

AWS also has a private backbones and offers (or will soon) VPC peering run on top of this.

ranman · on Aug 22, 2017

I work for AWS.

Just as an FYI you don't have to use a NAT instance there are also NAT gateways which I find easier to manage: http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-na...

ad_hominem · on Aug 22, 2017

Regardless they both cost money to run. Looks like the cheapest NAT gateway[1] is currently $0.045/hr which is about $32.76 per month. Then you also get charged $0.045 per gigabyte transferred as a "data processing charge" in addition to standard AWS data transfer charges.

[1]: https://aws.amazon.com/vpc/pricing/

fha · on Aug 22, 2017

AWS needs to release a more affordable and simpler feature for inter region connectivity. Even MS Azure has a Vnet to Vnet connectivity option in which traffic flows through the Azure backbone vs the internet and it doesn't cost much.

joeseeder · on Aug 22, 2017

That Vnet to Vnet is unreliable when you start using it at scale.

We had issues as soon as we started launching instances ( after connecting vnets ) , and azure supports response was to give them the ids so they can manually add them to routing between vnets.

Also BGP routing, was impossible to do beyond their tutorial level setup.

ad_hominem · on Aug 22, 2017

If any Google Cloud people are listening I wish you had an equivalent to AWS's Certificate Manager. Provisioning a TLS certificate which automatically renews for eternity (no out-of-band Let's Encrypt renewal process needed) and attaching it to a load balancer is so nice compared to Google Cloud's manual SslCertificate resource creation flow[1].

To a lesser extent, it's also nice registering domains within AWS and setting them to auto renew. Since Google Domains already exists, it would be neat to have this feature right inside Google Cloud.

[1]: https://cloud.google.com/compute/docs/load-balancing/http/ss...

rmhrisk · on Aug 22, 2017

We hear you, while I can't speak to future products and features I can say we understand there is room to improve the SSL provisioning and lifecycle management story in our products and we are making investments in that area.

manigandham · on Aug 22, 2017

It's in progress, star this issue to vote: https://issuetracker.google.com/issues/35900034

vira28 · on Aug 22, 2017

One thing that I liked with GCP is their recommendation for cost saving. I spun up a compute engine for a hobby project and within minutes they gave recommendations to reduce the instance size and how much i can save. I don't think AWS offers something like that. Correct me if I am wrong.

_pfwi · on Aug 22, 2017

Even better are Google managed services (PubSub / Dataflow / Datastore), which scale up and down based on usage (cloud native products) and thus save money automatically compared to their equivalents in AWS (Kinesis / Kinesis Analytics / DynamoDB) which does not autoscale.

ranman · on Aug 22, 2017

DDB recently added autoscaling -- https://aws.amazon.com/blogs/aws/new-auto-scaling-for-amazon...

CSDude · on Aug 22, 2017

It does not work well, it gives late responses

ranman · on Aug 22, 2017

Really? Feel free to email me about that randhunt@amazon

mianosm · on Aug 22, 2017

AWS has the trusted advisor, and it will offer to assist you in cost savings in terms of:

* Idle Load Balancers

* Underutilization of EBS volumes

* Unassociated Elastic IP addresses

* Idle RDS intsances

* R53 latency resource record sets

* etc...

BurritoAlPastor · on Aug 22, 2017

Most of the Trusted Advisor checks are only available if you're on a Business or higher tier support plan. And those are now priced as a percentage of your monthly spend – not cheap.

illumin8 · on Aug 22, 2017

If you're running your business on any provider, wouldn't you want to make sure you had support?

user5994461 · on Aug 25, 2017

Not when the support is useless and not required.

azurezyq · on Aug 21, 2017

One extra point for tracking VM bills:

GCE bills are aggregated across instances. To get more detailed breakdown, you can apply labels to them and the bills will have label information attached in BQ.

Alternatively, you can leverage GCE usage exports here:

https://cloud.google.com/compute/docs/usage-export

Which has per-instance per-day per-item usage data for GCE.

Disclosure: I work for Google Cloud but not on GCE.

manigandham · on Aug 22, 2017

When it comes to GCP:

- They have Role Based Support plans which offer flat prices per subscribed user which is a much better model. [1]

- Live migration for VMs mean host maintenance and failures are a minor issue, even if all your apps are running on the same machine. It's pretty much magical and when combined with persistent disks, effectively gives you a very reliable "machine" in the cloud. [2]

1. https://cloud.google.com/support/role-based/

2. https://cloud.google.com/compute/docs/instances/live-migrati...

user5994461 · on Aug 22, 2017

>>> on AWS you have the option of getting dedicated machines which you can use to guarantee no two machines of yours run on the same underlying motherboard, or you can just use the largest instance type of its class (ex: r3.8xlarge) to probably have a whole motherboard to yourself.

Not at all. Major mistake here.

When you buy a dedicated instances on AWS, you reserve an entire server for yourself. All the VMs you buy subsequently will go to that same physical machine.

In effect, your VMs are on the same motherboard and will all die together if the hardware experiences a failure. It's the exact opposite of what you wanted to do!

ranman · on Aug 22, 2017

I think two concepts are being conflated:

Dedicated Instances: https://aws.amazon.com/ec2/purchasing-options/dedicated-inst...

and

Dedicated Hosts: https://aws.amazon.com/ec2/dedicated-hosts/

stephengillie · on Aug 22, 2017

At my current job, we're looking into DIs to reduce our SQL costs. With standard Spot/RIs, we're paying per-core for SQL Server. But with a DI, we're expecting to be able to license against the physical sockets instead.

> You can use Dedicated Hosts and Dedicated instances to launch Amazon EC2 instances on physical servers that are dedicated for your use. Dedicated Instances are Amazon EC2 instances that run in a VPC on hardware that's dedicated to a single customer. You can also use Dedicated Hosts to launch Amazon EC2 instances on physical servers that are dedicated for your use.

> Dedicated instances may share hardware with other instances from the same AWS account that are not Dedicated instances.

> An important difference between a Dedicated Host and a Dedicated instance is that a Dedicated Host gives you additional visibility and control over how instances are placed on a physical server, and you can consistently deploy your instances to the same physical server over time.

It looks like you can launch DIs on your DHs, or on any arbitrary host; but once you have a DI on an arbitrary host, only your VMs will run there; so a de facto Affinity policy. And any instance you launch on your DH is automatically a DI.

Is there a benefit to running DIs without having a DH? It sounds like having a DI gives you 90% of a DH. The DH gives you is a few hardware details (which might be essential for licensing), and like GP suggested would let you choose Affinity (or Anti-Affinity) between them manually.

As a result, Dedicated Hosts enable you to use your existing server-bound software licenses like Windows Server and address corporate compliance and regulatory requirements.

This is the first I'm hearing about DHs, and it sounds like that might be what we need, instead of the DIs we've been telling other teams about.

ranman · on Aug 22, 2017

I'm not an expert on DI and DH, sorry. I can find answers for you though and post them back here. You can also email me: randhunt@amazon.com (actually anyone on these threads is welcome to email me any question they have) and I can try to get back to you there.

samstave · on Aug 22, 2017

If you have hipaa reqs, your signing an agreement with Amazon will require you to host pii/phi on a dh

otp124 · on Aug 22, 2017

This changed https://aws.amazon.com/blogs/apn/aws-hipaa-program-update-re...

samstave · on Aug 22, 2017

Wow. Thanks!

e12e · on Aug 22, 2017

I'm not sure what "dedicated machines" mean here - as far as I can tell from:

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/dedicated...

You can buy up to two of each type/location and schedule your vms to run on different physical hosts?

stephengillie · on Aug 22, 2017

Effectively, Dedicated Instances are an Affinity policy. You're looking for an Anti-Affinity policy, which isn't common. Here's an article about Affinity and Anti-Affinity on OpenStack: https://techglimpse.com/affinity-anti-affinity-policies-open...

dswalter · on Aug 21, 2017

If AWS were to go to a per-minute billing cycle, they would be instantly more price-competitive with Google's offering. Or, to put it the other way around, those leftover minutes form a significant chunk of AWS's profit margin.

_pfwi · on Aug 21, 2017

I don't think so. GCP's bill is usually about 50% of AWS's bill for same application, if you run it full hour (from my personal experiences and from several others as well: https://thehftguy.com/2016/11/18/google-cloud-is-50-cheaper-...). GCP has lot more cost saving features like seamless scalability, custom shapes, sustained discounts and so on. If you workloads span less than hour, GCP can offer more then 50% savings.

ranman · on Aug 22, 2017

I refuted some of the networking claims in that article previously (I work for AWS). Especially the bizarre claims that you have to get a C4.4xlarge for 1gpbs... The 220 mpbs network cap claim is just not true. Just run iperf3 on any aws instance to a GCE instance and you can see greater than 220mpbs.

isatty · on Aug 22, 2017

Honestly we all know that the small instances have terrible CPU that doesn't let you use the advertized 1Gbps anyway. Other than that, even if AWS let 1Gbps traffic go on for a while, you get throttled pretty quickly from my experience.

user5994461 · on Aug 22, 2017

Author of the quoted article here.

The run of iperf refuted your refutation.

kuschku · on Aug 22, 2017

> Just run iperf3 on any aws instance to a GCE instance and you can see greater than 220mpbs.

For how long is the question. Historically, it’s been considered common knowledge (might just be an urban legend) that AWS, even if you pay for more traffic, at some point just throttles you, the same way that they do with IO.

boulos · on Aug 22, 2017

He said "more" price-competitive :). I think we're all saying the same thing.

boulos · on Aug 21, 2017

Agreed, and I hope they do so!

Though there would still be other things like the lower on-demand rates, custom shapes, networking that scales with shape (rather than being coarsely grouped), being able to attach SSD / GPUs semi-arbitrarily, and so on. For those that care, not having to pay up front for the best price is also a huge deal. You see the same thing in GCS vs S3 as well: Glacier and S3-IA have a few rounding up gotchas that catch many people out.

All that said, I hope we all get to per-minute billing.

Disclosure: I work on Google Cloud (but haven't talked to the Metamarkets folks)

MichaelRenor · on Aug 22, 2017

> As we investigated growth strategies outside of a single AZ, we realized a lot of the infrastructure changes we needed to make to accommodate multiple availability zones were the same changes we would need to make to accommodate multiple clouds.

Maybe he author means multiple regions? Multi az is so easy. Everything works. Multi region is much harder.

whatsmyhandle · on Aug 21, 2017

Very nice writeup! A nice, detailed read that was easy to understand.

It seems to focus more on raw infrastructure (EC2 vs GCE) instead of each company's PaaS offerings. Obviously AWS has the front runner lead here, but would be super curious in a comparison of RDS vs. Cloud Spanner for instance. (pun unintentional, but then realized, and left in there)

outworlder · on Aug 22, 2017

This should be RDS vs Cloud SQL

vira28 · on Aug 22, 2017

Did you mean to say AWS Aurora vs Cloud Spanner? Because, I don't think you can compare RDS vs Cloud Spanner. RDS is a managed for the most of the famous RDBMS out in the market (except Aurora). Cloud Spanner is a google proprietary db running only on GC.

swozey · on Aug 21, 2017

Great thorough comparison and falls very into line with my experience. Definitely worth the read. Thanks!

throwaway0071 · on Aug 21, 2017

Off Topic: it's frustrating that these companies spend quite a lot of time and money learning about the complexities of their infrastructure but when you're interviewing at such companies, you're expected to have answers for everything and a complete strategy for the cloud.

/rant

hobolord · on Aug 22, 2017

Great post! How difficult is it to switch from an AWS EC2 instance to the GCP version?

mrg3_2013 · on Aug 21, 2017

Nice post! I will be using it as a reference.