Cloud Platform at Google I/O – new Big Data, Mobile and Monitoring products

nostromo · on June 25, 2014

This is one space where Google really excels.

We're in the AWS ecosystem, and the database offerings are really subpar. DynamoDB, which I originally expected to be somewhat comparable to MongoDB, is an incredibly frustrating (and expensive) product to use. AWS Data Pipeline is extremely confusing and very expensive as well.

AWS's offerings really lag behind Google's offerings (like BigQuery) in this space. Hopefully AWS can catch up because I'd rather not have requests bouncing between data centers.

threeseed · on June 25, 2014

If you are in AWS there is also the four RDS (Oracle, MySQL, PostresSQL, SQL Server) options as well as RedShift. Also the best thing about AWS is that there are so many third party choices e.g. MongoLab, MongoHQ, Instaclustr, Cloudant.

Databases is not the area I would be choosing Google for.

Nitramp · on June 25, 2014

I think App Engine's Datastore is generally an under-appreciated gem. Possibly because you have to use App Engine to use it without sacrificing performance, maybe because it's not easy enough to use if all you have is some JavaScript + JSON and don't want or know how to write Python/Java/Go.

But it's actually the only generally available product I know of that solves all the hard problems (availability, partition tolerance, some - but well defined - consistency with cross entity transactions) with zero hassle for you.

If you read through http://aphyr.com/tags/Jepsen, you get some appreciation for how hard this is to pull off without running into operational nightmares (massive data loss, split brains, etc).

Disclaimer: I work for Google, though not on Datastore.

bkirkbri · on June 25, 2014

We've had good luck with DynamoDB, but it could be that it just fits our use case very well. What sort of frustration were you running into? (Honestly interested to avoid trouble down the line)

nostromo · on June 25, 2014

Most recently: hot hash key. DynamoDB uses the object to be persisted's hash key to route it to the right data cluster.

We're a SaaS company with lots of tiny customers and a few very large customers. We need to keep an index to show a specific customer only their data. That means the index for our largest customers gets hit a lot. The problem with this structure is we have to pay as if all of our customers were as popular as our biggest customers, or we get throttled. And even though the DynamoDB interface shows that you have provisioned 10x above your current usage, you still get throttled, because you're being throttled only in a single cluster.

So, let's say you solve that problem, but now you need to drop the troublesome index on a billion+ row table. With DynamoDB you can't change a table's indexes, so you have to migrate your table to a new table. Doing that without downtime is an incredible challenge.

Which reminds me of when they announced indexes. We were so excited only to find out we couldn't add indexes to our tables, but instead had to recreate them all.

The whole point of SaaS is to make our lives easier, but with DynamoDB our lives were much more difficult than just using Mongo.

Anyway, I need to do a blog post on this -- it's a bit too complicated for a HN comment. :-)

nwfzp · on June 25, 2014

Hot keys are going to be the same with Mongo. The issue sounds more like that you're using a single key per customer than anything else.

socialist_coder · on June 26, 2014

Yeah, I think all nosql db's will have that issue if you have extremely unbalanced sharding. This is an application level fault and should be solved there.

But, the thing that is extra bad about dynamo db is how they are paying for 10x higher provisioning as a stop gap, and still getting throttled. That sucks.

FWIW we're using dynamo db and we love it. Pro tip: setup dynamic-dynamodb and let it autoscale for you in realtime. http://aws.amazon.com/blogs/aws/auto-scale-dynamodb-with-dyn...

rbanffy · on June 25, 2014

Please do. It sounds like a very interesting adventure.

tedsumme · on June 25, 2014

What do you think is the analogous product to Cloud Dataflow in the AWS ecosystem? SWF? http://aws.amazon.com/swf/

persona · on June 25, 2014

I believe it's Kinesis: http://aws.amazon.com/kinesis/

hatred · on June 25, 2014

+1. It's Kinesis.

eweise · on June 25, 2014

DataFlow is not Kinesis. It's more like Kinesis plus Esper plus BigQuery and you still wouldn't have one set of queries to run against streaming and batch data like you do with DataFlow.

nostromo · on June 25, 2014

I presume AWS Data Pipeline. But they have their differences, so perhaps there is no true analog.

http://aws.amazon.com/datapipeline/

Simple Workflow is new to me, so thanks for putting it on my radar!

npinguy · on June 25, 2014

I've been using Simple Workflow (in particular, the Flow framework: http://aws.amazon.com/swf/details/flow/) a lot recently to manage complex asynchronous distributed workflows, and it's a revelation. Some people laugh about the "Simple" part in the name, but it's kind of true - once you get over the initial learning curve. It's a bit like Git that way (First a lot of banging your head, then a lot of bang for buck)

ackdesha · on June 25, 2014

Have you looked at AWS Redshift? It is somewhat analogous to BigQuery.

persona · on June 25, 2014

Google Dataflow seems to be the big one here specially if it works well for stream processing. Fault-tolerant stream processing with huge scalability? Perfect for the IoT!

isbadawi · on June 25, 2014

Judging from the code samples they showed during the keynote, I'd guess that Google Cloud Dataflow is based on (or an extension of, or a public version of...) FlumeJava, described in this PLDI 2010 paper: http://pages.cs.wisc.edu/~akella/CS838/F12/838-CloudPapers/F...

davecap1 · on June 25, 2014

Anybody have experience moving from AWS to Google Cloud? If so, did you have any surprises in terms of difficulty or cost?

IanCal · on June 26, 2014

The streaming data stuff looks extremely interesting. My main concern is around cost, unfortunately many of these things are great if you've got a massive data problem but not particularly worth it if you've got much smaller data.

I'm in a rather awkward phase of having small enough data that I don't need "Scale to 1000 machines!", I want just one or a few machines occasionally but managed for me (turn on, run code, shut off). Tutum works very well for this, but I'd like to use more of the ecosystem available at Google or AWS (pay-per-usage datastorage, for example). GCE is pretty decent, but a bit awkward, although the new docker support helps (but I've had problems getting it even working).

Maybe this is my magic bullet :)

smoe · on June 26, 2014

I'm using MITs StarCluster to quickly spin up a bunch of AWS Spot Instances, run some calculations and shut them down again.

http://star.mit.edu/cluster/ http://www.youtube.com/watch?v=2Ym7epCYnSk

IanCal · on June 26, 2014

Thanks, I'll have a look around at that!

nwfzp · on June 25, 2014

It looks like an attempt to respond to AWS kinesis which was released last year. The monitoring stuff seems to be about the software they got when they bought stackdriver.

opendais · on June 25, 2014

Google adding cloud monitoring has me sorely tempted to abandon a side project of mine. I'm sure Google could do it better. Bah. :P

samstave · on June 25, 2014

What type of monitoring? We use Stackdriver, which Google just recently bought.

dpg17 · on June 30, 2014

DataDog is an important part of monitoring at SmarterAgent, where we use their API's and Integrations heavily, especially the CloudWatch integrations for Amazon's Web Services (AWS). Using these, we have been able to put up effective dashboards for new environments in a matter of minutes.

We leverage DD’s API primarily for eventing. For example, deployment notifications are posted to datadog, where they overlay our metrics. This has proven very useful in tracking changes due to deploys and/or configuration changes.

While we do leverage the DataDog agent for standard and custom metrics, DataDog’s ability to put together dashboards (and alerting) for AWS without any modifications to the host is what really closed the deal for us.

opendais · on June 25, 2014

I probably will end up building the bare minimum to meet my needs and moving on tbh.

It was basically a monitoring/metrics system to merge how I handle the monitoring of crons, work queue, system metrics, analytics, etc. into a single service. Right now, I'm stuck using 3.

Sure, I could just build something to merge it together ... but at that point, I'm halfway to building my own.

cmelbye · on June 26, 2014

I was about to do the same thing. App Engine sorely lacks those features currently, so I'm very excited for this (assuming it has good support for App Engine in addition to Compute Engine which I saw in the keynote).

sid_xervmon · on June 26, 2014

Hi, I am developer and hacker and wanted to see if i can offer help here. The reason, wanted to see what would be the typical needs and use cases and learn from the experience. I can be contacted on sid4it@gmail.com

kzwin · on June 27, 2014

We use datadog at pelotoncycle.com. My past experience was nagios but datadog is so much nicer as well as easier to scale. I looked into other services (cheaper, more expensive) before deciding on datadog.

k.z

curiousDog · on June 25, 2014

Anyone know if they'll be offering Spanner as a publicly available service?

Goranek · on June 25, 2014

Probable in the future.. They never offer the latest technology...