Serverless computing on DC/OS with Galactic Fog

markonen · on July 22, 2016

In my book, the innovation in Lambda is, above everything else, about the billing model. My company moved the work of 40 dedicated servers onto Lambda and in doing so decimated our costs. Paying for 1500 cores (our current AWS limit) in 100ms increments has been a game changer.

I'm sure there are upsides to adopting the same programming model with your own hardware or VMs, but the financial benefit of Lambda will not be there.

keithchambers · on July 22, 2016

There is no disputing the financial model behind AWS Lambda is revolutionary. I know of a visitor NDA signing service that charges $5 per visitor. The backend runs entirely on AWS API Gateway and Lamdba and their cost per visitor transaction to AWS is $0.25. Think about how powerful this is! Now you can develop an app in your spare time and bring it to market for practically nothing. Once developed, your fix infrastructure costs are zero.

That said, there are financial benefits on premise too. Short lived processes that need 1 CPU and 512Mb memory are ideal candidates for oversubscription. If you had a server where CPU utilization never peaked beyond 60% and 15Gb memory was free then you could fit 20 Lambda functions in this 'slack capacity' without resource contention. Driving up utilization when the cost of the server is sunk is effectively capacity for free.

luhn · on July 22, 2016

I'm struggling to imagine what about an NDA signing service would cost 25c per visitor. 25c buys you ~4 GB-hours in Lambda!

jadbox · on July 22, 2016

Isn't the cost significantly higher? Even with bulk pricing, lambda was more than 8x more expensive than ec2 for constant computing.

imglorp · on July 22, 2016

It depends on your traffic.

If you have a million 1ms transactions per second, you'll spend $500k/month, or $6m/year.

https://s3.amazonaws.com/lambda-tools/pricing-calculator.htm...

mej10 · on July 22, 2016

That pricing calculator isn't correct.

From the AWS Lambda pricing page:

    Duration is calculated from the time your code begins
    executing until it returns or otherwise terminates,
    rounded up to the nearest 100ms

So it would really be closed to $1 million per month.

random3 · on July 22, 2016

That's only if you compare raw compute. When you create a VM on AWS it will get <10% utilization (many times close to 1%). Sometimes you need to factor in for varying utilization so you need to overprovision but if you have many services and the deployment unit is the VM then you end up having tons of VMs doing nothing. With containers the deployment model becomes more granular, hence you can share the VM and increase utilization, but you'll still pay for idle time. With hosted functions (like lambda) you don't pay idle time, so in many cases it's much more efficient.

Indeed, if you host your own "lambda" implementation, you may or may not have cost improvements. In this case it may be just a matter of operational efficiency.

philliphaydon · on July 22, 2016

It's highly dependent on your usage. If you want a crib job to run that isn't on your production server and it's to run every few hours. Great to schedule that on AWS lambda. EC2 would be more expensive.

Trying to process 1000s of requests on AWS lambda could end up hitting the limits of lambda and cost you more.

I use lambda to process exif data in images as they are uploaded to S3. Which is awesome because it doesn't require services running on the web server and costs me nothing because it comes into the free tier of AWS lambda.

philliphaydon · on July 22, 2016

Oh and AWS database access sucks balls. Because you cannot pool connections you must open/close db connections. If you fail to close you will exhaust the database connections quickly and kill your app until the connections expire.

mjb · on July 22, 2016

Phillip - have you tried making your database connection in 'static' code? With container re-use (see https://aws.amazon.com/blogs/compute/container-reuse-in-lamb...), you should see multiple invokes going to the same container, and you can use the same connection for all of them. That will only happen if your traffic is high enough.

k__ · on July 22, 2016

I find it kinda strange, that in every official doc they talk about how not to rely on this, but every example I find relies on thsi.

runeks · on July 22, 2016

I wonder why Google Compute charges a minimum of ten minutes to spin up an instance. I understand there is some setup cost, but ten minutes is $0.00056 for the cheapest (f1-micro) instance [1]. I know this sounds sort of cheap, but that's the cost just to deliver the interface (the VM) that is going to be used. That's a relatively large amount of money to spend on not having any work done at all yet. A diskless VM, which can just send data for storage to a database over HTTP, shouldn't require much setup work, as it really just is the execution of CPU instructions (with the OS being the CPU interface rather than JavaScript).

Couldn't someone write a virtualized VM that runs inside Lambda, thus providing zero (or Lambda-equivalent) startup costs for generic VMs? Then the VM would just be a function running inside Lambda, transforming incoming user data and then storing the result in a database, after which it would disappear.

Or maybe the OS just is the problem? One model of bypassing it is Lambda-like services, which would be the more centralized solution, while the more decentralized solution (of the two) would be bare metal/unikernels, which basically achieve the same thing: (close to) zero startup time/cost (~30 ms for a HaLVM unikernel).

[1] https://cloud.google.com/compute/pricing#billingmodel

treeder · on July 27, 2016

I talk about cost being the main factor in this post about the benefits of Serverless. It's definitely the driving factor: https://www.iron.io/what-is-serverless-computing/

superuser2 · on July 23, 2016

Wasn't this the very earliest billing model in multiuser computing? Give us a program to run on our big iron, we charge you for the CPU time it consumes?

This is how university supercomputers work currently. You don't get dedicated hosts; your project has a budget of CPU hours.

simonebrunozzi · on July 22, 2016

That's interesting - it's the first time I read about a 10x cost improvement.

Would you mind sharing a bit more details?

p.s. if you're in the Bay Area I wouldn't mind getting together over coffee/drinks for a deeper chat.

soamv · on July 22, 2016

There are financial benefits to being able to very easily overcommit resources (although Lambda is just one way to do that).

happyslobro · on July 22, 2016

How do Mesosphere ops (who are developers first, maybe not so experienced with the operations side) normally manage these services, like Gestalt? How do you keep track of what should be running and how resources should be allocated? Do you just install directly from this service repository, and then update resource allocations as needed, through Marathon or the service itself? Or, do you have a system for managing and versioning the entire cluster?

I'm new to Mesosphere, and right now, I'm figuring out a process for managing the cluster that would work well for a small but growing team. It would be nice to have a specification of what the cluster should look like, and how it has changed over time. For that, I'm thinking of having a "{company}-DC" git repository with a collection of Ansible playbooks that would set up the DCOS cluster, and then install, configure and set up scaling policies for the services and applications that we want to run. Is this how most people do it? Do you see problems with the general idea of keeping all of Mesosphere under configuration management? Where do secrets fit into this, where do you store them and how do you make them available to your applications?

tobilg · on July 22, 2016

Also, there's a tool called mesosctl (https://github.com/mesoshq/mesosctl) if you want to start small.

keithchambers · on July 22, 2016

You may want to ping the DC/OS community Slack channel or mailing list at users@dcos.io.

(https://dcos.io/community)

fizzbatter · on July 22, 2016

I'm very clueless to this new architecture, but a concern i frequently hear about is lockin (both in language and vendor).

Does this style of computing / engineering hinder open source adoption in languages? Eg, OS images pre-setup with some type of PHP-style provider, letting you run whatever language you want with low startup time to handle each request?

I'm sure much of this is way off the mark, so to explain it differently, i'd love to be able to work with a Rust framework tailored to this "serverless" model, and hosted on any generic box i want (or fleets of boxes, etc).

And of course, apologies for the ignorance i'm sure is visible :)

the_arun · on July 22, 2016

If we want to replace AWS Lambda with self hosted service for Java, what are the recommendations? Apache Mesos ?

tobilg · on July 22, 2016

Actually, Galactic Fog runs on DC/OS which is at it's core a Mesos distribution. As far as I understood it can't currently run Java lambdas...

crispywalrus · on July 22, 2016

So their docs seem pretty clear that their lambda engine runs jvm based languages. Or at least java and scala, but if you can run java you can run scala and clojure and jruby etc.