I’ve had a chance to use the service for a couple of weeks. My quick summary review is that it’s a little tricky setting up the IAM roles and security groups, but once you have that going, it works great! I see a ton of potential here in transforming the way people use AWS.
I also put together the Netflix use cases in the keynote so if you have any questions I’ll try to answer them!
One thing I am struggling to understand is how much of my applications can be replaced with Lambda.
I think it would be really awesome to make applications that do not have any dedicated instances associated with them, but are entirely based on the various AWS platforms.
For some things, it seems like Lambda could be a drop-in replacement, whereas others, it would not quite fit the bill. Do you think it will be possible to build a viable a "server-free" application with Lambda and the other services (S3, DynamoDB)? Or is it probably going to be limited to the types of use cases listed on the landing page[1]?
It is completely possible. Consider that the major uses for a server are interacting with clients, storing data (backend database) and manipulating that data. Here the clients can interact directly with S3/Dynamo, this in turn will trigger Lambda functions to manipulate data, and the round trip can be accomplished via SNS.
Of course, it would be much more viable to have some sort of server running (for example I doubt Netflix has replaced their entire backend with Lambda functions), but a great majority of workflows dealing with stateless data manipulation and events can now be streamlined and even replaced.
I recently had to set up a system to generate thumbnails from images uploaded into S3; precisely the example they give. This would be a godsend and replace a lot of complicated queues and ruby processes running on an army of EC2 instances.
Oh what I would have given to have this capability a year ago. :)
You cant (at present) execute code on a GET request, so you need to have static content pushed to S3 as the basis for your app. You can update it on POST/PUT though, so you can construct an application via that.
I think it's completely possible. Amazon has all the pieces you need (compute and storage). That being said, you may need to wait for ec2 events to do some really complicated stuff.
True, its likely just like all other limits; arbitrarily small for anyone by default to prevent you from having an errant process spin up a ton of resources that you'd be expected to pay for.
Just like how you can get specific increases on your spot bids so long as you acknowledge that if you get a spot spike, and your personal limit is far greater than the spike -- you'll be charged your rate.
Interesting question. I think some advanced apps already use these techniques. For example we already do event driven things at Netflix.
What this does it make it a lot easier for smaller companies to do it without having to invest in all the infrastructure.
So I think yes, it will be revolutionary in the sense that it will get a lot more people using what I think is the future direction of compute and application building.
No idea, for the time being the only supported language is JS. I'm trying to understand if I can run ruby applications by glancing here and there... Didn't had the time to look into it yet, but everyone seems very excited about it.
Not exactly revolutionary per se, it's an idea that's been around at least as long as shells, and as mentioned in the keynote it's a natural level of abstraction above AWS APIs. The neat part is how this simple idea has been translated to applications at scale.
IMO what's really going to be revolutionary is what we end up doing with it, but we're going to have to wait a bit to find that out :)
Could you use it more generically, meaning not "just" within an app but also as part of an overall control plane, sending commands to other APIs like a notification system?
Not directly, no. However, the documentation states that you can use CloudTrail to log the requests from SQS to S3, then have changes to S3 trigger the events that can get your Lambda function to respond.
It's a little convoluted, but that seems to be the way they're going for now.
Wouldn't that be kind of unnecessary, when it's not too tough to write an SQS-reactive client?
Starting with S3 fills a gap as there is AFAIK otherwise no way to start processing based on changes to S3, without writing a rather wasteful program to list the resources, maintain state, and hunt for differences.
But if you have an SQS reactive client it would sit in EC2. With a service like Lambda, they are promising millisecond based pricing. A very ambitious project by them.
From hearing the details it seems that there is no real limitation to what you can execute as long as it runs on the instance. I get the feeling that if you zipped up all of the parts of python or ruby you need to ru you could execute a script using either of them as long as you initiated it from your node script. The key word here is probably "support" and that may be more of an issue if you want to edit your scripts with their web IDE.
Django is the problem. Even if I don't trigger the database to connect, the startup costs are noticeable to validate models and load things that won't ever be used.
so maybe lambda its not suitable for things that are less than a few seconds.
I could write python with no django and try to structure things so they don't need database. but then I would have to take input from and put output data back to a queue of some sort.
>> A quick back-of-the-envelope test
and actually python on mine was even faster than node when both are doing nothing. so my assumption was wrong (but its based on mostly using django and all my node stuff is small servers and tools)
» time node noop.js
node noop.js 0.04s user 0.01s system 98% cpu 0.045 total
» time python noop.py
python noop.py 0.01s user 0.01s system 90% cpu 0.026 total
but importing significant numbers of libraries would be the real test.
the main benefits to lambda would still be:
- scaling for high demand
- no cost for idle time
and let's not worry about the micro pennies in startup times
I dont think so - the docs suggest you get a full Linux that your code runs on, not just a limited sandbox, one of the blogs mentioned running binary code. So I don't think it is being sandboxed at the language level. Especially as they talked about adding more languages soon.
as MBlume mentionned[1], they seem to support NodeJS.
From the FAQ :
"At launch AWS Lambda supports code written in Node.js (JavaScript). Your code can include existing Node.js libraries, even native ones."
so it seems that you can avoid lock-in if most of your javascript code is not too coupled to Amazon's API
And they claim on the detail page:
"With AWS Lambda, there are no new languages, tools, or frameworks to learn. You can use any third party library, even native ones. At launch, AWS Lambda will support Node.js code, with support for other languages coming in the future."
Err, I think the parent's point wasn't lockin in terms of the code you write, but in terms of your architecture. Setting up your own system driven off of s3 or dynamodb events (if you wanted to move the functions onto your own bare metal) would be a pretty big project.
that's a REALLY good point! this feature seems really solid, but they seem to be putting a sort-of "golden handcuff" on you. If you have a distributed system on AWS which implements "lambda functions," you can forget about switching to Azure, or your own bare metal servers without having to go through some pain. "vendor lock-in" definitely seems to have been their motto here, very clever & great observation!
Well, yes. That's the thing about AWS. It's a very feature rich & sophisticated platform, which is hard to replicate elsewhere with the same harmony without spending a great deal of investment.
I won't be surprised if someone writes a generic lambda library which given a cloud provider generates relevant mappings/code. Will take care of some of the vendor lock-in concerns ?
I think this would be the solution. I guess lambda is basically IFTTT at scale with more configurable "T" portion.
Anyway, libcloud seems to be leading the way in this kind of standardization, so that's good news. There's also providers that have drop in AWS-compatible APIs for some services. Then there is stuff like apache mesos for creating that infrastructure, which is less relevant to smaller companies.
But I suspect that the last thing Amazon wants is to turn into a commodity, so it's in their interest to release as many differentiators with non-standard APIs as they can.
As someone who is non-technical, how much code/infrastructure would this save? I'm trying to understand how locked in this would really make you, and how difficult it would be to replace this on some other IaaS provider?
Any use of non-standardized service will only help you lock in further. Though a lot of companies/start-ups do not care if it saves time/cost and perhaps they are already deeply invested in the ecosystem.
It seems that this system puts together all events generated by Amazon services to trigger new events. This is usually not that complicated to code, but the catch is to make these (usually) scripts/programs highly available, which Amazon provides.
I'm guessing that this Lambda functionality is built on the new ECS (EC2 Container Service) which is built on Docker. Therefor, it should be reasonably doable to recreate it on-premises and escape the AWS lock-in. "should" being the operative word there. :)
It's no different from stored procedures and triggers in a database, or a shell language on an operating system. At some point you will have to invest time creating logic, for your purpose, on somebody else's system.
I wish they hadn't named the units of computation "Lambda functions". Cause, you know there's already something known as a "Lambda function" in computer science.
But kudos for Amazon for furthering the datacenter-is-the-computer approach. It is simply the right thing to do.
Really douche to use a computer science term as your product name, are they going to trade mark that or are the just happy with the confusion it will cause.
I have a hard time believing there is cause for confusion here, have Physicists been stumped by Amazon Redshift? I'm not being facetious, I just don't see how a product name that is niche to people using Amazon services is likely to confuse people.
Also, CS hardly owns the greek letter lambda, it like many greek letters has been used for many products and features over the years. Delta is an airline, but they're doing OK, among others. I am just curious what your issue with this is, and if you can come up with some concrete scenarios where this could really confuse people.
The meaning of "lambda" in computer science is functionally (no pun intended) similar enough to what amazon is selling that people learning programming and nontechnicals or even IT pros who work with programmers will inevitably be confused.
It seems like you're seriously saying that people who are studying computer science lack the ability to tell the difference.
Naming the product something that closely resembles its functionality seems like a great idea for a product. It's still Amazon Lambda, and I think it's very unlikely that people will simply refer to it as "Lambda." If CS students find themselves hopelessly confused by this product name, they are going to have a real hard time tying their own shoes.
If you search for Y Combinator, luckily, https://en.wikipedia.org/wiki/Y_combinator still shows up on the first page of results, I expect the same will be true for Lambda, so I can confidently say we won't lose a generation of CS students to this faux pas.
I'm not arguing they will not be able to tell the difference... I'm arguing it will cause confusion at some point, and maybe that's a negative externality of naming this product "Lambda." Nobody said "lose a generation of CS students."
And, as I've noticed in the past, searching for "y combinator" + any language name will often keep useful results away until you finally block this site. I'm not sure if this is still comprehensively true, but it seemed so to me at one point. So there is a very real impact on search results that can happen.
I was being overly sarcastic there, sorry about that. I just don't think this will ever be a problem for people, if anything Amazon adopting it may help people learn more about it, all the same, I guess it could go either way, I'll quiz some candidates in a few years to see if they ran into Lambda as a stumbling point :)
I'm just the opposite - when I heard they were releasing a new service called Lamda, I immediately wondered whether it was like the inline lambda function in Python - and when I read through the description - they were very similar in nature.
Amazon has a LOT of services, and the better the naming scheme they use, the easier it will be to remember what they each are. I'm genuinely happy that they used a name that will make it trivial to map the function (heh) it serves in my head.
Holy mind blowing awesomeness, this changes everything, yet I feel like this was such an obvious thing to do. So obvious that I can't believe it is real. This, ladies and gentlemen, is why AWS defines the cloud, they are so far beyond everyone else and still innovating, even and the most basic level.
Watch out for the limit on the number of concurrent invalidation batches. Youll probably want to scan for "pending" PUT objects and try aggregate them in to an invalidation batch.
Good point. I guess this would need to be "S3 PUT event triggers code which adds object name to a list and starts a timer"; "timer expiry triggers code which issues a batch invalidate for the list (and empties the list)".
I wonder if there's any good way to generate that timer expiry event...
Seems like a much more interesting announcement than the container service. Can see the architectures of dozens of our systms collapsing down with this.
At a guess, Container Service was built in order to build this. I.e. Lambda is probably an ECS cluster creating and running Node.js-based Docker containers. (A bit like Heroku's original Ruby stack, actually.)
It's a bit amusing to note that they could have built Lambda on top of their own previous Elastic Beanstalk Docker offering instead, but chose to build ECS presumably for cost reasons (which everyone who has evaluated Elastic Beanstalk's Docker support will understand.)
Should I think of this essentially as an abstraction that can replace my asynchronous task queue + workers (e.g. RabbitMQ + Celery workers, obviously assuming you aren't using MQ for additional messaging)? I hate managing those pieces and would be happy to hand that infrastructure to Lambda, but are there additional benefits or use cases that are opened up?
I guess I would have expected others to describe this the same way ("replaces your distributed task queue"), but since I'm not seeing that description I wonder if I've misunderstood.
I guess maybe part of what you might be missing is that these "Lambda functions" can (will be able to?) be kicked off by all sorts of different AWS events, so it can replace queue + workers where you're generating tasks from your own code, but it might also be able to replace the code that generates those tasks too.
Now that you say it, I'm sure we'll see wrappers to support this type of thing. For instance, I could imagine a django library that provides a declarative way to wire Django signals to AWS Lambda functions.
lambda_signal(Model, signal_name, lambda_fn_name)
When that signal fired on the model, the model would get serialized to JSON and sent to the lambda function. It'd be fun to whip this up if no one beats me to the punch.
Rabbit and celery is the backend system I have now. I was doing a lot of things on demand, event driven, but this resulted in overloading the database ( currently my system constraint ) or in elongating the queue too much. Some event would flood the queue with thousands of tasks. Now I have a periodic task that on each iteration chooses the most pressing task. And the system is stable but I have a bad backlog.
So I'd love to have a reactive task system where I didn't have to worry about the resource usage. Think how much precious developer time I've wasted just managing resource expenditures. Much of my coding this week has been just trying to optimize for this stupid constraint. How many processors do I have, how much disk io, is the memory getting close to swapping ?
Is it me or AWS is releasing too many services... there is a service for everything. I wounder if they are just throwing stuff out there see what sticks... kinda like landing pages.
To be fair, unlike google they are not cancelling any services. I think Amazon first builds these services for internal use, and then releases them to the public.
To be fair, you have to pay per unit for these services, so as long as people are using them, Amazon is making money. Putting a service out to pasture in maintenance mode probably wouldn't cost that much.
Maintenance on all of their services is non-trivial in terms of people at least. However if they are used internally, directly or indirectly then they're already paying for it.
If you listen to Amazon at their shin-digs, they tell it like all of these new services and features are in response to requests from existing customers.
Claiming that your code is a 'lambda function' makes it sound sexy, but.. isn't it really just a procedure? Unless I'm missing something and there is some higher-ordered capability for composing AWS lambda functions together in a way that permits the platform to perform lambda reductions or optimize resource allocation...
Lambda function, not lambda function. Capitalization is key (the former is just a brand name).
Having reductions and composition would be really cool, but from what I could glean from the demos each function is containerized. Stateless and idempotent like lambda functions, but right now chaining more than one together requires an intermediary event.
Except that it can have side effects, can't it - it can trigger services, cause stuff to be changed in a datastore. It doesn't have any local stateful storage, but it can make use of remote state. In that respect, how is it different from basically any web request processor?
I'm excited for this. This replaces what I wanted to use SQS for.
SQS always felt like too much vendor lock-in to me to justify not using something like RabbitMQ or Beanstalkd.
With Lambda, the resource consuming the queue is managed for me - that is huge. Also, the pay by time model is perfect for this - instead of managing when to stop/start resource intensive instances, I don't even have to think about the problem. I only get charged what I used, which can be crucial during growth stages of a business or prototype.
The big penalty is the vendor lock-in, but this tips the scales for me for certain things.
> You can use any third party library, even native ones. [1]
I realize they are starting with node.js, but I wonder how this will work? It sounds like they plan to support arbitrary dependencies. Will you upload a Docker container with your necessary dependencies? They talk about milliseconds until your code is running, and containers may be (?) slower than that. Or am I hoping for too much.
It's definitely possible to do this if you're not set on containers.
We do it at http://wym.io/ by putting our runtime library in the environment that the code runs in and keeping the runtimes around, keeping the environment clean along the way.
We're working towards doing all that (and more than) Lambda does in a deploy-your-own fashion...
One difficult part of doing event triggered processing is in the progress reporting and keeping the code related to it simple. I wonder how they deal with that.
I'd pretty much given up on AWS for compute and moved most everything to Linode and some bare metal servers, but this service looks very compelling for discrete compute tasks.
The ability to pay only for the fractions of seconds actually used, and the ability to scale quickly without provisioning (or over-provisioning) EC2 instances, is awfully attractive.
Plus, Amazon has priced this pretty aggressively — i.e., it looks shockingly cheap.
Good stuff. Basically it seems to do the bind+listen for you if you are the trigger subscriber. If you are the trigger generator, then it does socket.write for you. But the big deal is that you dont pay for 'listen', just pay for the function execution.
The one thing that will surely happen with this is that the code written will be 'locked in' to run only on aws territory.
Agreed. However, Amazon's pricing and rather complicated infrastructure means that there will always be a market for people like the DigitalOcean's of the world (I just want a cheap VM, now).
And yet, they don't. Compare that to Oracle. Plus, when Amazon makes a new cloud architecture that gets popular, others copy it. OpenStack will have this if need be in a year or two and then you can port to OS in 5 years when the rent goes up.
Oh, and FWIW, when has AWS raised a price? My servers have gotten cheaper every year I've had them with AWS (3).
Regarding OpenStack and something similar, there is the underlying workflow service (not quite the same I know) called Mistral. A project called StackStorm that is related but adds event handling and a rule engine launched a couple weeks ago.
This looks quite interesting, and a lot more fun to work with than maintaining a pool of servers ready to handle events and spinning up new ones based on capacity.
Anyone know of any similar mechanisms for the OpenStack world, or more generally for any cloud infrastructure other than AWS?
Similar, yes. We're excited that developers are seeing the power of the task/worker/lambda being the scalable unit of work rather than server or VM or even container.
Haven't used Lambda much yet but off hand we see many IronWorker advantages:
- Supports all languages
- Easily scale to thousands of parallel workers
- Integrated logging, error reporting, alerts, auto-retries on failure
- Integration with IronMQ our queue service to achieve stream processing and pipelining
- A great dashboard to manage your workers/tasks
And many innovations/announcements coming soon that'll make using IronWorker a great choice for developers both on AWS and inside the enterprise firewall.
Yes, seems like a direct competitor to IronWorker to me.
IronWorker has a few benefits such as being able to run any docker container and react to a HTTP Post (webhook) directly, but overall very similar to Lambda.
You can also configure your IronWorker more so than you can a lambda. For instance, I have an IronWorker that requires a library not installed by default, so I use the "deb" command to install it during the build process.
And of course, it supports more than just Node.js.
It sounds like this is essentially what Joyent's Manta is, which we've been using in production for the last year and have found to be absolutely fantastic. Are there differences that I'm not seeing?
How is it different from Google App Engine? Conceptually the two seem very similar to me, that is, developers do not have to worry about the underling infrastructure at all---just write code and deploy.
As a tldr, not-quite-right distinction, GAE is more of a platform whereas Lambda is more of a service.
I feel that GAE is a much more generalized solution for building/deploying applications, e.g. it's very integrated with BigTable and has caching/cron/queuing support. To a point you do not have to worry about the infrastructure, but write any GAE app at scale and soon you'll run into questions of caching some content and optimizing DB writes.
Whereas Lambda, at least right now, is a more specific use case: respond to an event by running some code. It's too early to tell whether managing Lambda functions will require fine-tuning the "advanced settings" (in all honesty it probably will) but it seems like it's much simpler.
Haven't had much time to read the docs. Sorry if it's already evident, but does it allow for running Lambda code on cron as opposed to listening to some event?
AWS Lambda functions can be triggered by external event timers, so functions can be run during regularly scheduled maintenance times or non-peak hours. For example, you can trigger an AWS Lambda function to perform nightly archive cleanups during non-busy hours.
From that example it looks like it doesn't support NPM directly, instead you download the dependencies locally and include the node_modules directory in the zip file you upload. Interestingly the example makes use of ImageMagick, which I don't believe is in the default AMI.
havent looked too much into it but I dont see why it couldnt. use s3 website, event notification to notify a user accessed a file (web page) and generates a new custom file in your bucket. generated file has some sort of preshared known info between the two in the file name (eg client ip address or cookie). accessed web page either sends back a page reload to the new generated page or just an async page that will render the custom content.
edit: oh also delete the generated file once accessed using lambda by either delete on get notification or just a lambda scheduled event.
Using Lambda for the slaves is probably not the best idea because, as @saynay mentioned, you couldn't run builds any longer than a few minutes.
You could "wrap" a Jenkins fleet running on EC2 with Lambda functions to set up on a build request event and tear down on a build finish event, and theoretically this would reduce the amount of resources you consume if your builds are few and far between. Think of GitHub webhooks triggering Travis CI as an analogy.
Whether or not that would actually be useful is debatable.
It looks like Lambda functions are limited in complexity and runtime duration.
The way I see it is pairing it with the container service (ECS) and spawning containers in response to events, possibly spawning EC2 instances if you need more computer.
It looks like Lambdas can also be set to execute at a scheduled time, so you could make a delayed Lambda that would shut down any EC2 instances you spawned as well.
Huh? Shared server infrastructure? That's really what this sounds like. Welcome to web hosting in 1999 guys. Most of the point of AWS was that you have your own dedicated resources. Sure, this is a scaling solution, but revolutionary?
Ain't a 'server' boss. They're running a segment of code. When someone hits your PHP page on an oversubscribed DreamHost box with 4000 other sites contending for resources, it's pretty much the same thing. Here we're getting some dedicated CPU/Mem, but it ain't a 'server.' Where's the dedicated disk and ability to install/run whatever I want? Nope, just a lil segment of JS running on a shared resource and billed in 100ms increments.
What about dependencies. What if you need a specific environment setup first in order to process. Would you end up paying 1 minute for each request just so that it can start installing bunch of stuff? Is it possible to just setup a VM of some sort and use that environment each time?
If thats possible Lambda would be like PiCloud but without Python, and will stick around (hopefully).
I also put together the Netflix use cases in the keynote so if you have any questions I’ll try to answer them!