Hacker News new | past | comments | ask | show | jobs | submit login
Chalice: Python Serverless Microframework for AWS (amazon.com)
262 points by fideloper on July 11, 2016 | hide | past | favorite | 93 comments



I think I would have preferred this to be some type of plugin/add-on to Flask rather than a full replacement that now locks me to AWS and is a brand new project without any docs :(

If it was a cli with decorators or something that uses Flask I would feel more comfortable and give it a shot.


I've looked into doing this. Flask is too integrated with the WSGI protocol to be easily adapted to this model. It presupposes a number of things about the request and response protocol that are not true in Lambda. You would end up having to write an unwieldy WSGI emulation layer (what Zappa did) and still wouldn't be able to achieve 100% WSGI compatibility, breaking a subset of the plugins.

Most people love Flask because of its beautiful, minimalistic route decorators and hooks. That subset of the API can be easily replicated - Chalice is part of the way there - and I'm hoping it emerges as an API standard, separate from WSGI.


You're not wrong, but there are plenty of examples of using Flash without WSGI. I use Flask-Frozen to generate static sites from the CLI, for example.


You may be right overall, but that's a particularly bad example, as Frozen simply eliminates the "flask" portion that gets deployed to the server in lieu of static files, ergo obviating the need for WSGI altogether.

Serving a Frozen app on lambda means serving static files on lambda, which is very doable, and a very different use case than developing something like an API to be deployed to lambda.

Edit: Re-reading our comments, you may be confusing the difficulty -- it isn't in building the CLI for Flask, it's in getting Flask to be served over lambda. Presumably, the OP could easily re-write the deploy functionality to deploy a flask application, it's just that the flask application is unlikely to work because it defaults to WSGI, which lambda (apparently) does not provide.


https://github.com/Miserlou/Zappa might be more useful to you.


I definitely wouldn't build a business on this for the reasons you mention, but there are still very good use cases. E.g. let's say you have a website where you need to strip metadata from images. That's only two or three lines of code with Pillow, but it might be better opsec to do it elsewhere so that if your app somehow gets pwned you don't have the metadata flowing through your system. But if doing that involved setting up an entirely new webserver then that probably wouldn't make sense, especially since the new server would probably be subject to the same vulnerabilities as the original.


Hopefully the Flask team will whip something like that up.

I also want to see this for Django.


Ok, I have seen the phrase "serverless" a few times recently. Can someone explain to me what it is (as I am pretty sure it involves a server - it runs on AWS ffs) and why I should want to use it? And most importantly, is it web scale?


Well, if you look at how we've done things over the years:

- We used to have our own machines in our own data centers

- Then we started renting machines in data centers

- We then moved to the cloud model where we would get compute capacity on demand. But the minimum unit was an hour

- But what if you could deploy your code and you were only charged for the compute and memory you take for the fulfillment of that request? That is what the "serverless" model is. You don't constantly run a server like apache; instead, when you receive a user Request, the relevant function is called, executed and results returned to the user. You are billed for the ram ⨉ CPU.

This has many benefits:

- For low traffic sites, this has significant cost savings

- For high traffic sites, this is auto-scaling without thinking about launching machine instances

This isn't without problems, naturally. This lends itself well to certain type of problems better than others. For example, if you need lots of hot-cache data, the response times on a serverless stack would be slower.

But as you can see from the above, this is the logical direction cloud computing will evolve. Infrastructure will truly be shared and you should be able to extract efficiencies down to the minute.


It's like running CGI on a shared server, except that it scales better.


I wonder if that's really true (that it scales better than running more-or-less bog standard CGI across a fleet of thousands of servers, behind a load balancer).

If I've understood lambda correctly, rather than spinning up a process (as with CGI), it spins up an entire vm/container. I suppose it might do the fast-cgi thing - spin up a container, and keep it running while there are requests coming in, and then kill it off.

I'm sure there are other benefits of "container-on-demand" vs "process-on-demand" -- but I'm not sure "scales better" is one of them. Well, I'm sure it "scales better" in the sense of organization (human resources), but not necessarily in terms of machine resources.

At any rate, I like the analogy.


The concept goes back even further. It's really transaction processing as used in the IBM Customer Information Control System, first used in 1964. Load small program image on demand, run it, discard it. (Or, optionally, reuse it for another transaction.) This is usually tied to a database, and if the transaction fails, the database changes are rolled back. This is how IBM mainframes do transactions. 48 years later, versions of CICS are still in wide active use and are supported IBM products.


That's not really how things were.

We used to have "web hotels". Then they grew server side programming support, such as Perl and later PHP. Customers were at first not separated at all but later they were with Virtuozzo and similar systems.

This was later rebranded PaaS, to differentiate from cheap PHP hosting of yore. Some has now rebranded to serverless. It's not so much a "direction" as it is market differentiation.


then why is it something new? Isn't this what google appengine / heroku has been doing for years ?


GAE will start up a machine occasionally to deal with increased demand and keep them on for 15 minutes at least. With something like Lambda (my old favourite was PiCloud) I can scale out to 50 processes for 10 seconds and only pay for 500 seconds of processing time.

For me, most of the benefit comes from data processing, where my usage is very bursty. The classic example used on most of the tutorials is image thumbnails/scaling/similar. All you want to say is "Do X to every file in this folder of 50k images" and let someone else handle starting and stopping as many machines as they can in order to get this done as quickly as possible.


No, heroku and to a lesser extant appengine are running at the whole application level. Lambda and google's, and azure's functions are literally individual functions that are deployed separately. They can be piped together to make a whole app but don't need to be.


One comparison I've heard is: can the platform spin up a new instance on demand in under 100ms? AWS Lambda can.


probably, but when I look at examples of companies using lambda, it seems rather generic.


It means that you no longer have to deal with the server configuration - the servers are completely hidden from you. You just give your functions to AWS, and they run them in a black-magic-box of theirs. I've tested it at least to the extent of 100-200 rq/sec, and it holds up nice.

Ultimately, there is a server somewhere, as you might expect. It's just not yours to manage/worry about.


It's just marketing hype. As you noted there is no such thing as serverless. It is another architecture that forces you to use AWS Lambda and API gateway and a few other services to glue everything together. The idea being that if you are just shuttling bytes back and forth then the overhead of the traditional deployment is too much and you're better served with what looks like an event bus with some transformers sitting in the middle.


It seems like its a serverless pricing model.


With Lambda you pay per 100ms / per-1-million-requests. So, yes, it's very proprietary at the moment, but for any service that sits idle most of the time it can be very cheap.

https://aws.amazon.com/lambda/pricing/

If you were hosting something like a little message board for your friends and family, or something you are demoing to to possible employers, etc, it'd be borderline free, but available all the time.


I have a SaaS running on EC2s/Beanstalk where I have a feature that a user only runs max. a few times per day (if any). It's a small Python algorithm that finds the most optimal combination in an array of items.

For that, Lambda is great. We have lowered our price since migrating to Lambda for this relatively simple script.


It just means you don't have to worry at all about managing the servers, using thing like AWS Lambda which handle it for you. Generally these services will scale up and down for you as well.


That just worries me all the more. You basically get software engineers that wall themselves into their garden. It's like nodejs all over again...


That's been the trend in the software industry since about 2000. When I started then, the job description was just "software engineer", and you were expected to know networking, databases, low-level code, UI code, scripting, etc. Now you have "frontend software engineer", "backend software engineer", "full-stack software engineer" (which seems to mean web framework + database, which is a very different meaning from what "full stack" meant when I started, or what it means at big companies like Google), "AWS specialist", "web developer", "Android developer", "iOS developer", "SRE", "devops", "data scientist", "machine-learning researcher", "infrastructure engineer", etc.

It's actually a boon to entrepreneurial types who really do know the full stack inside and out. So many startups these days are "slap a web or mobile app in front of a manual process", so people who really do understand computation and all the things computers can do have less competition.


We're still figuring out how to separate responsibilities for building software... it's the same cycle of maturation that construction, manufacturing, and every other large domain goes through.


If you don't understand why GOOD startups do this, take a look at the "Doing things that don't scale[1]" lecture from "How to Start a Startup" series by Sam Altman at Stanford.

[1] https://clip.mn/video/yt-oQOC-qy-GDY


I think this type of thing will end up being useful for special cases, but in general we'll still want to run our own servers. Their use case of creating a public API is a great one: it's a simple interface and having Lambda handle the auto-scaling is pretty awesome.


We are using Lambda to export CloudWatch metrics to another system. Haven't exceeded the free tier of Lambda yet, and CW itself was only $5 last month.

Having a t2 up the whole time would have been way more, plus it would have been subject to our Chaos Monkey.


Wouldn't it be a lot cheaper if you have a dedicated box where you can run your own VMs and add and remove services as you please? So far, running one at base cost with leased hardware costs about 1/10th of any 'cloud' provider out there, with lots more resources to go around. Right now, if want to spin up a new instance, container or process, I can do it locally, in vagrant or on a VM on the dedicated box all with the same config and management ease of the configuration management stack (we use SaltStack). It also works just fine with stuff like GCE or AWS, but those simply are never ever the 'cheapest' option. At best, they are a in-between solution when we need more hardware. It's great for temporary scaling, but of nominal resource usage there is no point in using those services at all, at least for us. The cost of a dedicated box is static, no matter how much you use, and yet it is always lower than the mean cost or 95th percentile of any AWS/GCE kind of thing. For (converted currency from euro) about 100USD we get 16 E5 cores, 92GB RAM, 1TB SSD RAID10, 10TB HDD RAID10 and 8TB transfer on a dual 1Gbit link. For the life of me I can't find any combination on Amazon to match that.



In this case, its the FaaS version, which is really just a slight twist on traditional PaaS.

Really, the "serverless" term does more to obscure than to illuminate.


It means it runs using AWS Lambda and allows access via the API Gateway... it's literally in the second sentence with links to explain what those technologies are.

Lambda's are just units of code that run only On-Demand and without having to manage one or more EC2 instances or a container platform.


Also known as: an expensive and lazy way to make something work 'a little bit' until it gets too complex and times out.


It's an app server. It's like PHP.


That's exactly what it is, fancy CGI only one pays for each script execution instead of a monthly subscription. The "Oh, but it doesn't use HTTP but a pubsub architecture" is meaningless, that's still CGI for hipsters. And you still need to use (and pay) for API gateway on AWS in order to execute Lambda scripts from the outside.


AWS Lambda is cool and all, but aren't people doing the math on this? Lambda seems like a really expensive way to deliver almost anything. Likewise, the AWS API Gateway is expensive, but at least provides some additional capabilities. Lambda seems like its profitable niche would be very small; limited computing environment, very high cost (relative to almost every other way to host an API), and having to learn a whole bunch of new APIs and processes to make it all spin.

Am I missing something?


I've found Lambda to be really ridiculously fucking cheap. We moved image resizing onto it and saved thousands a month.

Don't compare the cost of Lambda per 100ms to the cost of a virtual machine per month since Lambda only charges as you use it. You'd have to have the CPU pegged at 100% usage to make that a fair comparison. Mind you even if you took the cost of Lambda per 100ms and multiplied that out for a monthly cost it's still about the same as an EC2 instance of the same capacity. (Eg. if I had a 1GB Lambda running for a total of 1 month compute time in the uswest2 region it'd cost $32).

Oh and the cost of API gateway is $3.50 per million requests and the bandwidth cost is completely negligible for us (we redirect to the resized image hosted on S3, fronted by cloudfront). We get 3 million image resizing requests a day. That's over 30 per second. It would take a lot of VMs to handle that (image resizing isn't trivial). Or we could just pay the whole $11.50.

I have to ask have you done the math? Lambda beats everything in cost except for some funky solutions using unreliable and less scalable lowendbox.com services.


Whenever I've looked half-seriously at any cloud offering, I've come to the conclusion that it's the data transfer/bandwidth that kills vis-a-vis dedicated services. If you don't need ~10TB/month, then sure - it doesn't really matter. If you do then you need to get a lot of reduced ops work for your effort.

I suppose that if while testing/starting out you use little bandwidth, and any additional bandwidth/users comes with income - it doesn't really matter that a chunk of that goes to AWS, and not to your business.

For those that actually do run non-trivial things on AWS (or other clouds) - do feel this is an accurate assessment? That bandwidth is still really expensive in the cloud?


I would say that it strongly depends on the type of requests you are handling.

Saying that you can run a 1GB lambda for 1 month nonstop for $32 does not impress me, and its kind of strange that it impresses you.

You could run a t2.micro with the same RAM for $6/mo if you reserve it for a year. And if you want to give that t2.micro a workload where it only has to respond to 1 request at a time, the same terms we are affording the lambda system, then it's no comparison.

Now, for responding to disparate requests every once in a while? Lambda is ideal. You can spin it up and you don't have to pay when it is isn't processing requests. But continuous requests? Hard no.

We can argue that the cost overhead at full load is worth it for the ability to "infinitely" scale with no effort. Alright, that's fine. But let's not pretend that it is "cheaper".


You can't compare it to a t2.micro because that's not what you're getting with lambda. t2.micro is a single, weak piece of hardware meant to handle 30 requests per second? I forsee issues there. For the same traffic would be called in parallel when the spikes require it, which is something you would not get without scaling the t2 micro and programming concurrency. And that is ultimately where the comparison comes in. Because hardware needs to be able to accommodate unpredictable traffic, it is running all the time, and it is beefier than it needs to be. The performance buffer to account for unpredictable traffic is completely eradicated when using FaaS.

I still use dedicated hardware where I deem it appropriate, but the proof is in the pudding. For the right services, pulling them out into lambdas is proving to be a huge savings and performance boost for countless architectures.


I don't have a strong opinion one way or the other, but I'd like to point out that micro instances also have diminished networking (besides processing/memory). Depending on what you're doing, this could be a factor.


We can afford a t2.medium or an m3.medium at the price he quoted, which will have better RAM, processing capability, and probably equivalent networking capability, so that isn't really the takeaway point.


The point is, his lambda setup has vastly better capability than your instance, should he need it. Unless you're able to bring instances up fast enough to address your need, which clearly doesn't put you in the same ballpark as him.


What makes you say Lambda is more expensive than things like EC2?

Obviously lambda is a hammer, not a swiss army knife - you shouldn't shoehorn it into your app just because it's the new tech on the block. But the use case isn't niche at all.

We're using lambda for our service to ingest and process Hearthstone replays. An HTTP POST is sent to the API gateway, this triggers a lambda which parses a 0.5-2MB file, scrapes the data in it and stores data in the db + converts it to another format. The whole process takes 3-6 seconds. Lambda lets us easily scale to dozens of requests per second without having to worry about anything other than the DB server. Our frontend/app servers don't need to be touched.

The alternative would be to provision and manage a lot of EC2 instances to handle the load. We'd have to worry about provisioning them dynamically so that it's economically viable. We'd have to manage the load balancing ourselves. Lambda's been a huge time saver.

PS: Let it be known that Lambda only supports python 2.7, not python 3, and that really frickin sucks. Get your shit together, Amazon.


"Let it be known that Lambda only supports python 2.7, not python 3, and that really frickin sucks. Get your shit together, Amazon."

Node 4 support came, what, a month ago? It's pretty ridiculous.


I specifically don't understand the Python 3 situation. They support Python 2, it can't be that complicated to have a separate Python 3 runtime.


There is no significant benefit to python 3 over python 2.


You're wrong, and it's irrelevant anyway. Our app is in Python 3. Having to support Python 2 just because lambda doesn't support 3.x is a pain.


Unicode....?

I mean, that's reason enough. Python 2's string processing suuuuuuucks.


Except that it is the way forward and many people choose to write python3 compatible code.



I've got (virtual) servers all over the place. I run my tiny little "needs a minute" tasks on one or more of those. It's nice, I guess to make it a wholly isolated thing, but I rarely actually want to go to the trouble to design it to be isolated. My little jobs often need at least some of the rest of my production environment to be available (database, logs, libs, something). Making a Lambda thing just for those things seems like something I'd rarely want to go to extra effort for.

But, if building on a very large scale with strict isolation requirements for your individual tasks it would make sense. Which is the niche I'm talking about; maybe it's just bigger than I assume it is.


Do you leave your database server publicly open (but password protected obviously), or can you use IAM policies for this as well?


A quick google search of more 3-letter-acronyms than there are in the US government yields this:

- https://docs.aws.amazon.com/lambda/latest/dg/vpc-rds.html

- https://docs.aws.amazon.com/lambda/latest/dg/access-control-...


I think the appeal of Lambda is that most services get zero-to-low usage, and so something which is "pay only for usage" may be a net win over dedicated computing resources if you use it for the right parts.

I've had a number of startup ideas that went nowhere, and it would've been handy to test & demo them without springing for the full $17/month for an EC2 t2.micro instance. Usually they have some key aspect that's prevented me from running them on Lambda (eg. you can't exactly run a hacked Node.js binary on Lambda, and sending push notifications from it requires going through SNS and adopting its API), but if I could run arbitrary code & third-party packages and pay for only the computing time I use, it's an attractive proposition.

In general you use AWS for convenience anyway - if you're concerned about minimizing costs, you can sometimes get 10x price/performance ratios by moving to dedicated hosting.


As already mentioned, $4-$5 gets you a tiny instance all month. But, for me, I just run my small jobs on my existing virtual machines, generally speaking. I have a bunch of servers, they already have my necessary libraries, files, and database information and caching and such setup. Running a cronjob, or something triggered or pulled from a queue, is trivial and practically free.

My feeling on this is that if the job is small enough and infrequent enough, I don't care about its impact on other services so I can run it anywhere (and it's effectively free in such a case), and if it's large enough and frequent enough then setting up for it in Lambda every time is wasteful and I probably want a system (even if a small one) dedicated to the job, for probably less money. There's probably a sweet spot in there somewhere...but, it's not high on my list of things to integrate into my infrastructure.


Really $17 for a micro? Baby gce machine is $4.


    google micro (1 cpu, 0.6GB ram): $0.006/hr
    amazon nano  (1 cpu, 0.5GB ram): $0.0065/hour
    
    https://cloud.google.com/compute/pricing
    https://aws.amazon.com/ec2/pricing/
    in both cases, +disk, +traffic
Not exactly a crippling difference.


Oh interesting...are nano instances new? Didn't see them last time I was looking for a "just throw up a demo" machine.


They've been around for a few months, but like all new AWS things, they're being slowly rolled out across regions, so may not be available everywhere. Flick through the 'region' dropdown in that pricing link to see if they're available where you want them.

The micro went from 0.6G to 1G ram in the 't2' generation, and the nano fills the need people have for the old micro size. I use them for bastion/nat boxes :)


Except google gives you a discount just for using the instance. Amazon requires prepayment


Swings and roundabouts. AWS gets flak for having expensive data transfer costs; Google's data transfer costs are even higher (both listed at same links above).

My point is that when you compare apples to apples, it's really not all that different.


Yup. AWS prices suck, in general. If it weren't for the large (multi-terabyte) dataset I need for my current project that they're hosting for free on aws-publicdatasets, I'd probably switch to Google Cloud. I may yet switch, which illustrates the danger of relying on proprietary APIs like Lambda.


Yes, you're missing the fact that you don't have to run servers all the time.

Unless your CPUs are constantly at 100% utilization, there is some slack in your infrastructure. Lambda you literally only pay for the compute time you actually use.

I migrated a smallish app to Lambda from EC2 and shaved my AWS bill from >$100/m to <$5/m.


AFAIK Lambda is the only way to react to events on S3 buckets. A lot of what I've seen it used for is as a mechanism for kicking off a workflow off of a new item in S3.


you can push S3 events to SQS and then consume it anywhere (eg we had file converters running on Heroku, listening to S3 events through SQS)

Likewise, Lambda can execute events from SNS, Cloudwatch Events, Dynamo and a lot more, including the Amazon workflow system.


SNS can be consumed from anywhere not just Lambda.


Question:

With these "Serverless" frameworks (Zappa, Chalice, Serverless, etc), do you have to redeploy to Lambda every time you want to test your changes during development?

Is there a way to develop locally and get quick feedback?


I am not sure about the frameworks, but a lambda function is just that, a function. You can easily run your function and pass in an event object (a dict in Python) to test various scenarios. You don't need to deploy to test. Once you are ready you can deploy to lambda without publishing to fully test before publishing live.


Thank you. Are these not meant to render user interfaces and just be for API-style computing?


I am currently using it for what you could probably call a user interface. My use case is forms stored in S3 served as an iframe which POST via JS to AWS API Gateway that calls lambda to process the form and return the result to the user.


Correct. That being said, there's nothing stopping you from emitting, say, HTML from such a function. But there's nothing really helping you either.

It's basically function calls in the cloud; a remote procedure call where the remote procedure is executing on their infrastructure.


I'm also guessing there's no session state or cookies?


We'll I guess there wouldn't be without a server. You'd have to use JWT or the like.


Or any server side store, maybe DynamoDB.


the functions are stateless, but you can access S3 and DynamoDB.


But if you want to emulate the aws gateway + lambda cycle locally it seems like you need a bit of rigging to do this easily, especially if you want to talk to it with your browser.


This is awesome - it takes care of a lot of the pain of setting up Lambda, including the packaging, the tricky IAM policies and the crazy API Gateway incantations. It also follows the Flask decorator conventions!

I could have really used this a few months ago. I ended up writing a library of my own, but I'd much rather have used this, as it's supported by the AWS Python team.

This really lowers the barrier to entry to deploying Python-based API servers!


Has anyone successfully and painlessly used Lambda+API Gateway in production as their main backend? It appears Lambda has many limitations, but I see rather mature frameworks such as Serverless (formerly JAWS) being promoted here, and now Amazon is also working on their standard framework.

It would be great to just use CDNs and this; get all benefits of using a PaaS with IaaS-level costs only!


CloudSploit did - and just wrote an article [1] about it a few days ago. It isn't for everyone but it definitely save money in the right situations.

[1] https://medium.com/@CloudSploit/we-made-the-whole-company-se...


we use it now as the main backend for Mindmup 2, migrated from Heroku.

I recently published the scripts evolved from our usage to simplify deployments as https://Github.com/claudiajs - the tool is mature, and allows teams to use API GW almost as easy as if it were a lightweight web server


So basically each end point is a function ? how do you test your whole app locally ? I mean integration tests and acceptance tests on your local machine if you depend entirely on AWS vendor lockin infrastructure


Serverless computing seems to just be reinventing hosted CGI. Is this not just a trade of Perl and php for Python and JavaScript?


I think one could engineer CGI to get many of the same benefits - I'm not entirely sure most people did. First, you'd want to run each process in a separate jail, as a separate user - and yet be able to (easily) use fifo's or pipes (possibly sockets) to pipe data through.

At that point, you're already a bit from ephemeral data in /tmp and your typical php/perl/cgi setup.

But I think you're right in that it borrows some of the good ideas from CGI (chief among them simplicity, assuming that you were doing stateless stuff).


I love the name. :) Fits right in with "flask" (http://flask.pocoo.org) and "bottle" (http://bottlepy.org/docs/dev/index.html).


Shameless plug: come join over 6200+ of us at https://flask.reddit.com discussing Python and Flask!


I think this is great! While I'm no python expert - as much as i love flask - i get annoyed at the parts needed to setup up the uWSGI stuff. It looks like using chalice, maybe i don't have to worry about setting up the annoying uWSGI stuff?


Very cool, is it still Python 2.7.x only though like AWS Lambda service & Python code?


Seems that way, until Lambda adds support for Python 3


Consider https://getsandbox.com it provides great support for viewing state and route logs. Various hosting options.


What is the recommended way of keeping the URL of the service straight? To set up a DNS pointer to the DNS name amazon generates, with a "low enough" TTL?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: