Hacker News new | past | comments | ask | show | jobs | submit login

That does sound like quite a good approach, there are a few issues though:

1. Some vague throttling is useful, but I can only do this on an account level with Lambda rather than per function. Trying to call the API in bursts of 10000 requests may be problematic, which would limit our use of lambda for other tasks (for which we may be happy running that many in parallel). The default limit of 100 would probably serve well enough though.

2. I've now got to add splitting and recombining code on each end, with concerns around failed jobs being silently missed from the final output file. Although that extra work may leave me with a better approach to handling some failed jobs out of a large file. Hmm.

Part of the issue is that without the execution limit, this is an amazingly simple script. I've got a 10-20 line python script which does the actual processing of a file (read, hit api, store with a pool of N concurrent requests). Lambda is impressive because it adds only a small level of complexity to a problem and gives you a lot in return, just because my use-case is so simple that small amount of complexity adds up to relatively quite a lot.

Currently the setup doesn't hit the API, it just creates a dedicated instance to process a batch of data locally, but I'm hoping to simplify things to send everything through the API and just scale & load balance separately. Having code to automatically turn on & be responsible for turning off machines makes me a bit nervous :) I've already missed that I'd deleted the shutdown command in a script and left a box running for a day while developing.

Thanks for the suggestion though, I'll try and work through it in more detail today, see if I can see a clean way of dealing with the recombination. I think that's the side that I'm less clear on at the moment.




The main problem that I see in implementing what you are asking is related to the bandwith cost.

I still haven't figure out how to measure how much bandwith a trivial application is using, so my price is only based on time.

If I let you using the service for as long as you like you either need to pay A LOT or I don't allow you to use the network connection...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: