> To overcome the timeout limitation, I suggested using POST requests (with URL as data) to send jobs to an instance, and use multiple instances in parallel instead of using one instance serially. Because each instance in Cloud Run would only be scraping one page, it would never time out, process all pages in parallel (scale), and also be highly optimized because Cloud Run usage is accurate to milliseconds.
> If you look closely, the flow is missing few important pieces.
> Exponential Recursion without Break: The instances wouldn’t know when to break, as there was no break statement.
> The POST requests could be of the same URLs. If there’s a back link to the previous page, the Cloud Run service will be stuck in infinite recursion, but what’s worst is, that this recursion is multiplying exponentially (our max instances were set to 1000!)
Did you not consider how to stop this blowing up before implementing? Having one cloud function trigger another like this with no way to control how many functions are running at the same time with no simple and quickly met termination condition (with uncapped billing) is playing with fire. It's not going to be optimal either if most of the time each function is waiting for the URL data to download.
You need to be using something like a work queue, or just keep life simple and keep it on a single server if you can.
We've all had a program crash from a stack overflow. The problem seems to be that instead of the "serverless panacea" they were promised, the code they built can now only run on one of many Google servers, none of which are theirs. No way to kick the tires at all.
It honestly reminds me of debugging a Jenkins pipeline. Something that was designed to be super generic of a runtime but yet the tooling can inexplicably only live on computers that are not your local development machine, and all of it is maximally painful to stub or test or debug to seduce you into "just running it live".
It's like the opposite of the "small agile team" thing they were talking about. If your program requires 7 API keys and some cloud environment to do a test run, I want no part of it.
> We've all had a program crash from a stack overflow.
Launching a cloud function that recursively triggers the same cloud function, that doesn't have a simple safeguard for it looping or blowing up, and where billing scales with the number of cloud functions ticks the "very high risk" and "very high impact" boxes for me. A program running on a single server isn't similar here (you could accidentally create a DoS attack though).
Typical cloud function use is some event gets triggered like a user sign up, the function executes, then it halts. The above isn't a standard use case and is so incredibly risky this approach shouldn't be attempted in my opinion.
> To overcome the timeout limitation, I suggested using POST requests (with URL as data) to send jobs to an instance, and use multiple instances in parallel instead of using one instance serially. Because each instance in Cloud Run would only be scraping one page, it would never time out, process all pages in parallel (scale), and also be highly optimized because Cloud Run usage is accurate to milliseconds.
> If you look closely, the flow is missing few important pieces.
> Exponential Recursion without Break: The instances wouldn’t know when to break, as there was no break statement.
> The POST requests could be of the same URLs. If there’s a back link to the previous page, the Cloud Run service will be stuck in infinite recursion, but what’s worst is, that this recursion is multiplying exponentially (our max instances were set to 1000!)
Did you not consider how to stop this blowing up before implementing? Having one cloud function trigger another like this with no way to control how many functions are running at the same time with no simple and quickly met termination condition (with uncapped billing) is playing with fire. It's not going to be optimal either if most of the time each function is waiting for the URL data to download.
You need to be using something like a work queue, or just keep life simple and keep it on a single server if you can.