Note: If I come across as a hater, in my opinion, I am not. I'm a fan, a fan who is frustrated!
We moved to Cloudflare from a larger CDN and used workers to duplicate the functionality otherwise not provided by Cloudflare. The initial results were great. As our workers became more complicated, especially when interacting with CloudFlares internal stack, the cracks started to show. Specifically, when you run into a problem, the documentation being too light to resolve corner case issues yourself, and support draws a big zero when it comes to Workers. Even as an Enterprise customer, it can take months to get a clear answer/resolution, and that's if you nag them.
As a platform, workers have great promise, and Cloudflare is making strides. As with most things CloudFlare, they could spend more time improving documentation, and exposing details on their internal stacks so you can self-service and resolve your issues. What I find is that Cloudflare is on to the next thing before fully completing, polishing, documenting, the previous three things they started.
For Workers, I'd like to see a trace tool built for Workers where you can debug what happened in a Live environment. Currently, they recommend outputting debug as JSON in a header.
The linked article is worthless. See the blog post, as this poster mentioned:
> For Workers, I'd like to see a trace tool built for Workers where you can debug what happened in a Live environment. Currently, they recommend outputting debug as JSON in a header.
That is a general problem I have had with Cloudflare over the years. It's hard to know exactly what happened when things fail or even the fact that things failed!
They always branded themselves as that super scalable and stable CDN or DDoS shield that will keep your service up when it would normally break down. And those absolutely horrible "You <-> Cloudflare OK, host down" error pages that suggest it's the host that has the problem and at the same time advertise Cloudflare to your visitors. Well turns out the vast majority of issues that clients reported (automated or not) were caused by Cloudflare themselves. And what really makes it painful is that it's 1. sometimes hard to debug 2. they wont make this visible to you unless I guess you pay for the highest Enterprise plan which offers raw logs. They just don't tell you even how many requests failed. There's no simple error.log. An example: at some point Cloudflare started blocking a lot of requests from Turkey to an API. The only way to solve it was to whitelist Turkey as a country.
We've moved off of Cloudflare and had a lot fewer issues.
PS: every feature since the simple CDN stuff seems to be now about lock-in. It's all Cloudflare specific. This is a dangerous path that moves away from the commoditization of "hosting" where people can easy move from one hoster to another. I hope it does not succeed. For the sake of the open web.
For CDN we just put servers in different geological locations, some using cloud service and some dedicated. In fact the system itself was made as distributed wherever possible.
For the anti DDoS: it was not a big/common issue and in the rare cases we can work with providers to mitigate.
We started with GeoDNS (e.g. AWS Route53) and later on added Anycast into the mix. The combination of these two is very powerful but I wouldn't recommend Anycast for projects without good network staff.
For most, I think GeoDNS is good enough and very simple. The number of users using nameservers that break it (like Cloudflare's 1.1.1.1)* is still small.
*: The reason why it breaks is because they don't send along the EDNS client subnet to the upstream nameserver for "privacy reasons". I disagree with that notion because 1. it's just the subnet 2. the website/service will see the full client IP anyways when it gets the http request. I mean how many times does one resolve a hostname and then not connect to it? And we are talking about the authorative nameserver, not the ISP nameserver.
I also noticed random increase in latency when you put a rest api server behind cloudflare. Since then, I don't put api servers behind cloudflare anymore and limit cloudflare use on traditional websites only. I suspects cloudflare might flag api clients as bots (especially those with non-browser user agents) and do additional check on those requests, thus increasing the latency.
I don't think we do anything that would increase latency for API clients specifically.
But if your API is not implemented on the edge (e.g. in a Worker), and the responses are not cacheable at the edge, and you aren't using Argo, then there's not much else Cloudflare can do to make the request faster. Cloudflare is just proxying requests at that point, and the presence of a proxy will indeed probably add a few milliseconds of latency vs. the client talking directly to the origin. (Of course, you do get other benefits like DDoS protection.)
Is it possible that the timeouts was due to cloudflare blocking some user agent, maybe the browser integrity check/bad browser detection? When the api is used in a web browser, there was no reported issue with latency/timeouts from users and logging. the latency is not much different than without proxying on my end. but when used in an android and ios app with custom user agent I did see occasional large latency and the some users reported random timeouts. Unfortunately I didn't test updating the app to use chrome user agent and simply turned off proxying for that api server (and other api servers that used by non-browser clients). It was happening last year though, I haven't try it again yet.
Ah sorry, I didn't realize you meant that things were outright timing out, rather than just being slower than when not proxied. That seems like something went wrong. I don't think we have anti-bot measures that would cause API clients to time out. If anything they would get an error back or maybe get served a captcha or JavaScript challenge.
I'm a product manager working on Workers — didn't come across hater-y to me at all.
We will actually be rolling out a new version of our docs soon — hopefully make them easier to navigate, and expose deeper information. If there are any specific subjects or corner cases you've run into that you'd like to see better documented — our docs are open source (so issues are welcome!). We definitely want to make it as easy as possible for you to self-service.
Observability and debugging is also a heavy area of investment for us (and I think we've made significant progress here since the launch of Workers). As Kenton mentioned below, wrangler tail is a great way to get a lot of information out of the Worker (and its subrequests) — you can basically log anything. We'd also like to make it easier to log to existing logging platforms from a Worker. If there's any missing information you'd like to see, please do sent me a note ^
Will there ever be a way to put workers behind the cache (eg maybe the worker can set in the response “yes, please cache this”, the way you already can for the origin, except have it then serve that response rather than run the worker)? Or will workers have an API to interact with firewall rules?
Mainly I’m thinking that a major use case for using CF is stuff like DDoS protection, but workers have a per request cost and according to [1] CF’s DDoS protection won’t automatically protect against a layer 7 attack. It seems the options are to either use CF’s rate limiting product (which charges per good request 10x the price of a worker request. Maybe that’s worth it, but it sure makes workers look less affordable than at first), or set firewall rules to block the traffic. For that second option, it would be great if it could be done from inside the worker (although I’m not sure yet how the worker would actually go about detecting that a request aught to be blocked. [1] suggested running fail2ban on the origin and using the firewall API to dynamically update rules — I guess I’m windering if there is a simpler way). Rate limiting seems ideal, I’m just not sold on paying 10x when I don’t know if I’ll even ever need it: maybe simply being able to set optional limits for worker requests after which further requests don’t get serviced, rather than getting billed for an attack? Is that possible? I guess I could count requests somehow myself. and disable the worker. At least then I can make a call between whether uptime is worth the cost of rate limiting or if I can just accept some downtime.
This is the same problem GitLab suffer from. I don’t know if it’s necessarily a bad thing: perhaps their success is because of the focus on breadth of features rather than the quality of each individual feature.
I don’t have an answer to that but it’s worth considering if the cost of using the value created by CloudFlare _is_ the lack of polish when features are launched.
GitLab is a good example of this phenomenon because it can be compared to GitHub: GitHub is high quality slowly whereas GitLab is lower quality faster... maybe CloudFlare is the GitLab to Akamai’s GitHub, both with their valuable place in the market.
It's a hard problem to solve: getting to some useful amount of "done" where it solves actual problems for customers is a long way from delivering a truly polished, quality product.
There's diminishing returns from 80% to 100% done, plus there's the human condition--it's a lot more fun to solve the hard problems than do the polishing at the end. I think that teams can get too enamored of the features they create, it's hard to get perspective when you've been staring at something for a while.
I've often wondered if the answer is to have multiple teams (pods) so that one team can get a feature to MVP, then hand it over. You're then not invested in "your" product, there's space to be objective and revisit the path from MVP onward.
What I'm looking for is a full stack trace of a request, and how it progressed through their platform, through pops, Argo, Caches, etc. Akamai and others have tools like this, and it makes it much easier to debug gremlins on the inside.
From the docs:
"If your Workers application is returning an 1101 error code, your code is throwing an exception. You can catch the exception in your code using a try/catch block:"
That's great, but you end up just returning those details in a header or sending them somewhere else. We have to catch errors and send them to our backend for logging so we can debug them at a later point.
This doesn't help you debug things like sub-requests from the Worker or how they interact with other Cloudflare features like Cache.
We're working on this tracing tool as we move things like Page Rules and Rate Limiting into the same engine that runs Firewall Rules. Lots of advantages of using the same Rust-based engine/syntax to let customers express their configuration rules, e.g., adding rule-based alerting and combining different types of filters and actions.
Focus so far in the design process has been how requests get evaluated against a list of Custom and Managed Rules but if you imagine a world in which everything is a rule (incl. routing to/firing a Worker) it becomes tractable to do what you're asking about.
We'll be in discussions with the Workers team about how to best illustrate the handoff to Workers and what happens downstream: cache, origin fetch, etc.
Kenton, not exactly topic-related, by I am deeply grateful for what you did in Sandstorm. Cloudflare or not, you a great guy! And I am sure you still have a bunch of magic tricks in your pocket.
It's like buying a car. Sure once in a while you're stuck dealing with the dealership waiting on a part (and criticism is warranted) but that's hardly a reason all on its own to stick to walking everywhere.
Let's be real, if they're using it it's because it's cheaper than doing it themselves. If you were to offer them millions of dollars I'm sure they'll use a custom system.
Nice to see more and more pushed to the edge, and good that it's only the beginning of the week.
I'm a little surprised they stuck with the isolates model when moving to include generalized compute. We found isolates to be ideal for a cache layer, but it doesn't have first class support for many languages: https://community.cloudflare.com/t/native-golang-support-for...
Very interested to see if they expand on their storage options throughout the week. Additional consistency guarantees in their KV offering would make it more competitive with Dynamo. I'd love to see a managed relational offering like Cockroach, or perhaps the addition of CockroachCloud to their bandwidth alliance.
One point I'm disappointed about is that they list API Gateway and DNS Queries (per MM requests) at $0.
This may be true if you're deploying on a single domain with Cloudflare, but if you're a SaaS and need SSL for custom domains, Cloudflare bumps you into their enterprise offering that costs a minimum of thousands per month, and (we found) netted out to about $5-10 per month per hostname:
https://www.cloudflare.com/ssl-for-saas-providers/
(I recognize this is a different way of metering, but listing both at $0 feels like they're arguing DNS & TLS termination cost nothing at Cloudflare, and that was far from our experience)
The private beta sign up form is here for ones interested in Workers Unbound: https://www.cloudflare.com/workers-unbound-beta/ (disable adblockers to see the form). One key thing that sticks out is Cloudflare would charge for outgoing bandwidth at $0.09 per GB, a stark departure from unlimited bandwidth stance they're known for, especially wrt Workers Bundled [0].
A few questions, if anyone from Cloudflare is here:
0. Can we expect in Workers Unbound support for Websockets, WebRTC, HTTP Connect, and Server Sent Events?
1. Would there be support for protocols other than HTTP? Say, a raw TCP / UDP connection that is handed off by Spectrum to a Worker [1] (Spectrum here is redundant?)?
2. Inability to rate limit Workers in a cost effective way gives me sleepless nights [2]. The current rate limiting plans are too expensive when compared to Workers, almost 10x if expected good requests are to the tune of 10 million [3] (in my case good requests have potential to be much much higher). Any announcements due in this space?
3. Will the same isolate (Unbounded Worker), if kept running for long, be able to handle multiple connections/requests concurrently, or would each HTTP connection create a new isolate (that is, a new Unbounded Worker instance)?
> Can we expect in Workers Unbound support for Websockets, WebRTC, HTTP Connect, and Server Sent Events?
Definitely yes on WebSocket. In fact, the only reason we haven't rolled out WebSocket support already is because it's not very useful without long-running CPU. Workers Unbound fixes that so I'll be dusting off my old WebSocket implementation PR soon.
I think Server Sent Events are just streaming HTTP? That should work today (but with the same caveat that CPU limits may cause trouble).
WebRTC is a peer-to-peer (browser-to-browser) protocol. We don't currently have plans to implement it directly in Workers, though that's an interesting idea.
By "HTTP connect", I assume you mean the CONNECT HTTP method, usually used to tunnel TLS connections through forward proxies? That's a request I haven't heard before, but it's interesting. I guess what you'd get, basically, is the ability to establish TCP-like connections that are addressed to HTTP URLs and connect over port 80/443? Is this commonly used for things other than forward proxies?
> Would there be support for protocols other than HTTP? Say, a raw TCP / UDP connection that is handed off by Spectrum to a Worker [1]?
Not part of the current plan but definitely something we'd like to do eventually.
> Inability to rate limit Workers in a cost effective way gives me sleepless nights
Yeah, our rate limiting product is not really intended to be used to rate-limit the whole site to control costs. It is more intended to, say, rate-limit your login form to prevent brute-force password guessing.
That said, we do have layer 7 DDoS protections, and those run in front of your worker. If an attack gets past them and hits your worker, I'd encourage you to raise a support ticket asking for compensation for the attack traffic that we failed to block. I am not personally in a position to promise that we'd refund you but I believe it has happened in the past.
> Will the same isolate (Unbounded Worker), if kept running for long, be able to handle multiple connections/requests concurrently,
JavaScript is inherently single-threaded, so an isolate can only be executing code on behalf of one request at a time. That said, it is already the case that multiple concurrent requests may be handled by the same isolate (one request may be executing while another is e.g. waiting for a response from a remote server). But today you can't really take advantage of that, because you have no control over exactly which isolate receives any particular request, no any way to communicate between neighboring isolates. This is definitely something we're working on but nothing to announce at this time.
Thanks a lot Kenton. Appreciate the reply and excited to see what the Workers team build next.
> WebRTC is a peer-to-peer (browser-to-browser) protocol. We don't currently have plans to implement it directly in Workers, though that's an interesting idea.
The reason I asked is, since overlay distributed networks are being built on top of WebRTC (like u/feross' https://WebTorrent.io), Workers Unbound, if it supported WebRTC, could potentially then be a participant in such distributed networks. May be, I can seed torrents through Workers Unbound or build a P2P CDN on top of it which is at least as fast as usual CDNs when there are not very many connectable peers nearby. Or, have Workers Unbound participate as a host in a P2P database / filesystem like GunDB [0], for example.
> Is this commonly used for things other than forward proxies?
Yep, pretty much the use-case I was going for with Workers Unbound, tunneling traffic through HTTP to bypass firewalls and navigate NATs. A TURN over HTTPS of sorts, like Tailscale's DERP [1] but with servers worldwide. Will this be supported out-of-the-box?
At present we don't have any specific plans around WebRTC or HTTP CONNECT, but I'll keep those in mind. It might be easy to support CONNECT if/when we support raw TCP; it'd just be another entry point to the same subsystem. WebRTC would take a lot more effort.
If you can manage to tunnel over WebSocket, though, that'll be supported a lot sooner.
> JavaScript is inherently single-threaded, so an isolate can only be executing code on behalf of one request at a time. That said, it is already the case that multiple concurrent requests may be handled by the same isolate (one request may be executing while another is e.g. waiting for a response from a remote server). But today you can't really take advantage of that, because you have no control over exactly which isolate receives any particular request, no any way to communicate between neighboring isolates. This is definitely something we're working on but nothing to announce at this time.
Is this generally seen as ok? I'm planning on building similar things for sending page metrics in parallel to serving traffic.
> > Inability to rate limit Workers in a cost effective way gives me sleepless nights
Agreed, there is something very challenging with the current design of how requests flow through cloudflare.
I have played around with dynamic rules in firewall by manually handling "firewall rules" in a worker as part of a request flow and using KV to share this state. I would then update the cloudflare firewall with api. I had to create a separate service to poll both API and KV to maintain them though.
Can you drop me an email with some more details about what you're trying to do with Rate Limiting? I want to understand your use case/challenge a bit better: pat at cloudflare dot com.
We use a variation of this technique, and it works.
> I have played around with dynamic rules in firewall by manually handling "firewall rules" in a worker as part of a request flow and using KV to share this state.
You and I have pretty much the same problem then. Beware though of race conditions.
Also, a still cheaper way to keep a counter would be to use CountMinSketch and related probabilistic datastructures, but as before, data-race needs to be handled.
> Is this generally seen as ok? I'm planning on building similar things for sending page metrics in parallel to serving traffic.
It's OK, but you won't see a ton of benefit unless you have a very large amount of traffic -- enough that you're commonly seeing single isolates handling concurrent requests.
We're working on something that should be a lot better for this.
> It's OK, but you won't see a ton of benefit unless you have a very large amount of traffic
Our traffic patterns are such that per active-user we would add 100K requests per month when we go live. We send metrics out batched every 10s or so, but stumbled upon this post that warns of internal limit on total number of requests originating from Workers per second set at 2000, account-wide [0].
Is there a way to discuss our usecase and get whitelisted for limits as our traffic isn't malicious, hopefully without having to subscribe to the enterprise plan? I raised a support-ticket but a bot replied that since we are "free customers" it hasn't be been looked at.
> limit on total number of requests originating from Workers per second set at 2000, account-wide [0].
Oh, there is no such limit.
The post you link to is quoting another poster who is quoting something they heard from someone else. It sounds like something that may have originated with someone hearing about a particular implementation detail of our anti-abuse systems but the game of telephone has transformed it into something that's just not true at all.
As described in my comment on that thread, we do have certain anti-abuse heuristics which might result in a 1015 error. It's very unusual for people to hit these in legitimate use. Have you seen such an error? 100k requests per user per month doesn't sound like it would come anywhere near being an issue (even with 10M users).
FWIW the heuristics are designed such that simply adding more users (who behave the same as existing users) shouldn't ever cause a problem.
> 100k requests per user per month doesn't sound like it would come anywhere near being an issue (even with 10M users).
Nice. Thanks! That's re-assuring.
> It sounds like something that may have originated with someone hearing about a particular implementation detail of our anti-abuse systems but the game of telephone has transformed it into something that's just not true at all.
A Cloudflare employee engaged in this thread [0] confirmed existence of this internal limit: "...basically 2k subrequests _per minute_ per colo / zone / IP. Once you hit that you start serving 429s.", which another customer hit with ~5M requests per day (per a screenshot shared later in the thread) and had to get whitelisted for.
I've run into 3 big limitations when using Workers:
1) The CPU limit.
2) Not being able to use Node modules.
3) Not being able to add domains other than adding those to our CF account. SSL for SaaS solves this but, so far, it's only available to enterprise customers.
The CPU limit was not so bad when considering Workers were not intended as a general use-case for serverless functions. Now that this has been lifted I feel the Workers environment limitations are going to be much more important.
You can't, for example, use many Node packages in Workers since these are running in a custom V8 environment which resembles browser service workers. I say "resembles" because the API is not 100% identical. For example, you can use streams, but you don't get the full API.
A different runtime from nodejs is more a feature than an issue. The worker runtime is much closer to deno than to nodejs from a philosophy standpoint and i expect a lot of synergies between cf workers and deno in the future. the cf workers being as much compatible with the service worker api as possible makes them the only faas solution close to a web standard, so vendor lockin will only be an issue until compatible offerings appear. A killer argument for this approach is also that you can be sure you can run the cf worker code also in a browsers service worker (with minimal modification). This allows more flexible architectures and also more unified development between backend and frontend.
Also regarding the domains: cloudflare routes the requests to the edge centers based on dns anycast, so cloudflare workers cannot really be seperated from the dns functionality of cloudflare, see it as a faas offering inside your worldwide distributed dns. If that is not what you want, it is still possible to use cf workers with worker domains and use your own domain and dns setup
ahhh, sorry i did not know "SSL for SaaS" was a product name for a specific feature. I totally agree, this would have been really nice to be able to offer my clients, but is currently not really accessible. There are a few things like this such as custom error page etc. Lets hope more enterprise offerings will be more accessible for smaller projects in the future...
true, cloudflare communicates that extremly well in talks they give about it and online presentations, but the documentation is often not as easy to read
This is a major dealbreaker that it needs to be called out earlier.
Not only is this an interface you adhere to to use their service (incoming event, outgoing response), that type of coupling is fair and expected, but the underlying runtime is custom without full support for standard APIs?! What's the upgrade path? Can they port things over in a reasonable amount of time? Is it well documented what's supported and what isn't? Imagine developing against this and resolving these issues as they come up. Oof.
One alternative that's available right now is Fly.io — Fly actually runs Docker containers at the edge, so you get a lot of the benefits including autoscaling up and down, your application following the sun across the world, etc.
I'm using them for a couple of small production projects and wrote up a gRPC example, quite happy with the service so far.
"At the edge" is going to mean different things on different clouds. Cloudflare has hundreds of points of presence. What's more edgy about fly.io that you don't get from Cloudflare, App Engine, or AWS Lambda?
We're more like Fargate than Lambda, but Fly apps scale and balance across regions. I wouldn't really call it "more edgy", it's just built specifically for running app servers close to users (vs emulating a traditional single region datacenter).
We actually started with a JS runtime like CloudFlare has, but it was wrong for our customers. They're better off just running Docker images.
Sorry if this is somewhat unrelated to the main thread, but considering that someone working on Fly.io is here, I wanted to ask some questions that I couldn't find the answers to in your documentation.
1. What's the average cold start time when spinning up a new instance? For comparison, AWS Fargate often has cold starts of around 1 minute.
2. Is there a limit on the container size created by the Dockerfile?
3. Is there a limit on the number of concurrent instances a single user can run?
1. Cold start times are wildly variable. They can be <1s for optimized, fast booting docker images. We actually mask cold start time as much as possible, though, so you'll almost never see one.
2. There's no hard limit. We've had 8GB images running (that I know of). Cold starts can be pretty bad on those just because pulling down an 8gb filesystem is slow.
3. Also no hard limit. I'm curious what you're doing though! Feel free to email me: kurt at fly.io
I'm using Go, so the default no-config container is about 140MB. But it's possible to use a multi step build to bring that down to about 10MB (build it in a full container and copy into a very lightweight scratch container). That's in the range of Lambda or CloudFlare worker sizes, so startup speeds shouldn't be an issue.
Yeah, Fly isn't "edge" as defined in the US, but sort of is as defined in India.
CloudFlare, CloudFront etc tend to have one or two edge PoPs in India, and one in Singapore, and Fly has one (upcoming) in India, which is coincidentally in the city where I live. So for me, Fly and CloudFlare and CloudFront have servers a few blocks away.
Same case for Singapore — everyone has an edge there.
But of course for the rest of the world it depends. The full blown CDNs will try to put edges in every city, but Fly and AWS Outposts will likely be few per continent / one per timezone / country.
Yeah, I actually moved a project from Workers to Fly. It's more expensive to run but it's still very affordable and you solve all the limitations workers have.
Nice to see more and more pushed to the edge, and good that it's only the beginning of the week.
I'm a little surprised they stuck with the isolates model when moving to include generalized compute. We found isolates to be ideal for a cache layer, but it doesn't have first class support for many languages: https://community.cloudflare.com/t/native-golang-support-for....
Very interested to see if they expand on their storage options throughout the week. Additional consistency guarantees in their KV offering would make it more competitive with Dynamo. I'd love to see a managed relational offering like Cockroach, or perhaps the addition of CockroachCloud to their bandwidth alliance.
One point I'm disappointed about is that they list API Gateway and DNS Queries (per MM requests) at $0.
This may be true if you're deploying on a single domain with Cloudflare, but if you're a SaaS and need SSL for custom domains, Cloudflare bumps you into their enterprise offering that costs a minimum of thousands per month, and (we found) netted out to about $5-10 per month per hostname: https://www.cloudflare.com/ssl-for-saas-providers/
(I recognize this is a different way of metering, but listing both at $0 feels like they're arguing DNS & TLS termination cost nothing at Cloudflare, and that was far from our experience)
We looked into this and spoke to their sales team about it. We could bring our own certs for a single domain, but Cloudflare wouldn't accept traffic from our customer domains unless we purchased SSL for SaaS.
Bandwidth is one of the things that gets significantly cheaper at scale. For most users, if they call a wholesale bandwidth provider the rates they get will be more than what a public cloud provider will offer. So it looks like a good deal. That's much more true than with something like hardware/compute where the differences between what you can buy a server for and what a big cloud provider can is less extreme. That's not to say Amazon or Google can't get better deals on servers than someone who just phones up Dell, but it's nothing like the savings you can get at scale on bandwidth. What's been interesting is how little competition there has been between the different public clouds on bandwidth pricing. That's starting to change and I'm proud of how we've pushed that at Cloudflare with things like the Bandwidth Alliance (https://www.cloudflare.com/bandwidth-alliance/). Oracle's cloud is also pushing on this (see Zoom's comments on why they chose the Oracle cloud). If you saw the margins that AWS gets on bandwidth you'd throw up in your mouth.
I find this comment rather confusing under an announcement for Workers Unbound, which charges $0.09 per GB. That's AWS/GCP/Azure-level pricing, and definitely does not push competition on that front.
> If you saw the margins that AWS gets on bandwidth you'd throw up in your mouth.
Presumably the thinking behind this approach is that it allows them to advertise cheap instance costs and storage costs, while still making bank, assuming the customer's traffic volumes are non-negotiable?
Also has the effect of lowering prices for casual dabblers, who might want to play about with what's on offer without moving much traffic.
If you're selling hosting, you can either sell CPU at close to cost and make money on bandwidth, or make your margins on CPU. If people are building things that don't stress the CPUs at all, you can load up servers and make ~$60-70/mo margins on CPUs.
When they expect dedicated CPUs and memory, they're suddenly going to price shop across cloud providers and won't pay enough to make those same margins.
>>> Whether you think that regulations that appear to require local data storage and processing are a good idea or not — and I personally think they are bad policies that will stifle innovation — my sense is the momentum behind them is significant enough that they are, at this point, likely inevitable
Agree it's stealth protectionism and security overreach. But do data sovereignty laws really form the basis of an inevitable future federated internet? Trade agreements like CPPTPA and USAMCA seem to prohibit. Or restrict sensitive finance or health data. And most users of consumer internet apps like TikTok aren't too concerned. They just want fresh memes ;)
Just signed up to Cloudflare Unbounded private beta (feel free to fast track). And thanks for all your hard work. $NET has been a stand out performer!
The Internet was always supposed to be federated/decentralized. The idea of megacorps centralizing all of your data isn't the Internet. And the fact that we live in the real world and the Internets servers exist in countries subject to various laws, there was no way a global Internet was going to ever be without any sort of borders.
The idea that the Internet is universally global and where your data is doesn't matter is idealistic fiction, not reality.
Your user will almost always benefit from the data being stored in the same country as them. Using an app across an ocean tends to suck. And the user gets to benefit from their country's consumer protection laws.
Which is to say, the only parties that benefit from being able to store your data in other countries is often global monopolies shifting your data to where they have the least legal oversight.
If you can legislate data location then it's better to just legislate access and privacy rights to that data instead of the location.
Forcing data to be stored within borders just makes it harder, more expensive, and more inefficient to serve more customers for no real security benefit. And a company with illegitimate operations isn't going to be following these guidelines anyway.
What do you think about 5G vendors being able to also provide edge compute, which is more "edgy" than any CDN right now ?(they are literally going to be on every single block or street)
While 5G receivers will be on every block, nobody is going to be deploying custom code on the receivers. It's about a 10 ms round trip from there to the metro hub, where custom code will be running. (So roughly equivalent to CDN for wired/wifi connections).
> nobody is going to be deploying custom code on the receivers
Why do you say so? There's nothing technically stopping a teleco to add extra nodes there, use 10% for themselves and rent out the remaining 90% resources.
Technically, maybe, but telco's are not interested in that business. They know nothing about hosting user code, have no way to support a gazillion instances of random customers running random code on a gazillion nodes around the world in heat, rain, ice, whatever. How would you stand a chance doing customer support on that?
There's also security; they can't just store/process customer certificates on minimally secured boxes on street corners. Just about any useful code requires some kind of cert or secret or connection string somewhere. Cloud providers require all sorts of physical security restrictions, guards, cameras, sharks with lasers and such in order to host these things.
The way it's working now is telcos are outsourcing it to the cloud providers, and cloud providers are sticking racks in the carriers' metro hubs (with shark/laser requirements). These racks show up like mini regions in cloud providers' portals, so users can deploy VM's there just like any other region. Much lower barrier to entry that way.
Maybe someday you'll get sub-microsecond latency on custom apps from some box hanging on your street, but that's not coming anytime soon AFAIK.
> They know nothing about hosting user code, have no way to support a gazillion instances of random customers running random code on a gazillion nodes around the world in heat, rain, ice, whatever. How would you stand a chance doing customer support on that?
They are already doing that for their own code, no reason any CRUD app would behave much differently than teleco code. In fact teleco code has much higher SLO than regular web services.
> There's also security; they can't just store/process customer certificates on minimally secured boxes on street corners. Just about any useful code requires some kind of cert or secret or connection string somewhere.
Already solved problem by using SDS for mutual TLS.
> The way it's working now is telcos are outsourcing it to the cloud providers, and cloud providers are sticking racks in the carriers' metro hubs (with shark/laser requirements). These racks show up like mini regions in cloud providers' portals, so users can deploy VM's there just like any other region.
Do you have any source for this? Which cloud is providing such services and with which teleco?
I work in the field, but the info is all out there. Yeah AWS has partnerships with Verizon and a couple of internationals. Microsoft has AT&T. I don't think the deals are exclusive, just first attempts to get something going.
Interestingly when I took my job I also thought we were going to be deploying to every intersection, and was a bit disappointed when I found that the reality was much less interesting.
And just so you know, no the security angle is huge. Something like SDS may be fine for some purposes, but if there is anything valuable resident in memory or running over PCIE or whatever else, side channel attacks are real and people will steal them. Look up some hacks people do just to cheat at Xbox, it's insane. If you can't prove you're secure against that, you'll never get big money customers on your platform. Currently that still means sharks with laser beams.
What is your use case? In my experience it's hard to use them properly. I failed to use modern GPU's (V100's) effectively for even large computing tasks. I found good compiler producing AVX2 on many cores to be easier and cheaper.
We're using GPUs already on AWS for inference. The big problem is we have a worker pool of GPU servers and a task queue, leading to some latency if we don't have enough servers running and by the time the EC2 auto scaling group adds one the queue could be drained. Also we are seeing extra latency from requests originating from Asia since we currently have our servers located in a US region.
So basically all the classic benefits of serverless and edge computing would apply, the difference being we would need some more drivers, python libraries and a GPU attached.
I see CPU mentioned a bit, but nothing about memory. It seems like 128 MB isn't enough to run Nuxt.js (Vue server-side rendering), but it would be awesome to be able to run that (or Next.js for React) at the edge. I haven't actually tried yet on Workers, but they seem to recommend 512 MB memory for App Engine for example.
Are Cloudflare's machine sizes unsuitable for being able to offer more memory as an option?
Any rough guesses on when that could happen? Weeks, months, or next year? Depending on the plans, I'd consider waiting for a Workers-based solution for an upcoming project instead of working on a Cloud Run based deployment.
Will you sign up for the beta at https://www.cloudflare.com/workers-unbound-beta/ linking to this post? Me or my team will follow-up directly to understand your use case and see if it's a good fit.
It’s a configuration change that I believe we can support on a per-user basis today. Encouraged Workers team to follow up here and see if we can get you setup on beta.
This is a genuine question, why run a database bound application at the edge? Why not just cache HTML on the edge when you can and run the application traditionally?
The data transfer pricing is a big letdown considering Cloudflare has always has free transfer.
When transfer is so expensive, it makes it harder to switch from other clouds that have more than just compute capacity, especially since you might be paying double to transfer between cloud->CF->user
Cloudflare should focus on getting their existing services working. I tried their 1.1.1.1 Warp VPN on Android as a paying subscriber and it just broke my internet connection most of the time.
Hey rosywoozlechan I am the Product Manager for this app, can you file an in app bug with this name so we can look in to your issue? (look for little bug icon on main screen to file issue)
Does anybody know whether this will remove the current maximum runtime limitation? As far as I can tell, workers have a maximum duration of 10ms on the free plan and 50ms on the paid one.
I would assume that it does, considering that it's being compared to AWS Lambda, which has a timeout of 15 minutes.
LuaJIT support, please. Other than that some tighter consistency guarantees around KV would be nice. I’m PoCing a small game server on workers and the KV is working great for this so far (turn by turn, but still).
Does this also remove the 30 scripts per account limit? I looked into using Cloudflare workers to power a cloud runtime for WASM (think thousands of dynamically generated WASM binaries serving traffic for different sub-domains), but the script limit and CPU limits were a major blocker. I've had great experiences with Cloudflare and would love to use workers if it opens up these limits.
The 30-script limit exists because, currently, we copy every script to every server on our edge upfront, hence the storage is rather expensive and we don't want people storing a bunch of scripts they aren't actually using. We'll likely improve this in the future by allowing you to have more scripts as long as you're OK with some of them being fetched from a central location on-demand (which might make them a little slower to start up). I don't have a timeline for when we'll implement this, but it is a common request.
At one point, Cloudflare suggested they might open source some of their v8 work; does anyone know if that still might happen? (Even just the bindings for WebCrypto would be interesting.)
I'd love to open source our V8 API binding layer, which I think is a lot easier to use than the other solutions out there. But, to be honest, the main blocker has been finding time to work on it -- there's just way too much going on.
Our WebCrypto implementation just a thin wrapper around BoringSSL, FWIW.
On the topic of open source, I'd say about 10% of the coding we do for the Workers Runtime actually lands in Cap'n Proto / KJ, which has been open source all along: https://github.com/capnproto/capnproto
I learned about Cloudflare Workers early this year and they definitely seem simpler than AWS Lambda + Gateway combo to deploy and rollout.
Unless one is invested in AWS for other needs or need one of the language which Workers doesn’t support (Java?, Go?), it might be a fair option to consider.
Part of their defense against cpu side channel attacks was the narrow execution limits; did they decide those are unnecessary? (That was also how they prevented v8 from allocating tons of memory.)
> Part of their defense against cpu side channel attacks was the narrow execution limits;
Some people have claimed narrow execution limits provide a defense, but we on the Workers team don't believe that's true and I don't think we've made that claim. Specifically, it's easy for an attacker to store state between requests and so continue an attack across many requests. In our current implementation, you can store that state in global variables, but even if we wiped the worker's state after every request, it would still be easy for an attacker to store their state remotely.
We'll be posting more about Spectre later this week.
> That was also how they prevented v8 from allocating tons of memory.
No, we explicitly limit V8 memory usage independent of CPU time.
> No, we explicitly limit V8 memory usage independent of CPU time.
Ok! I hadn't just gotten that from one your talks on Workers: you had said v8 doesn't really offer a useful hook to limit memory usage as it aborted the process rather than just stopping the isolate, and that the CPU resource limit was thereby an important part of the memory limitation (as with the exception of typed arrays, using memory in JavaScript requires code to actively set memory). If you did change this, it makes me all the more interested in the changes you are making in v8 ;P (though it is also possible these changes were actually just part of upstream and I failed to notice).
They should work on reliability before new niche features. I work with another developer on his clients sites and he’s a Cloudflare fan. Almost every complaint and failure we’ve had reported has come from Cloudflare errors. Not issues with our stuff, issues with Cloudflare. I use cloudfront with a different set of clients with semi-high visability sites and have never had an issue stemming from it.
> Isolates are far more lightweight than containers, a central tenet of most other serverless providers’ architecture. Containers effectively run a virtual machine, and there’s a lot of overhead associated with them.
Kill me, now. Containers are in no respect virtual machines. Cloudflare Workers, by contrast, literally are virtual machines: your code runs on V8.
An isolate is a v8 "virtual machine" running in-process with many other such machines, virtualization is provided by a userspace program
A process is a "virtual machine" running on-host with many other such machines, virtualization is provided by the kernel
A host is a "virtual machine" running on metal with many other such machines, virtualization is provided by the CPU
Metal is a "virtual machine" running on microcode with up to 3 sets of sibling architectural state (did any vendor ever implement 8-way SMT?), virtualization is provided by the silicon
It's turtles all the way down
Even better if the isolate is itself running an emulator, in which case it's also turtles all the way up
Yes, 8-way SMT is available in scale-up POWER 9. But it just merges pairs of 4-way SMT cores instead of adding more threads to each core. The overall number of threads is not affected.
- These days, it has become common to use the term "Virtual Machine" specifically to refer to hardware virtualization. The old meaning of the term, referring to language runtimes, is (sadly) now so uncommon as to be outright confusing to many people.
- Most other Serverless environments technically do wrap their containers in virtual machines (like Firecracker), since plain containers aren't considered sufficiently secure on their own.
Isn't this statement incorrect?
"Isolates are far more lightweight than containers ... Containers effectively run a virtual machine, and there’s a lot of overhead associated with them"
Everything I have read suggests containers (being based on CGroups) add virtually no overhead at all (in the vicinity of 1% at worse from memory).
Containers are very lightweight compared to a vm, but each FAAS container is required to run its own instance of v8. This implies some start up time (tens of ms?) and some memory use (10's of mb?).
My understanding of isolates is they are more or less a container, but at the v8 level instead of OS. This makes spinning up a function more akin to opening a new tab vs opening a new browser instance.
It looks Cloudflare Workers and AWS Lambda are just re-implementations of Google App Engine. It is a sad that Google half gave up App Engine, by switching to a bad pricing scheme and not putting enough energy in developing GAE and further innovating in this area.
Has the status changed on time synchronization on endpoints? Using workers to deliver time over http would work really well if the servers they're hosted on are guaranteed to be through standard NTP.
Workers don't expose accurate time afaik to mitigate Meltdown/Spectre [0], but wouldn't their free-to-use NTP server work for your usecase? https://www.cloudflare.com/time/
Workers can read the clock, but the clock doesn't advance while the worker is executing, which prevents it from using the clock to measure its own execution time. For a service that just returns the current time, that shouldn't make a difference.
Using an ntp server voids the point of using workers over http to browsers. Workers are at the endpoint in the world closest to the user, which is less latency, so you can guarantee time better.
No, the blog posts talks about using V8 isolates (V8 is a JavaScript engine). What CF does is support Rust via Web Assembly, and I think Go WASM might be supported. But they're pretty clear that they're not doing a process model, you'll have to submit code that runs inside V8.
The premise of how this service works so well is due to a feature JS has which Go doesn't, and which Go has no interest in adding, so you will likely be waiting forever.
The linked Techcrunch article is pretty light on actual content. The actual Cloudflare blogpost describes what this product is: "We are extending our CPU limits to allow customers to bring all of their workloads onto Workers, no matter how intensive."
I know that Cloudflare people comment here. So can I go on a tangent and ask what’s holding up registering new domains through Cloudflare (posting on the Cloudflare blogs or community forums doesn’t get a response)?
Cloudflare registrar was announced (and released) nearly two years ago, allowing people to transfer their domains to Cloudflare. It was declared then that registering new domains would be coming soon. But here we are nearly two years later and there’s no sign that this is coming this year either. Meanwhile, the company even went public through an IPO. What’s the holdup and why the radio silence? I’d actually prefer a blog post about this on the official site than a “what should we tell and what shouldn’t we tell” response here.
We’re thinking about this. What we don’t want are domain speculators buying domains cheap and then doing nothing with them. That’s all cost and hassle for us. We want to let you register domains that you’re serious about doing something with. We’re running a bunch of experiments to see if we can allow new registrations of domains people are serious about without opening the domain speculation floodgates.
I, on the other hand, want to transfer more domains I've already done something serious with to CloudFlare Registrar, but can't because the TLDs (.club, .dev, .pw, and .sh in my case) aren't supported.
Cloudflare is committed to supporting all available TLDs, with a focus on expanding country-code TLDs, and are working to expand this list. Check back soon for updates.
bet Afilias
black Afilias
blue Afilias
green Afilias
io NIC.io
kim Afilias
lgbt Afilias
mobi Dotmobi (Afilias subsidiary)
pet Afilias
pink Afilias
pro Afilias
promo Get (Afilias subsidiary)
red Afilias
so really just one new TLD (.io) from a new registry, plus an Afilias (best known for .info) bundle. Still just a tiny number of registries, and almost no addition of ccTLDs despite the stated focus.
I wonder if you would be kind enough to shed some light on this situation?
GP here. This is a late reply to your reply, and am not sure if you'd see this. I get that you have an issue with domain speculators and squatters, but what I don't understand is how that's being prevented now. They could still buy domains on another site and transfer them over to take advantage of your lower prices and squat forever. So if you're not taking any precautions during domain transfer to prevent those people from transferring over (and becoming all cost and hassle for you), does it matter for new registrations? On the other hand, if you're taking some precautions to avoid such squatters in the domain transfer process, why not do the same for new registrations as well?
I feel as if I'm totally oblivious to what differentiates offering domain transfers from new domain purchases, considering that millions (or more) domains are already registered and squatted on now. A 60-day delay in transferring new domains isn't going to deter a squatter from buying new domains elsewhere and transferring it over to Cloudflare.
Could you please clarify if you see this comment? Thanks.
I'm sorry but I don't buy it. Domain speculators who buy thousands and thousands of domains already get very cheap bulk pricing. Thinking that you would contribute to that problem sounds ridicolous. Also there's plenty of simple ways to prevent this if you wanted to. Like limiting the amount of domains per account, credit card etc. Put it in your TOS and kick people out who do it anyways. Come on...
You may not buy it, but that's the rationale. I'm the one vetoing us turning it on so you're literally talking to the decision maker. We've thought about everything you've suggested and it all feels artificially constrained (e.g., what if you have 11 domains that you really care about?). We're running a bunch of different experiments for different options. We'll open up domain registration at some point, but only when we're confident it lives up to our original promise and still accomplishes what's right for our business. We haven't figured that out yet.
We could, but we promised we’d never charge more than the wholesale price. One experiment we’re playing with is bundling a highly discounted Cloudflare plan with new registrations. So you get registration at cost and a paid Cloudflare services at a significant discount.
Dual pricing for new vs transfers is industry standard. I'm happy to pay a one-off $1 fee for new registrations (and additional years/renewals are cost price).
I don't even think you've misled anyone by doing this, since it's essentially a new product (new domain registrations).
I'm aware of who you are but I have no qualms saying that I can't take this as a valid reason. And the example of 11 domains... do you really think this would have any noticeable negative effect on the domain speculation problem? Like I said, they already get those bulk prices. And 11 domains is a tiny number. The contrains are artifical yes, of course, what else could they be? There is no natural limitation here. I guess you ment arbitrary? It will always be arbitrary. Your suggestion to bundle it with a general Cloudflare plan is arbitrary just as well. There are many many ways you can try to lessen the problem. And I get it, bundling it with a budget plan makes sense though I'm not sure if one could 100% claim then that the domain registration is at cost-only. Then it's the same as any webhoster who bundles "free" domains into the mix.
The point of living up to the original promise: the original promise was to offer domain registrations at low prices. You are afraid to not live up to it? Well you don't because you don't allow registrations. You still have a landing page that advertises the service. You have call-to-actions called "register now" and the page [1] tells about new registrations and renewals. Then after clicking on it you land on a domain transfer page that doesn't really tell you that that's the only thing that works right now. So by being afraid to not live up to the promise you actually don't live up to it.
im not really registering domains that i necessarily /know/ how i plan to use them yet, but more as vanity names i like and may want to use in the future--basically i tend to register the domain and then use it for some side project sometime after, since that lets me frontload some of the DNS configuration in advance. im using Cloudflare for the DNS part already for sure, so it's convenient to have both that and the registrar in one place.
totally get that something that enables my silly personal domain purchases more smoothly could be abused by squatters and that their use patterns could create unwanted support/engineering load you wouldn't want to deal with though, i think that's reasonable. the one thing i'd wish for is maybe being able to transfer a domain i've recently purchased elsewhere more quickly--as-is i think there's a waiting period before you can move a domain onto the Cloudflare registrar
I'd think you could just charge a couple of bucks more than the competition(which is justified for users through nice integration into your other systems), but which would be off-putting to speculators, who often operate using bulk purchases. I'm sure you've thought about it a bit more than me though!
Wouldn't some basic rate limiting for new registrations do the trick? I rarely would register more than a domain or two a month for an actual use case. Maybe some logic to not ding you for also buying example.com, example.net, and example.org at the same time, but locking you to only a couple strings of "example" at a time?
On the same note, domain billing on cloudflare is broken. Transferring 10 domains to them at the same time (all in the same cart) created 10 separate transactions on my credit card, one after the other. This got my CC flagged and frozen and I had to call in and confirm I authorized the transaction.
I can only imagine transferring in more domains, or when they start to renew. All these individual charges will freeze any linked CC and the renewals for subsequent domains will fail. So there is a chance you might lose your domain if you experience this and don't check your account.
There are many people who have experienced the same and written on their cloudflare community forum, some posts dating back years, and still nothing has been done about it.
Just letting HN know about the billing side of things when using cloudflare as a domain registrar.
We moved to Cloudflare from a larger CDN and used workers to duplicate the functionality otherwise not provided by Cloudflare. The initial results were great. As our workers became more complicated, especially when interacting with CloudFlares internal stack, the cracks started to show. Specifically, when you run into a problem, the documentation being too light to resolve corner case issues yourself, and support draws a big zero when it comes to Workers. Even as an Enterprise customer, it can take months to get a clear answer/resolution, and that's if you nag them.
As a platform, workers have great promise, and Cloudflare is making strides. As with most things CloudFlare, they could spend more time improving documentation, and exposing details on their internal stacks so you can self-service and resolve your issues. What I find is that Cloudflare is on to the next thing before fully completing, polishing, documenting, the previous three things they started.
For Workers, I'd like to see a trace tool built for Workers where you can debug what happened in a Live environment. Currently, they recommend outputting debug as JSON in a header.
The linked article is worthless. See the blog post, as this poster mentioned:
https://news.ycombinator.com/item?id=23965764