Hacker News new | past | comments | ask | show | jobs | submit login
Durable Objects in Production (linc.sh)
134 points by geelen on Nov 13, 2020 | hide | past | favorite | 56 comments



> I've come away from this experience with a fairly firm belief that the "Stateful Worker" model is genuinely inspired—I can now start to see how they could model virtually every data problem we face, and replacing almost every piece of infrastructure we currently use. That's potentially revolutionary, but only time and experience will tell whether it genuinely outperforms existing alternatives. But for our first foray, this was an unbridled success.


What a great write up! Looks very promising. Right now, I'm wondering how everything will be priced, especially, the compute-time. Because if the price is low, you would be able to replace a lot of complicated servers with easier and more scalable workers.

According to [0], they will - obviously - charge for both compute-time and storage operations, and with the price of storage operations expected to be around Workers KV. Assuming that compute-time is charged at minimum at-or-higher than Workers Unbound, using workers for chat-rooms and other WebSocket stuff would be unfeasible. Workers Unbound costs $12.50 per MM-GB-sec - given that the server is 128 MB (the current fixed memory size) - the price per-worker-second would be at least $0.0000016 per connected-worker. It could get expensive fast.

[0] https://news.ycombinator.com/item?id=24616775


We're definitely going to figure out some sort of pricing for WebSockets that doesn't charge full price for idle time, but we haven't nailed it down yet.


It could get expensive fast.

Also with paid Workers and no stop loss what happens if someone decide to ddos your app? I searched for this and come away with no clear answers.

Also wondering where will Worker KV fit in? From what I gather Durable Objects are strictly superior if pricing will be comparable.


Workers KV is still better in many use cases. Durable Objects are the right choice when you need strong consistency. KV is the right choice when you want world-wide low latency access to the same data. Note that these two advantages are fundamentally opposed; it is physically impossible to simultaneously have strong consistency and worldwide low-latency access to a single piece of data. So, this will always be a trade-off.

Note that you could build KV on top of Durable Objects, by implementing your own caching and replication in application logic running in Durable Objects. On the other hand, you can't implement Durable Objects on top of KV; once you've lost strong consistency, it's hard (impossible?) to get it back. So in that sense, Durable Objects are "strictly superior". But in a practical sense, you probably don't really want to do the work to implement your own KV store on top of Durable Objects; it's probably better to just use KV.


What's the size limit of a single 'Durable Object' ?


There's no hard limit, but given that a single durable object is single-threaded, storing a huge amount of data in a single object may make it hard to access that data. Also, the system may be less likely to migrate huge objects to move them closer to their users. So, we recommend aiming for small, fine-grained objects, kilobytes to megabytes in size. But there's nothing fundamentally preventing an object from growing to multiple gigabytes.


We don’t charge for malicious traffic like DDoS. Compares favorably to other cloud providers who do.


It is reassuring to hear that, could this feature in the docs somewhere? For example when I'm researching this issue this is what comes up as the first result.

https://community.cloudflare.com/t/how-to-protect-cloudflare...

Workers Free Tier fail modes are straightforward and preferable for some of the scenarios I would use them for, but KV is only enabled by using Bundled.


You don't charge for worker invocations on a L7 DDoS? How do you determine which requests to charge for and which to not charge for?

Or is the claim your DDoS protection is good and accurate enough that there are 0 worker invocations to charge for because they all get blocked?


We aim to block attack traffic. If we fail to block an attack and you get charged for it, file a support request to ask for a credit.


From what I understand after going through the docs, KV is currently the only way to store data in Workers, and Durable Objects are going to be the new alternative.

KV is eventually consistent, appropriate for low-value data that is read a lot , written infrequently... Durable Objects will provide consistency, at the cost of not having the very low latency of KV because it has to run in a single location instead of on the edges. So there seems to be room for both solutions.

About DDOS, I believe Cloudfare is a leader in ddos-protection, so I would hope they include protection in all their Workers (pls someone correct me if I'm wrong).


I read your write up, but I’m not entirely clear on one part: how do cloudflare workers handle websocket connections? Are they automatically terminated after the worker spends too much time active? If so, doesn’t handling the WS handshake have a lot more overhead than a fetch call? Maybe I’m misunderstanding something here. One of my biggest issues with serverless is its inability to handle websockets in a sane way, so this would be huge.


My understanding is a Durable Object is similar to a long running JS app in a respawnable/relocatable container (imagine a Kubernetes deployment of 1 pod). There is always exactly one instance, somewhere. And because the instance is long running, WebSockets become more practical.

Note that because Durable Objects are long running computations, they are stateful and deployments are disruptive (clients disconnected). So even though you could potentially put them in the "serverless" category, the deployment experience isn't quite the same as short-lived serverless functions / lambdas.

The real novelty of Durable Objects appears to be their intended usage of fine granularity (and the underlying tech that enables this). For example, if you were building a chat room service, you could have 1 Durable Object per room. Of course, you could conceivably build a chat room service running 1 Node.js process per chat room on traditional VMs or containers, but that probably wouldn't scale well.


Thanks, you explained it much better than I could.


I was struggling to wrap my head around Durable Objects, this was very useful.


Hey, I'm the PM at Cloudflare for WebSockets on Workers. Support is still in beta, so we're still working through the timeout details here.

With the Workers Bundled plan (https://developers.cloudflare.com/workers/platform/pricing#b...), your WebSocket connection will stay open until your 50ms of CPU time expires. On Unbound (https://blog.cloudflare.com/introducing-workers-unbound/), which does not have a CPU time limit, your WebSocket connection will stay alive as long as it remains active and your Worker doesn't exceed its memory limits. If the connection goes idle, it may be terminated. We're currently considering an idle timeout on the order of 1-10 minutes.


I'm having some trouble understanding the pricing of Workers Unbound. How much would it cost to keep one websocket connection open for a month?


We haven't published pricing for WebSockets yet. Obviously directly applying the Workers Unbound duration-based pricing wouldn't work very well; we'll figure out something better.


WebSockets are a feature of Durable Objects.

Workers handle WebSockets in a pretty straightforward way. The server-side API is literally the same WebSocket JavaScript API as in browsers. sock.addEventListener("message", callback), etc.

But if you aren't using Durable Objects, then WebSockets on Workers aren't particularly useful, because there's no way to contact the specific Worker instance that is handling a particular client's WebSocket session, in order to send a message down.

Durable Objects fixes exactly that. Now you can have worker instances that are named, so you can call back to them.

Here's a complete demo that uses WebSockets to implement chat: https://github.com/cloudflare/workers-chat-demo/blob/main/ch...

> Are they automatically terminated after the worker spends too much time active?

We're still tweaking timeouts and limits, but in general you should be able to keep a WebSocket alive long-term, and reconnect any time it drops (which any WebSocket application has to do anyway, the internet being unreliable).

> If so, doesn’t handling the WS handshake have a lot more overhead than a fetch call?

Not sure what you mean here. A WS handshake in itself isn't terribly expensive. I suppose a typical application will have to establish some state after the socket opens, and that could be expensive, but it depends on the app.


I've reached out via the Beta form because I'm so damn excited for that. I have longed tried to map a project of mine (serverless game server for TGC based on workers) onto the available storage tech (KV), but consistency guarantees just weren't where I needed them to be. Hope they'll find a slot soon.

Edit:

(If someone from CF has questions, you can reach me at hn@elasticwaffle.com)

My use case is pretty much exactly the one described in the blog post (over on CF) for multiplayer games. Using DO for ongoing 1v1 matches in a trading-card-game style turn-by-turn strategy game. Players would play against a simple AI (which itself is a worker) or other players. The game is served from Workers Static Sites. But multiplayer required stronger coordination and the delay from KV writes to reach all PoPs is a deal-breaker. DO solve not only that, but the interface (just being classes) allows the game design to melt with the backend design.


Beta invites are definitely flowing now, but I'd recommend pinging someone on Twitter or elsewhere with a quick description of what you're going to build. I think they had so many signups they're worried about opening the floodgates, but on a case-by-case basis there's plenty of room. Maybe?


We really do prefer people fill out the form rather than try to ping us on Twitter. :) https://www.cloudflare.com/cloudflare-workers-durable-object...

We are taking things slowly. Storing data is a big responsibility. We'll need to test with a small number of users for an extended period (a few months) to work out any reliability issues before we really scale up.


I’d love to hear more about your use case and the consistency limitations you’re running into!


Feel free to reach out (email in parent post).


I'm on the Cloudflare Workers team and while I can't help you get access any quicker than the usual route through the form, I'd still love to hear about your specific usecase(s) and limitations. Is it okay if I email you?


Does there happen to be a local simulator one could use to build out an app?


I don't know about a full-blown local sim, but perhaps https://blog.cloudflare.com/trailblazing-a-development-envir... is close enough to what you have in mind?

Also perhaps try playing around in https://cloudflareworkers.com/

I don't believe Workers _Durable Objects_ specifically is available to play around with outside of the beta, however - apologies if that's what you actually wanted to play with. We're working hard on building and perfecting it into a product that everyone will be able to use soon, so keep an eye on this space.

Is there a specific usecase you have in mind? I'd love to hear about it!


I had a little e-vite app I was trying to build with the KV store, but the consistency model wasn’t a great fit. I’ve sent an email in for the Durable Objects beta, but it’d be nice to have access to a simulator of some sort just so I can see if it’s a better fit.


Let's talk consistency models and more! Is there a good email I could reach you at - or alternatively, could you send me an email with your requirements (haneef@)?


There’s one in my profile.


I started building one of these as a weekend project on top of SQLite for funsies but I abandoned it during the election chaos. Oh well.


Yeah, go ahead.


Sent!


This is a great write up, thanks for sharing it! I’ve built out a similar websockets on AWS setup at work to what they show at the end, and it’s definitely not as nice as this looks to work with. That was really just an MVP to push live results, I’d like to extend it to do logs/progress eventually but I’ll definitely evaluate Durable Objects before adding anything to that solution because this looks way better and cleaner.

Is there any sort of run it locally for testing story yet? In theory paid localstack supports websocket API gateways on the AWS side, though I haven’t played with that yet either so not sure how good it is. Looking at the API being used and the fact it’s all dynamic JS land, it looks like maybe you could inject some implementations in to run the websockets and store some state locally?


@glen - hey great piece!

Wondering if you've also tried the newer real-time sync services like replicache.dev or roomservice.dev for this usecase? since it seems to do the same thing except client-side. also on server-side curious if you've evaluated Temporal.io


I have not! It was never such a pressing issue that I investiaged a dedicated solution. But given that we're already using Cloudflare, and that this is a solution to a much bigger problem (global coordination + actual storage), but it works really nicely for our smaller usage, it felt like a nice way to dip a toe into this new model.


it would be so great to see support for other languages like Java, Python, Go etc...


Along with various present day languages, Cobol is also supported [1]

[1] https://blog.cloudflare.com/cloudflare-workers-now-support-c...


It does support other language, kinda. https://github.com/cloudflare/python-worker-hello-world It just compiles to javascript


Workers actually can run WebAssembly [1], so any language which can target WASM works. I think the rust toolchain is best supported, but there's a fun example with Haskell [2].

[1] - https://blog.cloudflare.com/webassembly-on-cloudflare-worker...

[2] - https://blog.cloudflare.com/cloudflare-worker-with-webassemb...


I tried getting Workers to work with a Haskell Servant application compiled with Asterius, based on [1], but kept running into various errors. [2] Example of the perils of trying to use experimental toolchains, I guess.

[1] https://www.tweag.io/blog/2020-10-09-asterius-cloudflare-wor...

[2] For the curious: the problem that eventually defeated me was the Worker responding with 404 to a route on /ip, defined in Servant. I tried defining a route on / with Servant and using that instead, but got 406 responses when doing that. First time I've ever seen that error code.


[2] Should be fairly easy to debug, as you can see what routes have actually been bound in the Cloudflare dashboard.


Are you referring to the routes set in wrangler.toml? I tried defining the /ip route there, still got 404.


Once your build code runs, wrangler takes any changes you made and writes them using the Cloudflare API. Once that's done, it should be reflected inside the dashboard on dash.cloudflare.com, and you could see what was actually set up.


The "Python" support really does feel like too much of a hack to do anything real with



Are there any large companies that use cloudflare workers in production?


The list on their website is: 23andMe, Broadcom, Codepen, Discord, Doordash, Glossier, Marketo, Maxmind, npm, and ProPublica. That said it's common that companies won't want to be featured publicly in that way, so the real list is longer.

Obviously Cloudflare is also a big user of Workers in production as well.


I run an agency and we’ve implemented Cloudflare Workers in production for very large companies, including doing the implementation for two of the big companies listed in the sibling comments on this thread.

It works very well and their CLI tool Wrangler is easy to integrate into CI/CD. We’ll probably use it for more. Happy to answer questions people have: matt@happycog.com


Yes: https://www.cloudflare.com/case-studies/?usecase=Deploy+cust...

From our Q3 earnings call:

"Turning to Cloudflare Workers, it's incredibly exciting to see how the platform is taking off. In Q3, more than 27,000 developers wrote and deployed their first Cloudflare Workers. That's up from 15,000 a year ago. History proves with new computing platforms, the more developers they have, the more quickly they improved and the more likely they are to win. Looking at GitHub and other sources of data on developer engagement, we believe more developers right deploy real applications and code on Cloudflare Workers every month than every other edge computing platforms combined. So what are they building?

- One of the most viewed publications during the 2020 elections used Cloudflare Workers to power their elections news platform and ensure it scaled during the unprecedented spike in traffic last Tuesday as well as Wednesday and today.

- A popular health foods company uses Workers to power their online ordering system.

- An online marketing firm working with major brands uses Workers to customize content on a per visitor basis.

- A publicly traded electronics testing firm use Workers to bridge their on-premise and cloud-based infrastructure.

- An innovative start-up is using Workers to power an online crypto scavenger hunt.

- And one of the largest online learning platforms uses Workers to deliver their customized content during this time of skyrocketing demand.

It's great to see more use cases every quarter, but I think we're just scratching the service. Most use cases today have focused on performance. Over time, I expect those use cases will pale in comparison to what is a much bigger opportunity, helping customers manage the challenges of compliance. As governments around the world increasingly insist on data localization and data regency, sending all your users' data back to AWS feeds for processing will become unacceptable. What our largest, most sophisticated, most compliant sensitive customers are looking to Workers for is as a way to manage this increasingly complex regulatory environment. That's why during Cloudflare's Birthday Week, our announcement of Durable Objects may have been one of the most important edge computing developments you may have missed. Durable Objects allows developers to define a data structure and store it safely on our network close to users that need to access it in order to ensure performance and consistency. It also allows developers to define where that data can move across our network and where it cannot, such as this user's data may never leave the EU or this user's data may never leave Brazil.

Given Cloudflare's network spans more than 200 cities in more than 100 countries worldwide, Durable Objects provides fine-grained control over where data is stored and processed. That functionality is critical for the increasingly complex compliance challenges that face every global company today. In other words, the future of edge computing will be defined as much by intelligent edge storage as it is by computing. And while others are still working to launch for edge computing platforms, we have products like Durable Object in market that are defining that future today."


We run censorship resistant proxies on Workers neatly domain fronted (well, IP fronted) by Cloudflare IPs [0]. It works so well.

> Given Cloudflare's network spans more than 200 cities in more than 100 countries worldwide...

I think, only the enterprise customers can truly claim benefit of all 200 PoPs. Free/Pro/Business(?) plans aren't necessarily routed to all 200. If that's not the case, then it doesn't match our interaction with Cloudflare's support.

---

That said, I absolutely love Workers. It is quite easily the best value for money of any edge computing platform. This blog post drives those arguments home: https://medium.com/@zackbloom/serverless-pricing-and-costs-a...

[0] On the flip side, because of CNAME Flattening, it is hard to block privacy eroding solutions such as these: https://www.cloudflare.com/apps/google-analytics


PS. Live Demo link at the bottom of page is broken.


So is it CA or CP? No, just C.


Being "partition tolerant" just means the other aspect holds: CP means "consistent in case of a partition", while AP means "available in case of a partition".

A CP system cannot be available in both partitions, since by definition the two partitions cannot communicate (otherwise they wouldn't be partitions) and thus it's logically impossible for two clients in two different partitions to affect a shared state consistently. Thus at least in one partition the service will be unavailable.

Forcing all state changes go through a single "master^Wmain" node (for some key/shared) is a simple way to achieve CP.

This main node can change over time, but in case if a partition it can never move to side of the partition which has no quorum for a master election.


CP. Sacrifices some Availability in the form of latency if accessed from outside local region.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: