You can get most out of it by using it as a "sidecar" next to your native Rust application. That way you get access to the full power of Rust by triggering "native actions" from flawless through http. And you can still use flawless for code that you think is more important and you want to always complete and never silently drop/fail mid-execution.
This is also very useful if you have a micro-service architecture and you need to call multiple micro-services to end up in a consistent state. It's a way to reliably implement the saga pattern: https://microservices.io/patterns/data/saga.html
This is something I am very interested in using on a personal project, but it's closed source, and as far as I can see, there is no narrative or explanation
about the long terms plans, or whether it is a good idea or not to start using it for projects.
Can you talk more about your plans, and if it might be good to use as a dependency in a project?
I've been keeping an eye on flawless since stumbling across your blog posts from last year, it's great to see more durable execution options entering the market.
Given that you're compiling functions down to WebAssembly, I was curious about which runtime did you end up going with for building out the platform?
I'm also curious about whether you had plans (or any that you would be willing to share anyway) around expanding the languages supported by Flawless beyond Rust any time soon?
In this scenario, the workflow is marked as "failed" and requires manual approval to continue. Sometimes you just can't resolve the issue without human input.
Exactly once execution is one of the guarantees I want to provide. Flawless should always give you peace of mind, and in cases it can't guarantee exactly once execution it will sacrifice progress. When interacting with the real world, hardware and software can fail in creative ways and it's not always possible to automatically recover without manual intervention. Sometimes it's just not possible to solve issues without looping in a human. But flawless will not repeat a call if it can't guarantee it was not performed or if it's not idempotent.
For many features, like HTTP requests, flawless uses a dual commit system. In cases where data was sent to an external server, but no response is received (timed out), we can't know what the external system observed, and will not allow progress to continue (fail the workflow). You can relax this requirement by marking the HTTP call as idempotent.
I'm not sure exactly once can be done in the presence of failure. When you come back online, if you're not sure whether a call happened or not, you either drop it or try again - and that gives you at most or at least once.
Durable execution is meant to complement your application. You will never want to model everything with it. It solves the problem of needing to decide how often to manually make snapshots of some important state, this becomes implicit. Workflows in flawless can still fail, you could call the `panic` function, or divide by zero. In the end it's arbitrary compute.
"External" state is one of the text book examples for using durable execution. If you are interacting with 5 different services and calling 5 different API endpoints, you sometimes want to have transactional behaviour. Leave all 5 systems in a consistent state after your interaction. You can't only call 2 and stop. Durable execution and patterns like saga [1] are one of the most straight forward ways (for me) to solve this.
In flawless specifically, I try to give enough context to the user why things failed. It's very easy to reconstruct the whole computation from the log. And let the user decide if they want to re-run the workflow. If you charge someone's credit card, but the call to extend their subscription fails (service down), you can't safely just re-run this. You have two choices, either you continue progressing and roll back the charge, or you fail and have someone manually look at it. In general, you want to use flawless in scenarios where the "called exactly once" guarantee is important. If you can just throw away the state and it's safe to re-run from the start, then you don't need flawless for this part of the app. The less state you have to care about, the better.
EDIT: The alternative would be to manually construct a state machine with a database. "Check if the credit card was charged. Call Stripe. I finished charging the credit card, save this information. Call the subscription service, it failed, restart everything. Check if the credit card was charged ...". And depending on your workflow, this can be a very complicated process where 90% of your code is just dealing with possible failures. Especially if failures happen on the edge of some calls it can become very tricky.
I feel like this approach might still pose some challenges or issues with regards to time or stale data. A couple of problematic scenarios:
- Application requests a JWT token. It then crashes and gets restarted. It gets past the problematic point, but later when trying to make a request, it crashes due to the cached token being expired.
- Application interacts with the current time in a meaningful manner. Due to the log replay, it will always live in the past and when switching from the cache-sourced time to the current time, some issues might occur, like deltas being larger than expected
- Application goes through a webshop checkout flow. After restart, some of the items in its cart have been already sold, but the app doesn't check this, since it already went through a (cached) check and got the result
Funnily enough, this is actually a massive problem when working with cloud automation APIs. Terraform and the like kinda handle this problem by calculating / storing the “goal state” and then looking at the system’s current state, and coming up with a “plan” to reconcile it.
Unfortunately, cloud provider APIs are usually eventually consistent, and getting a full snapshot at scale is nigh impossible.
So, in order to work around this, I effectively built a write ahead log style atop Postgres. Something like Sagas would have been great, but as far as I can tell, there was no real pattern for multiple Sagas operating on global state doing coordination. This is where Postgres SSI came in handy, where I could read the assumed state of the system, and if another worker came in and manipulated it, the write ahead entry wouldn’t get written as the txn would fail to commit. The write ahead entry would then get asynchronously processed by another worker, in case the first worker failed.
I have spent a lot of time thinking about this and believe that the most straight forward solution for long running (or even forever running) workflows is to allow hot upgrades.
A hot upgrade would only succeed if you can exactly replay the existing side effect log history with the new code. Basically you do a catchup with the new code and just keep running once you catch up. If the new code diverges, the hot upgrade would fail and revert to the old one. In this case, a human would need to intervene and check what went wrong.
There are other approaches, but I feel like this is the simples one to understand and use in practice. During development you can already test if your code diverges, using existing logs.
Thank you! I hand coded it with just HTML, CSS and JavaScript and put a lot of effort and love into it. The code is not the prettiest, but the implementation is straight forward if anyone wants to check it out https://flawless.dev/js/how-does-it-work-animation.js
// Update the `keyFrame`.
keyFrame += 0.5;
// This is just a huge state machine progression.
switch (keyFrame) {
// In the first 4 seconds we just run through the existing log messages.
case 0.5:
case 1:
case 1.5:
...
By using WebAssembly, it's kinda by default fool proof. WebAssemby explicitly requires you to declare host calls inside your modules. If you try to use another host call that is not provided by flawless, your module can't be instantiated.
It's also important to notice that there are multiple standardisation efforts going on in the WebAssembly space. For example if you are using the rust `rand` crate and compiling to WebAssembly, it's using the WASI host functions for generating random numbers. While I'm waiting for wasi, wasi-http and others to be standardised, I expose my own interface for now.
Obviously, this also has a big downside. You can't compile all Rust code to WebAssembly. However, I prefer this reject by default approach, so that you are guaranteed to never have unintended side effects.
Flawless integrates with your existing system. You probably don't need the guarantees that flawless provides for all code, but just specific functions/workflows.
> It is probably important here to realize that async solves concurrency, not parallelism. You can use async with a single threaded runtime for I/O concurrency and mix that with threads for computational parallelism for long running jobs.
In my experience, it's impossible to mix threads and async tasks. They can't communicate or share state. Threads need locks, while async tasks require an awake mechanism. If you just stick to unbounded channels that don't block on send, you can get far, but in 99% cases you will need to decide upfront on a specific approach.
This has not been my experience at all. Delegating compute-intensive tasks to rayon inside a tokio runtime is not particularly hard (assuming you can pipeline things to separate IO and compute effectively).
A pattern that has worked quite well for me is to use
This is also very useful if you have a micro-service architecture and you need to call multiple micro-services to end up in a consistent state. It's a way to reliably implement the saga pattern: https://microservices.io/patterns/data/saga.html
reply