Hacker News new | past | comments | ask | show | jobs | submit login

At some point, you can't gracefully handle bugs in other peoples code. If a function you call causes a SEGFAULT, in the vast majority of software, you're not expected to handle that. That's an invariant error, and you probably want some way to detect that it happened so you can fix it, but it's not reasonable to ask every caller of every function to handle that (in the same way we don't consider "the earth blew up" to be a reasonable thing to protect against, even though it its technically possible). There's simply not enough time and money to protect against every possible edge case in most software (NASA projects aside).

The argument here is that network issues are exceedingly common in microservice environments and so aren't actually an edge failure case, so you actually have to worry about them way more than you would worry about a function in a different module causing a SEGFAULT.




The point is not to handle individual bugs, it is to handle all failures. This is the difference between a "defensive programming" approach and the "let it crash"/ "zen of erlang" approach. Actors are designed such that they have failure isolation, which means they can react to errors in other actors without worrying about their own state. They then have two options based on one of two bug classes - transient and persistent.

Persistent errors are propagated to the supervisor. Transient errors are either retried or propagated.

It doesn't matter if it's a network error, a disk error, a timeout, a crash, a cosmic radiation bit flip - your approach is always one of those two. So adding more failure cases doesn't "matter" in terms of your error handling, although you may want to adopt helpful patterns in the nuances of "retry".

The frequency of errors will obviously increase with a network error (arguably very very little), but the pattern is fundamental to resiliency.

If your network is truly so unreliable that you can not pay that cost, don't do it. I don't think most people are developing on networks that fail for long periods of time frequently.


But now you are talking squarely about Erlang actors, not microservices in general. The runtime gives you all the needed guarantees here.


I talk about services and actors interchangeably because there's no interesting differences between them.


Other than automatic handling of network exceptions, safe failiures and the shitton of other features Erlang runtimes have?


I'm not sure what you're talking about. What automatic handling of network exceptions? What safe failures? BEAM has lots of great features, no question, but they have very little to do with the implementation of actors - BEAM primarily provides names and linking as useful primitives.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: