Hum, panic as in stop all work ? I like that in Java, units of work can continue...

cmrdporcupine · on Feb 15, 2023

Yes, panic, stop all work. If it's truly an exceptional circumstance, it's unlikely that anything further up can "fix it." Don't even try. Kill the process and restart. Or force the author to fix the bug.

If it's an "expected" runtime condition that you can manage and recover from, then it's not "exceptional", is it? So don't use an exception. Pass the information to the caller that needs it, and adjust state accordingly.

That's my take these days. I've seen too many systems degrade in cascading failures because of misguided attempts to "recover." Deadlocks, partial failures, explosions, etc. Real fun to diagnose.

xwolfi · on Feb 16, 2023

But what if we need it for critical processes, what if we receive loosely constrained input, what if we want to change it so often that bugs must happen ? (or are you a sort of manager that think bugs can be entirely prevented?)

Why are exceptions supposed to be rare ? They re exceptional in the context of what we told the software could happen, but not in the context we're failible humans pissing code as fast as clients can pay us.

I never had problem to diagnose a corrupted state following an exception, it's pretty clear. It s much harder to tell dozens of clients that there will no trading in Hong Kong this afternoon because one of them sent an illegal character we didnt think to sanitize, or the exchange inverted two messages against their spec, or a network router dropped a packet. All these are cases I ve seen the last few years, we lost one order in each case, kept the million others trading as normal, handled the potential surprise the next release...

Recovery design can be done but you need a strict set of constraints. How do you even recover with a restart after bug fix ? Takes hours just to do, the world has moved on, your states you recovered are useless, you ll sort it the next day ?

Maybe imagine a plane software stopping all work because the human pressed two buttons at the same time and the programmer, a human too, forgot this possibility ? Or am I misunderstanding you ? Maybe you work on more one off things like data science when you re the only person interacting with the inputs and outputs ?