Limits of TLA+ - It cannot compile to working code - Steep learning curve Opport...

skybrian · 2024-08-28T19:56:42 1724875002

Is that a practical example? Has anyone specified a transformer using TLA+? More generally, is TLA+ practical for code that uses a lot of matrix multiplication?

drdrey · 2024-08-28T20:14:27 1724876067

It’s really not, TLA+ works best for modeling state machines with few discrete states and concurrent systems. It can find interesting interleaving of events that would leave to a violation of your system properties

fiddlerwoaroof · 2024-08-28T20:36:08 1724877368

The most practical examples I’m aware of are the usage of TLA+ to specify systems at AWS: https://lamport.azurewebsites.net/tla/formal-methods-amazon....

westurner · 2024-08-28T22:29:08 1724884148

From "Use of Formal Methods at Amazon Web Services" (2014) https://lamport.azurewebsites.net/tla/formal-methods-amazon.... :

> What Formal Specification Is Not Good For: We are concerned with two major classes of problems with large distributed systems: 1) bugs and operator errors that cause a departure from the logical intent of the system, and 2) surprising ‘sustained emergent performance degradation’ of complex systems that inevitably contain feedback loops. We know how to use formal specification to find the first class of problems. However, problems in the second category can cripple a system even though no logic bug is involved. A common example is when a momentary slowdown in a server (perhaps due to Java garbage collection) causes timeouts to be breached on clients, which causes the clients to retry requests, which adds more load to the server, which causes further slowdown. In such scenarios the system will eventually make progress; it is not stuck in a logical deadlock, livelock, or other cycle. But from the customer's perspective it is effectively unavailable due to sustained unacceptable response times. TLA+ could be used to specify an upper bound on response time, as a real-time safety property. However, our systems are built on infrastructure (disks, operating systems, network) that do not support hard real-time scheduling or guarantees, so real-time safety properties would not be realistic. We build soft real-time systems in which very short periods of slow responses are not considered errors. However, prolonged severe slowdowns are considered errors. We don’t yet know of a feasible way to model a real system that would enable tools to predict such emergent behavior. We use other techniques to mitigate those risks.

Delay, cycles, feedback; [complex] [adaptive] nonlinearity

Formal methods including TLA+ also can't/don't prevent or can only workaround side channels in hardware and firmware that is not verified. But that's a different layer.

> This raised a challenge; how to convey the purpose and benefits of formal methods to an audience of software engineers? Engineers think in terms of debugging rather than ‘verification’, so we called the presentation “Debugging Designs” [8] . Continuing that metaphor, we have found that software engineers more readily grasp the concept and practical value of TLA+ if we dub it:

  Exhaustively testable pseudo-code

> We initially avoid the words ‘formal’, ‘verification’, and ‘proof’, due to the widespread view that formal methods are impractical. We also initially avoid mentioning what the acronym ‘TLA’ stands for, as doing so would give an incorrect impression of complexity.

Isn't there a hello world with vector clocks tutorial? A simple, formally-verified hello world kernel module with each of the potential methods would be demonstrative, but then don't you need to model the kernel with abstract distributed concurrency primitives too?

From https://news.ycombinator.com/item?id=40980370 ;

> - [ ] DOC: learnxinyminutes for tlaplus

> TLAplus: https://en.wikipedia.org/wiki/TLA%2B

> awesome-tlaplus > Books, (University) courses teaching (with) TLA+: https://github.com/tlaplus/awesome-tlaplus#books

FizzBee, Nagini, deal-solver, z3, dafny; https://news.ycombinator.com/item?id=39904256#39938759 ,

"Industry forms consortium to drive adoption of Rust in safety-critical systems" (2024) https://news.ycombinator.com/item?id=40680722

awesome-safety-critical:

koolala · 2024-08-28T20:41:52 1724877712

Sounds similar to UML: Unified Modeling Language diagrams.

I wonder if TLA+ could convert to diagrams instead of Math notation.

hackermeows · 2024-08-28T21:01:01 1724878861

you are way off. These is a tool to simulate and validate systems and expose edge/race conditions of that system.

candiddevmike · 2024-08-28T20:28:05 1724876885

Are there any RFCs written in TLA+?