Using TLA+ for fun and profit in the development of ElasticSearch [video]

jaytaylor · on Sept 19, 2019

If you're already familiar with Elasticsearch, the interesting TLA+ presentation begins ~8 minutes in.

https://www.youtube.com/watch?v=qYDcbcOVurc&t=479

ignoramous · on Sept 19, 2019

I spent time frequently mucking with the elasticsearch codebase from 5.x through till 6.4 strictly from durability and resiliency PoV [0]. The improvements it is seeing in 7.x are tremendous. It still had the occasional tendency to go hay-wire and show behaviour that simply didn't make sense and couldn't be recovered from, but I'm confident 7.x will push it to making it even more resilient than 5.x+ [1].

The sheer number of configurable options for the cluster bringup, index setup, plugable components turns it into a very hard to tame beast. There is an opportunity here somewhere for someone to make Elasticsearch on Rails (convention over configuration). With 7.x, I feel the time is just abt right to try something like that.

What excites me most abt 7.x is the overhaul of underlying distributed engine that's been going on for over 2yrs [2] that will enable a whole slew of super nice features [3] that'd put it on par with state-of-the-art distributed databases in terms of not only features but resiliency.

[0] https://www.elastic.co/guide/en/elasticsearch/resiliency/cur...

[1] https://www.microsoft.com/en-us/research/publication/pacific...

[2] https://github.com/elastic/elasticsearch/issues/10708

[3] https://www.elastic.co/blog/elasticsearch-sequence-ids-6-0

hcnews · on Sept 19, 2019

Can someone summarize TLA+ functionality in 2-3 sentences. It keeps coming up on hackernews but I haven't heard it any other context (and I have worked at faang, so should've heard of it if it was generally useful). Just trying to figure out if its a niche or something I should know as a distributed systems person.

latencyloser · on Sept 19, 2019

TLA+ is less a programming language and more a systems modeling language. It's like writing a mathematical proof for your system where you describe each action the system takes, the invariants to describe "correctness" of the system and its state, and the TLA checker runs some kind of analysis of every possible state described by your model and that the invariants still hold true for all possible states. To me, it's like running a unit test in parallel where all possible inputs are provided concurrently.

I'm not an expert on it, but I've worked with it some to verify distributed systems at a FAANG company (with the oversight of real experts)--so it does get used there!

gamegoblin · on Sept 19, 2019

TLA+ allows you to encode your system as a state machine, define invariants that must always be true, and the checker will explore your state machine and verify that the invariants are always true.

An example would be encoding a distributed locking system, with an invariant saying "no lock is owned by more than 1 node at a time". You would encode all of the locking and unlocking behavior in your state machine, and then the checker would verify it.

jsjolen · on Sept 19, 2019

So to ensure that your system follows this state machine you need to write a proof for that in some other theory (like an abstract interpretaion of your program)?

dwohnitmok · on Sept 19, 2019

In theory (no pun intended) yes.

In practice this is not done. Perhaps never done. I know of only one toy example that you could maybe extend to do this. TLA+ is meant to double-check design level questions and intuitions rather than code level or implementation level ones. That is it attempts to give guidance on whether a design is flawed rather than whether an implementation successfully follows a specified design.

This may sound limiting, but the fact that this is an artifact completely separate from code makes TLA+ much more attractive from a business risk perspective in my opinion.

pavlov · on Sept 19, 2019

I can recommend Hillel Wayne's book "Practical TLA+". It helped me understand scenarios where TLA+ specifications and the modelling approach would be essential in discovering bugs before writing any code. (Still waiting to put my fledgling TLA+ knowledge to some actual application though, as I'm currently working on some graphics stuff with nothing distributed about it...)

Wayne also has a great blog: https://www.hillelwayne.com

Btw, not all great technology finds adoption at the FAANGs. At least Facebook and Google tend to be quite insular: they certainly don't frown on open source, but it must be something that can be assimilated into the worldview of hulking monorepos, company IDEs and tools teams.

To put it another way: if Facebook/Google are giant Borg cubes, then TLA+ is a clever little Vulcan ship applying pure logic to stay in the game.

dwohnitmok · on Sept 19, 2019

TLA+ is a formal language for specifying a state machine and properties it may satisfy (e.g. eventually/never reaches a given state). Since many systems, from simple algorithms to entire distributed systems, can be thought of as state machines, TLA+ serves as a useful formal design language for documenting them and the expected behavior they should follow. On top of this, the TLA+ ecosystem comes with tools to verify that your proposed design indeed satisfies the properties you have written, increasing confidence that your design is sound and doesn't have weird corner cases you haven't thought about.

dqpb · on Sept 19, 2019

TLA+ is a formal specification language developed by Leslie Lamport. It is used to design, model, document, and verify concurrent systems. TLA+ has been described as exhaustively-testable pseudocode, and its use likened to drawing blueprints for software systems

https://en.m.wikipedia.org/wiki/TLA%2B

deboflo · on Sept 19, 2019

You can also use the model checker to, via brute force, find a sequence of states that result in a desired outcome (which is expressed as a set of conditions called an “invariant”). For example, steps to solve the tower of honoi or the Die Hard 3 jug problems.

pron · on Sept 19, 2019

Leslie Lamport, creator of TLA+, has described it as blueprints for your system. You can describe your system at any level of detail, and state assertions/assumptions about it. You can then use tools to check if your assertions are correct.

mad44 · on Sept 19, 2019

http://muratbuffalo.blogspot.com/2018/10/debugging-designs-w...

baq · on Sept 19, 2019

it's kind of a proof assistant, not really a programming language, focusing on finding gaps in invariants during state transitions.

i suggest spending a few hours with author's own video course: https://lamport.azurewebsites.net/video/videos.html

it is generally useful, what it doesn't do is emit working prototypes.

pron · on Sept 19, 2019

For other reports/examples, visit https://old.reddit.com/r/tlaplus/