Hacker News new | past | comments | ask | show | jobs | submit login
The State of State Machines (googleprojectzero.blogspot.com)
426 points by arkadiyt on Jan 19, 2021 | hide | past | favorite | 66 comments



I know this is probably what most users want but I hate the why Chrome's camera/mic permissions work. Your options are

(1) block a domain from camera/mic access

(2) give a domain camera/mic access forever

2 is what people pick. From that point on, that domain, in an iframe, can turn on your camera/mic. That means if messanger.com or meet.google.com or zoom.com wanted to make zoom.com/embed/this/url/as/iframe/to/spy/on/people.html they could, and then sell that service for ads to embed in their ads. They have the permissions needed to turn on the mic/camera whenever they want at that point, no more questions asked.

That bugs me.

Firefox at least has "allow camera access once".

I don't know what a better solution is. One might be mic/camera permission per URL instead of domain. I supposed that's not really any different if you can just append ?dothespything=true

Another might just be that it always asks. It's not like the phone feature of a smartphone has the option to just always answer immediately. Instead you choose whether or not the take the call. Does it need to be any different on a webpage?

PS: I like that webpages have this feature. I like that I can video chat/conference in a sandboxed environment instead of having to install a native app that can spy on me in other ways too. I just wish this permission issue worked differently.


Safari has only two options. One is to block a domain. The other is remember that you allow it to be used for one day. I think this is probably the best way to compromise on this.


I have no idea about the implementation details, but how about the controls to turn on and off camera/mic are not with the browser or app at all but with the OS. Big red button on the taskbar that you can choose to turn on for a given application or url on demand. Would be great if the OS would log access to the devices too.

The browsers are over trusted for things like this and should never have direct access. Google Meet after a crash or the browser was shutdown and restarted with all the old tabs would open up your meeting again hidden behind a bunch of other tabs. No one was ever on them long after the meeting but it was freaky. This isn't the behaviour that I've seen recently thankfully.


I agree it would be better as an OS level thing and for me personally I'd want the option to have to grant the permission for an app to access the camera every time. Unfortunately I'm not sure how that would be enforced for most apps. For example a browser, my browser runs pretty much until I reboot so if I gave it permission once it would still have permission until I quit. So it seems like the browser needs a permission system on to of the OSes. OS is per app, browser is per domain/page/tab something.


I think a simple camera icon in the status bar that appears when an app is using the camera and the user can toggle it on/off. The camera then just starts feeding black to the app. Have it disappear once all apps no longer have the device handle open.

This isn’t a difficult feature to design and a pretty simple one to implement at the OS level.


Because it would never work everywhere, it is not for browsers to specify how OS are supposed to work.


Doesn't chrome already block 3rd party iframe for those permissions?

https://sites.google.com/a/chromium.org/dev/Home/chromium-se...


Well I guess it depends on who is controlling the iframe, it just requires an allow="microphone; camera" attribute.


Good analyses, but I was expecting more discussion of how best to work with (conceptual) state machines when writing code.

I'm no expert on the matter, but there are code-generative tools out there that help make state machines very explicit: you describe the state machine, and the system generates the code. This should help to avoid issues with transitions that you failed to consider, which may be of real value as such unexpected transitions may pose a serious security issue when implementing the state machine 'by hand' in code. More so than many design patterns, it seems to make good sense to use a code-generative approach for state-machines.

This is of course related to how regex works, as regex is implemented with state machines, but state-machine code-generator systems might not make use of the usual regex syntax.

Related reading: Turning vaguely reassuring finite-state machines into regular expressions, https://news.ycombinator.com/item?id=25496045


That's one of the reasons I love ADTs and pattern matching in systems code so much.

For instance in rust where both self and Event are enums

  fn on_event(&mut self, event: Event) {
    *self = match (self, event) {
       (State1(data), Event1(args)) => {
           .... processing event1
           State2(new_data)
       }
       // All other events in State1 are ignored
       (State1(data), _) => State1(data),

       (State2(data), Event1(args)) => {
          .... processing event2
          State3(new_dat)
       }
       ..... pattern match the rest of the matrix
     }
   }
will do a great job highlighting missing transitions of the state machine's transition matrix.


My favorite rust state-machine-ism is consuming self by value when you transition. In combination with generics and From, this allows you to ensure that states are only ever entered into via the transitions you encode by implementing the trait, and because the previous state is totally consumed it makes the transitions super clear.


I started off by doing that, but ultimately didn't like it for my use cases.

* It didn't seem to grant a lot of benefits over pattern matching (errors were pushed to compile time checks in both cases).

* It had larger cognitive overhead since the allowed transitions were either all over the place in the source instead of in one match statement or weren't colocated with their structs.

* It didn't seem nearly as good at attacking the combinatorial complexity you see in these matrices as pattern matching is.


It's definitely not a great fit if you have a ton of different state transitions, since the From impls add a lot of boiler plate. But I've found it perfect for situations where there's only a few very well defined transitions, like the Raft implementation suggested in the blog post I linked in parent reply.


I'd love to read a blog post detailing exactly how is this done.



Much appreciated, thank you!


Could you explain a little more? That sounds really cool.


Here's a good blog on the topic: https://hoverbear.org/blog/rust-state-machine-pattern/. The last implementation of Raft uses this pattern.


Not Rust but I often find with ADT's + Pattern matching I end up with a state machine without even attempting to write a state machine.

ADT's + pattern matching make it a very easy pattern to implement. If your language supports exhaustive pattern matching it's also a very easy way to leverage the compiler to find bugs in your logic.


I'm certainly not an expert in this, but the big issues when not using a "real" state machine but e.g. a bunch of separate booleans representing various parts of the state (isLoading, isError, ...) are the following:

- you can be in an inconsistent state that should not exist, e.g. isLoading and isError at the same time

- state transitions that should not be possible can be performed if you make a mistake

The first one you can easily fix without a library by using a state enum instead of a bunch of booleans. The second one is probably better handled by a library that implements a declared state machine or state chart. In Javascript that would be e.g. xstate (https://xstate.js.org/).

I found that simply thinking about the states and writing them down as a state machine is already very helpful in avoiding certain kinds of issues that arise when you just do this on the fly.


This gets greatly compounded when you have a lot of states. As an easy example, consider the "state machine" for a rubik's cube. All told, this isn't insurmountable, but you are almost certainly not going to write down all of the allowed states with all of the possible transitions.

Which is not to say you can't make a program to reason about what the allowed states are. I started a probably bad attempt at that a while back at https://taeric.github.io/cube-permutations-1.html. (And, I just noticed the mathjax included on that page isn't loading. Almost certainly my fault... :( )


I think your answer boils down to "pick the right tool for the job". Of course state machines are cumbersome for systems with loads of different states and transitions.

But state machines are really useful when designing safety-critical systemsfor aeronautical, automotive or train components, especially if one uses timed automata for modelling.

It's possible to either generate the code from the model or use the model to automatically generate a complete test suite for the modelled system. Very useful for verification purposes.


Completely agree. I did not mean my post as an argument against state machines.


The permutations you're getting are Cayley's Theorem in action: https://en.wikipedia.org/wiki/Cayley%27s_theorem

Basically, there's a finite group that acts on the Rubik's cube, and every finite group is a subgroup of a permutation group if you squint at it right. (what's a 'finite group action?' The 'group' is a set of composable and reversible "moves." The 'action' means that the group moves are effectively state transitions acting on a set of states, and that the action is nice and associative: g(hs) = (gh)s. And 'finite' just says that there's only so many states possible...)

Really nice demo, by the way! Very fun to see the squares go flying.


I'll have to read up on Cayley's Theorem. Thanks for the link!

Glad others found the squares flying as much fun to watch as I did. :D


State charts avoid that problem to some extent, you don't explode the number of state like in a simple finite state machine. But that's a point where you certainly want to use a library.


Right. My assertion would then be that many things are modeled on a ton of possible states.

Granted. In context of this story, I'm in complete agreement with you.


Might be worth reposting that link, it's an interesting topic on its own. Rubik's cube transformations as permutations is one of those "obvious" things that had never occurred to me before, plus the neat interactive demo at the bottom.


Would love to see feedback on it. I posted back when I wrote it. I think it was mostly lost to the noise.


If you implement your favourite variant of the state pattern, state machines get a lot nicer to work with also - depending on the task at least. You wouldn’t bother with it to track a button press state in embedded but it’s nice when the state is complex, has some sort of “on tick” aspect, or there are a lot of states

https://gameprogrammingpatterns.com/state.html


On a few occasions in C++ I have found few boolean flags are superior to enum in code that evolves. What happens was that code changes effectively turned deterministic state machine into nondeterministic one. Modeling the relevant parallel states with bits is very natural then.


Ragel is a good tool for this: http://www.colm.net/open-source/ragel/

This can be built into languages as "typestate" but noone ever seems to get around to doing it. It's unfortunate because I think it's the only acceptable way to do OOP.


Heh, I mentioned Ragel in that other thread I linked to, although I see I messed up the link and linked to GraphViz instead.

I have to admit I've only very briefly dabbled with Ragel but yes it does seem a pity that this approach is so rarely used.


One of those code-generative tools is Ragel[1], which was in the middle of an issue at CloudFlare[2]. Though CF did note that it was a bug in how they used Ragel, not Ragel itself.

[1] https://www.colm.net/open-source/ragel/

[2] https://blog.cloudflare.com/incident-report-on-memory-leak-c...


This is what Interana does. You define the states and the transition rules, and what you want to compute (eg how many times was this pattern matched, what was the p50 duration between steps 2 and 5, etc.) It then compiles this into machine code to actually scan the column store using llvm.


To be honest I wouldn't bother with an existing code generator. However I would definitely consider modelling your state machine and generating code from that. Also model your state machine hierarchically, especially if it has any kind of complexity. We did that with our NBD client, which has 96 states in 17 groups, generating C code: https://github.com/libguestfs/libnbd/tree/master/generator


> code-generative tools out there that help make state machines very explicit: you describe the state machine, and the system generates the code.

J's sequential machine builtin[0] implements something similar to this. It doesn't generate code, but it executes a state machine based purely on descriptions of states and transitions.

0. https://www.jsoftware.com/help/dictionary/d332.htm


tl;dr Software programmers continue to fail to grasp that "protocol" means "state machine". News at 11.

Even beyond just "use an actual state machine", programmers don't grasp that you need to think about:

1) What is my state now?

2) What are my inputs now?

3) What are my outputs now?

4) What is my state next?

5) Where/when do I transition from state now to state next and does it need to be atomic?

Conflating any of these results in bugs. And not using state machines explicitly always results in conflating these.

One of the most important overlooked concepts in protocols is that "time" is an input (interestingly, Carmack talks about this in game development, too). Timeouts and errors are a state like anything else.


> And not using state machines explicitly always results in conflating these.

Not true - (pure) functional programming also enforces that you do exactly that. Even more, its rule-set also enforces that time must be an input.


Most pure functional idioms for handling a protocol are state machines, some of them in disguises, others plain ones.

The ones that aren't state machines are very limited and often enough way too verbose.

In particular, FRP is worth noticing as being exactly a set of states that change by means of events. It's a nice idiom for composing very complex state machines, but isn't even disguised.


General state machines are useless as an abstraction: they're fundamentally noncompositional, which makes them impossible to reason about as soon as they get big enough. Many things can be represented as state machines, just as any program can be represented as a sequence of machine instructions; that doesn't mean that's a useful representation for working with.


I think it is the other way around.

FRP goes further than state machines, in that it forbids any side effects.

E.g. you could build a state machine that, when in a certain state and getting a certain event, responds or acts by printing the current time. A classical state machine will thus behave different, depending when it receives the event. However, in FRP you _must_ either parametrize the event with the current time, or you need to have something _outside_ of the statemachine (such as an effect interpreter) which handles effects such as printing the current time. A classic state machine does not have these restrictions.


Every time a vulnerability like this comes out, especially if it's a company that has some people with a chip against them (and most companies do), people say "See, that's why you can't trust Company With Vulnerabilty, I never use their software."

But if you actually pay attention you'll realize that vulnerabilities happen everywhere. Not all software is created equal, some might be more secure and have fewer vulnerabilities than others. But a single exposed vuln, no matter how severe, is pretty much never enough to judge which software is which.

I remember hearing about this exact bug in Facetime, but I had no idea that, as OP demonstrated, an analagous bug effected pretty much every popular similar A/V group chat software. Including the vaunted Signal.

Did this one OP find and reveal it in all of them? wow.

Also, I really wonder how many of these the NSA knew about before we did, but who can say (except the NSA).


This extends beyond vulnerabilities to bugs in general too. No company, or programmer for that fact, is immune to making mistakes. Some people like to think that if you're a very good programmer you'll magically write bug-free code, but that just doesn't happen. This is why extensive testing and having multiple eyes look at something is good practice. And even then, with the multiple layers of redundancy and everything, you still see AWS/GCP/Azure having downtimes, you still see bugs and crashes and exploits, from the small startups to the top tech companies.


I feel like we will continue to run into these kinds of issues to the end of time unless can figure out: 1. How to precisely describe the properties we want the final system to have, 2. How to 'compile' from a higher-level language/layer into a lower-level language while maintaining all those properties. At the least this will require us to: 3. Define the precise semantics of each layer and enforce that every construction in and implementation of that layer adheres to it strictly.

Properties like "no private data is accessible until the user consents", "the final result of the algorithm is independent of the order that messages are received", "decryption must be constant-time and the program must not have any observable effect until the message is known to be verified", etc. Obviously these properties as stated are incomplete and imprecise, but that's my point: we have to figure out how to make these kinds of statements precise enough.

Layers/languages should verifiably maintain these properties through every translation via paths like: eula/privacy policy, requirements, UX, UI, state machine, source code, AST, IR, ASM, machine code. Every one of these layers should have a well-defined execution model that is isomorphic with other layers, modulo metadata. For example: source code === AST [+ comment/whitespace/filesystem metadata], AST === IR [+ name/desugaring/applied-optimizations metadata]. All undefined behavior at every layer must be exterminated by definition or avoided by construction.

One problem is that we throw away too much metadata when moving between layers which makes the task of maintaining these properties through many layers extremely difficult. For example, trying to implement constant time algorithms in a high-level language is nearly impossible to hold over time (robust to changes in source and compiler) because the compiler/optimizer has forgotten (or was never told) what parts must be done in constant time and to not use the partial results of the algorithm until it has completely finished. To implement this, all candidate implementation strategies and optimizations in the lower layer must be identified by whether they preserve constant time execution or not, and optimizer/compilation unit (CU) boundaries inserted to prevent an over-eager optimizer for a non-constant-time CU to observe partial results. Etcetera for every other desirable property and every layer.


A good path to this might be deeper investigation of logic/constraint programming. Its a good tool for formally encoding known, discrete constraints like you described above.


IMO an interesting project in this space is: mm0 / MetaMath Zero - Closing the loop in proof verification down to verifying the machine code of the verifier. Goes from first-order logic to peano arithmetic to a model of x86 to a model of the verifier written in x86. Interestingly, it demonstrates that verification of a compact proof can be performed in linear time (!) if the proof is structured correctly. -- https://github.com/digama0/mm0

The fact that proof checking can take linear time (though not proof-finding), and the fact that it incorporates so many 'layers' has emboldened my opinion that such a thing as I described above is possible and has enormous potential.


Agreed! I've been paying attention to constraint/logic programming (CLP) recently, but I haven't found the right thing yet / haven't invested enough time into it to find the right thing. In my very-layman opinion, CLP-based systems need three things to properly take off: 1. some kind of 'type system'-like thing that enables robust symbolic reasoning over pure relations in any mode, 2. a critical mass of defined layers that is 'deep enough' to reach the hardware, 3. a way to express verified alternatives and preference in the order they are applied.


This discussion reminds me of the Alloy[1] language and the associated analyzer.

The analyzer transforms an Alloy model and some formal statement about it (e.g. "a caller can never connect to callee without the callee's consent to that caller connecting") into a giant SAT problem, which it feeds to a SAT solver looking for counterexamples, which it can map back to a series of valid state transitions in the original model.

Very cool stuff, at least 20 years ago when I was playing with it.

[1] http://alloytools.org/about.html


It's still useful, though I haven't used it in a while. Alloy was my introduction to properly modeling systems (versus largely paper-based which has limitations on size/scope).


It was interesting to me how many of these were related to making async calls to adjust state resulting in race conditions. In many cases the argument was made that async calls were not justified. Many times we follow a particular call pattern in order to keep things consistent without fully considering whether it's necessary and/or safe.

Also I wasn't aware of the tool named Frida[0] that she kept referencing so I had look that up.

[0] - https://frida.re/


Solution is this:

Assign UUID to each async call, including retries.

Or disallow retries.

Sprinkle in some optimistic update if you want.

Keep track of results, mirror lower level timeout (75 seconds for HTTP).


That's why I get upset when I look at complex code that doesn't clearly specify a whitelist of state transitions. It's easy god damn it: figure out your protocol state transitions, and write them down in your code somewhere. Now every time you change state, just match it against the whitelist of state transition that you have.


One could go so far as to say that a finite state machine with a poorly defined 𝛿 is itself poorly defined.


As much as I try to distance myself from Google as a corporation I really appreciate project zero and their write ups. Always learning something new from them.


For distributed systems and protocols, you have multiple state machines acting concurrently. Such systems can be modeled by Petri nets:

https://en.wikipedia.org/wiki/Petri_net


State machines are incompatible with concurrent systems. For example there is no nondeterministic state machine for the following program:

   IndeterminateLargeNumber:[]→Natural ≡
       []↦
          aCounting←Counting.[], // Bind aCounting to a newly created Counting
          aCounting.go|||  // Send aCounting a go message while concurrently
          aCounting.stop   // sending aCounting a stop message

    Counting:[ ]→Interface go→Void, stop→Natural 
       ≡ []  ↦↦   // constructor has no arguments 
          count:=0,   // The variable count is initially 0
          continue:=False|  // The variable continue is initially False
          go ↦      // When a go message is received: 
            continue cases   // Cases for continue are as follows:
                     True then  // If continue is True,  
                        count:=count+1;         // then increment count and afterward
                        Hole (This Counting).go      // in a hole in the region of mutual exclusion,  send a go message this instance of Counting
                     else Void     // If continue is False, then return Void
          stop ↦    // When a stop message is received:
              continue:=False;       // Assign continue the value False and then
              count     // return the value of count
See the following for an explanation:

https://papers.ssrn.com/abstract=3459566


No mention of behavior types here in the comments? I'm wondering. Let's add this as it's related.

https://www.di.fc.ul.pt/~vv/papers/ancona.bono.etal_behav-ty...

As this above is a very academic viewpoint let's add also some reference to a real world implementation of some of the concepts:

https://alcestes.github.io/effpi/

Enjoy! :-)


Was hoping to see more than just chat apps. For examples, see also:

https://www.libssh.org/security/advisories/CVE-2018-10933.tx...

https://mitls.org/pages/attacks/SMACK


Is there a more detailed description of the logic of these calls to better understand the problem?


That was a really good read. I appreciate the detail that the writer went into describing his investigative process.

Let the warning be heard loud and clear: state is the devil


State isn't the devil. Nor is it a necessary evil. State is essential to many, if not most, interesting applications.

Poorly managed state or ad hoc state machines are the devil. Once you model your system properly as a state machine, many of these issues fall out during the initial design and development (not to say they won't happen anyways, but this mitigates many of the issues). If you design your system so that whatever variables define your state are mutated together to ensure your invariants hold, this helps a lot with addressing poorly managed state. And if you can (you can't always) use a system which is aware of state machines as part of the design then you can mitigate the ad hoc state machine part.


I would go further and say that the modeling isn't even a big deal (in terms of state machine complexity) as long as you are enforcing immutability at your domain boundary. If only domain methods are able to mutate state, it's a lot easier to find where bugs might be hiding. If any consumer of your domain model can mutate anything in any way it sees fit and then invoke some generic UpdateState(model) method, then you have a much bigger problem.

Put differently, as long as you have good accountability of all the things that can change the state and are comfortable with how that set of things operates, then you are probably in a decent position.


That's fair, and is similar to what I would've written if I'd elaborated more on my "poorly managed state" comment. If you apply good modular/OO principles you at least constrain where state is managed in a way that makes writing, maintaining, and testing your system more tenable without an explicit model.


* her[1] investigative process

[1] https://twitter.com/natashenka


It's an easy mistake to make because the top of the article says

> Posted by Natalie Silvanovich

but the bottom says

> Posted by Ryan

The latter is just the blogspot username though. It's quite scruffy.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: