JS engines have many parts implemented natively, which may be called from JS, an...

johncolanduoni · on Jan 25, 2019

F* is probably the best equipped at the moment to deal with situations like that CVE at the moment, since its library has a concept of heaps. Basically, any function that can access or modify the "heap" (which in F* is just a set of pointers that are guaranteed to point to a value and not alias any others outside of the same heap) must specify what properties of the state of the heap must be true at entry, and what properties are true afterwards. So in pseudo-types (, the functions for accessing a JavaScript array would be something along the lines of

    fn arrayLength(x: JSArray*) -> n: uint (requires nothing) (ensures length of x = n, changes nothing)
    fn callToJson(x: JSValue*) -> JSValue* (requires nothing) (ensures nothing)
    fn arrayAccess(x: JSArray*, m: uint) -> JSValue* (requires length of x > m) (changes nothing)

(NB: F* syntax doesn't look much like this, but I'm guessing this will be readable to more people on HN)

The stuff in parentheses after each function type are the preconditions and post-conditions respectively. So if you do something like:

    let x = arrayLength(someArray)
    for i in range(x) {
      let element = arrayAccess(x-1)
    }

It will typecheck just fine. But if you add the call to toJSON:

    let x = arrayLength(someArray)
    for i in range(x) {
      let element = arrayAccess(x-1)
      let transformed = callToJson(element)
      // ERROR: (requires length of x > m) not satisfied for all runs of loop body
    }

Since callToJson cannot ensure any property of the heap after it runs. In this way you can elide range checks when needed for performance without worrying that you've sacrificed safety.

Covering all the cases a JS engine would need without adding 10 million lines of proofs to the size of SpiderMonkey is still an open problem, but this general approach (known as Hoare Logic[1]) is very enticing, and the type systems that languages like Idris and F* have are definitely the closest to realizing it in more places. There are real software engineering efforts using descendants of Hoare logic like TLA+ (notably Amazon IIRC), but it's rare to see it even in huge projects like browsers.

It's also critical to note that the heap concept of F* is not a totally fixed part of the language; most of the specification of how heaps work are actually in the standard library. That level of flexibility is what I think makes these languages likely to become capable of tackling these problems: something like a JS engine or any optimizing compiler is exactly the kind of place where being able to come up with your own type-level verification model is worth the effort.

[1]: https://en.wikipedia.org/wiki/Hoare_logic