Hacker News new | past | comments | ask | show | jobs | submit login

A JS engine is a high-risk, high-reward problem for Rust. High-reward because JS engines are, to your point, a major source of vulnerabilities; high-risk because JS-engine theory is rather outside of Rust's wheelhouse.

One class of vulnerabilities in JS engines is use-after-move. A raw pointer is extracted, an allocating function is called (triggering a GC), then the raw pointer is used, pointing into nowhere. It's awkward to express in Rust that a function may modify state inaccessible from its parameters.

A second class of vulnerabilities is type-confusion. A value is resolved to (a pointer to) some concrete type, but some later code mutates the value. Now the concrete type is wrong. Again this possibility is awkward to express in Rust.

The problem is complicated by the NaN-boxing and JIT aspects of JS engines, which interfere with Rust's tree-ownership dreams.

People smarter and way better at Rust than myself are working on it; I'm excited by the prospect of novel solutions that can defeat entire classes of problems.




I'm curious what proportion of vulnerabilities in JS engines are due to mis-generated JIT code vs direct errors in their compiled code. Rust allows you to express some nice properties not always directly related to memory safety (e.g. checked consumption, convenient and safe ADTs), but unless there is a novel application of these facilities to the structure of a JIT engine it won't help a ton with the former kind of vulnerabilities.

I'm excited to see a practical programming language that implements full dependent typing; languages like Idris are actually really good at dealing with precisely the kinds of situations you mention.


JS engines have many parts implemented natively, which may be called from JS, and in turn call back into JS. An example is CVE-2015-6764: this grabs an array length, which quickly becomes stale, because accessing one of the array's elements invokes a custom toJSON which in turn modifies the array's length.

This feels like a hopeless problem; can any of Rust's powers be brought to bear here? Could Idris?


F* is probably the best equipped at the moment to deal with situations like that CVE at the moment, since its library has a concept of heaps. Basically, any function that can access or modify the "heap" (which in F* is just a set of pointers that are guaranteed to point to a value and not alias any others outside of the same heap) must specify what properties of the state of the heap must be true at entry, and what properties are true afterwards. So in pseudo-types (, the functions for accessing a JavaScript array would be something along the lines of

    fn arrayLength(x: JSArray*) -> n: uint (requires nothing) (ensures length of x = n, changes nothing)
    fn callToJson(x: JSValue*) -> JSValue* (requires nothing) (ensures nothing)
    fn arrayAccess(x: JSArray*, m: uint) -> JSValue* (requires length of x > m) (changes nothing)
(NB: F* syntax doesn't look much like this, but I'm guessing this will be readable to more people on HN)

The stuff in parentheses after each function type are the preconditions and post-conditions respectively. So if you do something like:

    let x = arrayLength(someArray)
    for i in range(x) {
      let element = arrayAccess(x-1)
    }
It will typecheck just fine. But if you add the call to toJSON:

    let x = arrayLength(someArray)
    for i in range(x) {
      let element = arrayAccess(x-1)
      let transformed = callToJson(element)
      // ERROR: (requires length of x > m) not satisfied for all runs of loop body
    }
Since callToJson cannot ensure any property of the heap after it runs. In this way you can elide range checks when needed for performance without worrying that you've sacrificed safety.

Covering all the cases a JS engine would need without adding 10 million lines of proofs to the size of SpiderMonkey is still an open problem, but this general approach (known as Hoare Logic[1]) is very enticing, and the type systems that languages like Idris and F* have are definitely the closest to realizing it in more places. There are real software engineering efforts using descendants of Hoare logic like TLA+ (notably Amazon IIRC), but it's rare to see it even in huge projects like browsers.

It's also critical to note that the heap concept of F* is not a totally fixed part of the language; most of the specification of how heaps work are actually in the standard library. That level of flexibility is what I think makes these languages likely to become capable of tackling these problems: something like a JS engine or any optimizing compiler is exactly the kind of place where being able to come up with your own type-level verification model is worth the effort.

[1]: https://en.wikipedia.org/wiki/Hoare_logic




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: