Hacker News new | past | comments | ask | show | jobs | submit login
Another Big Milestone for Servo: Acid2 (blog.mozilla.org)
345 points by dherman on April 17, 2014 | hide | past | favorite | 96 comments



Servo is the kind of project that launches a thousand research papers. Some of the early results are staggering and the project is still just getting going. It is a great example of doing serious research to practical ends.

Some examples:

- firstly, the entire foundation, Rust, is itself an ambitious research project that solves many long-standing problems in the domain.

- Servo has, or has plans for, parallelism (combinations of task-, data-parallelism, SIMD, GPU) at every level of the stack.

- The entirety of CSS layout (one of the most difficult and important parts of the stack) is already parallelized, and it's fast.

- It puts all DOM objects in the JS heap, eliminating the nightmarish cross-heap reference counting that historically plagues browser architectures (this is part of Blink's "oilpan" architecture).


> - It puts all DOM objects in the JS heap, eliminating the nightmarish cross-heap reference counting that historically plagues browser architectures (this is part of Blink's "oilpan" architecture).

Does Gecko not already do this? Certainly Presto did, and (I believe) Trident does.

But to add another bit of research into the mix, and what I think is arguably the most important point:

- Builds on all the work around HTML(5) and CSS 2.1, both of which are the best part of a decade of work, both aiming to get browsers actually implementing to specifications, and specifying behaviour in sufficient detail (with minimal undefined behaviour) that a web page cannot easily distinguish two user agents. By actually specifying the web platform as it actually exists, it suddenly makes it far more practical for new browsers to enter the market (compared with the previous hellish situation of the hardest part of developing a browser being reverse-engineering existing browsers with sufficient marketshare that web developers ensure their web pages work in them). If this is shown to be viable, suddenly we have a truly open platform!


Here's another really fantastic reason for the research Servo is doing: by using concurrency and parallelism pervasively, Servo is discovering and mapping out all the places where sequentialism is baked into the web standards themselves. With any luck, their experiences will help to guide the next generation of web standards and prevent them from being inadvertently hostile to parallelism.


This is certainly part of the intent, and it has already started to happen in some limited ways. For example, Bobby Holley noticed that certain document.domain changes could make implementing parallelism hard and worked to address this in the spec.


I've been following Servo for a long time and had never begun to consider this. How incredibly exciting it is that they could actually get to the point where the only thing holding back their performance wasn't even in the code.


floats :(


The float taketh away and the clear: both giveth back again:

http://pcwalton.github.io/blog/2014/02/25/revamped-parallel-...


DOM objects are reference counted in Gecko, and there is a cycle collector to collect JS/C++ cycles. They aren't allocated in the JS heap.

WebKit is similar, but without the cycle collector.


Ah — I'd presumed since you detected cycles that it was in the GC'd JS heap, as opposed to having a separate cycle collector.


The main improvement that Servo's architecture brings over things like Oilpan (and, it sounds like, some other browser engines too) is that our GC is single-threaded, despite the browser itself being highly multithreaded, and it's precise instead of conservative. (Well, to be more exact, it will be precise when we upgrade our copy of SpiderMonkey. Also note that Oilpan is only mostly conservative, in that it can do precise collections if no C++ is on the stack.)

Single-threaded GC is nice because it doesn't have to stop any other threads in order to run; it also tends to reduce the size of the heap that must be scanned. For example, while script is collecting garbage, the layout and painting threads can run unimpeded. Of course, using a single-threaded GC in a multithreaded app means you have to be extremely careful in order to avoid the GC freeing things you don't want to be freed, and that is why it is seldom done. Our trick is to use the Rust type system to enforce that this is done correctly at compile time, which is something that is not possible in C++.


Here is a presentation (and its slides) describing the implementation details of Firefox's cycle collector:

https://air.mozilla.org/gecko-cycle-collector-intro/

http://mozilla.pettay.fi/cycle_collection_optimizations/


> (I believe) Trident does.

I doubt it, Trident's DOM is COM objects. Not only are the heaps separate, the DOM heap is page-independent and in the old days you could leak DOM objects for whole sessions if you created a circular reference between JS and DOM (because you set up an event handler closure for instance, the DOM element would reference the closure and the closure would close over the DOM element). There were tools (Drip and later sIEve) whose sole job was to wrap MSHTML, reload pages and detect objects remaining from the page you'd just left.


Trident moved away from using COM for JS/DOM communication with the introduction of Chakra in IE9. I believe as of IE9 they allocate DOM objects as host objects in the JS heap.


You're describing the situation 10 years ago. It's not at all clear whether that still holds today.

Also note, just because objects can have COM-accessible reference doesn't mean they always need to have one. It's quite possible they're just wrappers created on demand.


> that a web page cannot easily distinguish two user agents.

Don't need to, to be more precise.


Both, really — any vector that can be used to distinguish UAs often is (even very dubious ones — HTMLUnknownElement implying Gecko, for example!), and that's harmful to the open web.


>is already parallelized, and it's fast.

benchmarks?


Benchmarks at this point would be pointless, Servo doesn't do so many thing Gecko does that any benchmarks is pointless.

E.g. Gecko renders every element on the page according to spec and spends 10seconds, while Servo renders like two in a millisecond (because it stubs the others).

To make stuff worse, Gecko will be super-optimized for browser benchmarks (like storing previous values of used functions to speed things like sin(x) ) that Servo will look way slower than Gecko, or maybe even Firefox 6.


If I'm not mistaken, it was never planned for Servo to have it's own Javascript engine, it just does layout. Most benchmarks just test JS execution time, so you wouldn't even run those ones.


I don't know what this means, but we're totally implementing the DOM and JavaScript execution using SpiderMonkey.


With the bad press Mozilla has had the past few weeks it is easy for people to forget about some of the awesome things Mozilla are working on such as Rust and Servo. I really like the look of Rust and feel it might be the future native language for high performance applications. It is very exciting!


Indeed. Brendan's resignation was dismaying and distracting, but we haven't stopped working on stuff.

Just after Brendan resigned, I read a comment somewhere to the effect of "I guess that's the end of Firefox OS". Um, no. A major project run by a company of 1,000 employees (and many volunteers) doesn't stop because one person left, even if that person is at the top. Especially when it's progressing well.


This is awesome, I wonder if there is a more constrained web rendering engine somewhere. Something where rather than 'render everything we've ever seen' is 'render the following html 'standards' correctly' (or at least predictably). I was looking for something like this for a modern day sort of serial terminal thing.


We try to support the newer things that are generalizations of the older things. One example of this doing all your list styling with a user agent stylesheet using generated content, instead of hardcoding how list bullets and things work. Also, in some sense, we are doing what you want simply because we don't have a lot of features yet and we prioritize things that are important and things which would have some affect on parallelism or architecture.



This is one of those only on HN moments. From grandparent's web page:

"A long time ago, in a galaxy far far away, I worked for a company called Sun Microsystems. One of the things I did there was to join a renegade band of engineers who were off in Palo Alto working on a technology that nobody within Sun could see any possible use for. That technology was of course Java."

http://www.mcmanis.com/chuck/java/


Out of interest, what are the reasons you don't want to be able to 'render everything we've ever seen'?


It is the complexity tradeoff. I built a really simple widget using an LCD which you can send "standard" HTML (as in the base DTD) over a serial line. Built using an Cortex M3, a SPI connected LCD, basically libft2, libpng, and a minimal drawing package. If you look at LED signs, this is one of those, but with a screen instead of LEDs and a more 'standard' layout engine. Three fonts, (serif, sans, fixed), three weights (normal, bold, italic), old skool HTML (no css, no script) and a simplified img tag. The most complex thing in the code is the table rendering code.

Basically if you want to throw up a quick status display, it can do quite a bit.

So I thought I would update it to HTML4, maybe a bit of ECMA script for animations. Did a couple of runs at it and found most existing code had a bunch of stuff in it so that it could render "commonly occurring, but non-standard" HTML pages. In order to fit in a limited memory foot print system I was ripping things out left and right but stuff has more side effects than Cymbalta. So I tabled the project.

Starting from the premise of rendering "just the stuff we promise we can render" allows me to add stuff until I run out of memory and then stop.


I worked for a medical software company which had to deal with incoming digital images (DICOM) from all kinds of manufacturers (Siemens, GE, Philips, and tons of other smaller players).

I once heard one of our lead developers say that 80% of the code we wrote was to work around problems in everyone else's implementations. The DICOM format has a bunch of predefined fields and supports custom fields that manufacturers can use, but a lot of manufacturers would, for no reason I could figure out, use fields incorrectly a surprising amount of the time. They would store one of the most basic pieces of information in the wrong field, and then store that data nowhere, or put it in a custom field, or store it in the wrong format, etc.

Enough manufacturers doing 95% of things right and 5% of things wrong, making everyone else jump through hoops to handle their broken behaviour, and suddenly your code grows massively. Sure enough, we had to write an entire system for remapping fields from certain machines so that incoming data from broken implementations would be somehow magically coerced into being correct (either by simply copying data over to fields, or more advanced post-processing).

The less time browsers spend wondering what to do with inherently, irredeemably broken HTML, the faster they can render content which actually does work properly.

Unfortunately, that only works in controlled environments, like store kiosks, intranets, etc. Otherwise you end up choking on the 95% of the content which is only 5% broken.


Your story is a perfect example of the dark side of Postel's law: "Be conservative in what you send, be liberal in what you accept." The people who care about writing software that Just Works™ end up maintaining software that is crushed under its own gravity. :)


I think the wisdom of Postel's law is questionable at best.


Apparently what we think of as Postel's law is a bastardization. http://erlang.org/pipermail/erlang-questions/2014-March/0781....


Added complexity and baggage for his use case?


I want Rust scripting support in Servo.

  <script type="text/x-rust" src="foo.rs">
Since Rust is a safe language this should be possible without compromising security, though I don't think anyone's yet attempted to write a JIT compiler for Rust. Has the Servo team considered this as a possibility?


Rust is safe in the sense that it avoids many categories of bugs; it's not safe in the sense that you can run untrusted code without causing security issues. For instance, you don't want a script in a webpage to be able to read/write arbitrary files on your computer, but a Rust program would be allowed to do that. You would have to restrict a lot of capabilities before it would be safe for your browser to run untrusted Rust scripts.


I'd imagine that a Mozilla project would just focus on compiling to asm.js...


There are a lot of problems with asm.js: the heap can't be resized after creation, the code size is really large (esp. with things like C++ templates, which are duplicated for each specialization), threads don't work, datatypes like 32-bit floats and 64-bit integers are problematic, interop with JS and DOM APIs is slow. These things can only be improved with browser cooperation, which sort of defeats asm.js's main advantage of working in any browser.

Native Rust support would be far better.


All of those things you mentioned are currently being worked on and will be added to JavaScript directly.


"Coming in a browser near you in another 5-6 years"...

I've followed Harmony / ES6 for like 7 years now.

Don't hold your breath. (And I know that you can get lots of these features with special switches in Chrome, FF etc. Also know about Traceur. I mean don't hold your breath about having it released and available all-around for real web work, frameworks updated for it, etc).


On Traceur: We're currently using a subset[1] of Traceur for out frontend coding and I can't say I've noticed any particular problems doing so. I also don't see any problem if libraries we were using would change to using Traceur, so long as they were to distribute the compiled output. Yes, we'd have to include the Traceur runtime, but we already do, so... (There could be versioning issues, but I suspect that the Traceur runtime is relatively static and so could remain "compatible enough" across versions. One could also easily imagine JS libraries built on Traceur distributing both the original ES6 source and the transpiled JS.)

[1] The caveat here is that we have to support IE8 (for now at least) which the Traceur runtime doesn't support, so there are a few features we cannot use (yet).


Does Traceur always polyfill ES6 features, even if the browser supports them natively?


Of course pure source transformations (classes, let, const, splats and such) just yield ES5 source directly, and so you lose native browser support for those. That's no big deal since most browsers really aren't really up to scratch yet in ES6 support.

I'm not sure about the runtime, but I think I remember it doing some (very basic) feature detection while browsing through the generated code during a debugging session.

EDIT: Frankly, I think it would probably be a good idea to polyfill all browsers until they all really implement the ES6 spec.


What's the solution being worked on for code size?


Code size currently is avoid equivalent to native builds, if you gzip both of them. The emitted code is basically similar to a native build except it's in text format (so the difference mostly vanishes after gzip).

You mentioned things like templates in C++ - those will be an issue in asm.js just as they are in a native build, no more and no less.


My experience has been that emscripten builds are significantly larger than native even after gzip. Templates are of course an issue in native builds too, but native builds are usually less size-constrained than web apps. Sending (minimized) Rust source code, as we do with JavaScript, would result in significantly smaller downloads (and less code to compile) vs. asm.js. Also shipping a standard library with the browser (as JS does) would be another significant reduction.


I would be curious to see your code, if that's possible - perhaps we can optimize things better or you are hitting a bug. How much larger was it, after gzip?

I'm not sure you'd want to send Rust source code - you'd need to wait for it to compile on the other side. (Rust is designed for many things, but fast compilation is not a primary goal.) Most likely you'd want to build it ahead of time on the server, just like C++, but then I'm not sure there would be a size advantage to Rust.


It seems to me like making a fast Rust compiler would be an easier problem than making JavaScript run native code fast :)

Thanks for the offer of help. Unfortunately I can't send our code, but I just talked to our guy working on emscripten and he said he's already been in touch (though not about code size yet). https://groups.google.com/d/msg/emscripten-discuss/n1qfKPPAy...

My impression of emscripten code size might be out of date, I know that development on asm.js has been pretty rapid.


It's not just a question of engineering resources; some problems are intrinsically hard. There may well be things (like floats in css) in rust that make fast compilation hard if not impossible - depending on what exactly you want to achieve.

After all, the point is to generate efficient code, so a solution that simply disregards most optimizations isn't an option; and most of the type system is likely unavoidable and/or critical to the optimizer.

By contrast, asm.js is fairly simple: All the heavy lifting of compiling has mostly been done - the remaining JS was chosen so that it fairly straightforwardly represents assembler instructions. Decisions about types and what that means for semantics (and safety) have been done, inlining, code rearranging etc. - that's all largely done. In a way, asm.js is assembler with a weird syntax and environment. I really doubt you'd ever get something like Rust (or really most any other statically compiled language) to ever compile as fast, given comparable resources.


Why do you want to script with Rust?


The same reason people want asm.js: speed, and avoiding the many problems of JavaScript.


Javascript is a terrible language, even though Rust is experimental it would be a big improvement. And it does have attractive aspects as a language.

More broadly, breaking the iron grip Javascript has on client-side development would be wonderful and Mozilla is the last stumbling block in front of this goal.


Insofar as scripting languages go, Racket/Scheme would also be a nice addition. After all that is what Eich was going for when he created JavaScript. He wanted a language with scheme-like semantics but other forces wanted it to have a C-style syntax.


Rust as a language is compiled, so interpreting it doesn't make much sense, and JIT only makes sense for interpreters.


This is one of my pet peeves. Any language can be either compiled or interpreted. I think what you mean is, "No one has written an interpreter for Rust." But that is also nonsense, as all compilers contain an interpreter. The difference is just that instead of streaming instructions straight to the CPU, compilers wrap the instructions in an executable format and save them to a file.


I was commenting specifically on "JIT Compiler". It should really be "JIT Interpreter". Naturally, it's possible to write an interpreter for any language that is compiled (though sometimes the semantics don't make sense). For a language like Rust, you'd probably want to compile to a bytecode to interpret, to allow all the compiler checks to be preserved.

Out of interest, I did a search, and there does seem to be a (depreciated) Rust REPL.


Like the interpreted language that is Java? :) A JIT compiler could make sense for Rust, though I'd agree that the usecases for that are probably not very interesting compared to some other languages.


But java bytecode itself is interpreted. For things like the web, you really want a language that can start executing immediately, so you really need to interpret, or have an agnostic bytecode.


I think it's a bit of a stretch to call the JVM an interpreter. Having said that, if you want to deliver Rust as bytecode, emscripten would seem to satisfy that usecase (though obviously that's not what the original author was talking about, and it doesn't work properly at the moment).



Note that the previous announcement was a bit premature: three weeks ago, on the date of that discussion, Servo was passing Acid2 only on a feature branch. As of two weeks ago the master branch should be passing as well.


When can I expect Servo to be in Firefox instead of the current engine? 2015/2016? do you have a rough idea?


There are no plans yet to ever integrate Servo directly into Firefox. At the very least Servo will certainly exist as a standalone browser engine (and unlike Gecko, Servo is designed to be embeddable, so it will be an actual alternative to Webkit in that space).

The biggest impetus for Servo at the moment is that it's researching the biggest wins to be gained from concurrency and parallelism in browser engines, thereby guiding efforts to tack such things onto Gecko.


Well, then I'll hope for a Servo equivalent of Luakit before too long.


Servo aims to be "dogfoodable" by the end of the year. It's still an experiment, really, so you won't be able to get anybody to commit to any plans further than that. The design is significantly different from Gecko that I suspect you'd never be able to wedge it into the existing Firefox, but it wouldn't be entirely implausible if a usable browser called Firefox used Servo within 5 years.


As well as the other replies to this, note there's no plan to implement a lot of the non-standard parts of Gecko that Firefox (and most extensions) rely upon, such as XUL and XBL. Servo is concerned with the web (at least for now) — not replicating Gecko's feature set.


As far as I know there are no short- or even mid-term plans to integrate Servo into Firefox. It's still a research project first. Rust for example doesn't even have a stable API yet.


It may never happen--It's a research engine, not necessarily something that's going to replace Gecko anytime soon.


> Many kinds of browser security bugs, such as the recent Heartbleed vulnerability, are prevented automatically by the Rust compiler.

Does anyone care to explain how this would work? If you used OpenSSL from Rust you would still be vulnerable to Heartbleed. Or am I missing something?


I believe the implication is that if OpenSSL had been written in Rust, Heartbleed would not have been possible.


Which might be true but seems odd to bring up in the blog post.

Also they say:

>> Many kinds of browser security bugs, such as the recent Heartbleed vulnerability, are prevented automatically by the Rust compiler.

Are they referencing reverse heartbleed here? Browsers themselves were not vulnerable to heartbleed, I don't even this they were vulnerable to reverse heartbleed.

There is no way Servo would have prevented the heartbleed bug, no browser could have, I feel like that sentence has no place in this blog post.


If the wording was "such as vulnerabilities similar to Heartbleed" would that make it better.

The point I was trying to make is that you can't make that kind of mistake in the safe Rust language. You will fail a bounds check even if you decide to trust client provided lengths.


I think that would have made it better. I can believe that Rust protects you from the mistake that led to heartbleed but the statement read (to me at least) more like:

"Servo protects you from that really scary thing the internet has been atwitter with for the past week whereas other browsers leave you high and dry"

I know that is not what you said and probably not what you meant but it just seemed like an odd way to word IMHO.


I updated the blog post with improved wording.


I believe its suggesting that Rust prevents the missing bounds check bug that caused heartbleed. Heartbleed is just latest example of such a bug.


> Browsers themselves were not vulnerable to heartbleed,

Clients could have been vulnerable to Heartbleed. Feel free to correct me on this, but I believe the only reason they weren't is that Chrome uses OpenSSL compiled without the heartbeat feature, and Firefox uses NSS.


Both Firefox and Chrome uses NSS (although I believe Chrome has a potential plan considering using OpenSSL at some point in the future).


Chrome on Android uses OpenSSL, FWIW. I have no idea whether it supported the Heartbeat extension though.


If you wrote OpenSSL in Rust, it would avoid bugs like Heartbleed.

If you called OpenSSL from Rust, it would be the same as calling C from Java.

Gist of issue, Rust's static type checker would prevent bugs that caused HeartBleed.


> Rust's static type checker would prevent bugs that caused HeartBleed

I believe its more a case that it would have been a run-time error due to Rust's automatic bounds checking. This is because Rust uses 'fat pointers' for strings, vectors and slices that include bounds information rather than a single, raw pointer like in C. Do note there are unsafe ways around bounds checking, but these are restricted to unsafe blocks, which makes them easier to audit.


No, you're being too specific. They don't mean Heartbleed and OpenSSL, they just mean memory bugs in general. I say this because they also brought that up in the Hacker News "Who's Hiring" thread in March:

"It is designed to be more memory safe (far and away the #1 cause of browser engine security bugs!)"


Of course it's true that Rust can't protect you from things done in libraries called across its FFI. It's also true that at the moment most of, or at least a lot of, real work done in Rust eventually ends up calling into some C library. But I think that will be less and less true as time goes on.


Lots of bits of Servo use the FFI, but we've been replacing them as we go along with Rust versions. We did this so we could stand up a whole browser as fast as possible and then iterate on the important pieces first.

As an example, we used to use Netsurf's C library for CSS stuff, but now we have our own parser and style system written in 100% Rust.


I think it's a great approach, and I plan to whole-heartedly enjoy watching more and more of those chunks get whittled away as time goes on. I'd love to see a pure-rust spidermonkey replacement at some point!


> It's also true that at the moment most of, or at least a lot of, real work done in Rust eventually ends up calling into some C library.

I'm not entirely sure this is a correct claim, unless you mean 'system calls are implemented by the kernel, which is written in C.'

While it's true that FFI in Rust is really good, most things (notably, _not_ crypto) are just straight-up written in Rust.


That's fair, and I used the phrase "a lot of" intentionally vaguely. Yes, I'm thinking of calls into the system or common system libraries (like openssl), but also things like graphics libraries, e.g. sdl or glut. Certainly there are more pure-Rust turtles standing on one another than is the case in most languages, but the bottom turtle is still C.

Edit: Servo's src/support directory is a good example of my general sense that large Rust projects still tend to rely on a good deal of C libraries:

https://github.com/mozilla/servo/tree/master/src/support


Yes, and that's the primary reason why we still need to use OS sandboxing features: the Rust type system can't protect the C code that we're using, and we have to use some C code, even if it's someday just kernel32.dll. But I'm confident that the sheer number of memory-related browser vulnerabilities that have been found and continue to be found in browser engine code means that the Rust safety features are a significant security advance. It's about reducing the attack surface.


This is what I see when I run Acid2 in Servo. Perhaps they haven't merged the changes in to the public repo yet.

http://cl.ly/image/1b123r220P3u


Yes, the fix (a submodule update for rust-layers) is blocked on a Rust upgrade which is currently in progress.


Is Servo using GC?


For the DOM thread only, yes. All layout data structures are not using the garbage collector.


And for those who don't follow Rust closely, it's important that it's "the DOM thread only," as Rust's opt-in GC is on a per-task (thread) basis, so the threads that don't use GC don't even know it's there or running.


(Note that Rust doesn't actually have its own GC yet, Servo is just using spidermonkey's JS GC. pcwalton's comment implies it is being used single-threadly though.)


There can be multiple script tasks even within a single page. For example, cross-domain or sandboxed iframes will have each DOM and script in its own task. Also, each of those will have their own layout and rendering tasks.


We use Spidermonkey's GC for DOM objects.


I'm curious what Chrome's plans are for the future, in particular related to parallelization. Has anyone seen any articles about that anywhere?


Eric Seidel talks here about some of the ideas in parallelizing things: https://www.youtube.com/watch?v=4Sm-DbIOqiU#t=818

And a 2014 Brainstorm thread with similar ideas: https://groups.google.com/a/chromium.org/d/msg/blink-dev/Z5O...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: