Very good points. We proposed a way to deal with DOM manipulation in
the paper [1], but Stefan omitted this in the blog post. Specifically,
Section 4 of the paper (the "Page access" paragraph) briefly describes
this. (Sorry for referring you to the paper, but our wording in the
paper is probably better than my attempt to paraphrase here.)
Of course there are other ways malicious extensions can used to leak
data---pick your favorite covert channel. But the idea was to propose
APIs (and mechanisms) that are not overtly leaky. (We are early in the
process of actually building this though.)
"To ensure that extensions cannot
leak through the page’s DOM, we argue that extensions
should instead write to a shadow-copy of the page
DOM—any content loading as a result of modifying the
shadow runs with the privilege of the extension and not
the page. This ensures that the extension’s changes to the
page are isolated from that of the page, while giving the
appearance of a single layout"
Could you elaborate more on this? Do you mean that you'll compare the network requests made from the main and the shadow pages? What if the main script is sensitive to the request receive time? Then the shadow DOM may act differently.
From more practical standpoint, having two DOMs for every page will eat even more of my laptop's RAM.
"Do you mean that you'll compare the network requests made from the main and the shadow pages?"
Essentially, yes. Requests from the extension should be treated as if
they are of an origin different from the page. (We could potentially
piggy-back on existing notions of security principals (e.g., that
Firefox has) to avoid huge performance hits.) And if the extension is
tainted the kinds of requests will be restricted according to the
taint (as in COWL [1], likely using CSP for the underlying
enforcement).
"What if the main script is sensitive to the request receive time? Then the shadow DOM may act differently."
If by main script you mean a script on the page, then there should be no real difference.
"From more practical standpoint, having two DOMs for every page will eat even more of my laptop's RAM."
I hope this won't be so bad down the line (assuming we'd be able to
leverage some underlying shadow DOM infrastructure and that performs
relatively well).
I think part of what you are proposing Chrome already implements and calls "isolated worlds". Chrome extensions don't operate directly on the page's DOM, they have an isolated version of it (https://developer.chrome.com/extensions/content_scripts).
So in principal, we already have the context from which we can decide which network requests to allow or block (this is already used today to allow cross-origin XHR from content scripts).
However, it is super gnarly to implement your idea in practice because:
1. There has to be a connection traced from every network requests back to the JS context which ultimately caused the request (e.g., by mutating the DOM). This is doable, it's just a lot of work.
2. There can't be any way to execute JavaScript in the page - even with page's principal. Such mechanisms exist today by design because developers desire them.
3. Even if you do 1 and 2, there are still channels such as hyperlinks. The extension could add a hyperlink and get the user to click on it. I suppose you could try and tie that back to the script context that created or modified the hyperlink.
4. Even if you do 1-3, if you can induce the page script (or any other extension, or the browser, or the user) to request a URL of your design, you win.
Sigh. Still seems fun as a research project to see how close you could get.
Yep, isolated worlds is definitely what we want, and part of the inspiration for the particular (DOM modification) feature we proposed.
I think CSP helps with 1 & 2, unless I'm missing something? (Our labels map down to CSP policies pretty naturally.)
Points 3-4 and phishing, in general, are definitely a concern. Unfortunately, I'm not sure that a great solution that does not get in way exists, but we'll see how close we can get :)
CSP does help, but it is document-specific, not js-context specific. At least in Chrome, tracing the js context that was responsible for some network request would be significantly more difficult to implement.
Of course there are other ways malicious extensions can used to leak data---pick your favorite covert channel. But the idea was to propose APIs (and mechanisms) that are not overtly leaky. (We are early in the process of actually building this though.)
[1] https://www.usenix.org/conference/hotos15/workshop-program/p...