Cloudflare ReCAPTCHA De-Anonymizes Tor Users

mmaunder · on July 19, 2016

"The Tor design doesn't try to protect against an attacker who can see or measure both traffic going into the Tor network and also traffic coming out of the Tor network. That's because if you can see both flows, some simple statistics let you decide whether they match up."

https://blog.torproject.org/blog/one-cell-enough

Work on a client to try and mitigate the risk of timing attacks:

https://news.ycombinator.com/item?id=9585466

djsumdog · on July 19, 2016

I remember someone at a security conference talking about a kid at a University who sent a bomb threat via Tor.

The University simply looked their their logs to see who was connecting to known Tor nodes, narrowed it down by time and found the kid.

Source: http://www.theregister.co.uk/2013/12/18/harvard_bomb_hoax_ch...

mmaunder · on July 19, 2016

Good opsec involves multiple layers of security. There's a fun talk at defcon next month on extending wifi range to avoid detection along with signal 'hiding' via SDR:

https://www.defcon.org/html/defcon-23/dc-23-speakers.html#Gr...

_asummers · on July 19, 2016

There's another good talk on more general opsec from Defcon a few years ago called "Don't fuck it up".

https://youtube.com/watch?v=J1q4Ir2J8P8

rev_bird · on July 19, 2016

This is great, thanks for linking it. I'm really interested in this kind of stuff, but it's so hard to find resources on it. Lots of people on Twitter making fun of mistakes people make, not a lot of folks giving advice.

j_s · on July 20, 2016

https://news.ycombinator.com/item?id=6521145

https://news.ycombinator.com/item?id=6521517

These discussions are definitely post-mortem analysis but there is also discussion on another OpSec presentation:

http://www.youtube.com/watch?v=9XaYdCdwiWU

blastrat · on July 19, 2016

good opsec also involves logging access to the internet and seeing who sent bomb threats... oh wait, whose side am I on?

zaqwsxcde · on July 20, 2016

In asymmetric warfare, being the little guy tends to be the greater challenge, and carries harsh, personally felt consequences.

Protecting livestock, so they can be fleeced by their proper owner, on the otherhand, doesn't imbue the same sort of charisma upon the good shepherd.

ayyn0n0n0 · on July 19, 2016

I will so be there!

Relys · on July 19, 2016

Same! =D

jsmthrowaway · on July 19, 2016

That situation becomes quite relevant when debating "VPN then Tor" or "Tor then VPN," which I've seen people come down on both ways. Ultimately it depends on who the threat is.

Example: https://thetinhat.com/tutorials/darknets/tor-vpn-using-both....

drewbug · on July 19, 2016

What about VPN -> Tor -> VPN?

prdonahue · on July 19, 2016

Yeah, because 5,000ms latency is fun.

lucb1e · on July 19, 2016

For a bomb threat? Or more benignly, uploading a few documents to a whiteblowers platform (some news organizations have one)? No problem I'd say.

jsmthrowaway · on July 19, 2016

Have you tried TCP with 5sec latency? It can barely window. Shit, dialup was better, and that would still cost you half a second or so for a full-MTU packet.

I see your point, don't worry, it would just be a lot more rough than you're implying, particularly to upload many heavy PDFs. (I kind of want to lab it now that we've discussed it.)

lucb1e · on July 19, 2016

I meant a delay of a few seconds, not strictly >=5s, but still I wanted to prove you wrong even about 5 seconds.

Turns out Cloudflare deems 5 seconds latency too much. I thought most default timeouts were something like 30 seconds, and when writing applications myself I usually limit them to 8 or 10 seconds (to be able to get back to the user quick enough with an "unable to connect" error, but to also give it a moment). I expected that 5 seconds latency would be slow, but not unbearable. Instead it breaks stuff completely.

From my testing, 3.5 seconds latency works fine. Slow, but it consistently works.

Adding 5 seconds latency just breaks TLS connections to Cloudflare, though DNS, TCP and HTTP work. I was able to retrieve a webpage (via netcat) from my site, the redirect to HTTPS from http://news.ycombinator.com worked, and pinging showed a consistent 5030ms +/- 10ms latency.

Adding 3.5 to 5 seconds latency (random variance) makes, as expected, some connections fail. On my second try, I was able to load the Hacker News homepage, consisting of: 1) the homepage; 2) robots.txt (wget retrieves this); 3) css file; 4) javascript file; 5) favicon; 6-10) 4 images. In total 10 resources, taking a minute and 20 seconds. After establishing a TLS connection, it could reuse that connection for multiple requests. I think this is equivalent to hitting reload a few times in a browser.

So VPN-TOR-VPN might work still, though it is indeed more on the edge than I expected. Thanks for making me venture here, I learned something!

shivsta · on July 19, 2016

Tor is already slow - people use it because they want security. The addition of another VPN increases security greatly and only adds a minimal amount of more latency.

lucb1e · on July 19, 2016

> Tor is already slow

Relatively, sure. But I've found it very usable in recent times actually. Used it almost full-time (besides a normal Firefox instance for the company's intranet) to get around some silly firewall that wouldn't let me download "hack tools" (I was an intern in the cyber security department, security tools were part of my job). There were times where I didn't notice at all that I was using Tor, and most of the time it was comparable to mediocre wifi.

rycfan · on July 20, 2016

You probably should have spent some time fixing that hole in the firewall that let you bypass your company's download restrictions. ;-)

halfcat · on July 24, 2016

You would usually not try to block Tor at the network level. You would lock down your computers so employees can't make changes, and only allow them to run executables from locations which they have no write access to.

lucb1e · on July 20, 2016

Tor can and should circumvent any firewall using obfuscation proxies that use AWS, GCS, Azure, etc. You'd need to block most of the internet to kill Tor.

And as for monitoring, I guess it might be possible, but if someone thinks to use bridge nodes that's also defeated.

616c · on July 20, 2016

God I wish grugq still wrote currently.

https://grugq.github.io/blog/2013/12/21/in-search-of-opsec-m...

I discovered he moved to Github pages after years only to realize he has not published much since I first heard of him.

nyolfen · on July 20, 2016

he has a tumblr which he updates frequently, but it's mostly just links to things he finds interesting: https://grugq.tumblr.com/

mrb · on July 20, 2016

https://medium.com/@thegrugq

meowface · on July 19, 2016

It's funny, because just with a little hostname and MAC spoofing, he probably could've gotten away with it even if he sent it from their wireless network with no anonymization whatsoever. (Depending on how it does authentication, at least.)

MichaelGG · on July 19, 2016

Nope, not needed. He had gotten away with it! There was no evidence that the target was on campus. Just a lucky lead for the university. All he had to say was he was browsing "personal" sites, or looking up Onion sites out of curiosity, etc. Just a basic cover. Instead, he caved immediately.

Maybe they would have found other evidence. But it wouldn't have been just him connecting to Tor.

jgrahamc · on July 19, 2016

This short piece doesn't have much detail. But if reCAPTCHA is usable to deanonymize Tor users then I would like to know about it in detail so I can do something about it.

tedunangst · on July 19, 2016

I didn't see anything that makes it unique to recaptcha. Any fingerprint able traffic pattern that can be observed coming and going will work.

I could make a website that adds random(1, 64) one pixel images to each page. As you browse the site, you'll be broadcasting 6 bits of identifier with every click.

jerf · on July 19, 2016

I don't see anything that makes this unique to CloudFlare, either.

(You imply this in your point, but given the specificity of the accusation, I think it's worth clearly pointing out.)

SwellJoe · on July 19, 2016

I believe the "unique to CloudFlare" element is that CloudFlare effectively sees traffic for significant portions of the web...but is one entity. So, a powerful enough hostile actor (say, a state) would only need to compromise one entity (CloudFlare) to exploit users of thousands of websites, including many major ones. Er, well, two entities, because they also need entrance data. So, if a state were to compromise an ISP and CloudFlare it would give that state a lot of Tor users identities.

Very few small-ish entities have such a large reach and can interject themselves into so many connections on the web.

jerf · on July 19, 2016

But if we're talking about The Adversary, then they're already deeper in than CloudFlare will ever be, so... what's different?

SwellJoe · on July 19, 2016

CloudFlare is an endpoint for predictable actions; and serves overseas websites. So, even a state that doesn't necessarily have power to infiltrate a foreign host might still be able to identify users of those foreign hosts.

So, say I use Tor to make political comments on a foreign website; one that I have reasonable trust is outside the reach of my government. But, say that website uses CloudFlare and CloudFlare has servers that are within reach of my government. That's the difference. It is a difference of degree, rather than kind, but a difference nonetheless.

Powerful actors have always had some ability to compromise Tor by compromising the requesting side (the ISP of the target of an investigation, for example), and then the receiving side (the website where your suspect does the thing you're investigating them for...possibly a honey pot setup specifically to catch people who do this thing, or possibly a website whose owner has already been arrested, prosecuted and made a deal that allowed access to the systems). CloudFlare just adds an additional element of uncertainty for Tor users: Will this CAPTCHA take place in a way and place that allows someone to narrow down my identity?

As with a lot of the security concerns about Tor, one has to take it as weights on a scale. Who are my attackers, and what level of attack can they bring against my traffic? If your privacy concerns don't include state level actors, then this is probably a theoretical attack. If your attackers do include state level actors, then it is a concern to be aware of. State level actors have other means of compromising your identity and traffic, of course, but this is one of them, and if I understand it correctly, it is a valid concern.

yoo1I · on July 19, 2016

The difference is that reCAPTCHA provides a detectable traffic pattern and is already widely deployed. This provides plausible deniability. Other than that, I don't see a difference.

sitkack · on July 19, 2016

The problem is that traffic is passed all the way through the network. A solution would be to have a transformative proxy on the inside of the tor network running as a hidden service that made requests on your behalf. Then it could possibly respond with .har files or some other transformed asset that doesn't match the same traffic signature. VPN into Tor, terminate at a high level proxy and then exit Tor through this intermediary.

onecooldev24 · on July 19, 2016

Data between you and tor nodes are encrypted, no way your idea will work.

tedunangst · on July 19, 2016

Encryption doesn't obscure the size or frequency of requests.

onecooldev24 · on July 19, 2016

It does obscure the size, maybe not the frequency. Best have JS disabled when you come across this.

garrettr_ · on July 19, 2016

> It does obscure the size

Encryption does not inherently obscure the size of plaintext. Protocols may choose to pad plaintext for various reasons, and both Tor (since Tor always sends fixed-width cells) and TLS (when it uses a block cipher mode) do so. However, the amount of padding is typically small and can hardly be said to "obscure" the size of a request - it is not a defense against traffic analysis.

onecooldev24 · on July 19, 2016

> since Tor always sends fixed-width cells

you send 1px of data or 500px tor always send fixed width cells. There is no question of padding here.

mynameisvlad · on July 19, 2016

I believe what he's saying is that it'll pad to fit those cells. If you're sending 500b of data in 16b chunks, you'll need some padding (12b) in there to fit into 32x16 = 512b.

onecooldev24 · on July 19, 2016

If the data is just a few bytes, tor would pad it with null and then encrypt it. The final encrypted cell would have no revealing information except for the time it was sent out at.

drdaeman · on July 19, 2016

As I get it, it's irrelevant to JS. "Select all X" image-based captcha can be done without it. For example, a set of checkboxes with CSS background-image for its :checked (or :hover/:active/whatever) states would also do the trick (unless you patch/configure your browser to not optimize/delay the load of resources until they're actually required)

yoo1I · on July 19, 2016

It has enough detail:

The claim is that an adversary who can measure traffic on CloudFlare's side (i.e. you) and the users ISP (i.e. your hypothetical friend mallory) can collude by measuring and comparing the bursts of packets generated during puzzle solving on the ISP side and the receipt of said packets on CF's side.

This information is enough to figure out that Alice wanted to reach example.com via TOR.

This works because reCAPTCHA has a detectable data signature. But you are in the position to inject any javascript you like anyhow, so it's not really reCAPTCHA specific in a technical sense, it's just that that would be a good coverstory if Eve were to try to make you and mallory cooperate to de-anonymize Alice.

jgrahamc · on July 19, 2016

But it's a Google-served reCAPTCHA so there's nothing to measure on CloudFlare.

So I disagree that there's detail here. Need real technical detail to be able to take action.

If this were a paper or PoC then would be different.

If there's a way to do that then please report it to us.

yoo1I · on July 19, 2016

You're right of course wrt to Google serving reCAPTCHA, in this case you'd just be providing plausible deniability if Google and Mallory-ISP were to collude to exploit this.

Tor users on google fiber take note.

lucb1e · on July 19, 2016

> users on google fiber take note.

This is why I got quite scared when I first heard of Google Fiber.

It's in Google's interest to provide good, fast and cheap service: they will gain more customers and more people will be able to use more Internet services (many of which are from Google or use Google -- adwords, analytics, etc.). Thus they provide speeds for prices that are very hard to compete with for normal ISPs, since normal ISPs don't have the luxury of being the world's most popular, well, so many things (search engine, mapping service, email service, ad service, etc.).

If one company knows everything about you and controls a big enough stake in your life, that sounds very scary to me. Not because Google is bad, but because it's one company able to control many basic services.

jsmthrowaway · on July 19, 2016

That's all well and good except for, to my knowledge, the reCAPTCHA widget being served by Google and communicating solely with Google as 'jgrahamc points out. The amount of Cloudflare blame in this article does not mesh well with plain logic (I'm kinda disappointed with Cryptome, TBH). I believe the only thing handed to Cloudflare during and after the reCAPTCHA is solved is a token of some kind indicating Google's confidence the user is real. If I'm wrong about this I'd be surprised, because it would then be possible for server-side operators to tamper with Google's machine learning that they're doing with reCAPTCHA users.

I'm almost positive these claims are completely false, for example:

> Cloudflare can conveniently serve few more images to specific users

> Each click on one of the images in the puzzle generates a total of about 50 packets between Tor user's computer and the Cloudflare's server (about half are requests and half are real-time responses from the server.)

> The packet group has predictable sizes and patterns, so all the adversary has to do is note the easily detectable signature of the "image click" event, and correlate it with the same on the Cloudflare side.

There is no API documentation in the reCAPTCHA widget about your server having to handle real-time requests from users solving the widget or serve images, so there is no Cloudflare side. It wouldn't make sense from an API perspective; why would I have to add a bunch of code to my server to handle this stuff? Google runs that. Look here:

https://developers.google.com/recaptcha/docs/display

Do you see a "handle real time image click events" API here for Cloudflare to deploy? You do not. Google would have to build backends for their machine learning and fraud detection algorithms in every language an API user would ever run, and then they also lose obscurity by shipping them. The image click events almost certainly go only to Google, never Cloudflare, so I think whoever sent this tip didn't understand what they were looking at in Wireshark.

The possible threat vector here is Google, not Cloudflare. Cloudflare just happens to have deployed Google's reCAPTCHA widely. The article is misleading and incredibly light on important detail; how about even a screenshot of a packet capture showing traffic to Cloudflare? If you want my honest take, I read this as a Tor user annoyed they have to solve reCAPTCHAs on Cloudflare sites (the "insistence" and quoted "protects" bits are the clue) and looking for something to hit them with, and a lack of diligence on Cryptome's part before posting it.

https://www.gstatic.com/recaptcha/api2/r20160712125018/recap... is the current version of the widget if anybody is curious, but I haven't looked closely.

drostie · on July 19, 2016

That's because this is a speculation; e.g. "this obvious opportunity is not the proof" is admitted in the text itself. As mentioned, the requirement is the ability to correlate two different traffic signals: from your computer to Tor, and from Tor to the exit. So, the agency trying to trace you needs to be listening at both of those points. Their approach is simply to have one of those be the ISPs (presumably this ranges into the hundreds of thousands if not millions of computers) of US citizens (foreign ISPs would seem much harder to monitor), and the other being the CloudFlare servers (only maybe hundreds or thousands of machines needed to log this?).

Actually, watching the entrance and exit nodes in this fashion is probably more expensive than simply hosting your own entrance and exit nodes. It would be within the NSA's power to, say, host or monitor 500 of the 1000ish exit nodes by now, collecting 50% of the exit traffic at almost no real cost. Entrance traffic is harder as the network is larger, but if you hosted (or, again, captured the traffic to) another 2k non-exit relays you might be able to capture 10-20% of the entrance traffic. The basic points I'm making here are: (1) that there are way fewer relay nodes to monitor than there are ISPs, if you would prefer surveillance; and (2) you are not restricted to surveillance or even to your own nation--there's literally nothing stopping the NSA from purchasing VPSes in the Netherlands and Germany and Sweden and running Tor on them, and it'll seem like a very geographically diverse set when you're looking at it with Vidalia.

Combined together the NSA can maybe deanonymize about 5-10% of the Tor traffic to the Internet right now with a much cheaper method, and this is where it gets interesting: the Tor default is to have 3 hops, which means that in addition to correlating traffic patterns you get to correlate on the IP address of the hop in the middle, even if that hop is not colluding with you. So even in the face of network jitter you have a 32-bit identifier which links together packets above and beyond simple network traffic into or out of Tor. And you only need to operate a few thousand computers to do it -- far fewer than you'd need to monitor the US ISPs in general.

You can also try to watch specific popular exits like Cloudflare, but doing this removes this awesome IP address that you get for the middle hop, and you still need either a relay node or else to be tapping a given user's IP, to try to deanonymize them.

akerro · on July 19, 2016

If deanonymization would be based on size and patterns of TCP connections made to cloudfare server you should use BSD configuration flag to make them more random with net.inet.ip.random_id=1. But as long as we can't measure this... it's no proof and no defense.

tedunangst · on July 19, 2016

Random IP IDs won't do jack to prevent correlation. Probably even make it easier if actual fragments are involved.

pyromine · on July 19, 2016

I didn't realize just how fragile TOR is. . . While I understand that remaining anonymous requires adjusting your browser habits somewhat extensively, the fact that a ReCAPTCHA is enough to (theoretically) de-anonymize a user seems to me that it's not able to anonymize at all when browsing.

While TOR may be useful for evading firewalls, my general perception of the project has changed from general anonymity tool to a tool tailored for very specific use.

Granted, this is probably what my understanding always should have been.

0xmohit · on July 19, 2016

> I didn't realize just how fragile TOR is. . .

It's JavaScript that causes it (you could choose to disable it [0]). The FAQ [1] warns of it:

But there's a third issue: websites can easily determine whether you have allowed JavaScript for them, and if you disable JavaScript by default but then allow a few websites to run scripts (the way most people use NoScript), then your choice of whitelisted websites acts as a sort of cookie that makes you recognizable (and distinguishable), thus harming your anonymity.

...

Until we get there, feel free to leave JavaScript on or off depending on your security, anonymity, and usability priorities.

[0] https://www.torproject.org/docs/faq#DisableJS

[1] https://www.torproject.org/docs/faq#TBBJavaScriptEnabled

onecooldev24 · on July 19, 2016

Not only javascript, you can have a http server that can send timed responses/packets and that would still work. If network traffic is being monitored at the modified server and ISP.

walrus01 · on July 19, 2016

If I were a national signals intelligence agency with a correspondingly huge multi-billion dollar budget, it would be trivial to run a large percentage of tor exit nodes... You could probably achieve it with 500 individual 1U servers colocated with random hosting companies around the globe at a budget of $250/mo * 500 = $125,000/mo, which is a tiny drop in the bucket compared to the traffic analysis capability it would give you, with the ability to capture all traffic entering/exiting each node's world-facing public ipv4/ipv6 interfaces.

edit: The major challenge would probably be continually violating various hosting companies' TOS/AUPs and getting service shut off, which would be a continual churn of provisioning new physical servers, shipping them to locations, arranging for plausibly deniable billing, etc.

nitrogen · on July 19, 2016

Hypothetically an intelligence agency could "persuade" hosting companies not to shut down their boxes. One could speculate that they might do so by staging an investigation of their own box, thus getting two boxes in place.

nickpsecurity · on July 19, 2016

I formulated the attack myself with your numbers seeming similar. It's one of reasons I didn't trust Tor. The success rate described in Snowden docs indicate NSA might be doing this experimentally. I don't think they're fully committed to point where they're running most nodes or anything. Being careful.

The difficulties wouldn't be as much as it seems. They probably wouldn't even be shut down that often. Just a small number of high-bandwidth nodes from front companies would net them a lot of intel. They could also partner with Five Eyes and Euro agencies as they all seem to want to de-anonymize Tor users. Each could have fronts doing it with their own operational techniques to muddy the situation up. Again, probably already do in a small way.

We haven't even discussed QUANTUM-ing the Tor servers. They really, really need memory-safe machines & implementations from CPU up if they're expecting to withstand high-strength attackers. Haven't looked at code or supported OS's in a while but I'm guessing default implementation doesn't fit that bill. ;)

fweespeech · on July 19, 2016

Wouldn't running your own entry node be enough protection still (assuming you could guarantee it wouldn't be compromised)?

As far as I'm aware, control of entry & exit is required for these sorts of attacks.

Running the entry node with a consistent entry point and using it as a random walk crawler with a real browser would seem to be enough for personal use as long as you aren't a criminal worth active, serious investigation that is targeted to reveal you.

tedunangst · on July 19, 2016

These attacks require observation of entry and exit.

nickpsecurity · on July 19, 2016

Exactly. Taps further enables that. Malware even more. Just a metadata, recording system could fo plenty and not take much bandwidth to leak.

SEJeff · on July 19, 2016

Tor was sponsored by the USG to allow intelligence informants to not be uncovered. That is the "very specific use" it was originally envisioned for.

See: http://cryptome.org/0003/tor-spy.htm

mikeash · on July 19, 2016

That was exactly my reaction upon seeing the description of the problem. Seems like the title of the article should really be, "Tor kind of sucks at anonymizing users." If all it takes is 25 requests sent in quick succession, then surely half the web pages out there share this same problem just from loading various resources.

rohit89 · on July 19, 2016

As I understand it, this is a fundamental problem for any low latency network. You could fix the problem by introducing delays but that would break the low latency requirement.

mikeash · on July 20, 2016

Makes sense. I wonder if that could be made a tuneable parameter, so users could choose what sort of tradeoff they preferred. That might impose unacceptable storage costs on the router nodes, at the least.

marcosdumay · on July 20, 2016

Anonymity is fragile. One can only ever break it, never fix.

bostik · on July 19, 2016

In other news, a global passive adversary can use traffic analysis, timing data, and known patterns to deanoymise a Tor user.

The only "new" thing here was the rough traffic pattern analysis of CF captcha page.

cuonic · on July 19, 2016

One way around this is to disable javascript for ReCAPTCHA, the service provides you with a rather primitive HTML form with checkboxes over the images, generating only one request on submit.

happyslobro · on July 19, 2016

Yeah, this again. You can't secure your system, if you are running your adversary's code. Tor is upfront about this, this is why Javascript is disabled by default, and why there is a warning if you enable it globally. I suppose this does make for decent clickbait headlines though.

subliminalbrad · on July 19, 2016

TorBrowser does not disable Javascript by default, and neither does TAILS.

happyslobro · on July 19, 2016

It is shipped with the noscript plugin enabled, that part is pretty important. You aren't just referring to the fact that JS is disabled via a plugin, are you?

sp332 · on July 19, 2016

Isn't this explicitly outside Tor's threat model? https://svn.torproject.org/svn/projects/design-paper/tor-des... See section 3.1

Johnny555 · on July 19, 2016

Why is this phrased as if it's Cloudflare's fault?

If it's this easy for a side effect of a recapcha image to de-anonymize a Tor user, then this seems like a failing of the Tor protocol that they should fix. Maybe they need to introduce more jitter, repackage requests into a single stream with consistent (or randomized) packet size, or pad the packets with random data.

Grollicus · on July 19, 2016

It is reCAPTCHA's fault. (kind of, I mean its just the way its built)

Still, if you browse the web via Tor you will see reCAPTCHAs everywhere because of Cloudflare. And that IS Cloudflare's fault, and its really annoying which is propably why this article is phrased this way.

rohit89 · on July 19, 2016

Introducing jitter would mean increasing latency and generally slowing down your browsing.

softawre · on July 19, 2016

as a tradeoff for security? Ok..

I mean, using a VPN is slower than not, but tons of people use them

rohit89 · on July 19, 2016

The effect will be exaggerated since tor traffic goes through three nodes and random jitter would need to be added for each. In general, beating traffic analysis in a low latency network is a hard problem. You could have perfect anonymity if you could add arbitrary delays but that would make the network unusably slow.

Johnny555 · on July 19, 2016

So make it a configurable option that you can turn on and off when you need it.

Having to wait a few seconds for a web page to load seems like a small price to pay to avoid government agents banging on your door because you're looking at "subversive content".

rohit89 · on July 19, 2016

The delays would have to be longer than that, not just a few seconds. You're looking at minutes to hours for strong guarantees. Also having a configurable option could make you more fingerprintable.

mikegerwitz · on July 19, 2016

Traffic analysis is always a problem; this is a specific case, but I'm not sure this is anything new.

Many attacks on Tor are facilitated by or require JavaScript. Consider disabling it rather than executing arbitrary, untrusted software on your computer automatically.

daxorid · on July 19, 2016

Traffic analysis is always a problem

Not always. High-latency mix networks with fixed message size and randomized transmission are very robust against traffic analysis.

The elephant in the room is, as usual, PEBKAC. The demand by users for low latency will always be the killer for anonymity networks.

captainmuon · on July 19, 2016

Huh, I always thought that Tor breaks up traffic in a random, but deterministic (not data dependent) way - sometimes joining data from two packets into one network packet, sometimes splitting packets and holding data for a while [x]. That's how I explained the jitter to myself. Sometimes a connection would be really fast, and sometimes it would hang on a single packet for hundreds of ms. Seems I was mistaken.

In this case, it would have helped a bit, since an attacker would not have seen the characteristic staccato of the reCAPTCHA exchange. They would have seen a few kB in either direction, in 40-100 packets, over a period of a few seconds. If the implementation is clever, on end would even have a different signature than the other.

At least this is something I would have included in Tor. Now that I think about it, randomly introduced delays (from the outside) might actually be a technique to deanonymize users....

([x] You'd generate packet sizes and minimum transmission times from a known seed. First packet is 501 B, 24 ms later a packet of 2048 B, then 15 ms later one of 1718 B, and so on. If there is not enough data after a grace period, pad with junk. If you constantly need more time to send packets than allowed, or need to pad, then adjust the model. Also choose the model to match regular traffic if possible. Disclaimer: I'm just making this up on the spot and am no expert, but it seems plausible and obvious to me.)

baby · on July 19, 2016

This is called a mix network and only one server currently does that in Tor's network.

niij · on July 19, 2016

Could you please expand on this? I tried to find more info of a Tor relay running mixing, but couldn't find anything. I would like to turn it on for my servers if possible.

hewhowhineth · on July 19, 2016

I stopped visiting sites with image recognition reCAPTCHAs. It has to be one of the worst UX patterns ever devised. It's dirt cheap to automate them away so it doesn't really stop any self-respecting bot maker, and it comes at a price of being a huge pain in the ass for a real user. Every time I run into them I felt used and abused.

It's really sad. So much brain power and this is what they come up with.

Apologies for the rant, couldn't help it. ReCAPTCHA is one of very few things I genuinely hate.

mpitt · on July 20, 2016

> It's dirt cheap to automate them away

Have any examples?

tlrobinson · on July 19, 2016

Are there any anonymity networks that transmit streams of packets between nodes at a constant rate regardless of whether it's being actively used?

Obviously it would be a very bandwidth hungry network, though if exit node bandwidth is currently the limiting factor (is it?) then maybe not entirely impractical.

sp332 · on July 19, 2016

There are a few systems based on the Dining Cryptographers protocol, but the ones I've seen implemented are very slow and don't support many users.

MichaelRenor · on July 19, 2016

It's bizarre that this article is critical of Cloudflare. If TOR can't stand up to a recaptcha without leaking PII, then it sounds like TOR ultimately needs to be fixed.

I stand by Cloudflare. So much malicious traffic comes through Tor that administrators need to do a lot to protect themselves from it.

rvern · on July 19, 2016

Almost all the pages I see CAPTCHAs on with Tor have absolutely nothing to protect. Arguably the website owners who use CloudFlare are far more to blame than CloudFlare itself. Nevertheless, I believe CloudFlare should stop using reCAPTCHA and create a challenge system that does not require JavaScript, and does not require sending requests to any website other than the website being visited. I like their proposed solution[0] of automatically creating an onion service for websites using CloudFlare and redirecting Tor users there.

[0]: https://blog.cloudflare.com/the-trouble-with-tor/

gnud · on July 19, 2016

I wonder why these anti-abuse systems don't use proof-of-work. Instead of a captcha, let the browser chug for 5 seconds, and then POST the solution in order to gain a temporary access cookie.

Sure, this could be attacked - but not at scale, and that's the whole point of the capchta anyway, right?

mikegerwitz · on July 19, 2016

That requires JavaScript.

CloudFlare does have a JS-only challenge, which presumably does this type of thing, but this has a couple different problems. From a security perspective, you're executing arbitrary software, which is unwise, especially if you're looking for anonymity. The other issue is that the software is also proprietary.

https://support.cloudflare.com/hc/en-us/articles/204191238-W...

"During a JavaScript challenge you will be shown an interstitial page for about five seconds while CloudFlare performs a series of mathematical challenges to make sure it is a legitimate human visitor."

Related: I have started trying to get into contact with webmasters of sites that enable JS Challenges; my template is at the bottom of this page; it'd be great if others could do the same:

https://gitlab.com/mikegerwitz/dotfiles/blob/master/emacs.d/...

gnud · on July 19, 2016

Good point. The Cloudflare capthchas I've seen seemed to use Javascript, but maybe it's just incredibly good CSS.

Interestingly, while a javascript calculation might leak more information to CloudFlare (since they might collect other info beseides the result of a proof-of-work function), it would probably leak less to anyone trying to analyze tor traffic from the outside? Seems to me like it would be harder to correlate the two ends of the tor circuit.

mikegerwitz · on July 19, 2016

It'd be harder to do traffic analysis, yes, though I really wonder why it makes so many requests to begin with. I'd like to see an analysis of the CAPTCHA.

Considering that they allow it without JS enabled, I wonder why they'd need any requests at all.

JumpCrisscross · on July 19, 2016

> The Cloudflare capthchas I've seen seemed to use Javascript

They have a non-JS version, too.

majke · on July 19, 2016

Browsers vary drastically in javascript performance. Counting something can take couple of milliseconds on your desktop and seconds on some old mobile phone.

Ok, so you could differentiate the work for User-Agent, but then one could spoof user-agent to get less work. Going this route is not gonna be simple.

mikeash · on July 19, 2016

I set up a system like this for my blog comments. The work was a couple of orders of magnitude slower in the browser than it would be in native code. That means that if my users have to burn 30 seconds of CPU time, an adversary could potentially burn 0.3 seconds of CPU time per spam comment, which is not all that significant. Bumping it up beyond that makes the user experience suck, and this is for a comment system where you still have to spend time writing the comment. For just loading a web page, anything over a second or two will hurt.

It's fine for my purposes, because I have no dedicated adversaries to combat, just occasional small-time spammers. But CloudFlare has a tougher time.

tokenizerrr · on July 19, 2016

Not really. This wouldn't do anything against targeted spammers, where the spammer targets your site specifically. This happens more than you might think.

cocotino · on July 19, 2016

This is what TeamSpeak does, by the way.

the8472 · on July 19, 2016

Don't most recaptcha http requests go to google, i.e. wouldn't google be the one with the information/control necessary to de-anonymize?

Illniyar · on July 19, 2016

A lot of comments here talk about recaptcha having a distinctive traffic signature, but I don't understand this.

Why does recaptcha have a distinct signature and if it does couldn't an attacker just make a distinct signature without recaptcha?

And why does recaptcha have a traffic signature that can distinguish between users? I mean how does a simple request response create a distinct traffic?

captainmuon · on July 19, 2016

Right, include Javascript (or heck, just a bunch of images of different sizes) that open requests in Morse code. Long request, long request, short request, long request...

Or, my favorite, the binary search (assuming you control the server / the network in front of the server / some exit nodes, and can monitor the traffic of your targeted user): have sites that cause transmissions for some time (long running JS / requests, or just a lot of content the user interacts with). Freeze 50% of the servers connections. Is the user still connecting? Then s/he is in the 50%. If not, in the other. Repeat until the is user matched to activity on the server.

mr_potato_face · on July 19, 2016

One issue here is that recaptcha's traffic signature allows for a passive attacker. So instead of needing to modify the website a user visits, or trick them into going to a domain you control, you are able to just look at the traffic logs and identify them without them knowing.

I'm not too sure about what defines the "distinctive" signature of recaptcha, but I expect timing of when packets of certain sizes were sent (e.g. loading 9 images in quick succession will generate larger packets, and then you often get the incremental couple images with human-length delays between them).

danthejam · on July 20, 2016

Having a dynamic IP from the 3rd world, I can't help but notice how so many of the sites I visit are behind CloudFlare. 70% of the time I have to solve a captcha the first time I enter a domain during my browser session. This space could really do with more competitors on their same level.

mabbo · on July 19, 2016

>No one is that incompetent.

Well, I'm not sure I'd go that far.

libeclipse · on July 20, 2016

If an attacker ran the entry and exit node for Alice's connection, they could exploit this technique and not need access to the relay node.

_lhlo · on July 19, 2016

"No one is that incompetent." Yeah I don't think so. Beautiful article, otherwise.

gcb0 · on July 19, 2016

does that captcha works without JavaScript?

mikegerwitz · on July 19, 2016

The one they're describing does, yes.

gcb0 · on July 22, 2016

still can't understand why people use TOR with javascript enabled.

lumberjack · on July 19, 2016

Browser signatures are probably easier still.

akerro · on July 19, 2016

The point of Tor-browser is to make all signatures the same, when screensize.

akerro · on July 20, 2016

*even screensize.

LinuxFreedom · on July 19, 2016

It is an USA company - that is enough to not trust them.

We do not need any more evidence, there is enough out there about gag orders, secret courts, worldwide compromise of network security.

USA tec company inhabitants and founders, read this: please move out of the country, build your companies in other places, do it now. There is no time to waste. You can not repair the system, that corrupt bureaucrats have irreversibly destroyed.

It will take one or two generations to rebuild a freedom oriented democracy in some other place. Currently Europe still seems to be a good starting point, especially now that the main USA influence channel GB is out.

Please give up the false hope and act now. Get out of that failed state! Freedom can not be rebuild in a fascist system without help from the outside - you can do help much better from outside!

People who still stay in USA will be seen as cooperators by history, the window of opportunity is closing, hurry on and get out asap. Help to defend freedom in other places!

eeZah7Ux · on July 19, 2016

Why the downvotes?

tedunangst · on July 19, 2016

Seems a little tangential to the original article.

ZoF · on July 19, 2016

What exactly are you ranting about here? Any specifics beyond 'the US is a failed state get out now facism'?