Abusing WebRTC to reveal coarse location data in Signal

DrPhish · on May 20, 2020

The only universal fix I can think of for this class of attacks is to have routers bound latency to a lower limit (eg. 200ms), with fixed latency buckets (eg. 500ms granularity) when it goes beyond that.

That is, no traffic would traverse the router in less than 200ms, and every other flow would be fixed at 700ms, 1200ms, 1700ms, etc amounts of latency. Tweaked correctly that would limit location to continent, unless I'm missing something.

It would effectively trade quick responses to/from close networks for some extra amount of privacy (in the case that GeoIP has already been taken care of)

The latency would have to be controlled on both ingress and egress to account for internal and external threats. I've got a niggling feeling that an attacker that could control latency of enough geographically diverse networks could find the boundary by manipulating responses to get finer detail, but can't quite work the problem into a solution...

Is there a less horrible or more reliable universal mitigation that I'm not thinking of?

andrewstuart2 · on May 20, 2020

I find that interesting, because you could probably classify physical location revelation attacks with other timing-based attacks for which we've had to resort to constant-time algorithms (e.g. key protection via constant-time cryptographic operations).

There's an interesting fundamental tradeoff somewhere between optimization and information. If you make things as efficient as possible, you'll probably leak information.

jerrysievert · on May 21, 2020

> If you make things as efficient as possible, you'll probably leak information.

at which point the threshold could lower ...

cortesoft · on May 21, 2020

I am confused... this attack doesn't seem to be doing anything with latency?

stagas · on May 21, 2020

You are right, it is only pinpointing the DNS server of the client, which usually resides in the same city, so you can find where the victim lives. It is inferred from the conversation here though that one could potentially exploit this even further to perform ping trilateration and get a better accuracy on one's location using the method described in the article. I think this spawned the comment about the timing attacks.

jrockway · on May 21, 2020

To some extent, this already happens. My cable modem adds about 30ms latency no matter the destination. I think this is a combination of buffer bloat (wait for buffer to fill before talking on the network) and waiting for a transmit time slot (shared access to the physical layer). I haven't looked at it in detail, but it is very surprising to me that I get 60ms RTT to Blizzard's servers in Chicago (a speed of light distance of 4ms one way) and 25-40ms ping to Google (in what they call "lga15", which is somewhere in Manahttan, probably 60 Hudson).

I realize that ping is a very poor benchmark as most routers do not handle ping in their fast path, but it's not adding 40ms of latency. So I suspect my modem.

dannyw · on May 21, 2020

This is extremely unusual to me. Personally I notice it when my ping to local game servers go from 5 to 25.

I’d recommended you look into it, and potentially get another modem. If all my requests started taking another 30ms, I’d consider my network degraded.

lxgr · on May 21, 2020

On a congested upstream, DOCSIS unfortunately seems to behave like this due to contention for and the insufficient granularity of upsteam transmission timeslots.

This effectively means that even if your share of the total upstream link of a network "collision domain" (shared coaxial medium really) is low (i.e you're using less than 1/n and accordingly should not be experiencing queueing in the modem), you might be seeing latency spikes due to having to compete and wait for transmission timeslots.

My local DOCSIS link is experiencing anything between 0 and 60 milliseconds of latency which I suspect is mostly due to this (since it's inversely correlated to upstream bandwith).

jrockway · on May 21, 2020

Seems like everyone with any cable ISP has this problem. It could be a modem problem, but it's happened with a variety of modems. It could be an ISP problem, but both Comcast and Spectrum users see the same thing in my experience.

I'd be interested in hearing counterexamples, but I imagine that people with good experiences are on DSL or Fiber.

zrm · on May 21, 2020

I can ping 1.1.1.1 right now at 12-15ms, or Google at around 20ms, via Comcast. Less than 10ms to either one on Optimum. But it's 25-30ms to ping Optimum from Comcast.

lxgr · on May 21, 2020

> buffer bloat (wait for buffer to fill before talking on the network)

That's not buffer bloat (but might nevertheless be something that happens on DOCSIS modems, although I haven't heard of buffering several packets before contending for an upstream send grant).

Buffer bloat, while also rampant especially in shitty CPE like most DOCSIS modem/router combinations, would only occur when your upstream is saturated.

Supposedly though, on DOCSIS, the upstream access contention algorithm used can sometimes add the latency you describe, adding latency even for single packets.

skybrian · on May 21, 2020

One problem might be with latencies right at the boundary between two buckets. A bit of jitter would let you know you're near the edge with enough sampling. And if you can add a little latency then you can move the boundary where you like.

It seems like the only way to avoid that is with one bucket (constant time).

stagas · on May 21, 2020

Even if it's one bucket, isn't it still vulnerable to the same attack you describe? If you push the latency to the edge, you'd know how much is yours and how much the bucket's. Even if it was randomized you could still figure out the window of the added noise with enough sampling and you're back to where you started. I don't see a way out, other than if someone really wants to avoid these types of attacks, then using a VPN is the best bet here.

verdverm · on May 20, 2020

I recall seeing a paper where they showed how close you can geolocate with various numbers of peers to the target, by using network latency alone

stagas · on May 20, 2020

That's funny, I was just working on a POC like this today[0] - it's accurate most of the time for my location but I haven't tested from other locations. You'd need to tweak the 'known' servers on and off to find the optimal arrangement because you'd need to be somehow inside the polygon. I was planning to find a way to discover these itself and other tweaks (like trying multiple times then averaging out)

Edit: the paper I found related to this is here[1]

[0]: https://github.com/stagas/http-geolocate

[1]: https://homes.cs.washington.edu/~tom/support/geoloc.pdf

RL_Quine · on May 20, 2020

https://vercel.com/edge-network

This page contains websocket addresses of a CDN that returns ping pong from a huge number of locations. It works surprisingly well for working out very fine grain location in just playing with it.

stagas · on May 20, 2020

Oh wow, they're actually doing the same thing, triangulating on latency. Hive mind, I guess. I used universities because I figured a)they won't mind, and b)they're more likely to have their servers on premises, then used a public reverse geoip to get their locations. I'll try to see if I can integrate with Vercel's edge network, seems more ideal.

Edit: no, I misunderstood, the location dot that's being displayed isn't the product of triangulation, they're just doing reverse geoip lookup. So, I wonder now if the edge network would perform better.

Update: it doesn't perform better. Either there is some kind of proxy redirecting their traffic or these servers aren't where they say they are, the center is skewed out completely. The universities win so far being correct and accurate most of the time.

LargoLasskhyfv · on May 20, 2020

Unimpressed with yours. Why? Because my ISP got some IPv4 space recently, which was formerly allocated to .ua, .ru, .iq, & .ir. While being physically next to or in Hamburg, .de. That led to all sorts of inconveniences for some people, suddenly barred from logging on, or using the sites they frequent, because they used outdated geo-ip data. I didn't even notice, except for the outrage in their customer forum.

Now yours consistently puts me somewhere into, or onto the shores of the Black Sea, while Vercels doesn't. So that makes me suspicious of your claim by using latency alone.

Edit: There seems to be outdated geo-ip information factored in somewhere. Why else it would put me IN the Black Sea?

stagas · on May 20, 2020

I mentioned I just started working on it. To try for yourself you'd need to clone and tweak the servers list to find an optimal arrangement for you. The problem as I see it is that jumping continents occurs really artificial delays which skews the result significantly, so it first needs to identify your relative whereabouts, then decide on an optimal set of servers. If you clone and tweak the servers to place yourself inside the polygon you'd see it does locate you. Vercel is doing a reverse geoip lookup, so your location is preconfigured in some database based on your ip.

LargoLasskhyfv · on May 20, 2020

At least they are current ;-)

gregoryl · on May 21, 2020

I hope you can come back in a day or so, and re-read this conversation. You're not being very nice, or fair, and it doesn't portray you in a good light.

LargoLasskhyfv · on May 21, 2020

I probably will. It may have come across harsh, but wasn't intended as such. IMO i outlined the reasons why sufficiently in my post and edit.

stagas · on May 21, 2020

Maybe you misunderstood what the tool is doing. It is not doing geoip, it is using the latency of pings between your client and some selected servers with known locations, then calculates the center based on those timings and these known locations. Very simple, you can look at the code and see what it's doing. It's not factoring in any kind of real latency, such as the speed of light, hop delays, etc. Unfortunately, as others pointed out, this method works poorly, especially in such long distances. In my tests you can only find a certain window that performs well for your location, still within a radius of hundreds or thousands of kms, which makes it pretty unusable as you point out. Knowing what _doesn't_ work is also valuable information.

LargoLasskhyfv · on May 21, 2020

I understood that. Let me replay it from my mind, how i percieved it. There is the dark map with white contours. And yellow points on it, which represent the triangualiton servers, i guess? Even clickable, at least the RED point which represents my assumed location sometimes moved. Most notably when i clicked on London, then it hopped to Odessa. Otherwise, it was mostly in the northeast region of the Black Sea. In the light of the recent IPv4-space acquisition of my ISP, which i wrote about containing addresses formerly used in the Ukraine, Russia, Irak and Iran, causing "funny" issues...what would you have thought?

LargoLasskhyfv · on May 21, 2020

edit: btw. tested in latest Firefox with uBlock Origin on/off. and Iridium Browser (Chromium derivative), retested in Iridium only after 24h dyn-ip renewal. Same results.

Anyways, time to sleep. Sun's getting too bright.

mike_d · on May 20, 2020

This is commonly referred to as "ping triangulation" and seems to be reinvented every few years. It sounds good in theory but in practice performs poorly unless ran consistently for days, which is why few researchers end up writing it up or publishing POCs for the next person to find. :)

You should take a look at network path convergence for geolocation instead. You can see a demo of the tool I wrote at https://traceroute.guru/tr/209.216.230.240

toast0 · on May 21, 2020

I think the key thing to keep in mind with this approach is that physical distance and network distance are only loosely related.

You can easily be next door to someone, and your packets go all the way across the country to get there.

I've definitely seen cases where my latency to two servers in the same building were wildly different, depending on the paths my ISP and their ISP(s) routed the traffic.

I'm about 22 ms roundtrip away from ISPs facility at the local internet exchange in Seattle. If a server is connected to that exchange, and traffic flows through the exchange both ways, I see ping times of about 22 ms. Sometimes, my ISP will send traffic through San Jose instead, but the server returns the traffic in Seattle, and that adds about 26ms, so I get a 48 ms ping. If both sides route through San Jose for whatever reason, I'll get 74 ms, which is close to what I'd get if the routing was sensible and the server was in the Washington, DC area. (You normally can't see the routing back from the server, but sometimes you control the server, too).

If I were building something like this, I'd want to try to determine how much of the latency was the user getting to where their ISP interconnects with other networks (or with multiple routes within their own network), and then where that interconnection location is. I'd guess you can get a pretty good idea of the interconnection location, based on reasonable network paths from there, but distance from the interconnection point is going to be tricky, most residential networking technologies add much more latency than the speed of light, so your upper bound of distance is going to be pretty far off. Of the roughly 22 ms I see to Seattle, about 20 ms is just coming from the DSL termination; when I had AT&T GPON, it added about 4 ms to my pings; that's a lot of distance.

ummonk · on May 20, 2020

Weird. Your POC seems to be placing me almost directly opposite in the caucasus.

stagas · on May 20, 2020

Yeah, it doesn't work really well :) Work in progress

cjbprime · on May 20, 2020

Would be interesting to see a link! That doesn't sound like it would get an accurate guess to me, given facts like "light travels slower in copper than fiber" and "your packets have to enter a country through specific large hubs" etc etc.

Avamander · on May 20, 2020

If you have enough machines you can just use machines after those large hubs. The copper/glass light speed difference doesn't matter much because the speed differences added by hops in the middle are usually orders of magnitudes higher.

In the end, unless you've got machines next door pinging your target you won't be able to differentiate between houses or city blocks.

ravenstine · on May 20, 2020

That's pretty interesting. So, effectively, a triangulation based on latency times?

Disposition · on May 20, 2020

Hate to be a pedant, but it's technically trilateration.

stagas · on May 21, 2020

I didn't know this term, thank you for being such a pedant :) knowing the right term makes it easier to find information on the subject.

Avamander · on May 20, 2020

> So, effectively, a triangulation based on latency times?

Yes.

kodablah · on May 20, 2020

I can see where a FQDN candidate is no biggie in a browser's offer/answer since DNS lookups occur all the time. But I imagine the simple fix for Signal's WebRTC use, since they control both sides of the exchange, is to just disregard non-IP candidates. Or even better, don't do anything with the candidates until the call is accepted. Worst case, could just have a geographically centralized signaling server (or shared IP). Granted, since Signal controls both sides, might as well only serve fixed "host" candidates and disallow any offer/answer with custom crafted ones.

One also wonders, to prevent other forms of leaks, if Signal can make a blanket policy to prevent DNS lookups or in general get tighter control on outbound network.

pthatcherg · on May 20, 2020

Disregarding non-IP candidates is exactly what we've chosen to do (and which the new versions of the app do).

The downside of disregarding all candidates until the call is accepted is that post-accept connectivity would be much slower.

Going through a server to hide your IP is an option in the settings in the app, but it can potentially lead to higher call latency, so there is a trade-off.

To prevent issues like this in the future we are taking more control of WebRTC's behavior with a fork of WebRTC (Signal uses WebRTC) and are providing patches to upstream WebRTC as well.

(I work at Signal on calling)

billme · on May 21, 2020

>> “ PINs will also help facilitate new features like addressing that isn’t based exclusively on phone numbers, since the system address book will no longer be a viable way to maintain your network of contacts.” [1]

Any idea when more information might be available on this? Asked moxie years ago to add this and know 100s of other have too.

Worth noting the FAQ as it relates to the PIN length is not correct, “How long can my PIN be? There is no limit. Feel free to add as many characters as you want.” [2] ...tested it and longest PIN I was able to create was 20 characters all numeric.

[1]: https://signal.org/blog/signal-pins/

[2] https://support.signal.org/hc/en-us/articles/360007059792-Si...

floatboth · on May 21, 2020

> if a Signal user wishes to hide their private/public IP addresses even from contacts who call, then it has an option “Always Relay Calls” in its privacy options

I thought Signal was all about privacy by default? :D

Signal fans love to dunk on Telegram for secret chats not being the only kind of chat.. well turns out on Signal, private is not the only kind of call, and your IP address is exposed by default.

vinay427 · on May 21, 2020

I'm unclear why you claim that "private is not the only kind of call." [EDIT below to clarify.] Also, your IP address is only potentially revealed to your contacts, which is rather different from the Telegram situation in which another party that you don't specifically authorize has access to your data.

EDIT: What I meant by this, as upon re-reading it seems unclear, is that the privacy as I understand it is not supposed to protect one party from the other party with which they are communicating, but rather conceal the conversation from third parties.

lxgr · on May 21, 2020

The confusion seems to stem from two kinds of privacy goals here: Metadata privacy towards third parties (i.e. who is calling who; Signal explicitly does not provide this) and reciprocal location privacy of two calling parties (i.e. I don't know where I am called from and vice versa, only who I am talking to).

Signal's conscious choice is to interpret a user adding another as a contact as an implicit signal to mark them trustworthy enough to forfeit the second kind of privacy in exchange for better voice quality (latency and bandwidth) as well as to lighten the strain on their resources.

lxgr · on May 21, 2020

Telegram has the exact same default (allow P2P calls for contacts only).

In my opinion, this is a reasonable default: Relaying all voice calls would use significant resources and might increase latency for users far away from the nearest relay (topologically or geographically).

Also, what's with the snarkiness? Are Signal's security tradeoffs or vulnerabilities somehow making Telegram more or less secure?

The two of them intentionally make different security/usability tradeoffs (the most significant one being Telegram's choice to provide a server-side message history visible to the service operator).

Of course this tradeoff isn't inherently bad, but weird communication and branding on Telegram's side in the past has given this a weird aftertaste that, at least for me, is still sticking around.

emerongi · on May 21, 2020

I thought Signal is more about privacy for the masses. Signal claims the call quality is better when this option is not enabled, so it makes sense to leave it off if your goal is to actually have users use your app. The call itself is still encrypted.

dancemethis · on May 21, 2020

Signal fans like to selectively forget that the server-side is proprietary software - therefore, the whole platform can't quite be proven to be reliable.

Essentially, they are not much better than Whatsapp stans.

lorenzhs · on May 22, 2020

That’s hilariously wrong, the server source code is at https://github.com/signalapp/Signal-Server. How can you make such a claim in good faith when it’s so absolutely trivial to refute?

ric2b · on May 22, 2020

I don't need the source code for Facebook.com to know what information my open-source browser sends to it. Same concept with Signal, you can read the client source code to know what the protocol sends to the server.

upofadown · on May 20, 2020

>Even Edward Snowden, the well known American Whistleblower, claims “I use Signal every day.”

Well, 5 years ago...

mike_d · on May 20, 2020

Edward Snowden has a vastly different risk profile than anyone else.

It it without a doubt that he is under constant electronic and physical surveillance by the Russian and American governments. His phones and computers are also very likely compromised. At that point your choice of messenger app matters about as much as the color of your socks because the interception is happening at another layer.

cjbprime · on May 21, 2020

Presumably he's been using Signal over Tor, defeating all such attacks as a side-effect.

aesh2Xa1 · on May 21, 2020

How does Edward Snowden acquire a laptop or phone in a way that he can trust it? I don't think it matters what protocols and applications he uses: he does not enjoy privacy.

sadfklsjlkjwt · on May 21, 2020

I order to get a device that is not explicitly compromised with custom targeted malware one could: take a walk, enter a random shop, buy a device. Now you only have the standard malware that everyone gets preinstalled on their devices.

How to keep it free of custom targeted malware? That is another question!

Mediterraneo10 · on May 21, 2020

With a target like Snowden who is under constant surveillance and lives at the whim of his host country, he could expect that any off the shelf hardware he bought would be immediately compromised. His hosts would just make up some bullshit reason to part him from the device for several minutes and do an evil maid attack. Or from afar his hosts or another country's actors could exploit undisclosed vulnerabilities in his device's wifi or bluetooth layer that they have in their toolkit.

sadfklsjlkjwt · on May 22, 2020

I agree I cover that in my OP.

thephyber · on May 21, 2020

s/Edward Snowden/anyone/

Two key words in your comment are both spectrums: trust and privacy.

Most people implicitly trust their hardware more than Snowden does now -- they overestimate the security from the factory and he probably has better expectations of the likelihood of hardware compromise.

On the privacy spectrum, one point is how much privacy you think you have, and the other (unknown) point is how much you actually have. Similarly, I think Snowden's situation and prior experience helps him more accurately understand where those points are; the rest of us are up on the first peak of the Dunning-Kreuger chart.

Daniel_sk · on May 21, 2020

Go in random store and pick random notebook / phone.

mirimir · on May 21, 2020

Can one use Signal via Tor? If so, a URL would be useful.

But one can use Session (a fork of Signal) over Lokinet (an onion routing network, which is similar to Tor).

Even the updated version of Signal merely relays stuff through a proxy. That is, there's just one hop, and that's trivial to deanonymize. With Lokinet, there are multiple hops, so adversaries must compromise multiple nodes.

Also, Session requires no PII for account creation. That's great for anonymity, but there's no built-in authentication. So users must authenticate contacts in meatspace or via other communication channels.

ajconway · on May 21, 2020

Lokinet seem to claim (https://medium.com/@LokiNetwork/lokinet-b8f738fefe7a) that their network is more resistant to sybil attacks by introducing different incentives (an internal cryptocurrency) and not having a central authority (which TOR does have, and which users have to trust). It's unclear how this helps against a wealthy adversary determined to control the network via its own nodes.

mirimir · on May 21, 2020

As I understand it, there's a slowly increasing supply of Loki, a private cryptocurrency. Basically, service nodes earn Loki for caching messages, and relaying traffic. I gather that's analogous to mining in Bitcoin etc.

Creating a new service node requires a providing a stake in Loki, which I believe currently costs on the order of $5000. And the only source is Loki held by existing service nodes. So arguably, as the creation rate for new service nodes increases, the price of the requisite Loki stake increases, perhaps supra linearly, or even exponentially.

There's also the issue that service nodes that behave maliciously lose all of their Loki, both the initial stake, and anything that they've earned.

I don't know specifics, however. So I don't know just how high the bar is for malicious service nodes.

ajconway · on May 21, 2020

If I understand it correctly, a node does not need to be acting malicious, it only needs to cooperate with other compromised nodes.

mirimir · on May 22, 2020

Thanks, I'll look into it. It seems like a subtle distinction. That is, "cooperat[ing] with other compromised nodes" seems like a flavor of malicious activity.

I get that the Tor Project has banned relays for numerous reasons. The tor-relays list is a good (albeit incomplete) source for reports, discussion and decisions.

dep_b · on May 21, 2020

WebRTC and signaling can be an interesting attack vector. If rooms are not protected technically from uninvited people to enter you can get all kinds of information but even worse you can sometimes even hijack a call.

sneak · on May 20, 2020

https://archive.is/SYq8H

I got a blank page on the original domain, perhaps due to DNS adblocking.

extropy · on May 21, 2020

You already have the peers IP address for p2p call right? How is this better than that?

meowface · on May 21, 2020

According to the article, by default Signal will relay all calls from people who aren't contacts. This means if a non-contact calls you or vice versa, they can't see your IP address and you can't see theirs, even if the call is accepted and you're both talking. They also provide an option to enable relaying calls to/from contacts, so that contacts won't see your IP address, either.

Here, regardless of if you have that setting enabled or not, and regardless of if you accept the call, contacts and non-contacts can cause your device to make a DNS request, which will leak your DNS server. And if using a DNS server with EDNS Client Subnets, the first 3 octets of your IP address will also be leaked.

I think there's another issue like what you're describing which can kind of obviate this, though: the vast majority of Signal users probably use Signal on their regular mobile phone and its number, not a burner phone/SIM/number. (Few users probably even own a burner phone/SIM/number or understand what that is or why they might want one or how they'd obtain one.) So... everyone can just see your phone number, which probably has an area code corresponding to your city or close to it, and the other digits can possibly pinpoint it even more precisely than that.

Anyone who isn't tunneling all of their DNS traffic with a VPN or otherwise probably also isn't anonymizing their phone number and just has the app installed on their personal, standard cell phone.

If they aren't traveling and haven't moved recently, you can probably see what city they're in just from that. (This exposure does allow coarse location detection even when someone's traveling, though it's a lot more coarse than the area code, unless the Client Subnet value is being sent.)