Apple Is Sending URLs to Tencent?

amanzi · on Oct 13, 2019

The author of the tweet goes into more depth in a blog post: https://blog.cryptographyengineering.com/2019/10/13/dear-app...

saagarjha · on Oct 13, 2019

Took a quick look, and this appears to be enabled if [NSLocale.currentLocale.countryCode isEqualToString:@"CN"]:

  char ____ZN7Backend6Google12SSBUtilities24shouldConsultWithTencentEv_block_invoke_2(void * _block) {
      rax = [NSLocale currentLocale];
      rax = [rax retain];
      r14 = [[rax countryCode] retain];
      [rax release];
      rbx = [r14 isEqualToString:@"CN"] != 0x0 ? 0x1 : 0x0;
      [r14 release];
      rax = rbx;
      return rax;
  }

Update: the code for Tencent Safe Browsing seems to be very similar to that which talks to Google, down to it being under a "Google" namespace, the API endpoints being named the same, and performing hashing which seems to match the "Update API" here: https://developers.google.com/safe-browsing/v4/update-api. I think this is just "whatever Google could see before, Tencent can see now, if you're in China". I'm no expert, so I have no idea if that's k-anonymous or whatever if Tencent/Google decide they want to track you, but in either case it's just shifting who's getting your hashes.

eastendguy · on Oct 13, 2019

> if [NSLocale.currentLocale.countryCode isEqualToString:@"CN"]:

So even for US and EU based users the data is send to Tencent just because they enabled Chinese language support? Who programmed that?

Gaelan · on Oct 13, 2019

iOS has separate region and language settings. Quick look at Apple docs suggests that this is the former.

gowld · on Oct 14, 2019

No, it's the latter.

In NSLocale, "region" is a subtype of language, as in a regional dialect, not an independent dimension.

https://developer.apple.com/documentation/foundation/nslocal...

https://developer.apple.com/library/archive/documentation/Ma...

dchest · on Oct 15, 2019

Nope, [[NSLocale currentLocale] countryCode] returns Region country code from settings. The same code is also used for language region, so you can end up with something like zh_US.

toxik · on Oct 13, 2019

I wonder if there are similar oversights with the US locale, seeing as a lot of developers prefer English interfaces to the somewhat craptastic localizations.

jhanschoo · on Oct 14, 2019

CN is the country code, not the language code. zh-CN is simplified chinese localized to mainland China. If you want simplified chinese try something like zh-HK or zh-SG.

But there is valid criticism to be had that Apple should be signposting more visibly the differences between its settings for CN and outside CN.

Aperocky · on Oct 13, 2019

Your data is sent to Google otherwise.

eastendguy · on Oct 14, 2019

...which is what I strongly prefer (rule of law and all that). Of course, as others have said, on-device processing would be the best.

solidangle · on Oct 13, 2019

Not just EU and US based users, but also Hong Kong and Taiwan based users.

yorwba · on Oct 13, 2019

Hong Kong, Macao and Taiwan all have their own ISO 3166 codes and users there are unlikely to accidentally set the region to CN, since the difference between simplified and traditional characters is quite obvious.

giancarlostoro · on Oct 13, 2019

This sounds like the more dangerous story here. What the heck?

bidluo · on Oct 13, 2019

Google is blocked in China so naturally they'd need a Chinese alternative, with everything going on it's easy to fear monger but people need to chill out a bit. Locale is probably one of the least intrusive ways to determine location, using GPS would probably cause an even further problem if people realise that there's a backdoor to avoid location permission

Any company that markets/releases in China and relies on some google service (maps/safe search/safety net/google sign in/firebase/etc) need to find an alternative, not because everyone is on the Chinese payroll but more often than not these services are business critical.

busymom0 · on Oct 13, 2019

Wouldn't the locale be set to CN for phones which are in non-china countries too?

ptlu · on Oct 14, 2019

Language and Locale are separate preferences. You can mix and match however you want on iOS.

giancarlostoro · on Oct 14, 2019

Oh I see. That is where I got confused then.

rocqua · on Oct 13, 2019

What kind of code am I looking at, it seems pretty cool. I this some automatically 'reverse compiled' assembly?

In any case, I'd love to know how you generated this. Would be very cool to get something similar out of an executable.

saagarjha · on Oct 13, 2019

It's "decompilation" of a block invoke for Backend::Google::SSBUtilities::shouldConsultWithTencent() taken by opening /System/Library/PrivateFrameworks/SafariSafeBrowsing.framework/SafariSafeBrowsing in Hopper Disassembler.

carol0987 · on Oct 17, 2019

SafariSafeBrowsing.framework

elwell · on Oct 13, 2019

> ____ZN7Backend6Google12SSBUtilities24shouldConsultWithTencentEv_block_invoke_2

I'm glad I don't use Objective-C... That's some Java level function naming there.

Edit: may have spoke too soon, appears to be possible reverse engineered / decompiled?

lelf · on Oct 13, 2019

  bash$ c++filt <<< ____ZN7Backend6Google12SSBUtilities24shouldConsultWithTencentEv_block_invoke_2
  invocation function for block in Backend::Google::SSBUtilities::shouldConsultWithTencent()

Edit: what are downvotes for? That is the standard way to decipher C++ mangling, using built-in (binutils) tools.

msbarnett · on Oct 13, 2019

You’re looking at the method name after the compiler got done mangling type information into it for the linker. The human-readable name that (what appears to be an anonymous block) likely appears within in the source code appears to be “shouldConsultWithTencent” in a namespace (class?) “Backend::Google::SSBUtilities”.

The other line noise encodes return and argument types via the process of Name Mangling: https://en.wikipedia.org/wiki/Name_mangling

saagarjha · on Oct 13, 2019

This is a block invoke for a mangled C++ function. You'd know it as a lambda inside of Backend::Google::SSBUtilities::shouldConsultWithTencent().

jimktrains2 · on Oct 13, 2019

It appears to be the symbol in the binary. C++ also does similar things. It's called mangling. https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling

It's also the reason you sometimes need to extern c or otherwise mark symbols being exported via a c-abi in c++ so that they don't get mangled.

yorwba · on Oct 13, 2019

This is really a "damned if you do, damned if you don't" kind of situation.

They can either use Tencent's Safe Browsing API as a drop-in replacement for Google's API, relying on k-anonymity to leak as little information as possible. That leaves them open to accusations that they allow Tencent (or, for that matter, Google) to track the browsing history of Safari users.

Or they can essentially turn off Safe Browsing in China. (Google's API is collateral damage of the Great Firewall.) That leaves their users unprotected against all kinds of malware and scams.

I think they made the right call here by protecting users against the most common threat (most people are not dissidents), while giving advanced users with a different threat model the opportunity to opt out.

rsync · on Oct 13, 2019

"Or they can essentially turn off Safe Browsing in China."

The OP as well as the associated blog post[1] as well as the Apple-provided fine-print language do not make it clear to me that this "feature" is exclusively enabled for Chinese users (or, perhaps Chinese IPs).

Could someone point to a source that confirms a US person, in the US, with a US-purchased iphone, would not have their browsing history transformed and sent away for analysis to tencent ?

[1] https://blog.cryptographyengineering.com/2019/10/13/dear-app...

kevinday · on Oct 13, 2019

If this source is to be believed, it's either going to Google or Tencent, but never both:

https://twitter.com/eromang/status/1183422784082530304/photo...

You can try yourself by going to one of the IOS Safe Browsing test pages on your phone, and when the warning pops up click "Show Details". It'll either say Google or Tencent on the warning message, which should let you know which one got chosen for you.

https://testsafebrowsing.appspot.com

I just tried it, and it says Google for me in the US.

jstsch · on Oct 13, 2019

Great. I disabled safe browsing probably back when it first appeared on my iPhone 3G or 4 and this test confirms I’m still not sending urls to anyone whilst surfing on my iPhone 11. Nice job preserving these settings over countless device upgrades.

mrgalaxy · on Oct 13, 2019

For anyone else wanting to disable it (or at least learn more about it), the feature is labeled in iOS settings under Safari > Fraudlent Website Warning.

saagarjha · on Oct 13, 2019

Interestingly, none of those links triggered a warning for me on my Mac…

tialaramex · on Oct 13, 2019

You've probably switched off Safe Browsing. I wouldn't advise anybody to do that unless they're _so_ sure that they don't need Safe Browsing that when (most likely rather than if) they get infected by Malware or fooled by Phishing they are confident they'd tell everybody they know what an idiot they are.

I have it switched off on my home PCs (but on for work). But then I also don't carry home insurance and when I was flooded I believe my first Facebook message began "This is probably a good time for you to say 'I told you so'" because that seems like the right sentiment.

saagarjha · on Oct 13, 2019

I haven't.

fiddlerwoaroof · on Oct 13, 2019

https://news.ycombinator.com/item?id=21242628

matthewdgreen · on Oct 13, 2019

Alternatively, they could purchase the data from Tencent or another company, and operate their own version of the service. That may even be what they’re doing —- but we don’t know, since they launched the service with no details or publicity.

pmoriarty · on Oct 13, 2019

Yet another approach is to send the entire list of all malware URLs to each client and let the client do all the processing on their end.

This way no data (hashed, anonymized, truncated, or otherwise) would need to be sent to Tencent, Apple, or Google, or anyone else.

deogeo · on Oct 13, 2019

Why is this getting downvoted? I'm also interested in why this approach isn't taken.

jhgg · on Oct 13, 2019

Chrome does something like this, you can learn more about it here: https://codereview.chromium.org/6286072/

mokus · on Oct 13, 2019

A couple possibilities I can think of:

* the list may be prohibitively large

* it exposes to the bad actors exactly which of their scams is detected, so they can simply refine their methods until their sites don’t make “the list”

souterrain · on Oct 13, 2019

Bad actors can also occasionally poll the safebrowsing API.

cryptonector · on Oct 13, 2019

Bloom filters take care of the first. There will always be an arms race between attack and defense, so I'm not concerned about the second issue.

yorwba · on Oct 13, 2019

Bloom filters can give false positives, and to eliminate them, you'd need to send "data (hashed, anonymized, truncated, or otherwise)" to some entity that has the full list. That's exactly how Google's Safe Browsing API works.

deogeo · on Oct 13, 2019

Can't the bad actors already check each of their sites individually by pretending to be a normal user?

pmoriarty · on Oct 13, 2019

Exactly how large is it?

OS vendors already make a habit of regularly sending gigantic OS updates. I'd have a hard time believing that a compressed list of malware URLs would be noticeably bigger, by comparison.

Also, once the list is sent the first time (or just included with the OS so it'd be already present on your device when you bought it), they could just send the deltas as the list changed, and those deltas (especially once compressed) should be relatively small even compared to the original (probably not that large) list.

NetMageSCW · on Oct 15, 2019

Google Safe Browsing transparency report lists 40k new bad URLs per week - how large do you think the list is now? It is far, far too large for local processing but k-anonymity is perfectly trustworthy when used with cryptographic hashing.

Gaelan · on Oct 13, 2019

Wouldn’t it be pretty easy for the bad actors to check the database anyway? I can’t imagine they would need to query often enough to hit any rate limits.

lostlogin · on Oct 13, 2019

Local handling seems more Appley too.

cryptonector · on Oct 13, 2019

Downvoters who think that might be too much data don't know about Bloom filters.

robocat · on Oct 13, 2019

Bloom filters are likely useless in this situation - following facts for phishing only:

1. Phishing sites have a lifecycle of about 15 hours.

2. Most malicious links are hidden within benign domains.

3. About 400,000 phishing sites are created each month.

From: https://www.itgovernance.co.uk/blog/4-eye-opening-facts-abou...

I haven't run the numbers, but I am guessing that a clientside solution would have a lot of bandwidth sucking and avoiding false positives is very important.

Also with a clientside solution, how are new phishing URLs detected?

PS: perhaps try to assume HNers know what a Bloom filter is (I've seen them come up lots of times in comments).

cryptonector · on Oct 14, 2019

Google's safe browsing API is probabilistic too. The idea is that you do so many rounds of checking to get closer and closer to the mark. You start with a fairly high false positive probability, high-privacy check, then if you get a positive, you try a lower false positive rate check that also loses you some privacy, and the trade-off is that you don't have to have the full malicious site DB with you at all times (and keep it up to date).

Why did you assume I'd not know about false positives?

pmoriarty · on Oct 13, 2019

400,000 sounds like a lot, but I wonder how many new URLs Tencent adds to its database each month. I expect they don't add every phishing URL but some small subset of them (possibly even a very small subset.. we'll proably never know).

But let's say it is 400,000. I took the URL you linked and made a file of 400,000 copies of it. The file size was 28 MB. I didn't bother compressing that particular file since the URL is the same in each instance, but I expect a file full of actual phishing URLs would probably compress pretty well, so it would probably be significantly less than 28 MB.

Considering that OS vendors regularly ship multi-gigabyte size updates, having to download less than 28 MB extra every month shouldn't even be noticeable. If updates needed to be done more frequently, the client could subscribe to get regular updates as they become available.

tomxor · on Oct 13, 2019

> 28 MB extra every month shouldn't even be noticeable

Parent comment suggests phishing site life-cycle <15hrs, at 400k a month that's 8333 every 15 hrs. To give an idea of how frequency sensitive this is, assume URLs are added equidistributed in time: that would be a new one every 154ms - for such time critical information it makes no sense to attempt to synchronize clients, it would require constant polling or push updates to have _any_ chance of catching a malicious URL.

At such a frequency, efficiency becomes less about bandwidth and more about the overhead of continuously synchronising so many clients (think of that 28 MiB spread out over 400k separate messages over one month, one every 154ms, that not only inflates the size, but causes a constant network usage and processing that is far less efficient than a single 28MiB download).

Or you could just send the URL hash when you visit a URL... (do you request any where near 8k URLs every 15hrs?, 1 URL every 154ms? no), it's so clearly a simpler solution that will be faster for everyone without letting bad URLs slip through before a latent sync.

deogeo · on Oct 14, 2019

There's no need to sync every time a new phishing URL is added - only every time a URL is visited by a client.

The delta can be derived just from the version number of the client's URL database, and should be a total of 1 MB in size for a whole day's worth of updates. So ~1 MB for the 1st URL visited in a day, and considerably less afterwards. Compared to average webpage size, that's nothing.

Really, only thing that changes is instead of sending a URL hash, you send the URL DB version, and the reply is the list of changes since that version.

tomxor · on Oct 14, 2019

> Really, only thing that changes is instead of sending a URL hash, you send the URL DB version, and the reply is the list of changes since that version.

Or none at all and a simple confirmation that the list is up to date. Yes this is a way better idea.

Although it's always going to be less efficient. For instance i'm not sure how it would scale into the future. Checking URLs server side is optimal, it's always going to be relatively constant in proportion to the URL size, but with DB deltas each URL is now related to both the URL size and the DB update frequency, i.e as the malicious URL rate increases over time, individual URL lookups will incur greater network cost... this is probably not a big deal for the client, but It would make a significant difference for the provider of the deltas - or maybe network caching would disolve it again? I mean there would be a lot of duplicate deltas flying around every minute... basically a content distribution problem but with a high frequency twist.

pmoriarty · on Oct 14, 2019

Do you really think Tencent is detecting a new phishing site every 154ms?

I'd seriously question how many of the total new phising sites they detect to start off with, and then how frequently they do so.

If a user only downloads the deltas periodically they'd risk being out of sync with the master list (which might not be updated even once a month or at all, for all we know), but that's the price they'd need to pay to have to send any information about their web browsing to parties they don't trust.

One other thing to consider is the likelihood that the URL you happen to be surfing is both a phishing URL to begin with and one of the ones that just appeared since the last delta download you did, compared to the likelihood that it's one of those already in the entire phishing URL database you've already downloaded. I'd expect those odds to be very low.

tomxor · on Oct 14, 2019

> the likelihood that the URL you happen to be surfing is both a phishing URL to begin with and one of the ones that just appeared since the last delta download you did, compared to the likelihood that it's one of those already in the entire phishing URL database you've already downloaded. I'd expect those odds to be very low.

Ignoring the first condition (otherwise why bother with a list at all)... Consider that this information is very transient (average 15hrs), this is pretty simple: deltaT / 54000

This is still horrible, because your safety is determined by how frequently you can sync with the DB.

> If a user only downloads the deltas periodically they'd risk being out of sync with the master list (which might not be updated even once a month or at all, for all we know).

Being updated monthly doesn't match the statistic of average 15hr lifecycle, because it would be useless after that length of time. And while I don't claim to know 15hrs as a fact, it is intuitive that the average will become ever shorter as malicious URL checkers become updated ever faster.

> but that's the price they'd need to pay to have to send any information about their web browsing to parties they don't trust.

Full URL information need not be sent, a hash of the URL domain and path would probably suffice... if that's not enough then it's a dilemma, but that doesn't make continuous syncing a good or fail safe replacement.

pmoriarty · on Oct 14, 2019

"Being updated monthly doesn't match the statistic of average 15hr lifecycle, because it would be useless after that length of time."

And maybe it is useless. We don't actually know, but we should at least recognize that there may be a difference between how frequently phishing sites allegedly appear and how frequently they appear in Tencent's malware URL database.

"This is still horrible, because your safety is determined by how frequently you can sync with the DB."

And being identified by the Chinese government as someone who surfs to forbidden websites might be even more horrible, for some.

morelisp · on Oct 13, 2019

People who think they know about Bloom filters should consider what attack vectors a false positive with such would allow.

(The end result of the thought experiment will be basically what Google does now.)

leoh · on Oct 13, 2019

Running their own service would be most ideal. They are definitely not. https://github.com/Igalia/webkit/blob/9777baa3db09cad7ed5b2c....

wolfgke · on Oct 13, 2019

> This is really a "damned if you do, damned if you don't" kind of situation.

Why? Simply, when setting up the device/browser, let the user choose what safe browsing API the browser shall use (both, one of them, or none).

Letting the user make a conscious choice is the best way to handle "damned if you do, damned if you don't" kind of situation. To make the choice as conscious as possible for the user, provide additional material that explains the advantages and disadvantages of each option for the user so that the user is well-informed before he/she makes his/her choice.

Aperocky · on Oct 13, 2019

That’s entirely against the entirety of Apples modus operandi. It’s always make the user have as little choice as possible and assume that users are idiots. The only exception they made is to developer with cli abilities, and even that they have began to restrict

oarsinsync · on Oct 14, 2019

> assume that the users are idiots

Microsoft didn’t make this assumption. Apple did. My entire extended family are now all Apple users. My tech support calls per year can now be counted on one hand.

In my extended family, the users are idiots, and thanks to Apple, I’m poorer financially but significantly richer in free time.

Aperocky · on Oct 14, 2019

I didn't say it was a poor choice, rather, it's a great choice.

I'd rather be an idiot in the areas that I don't have an expertise in, I have no interest in plumbing and I would blindly follow the suggestion that the plumber who came to my house made. And I guess this strategy worked well for Apple.

ottolin · on Oct 14, 2019

That is exactly my experience too. My parents just kept calling me when they were using Android phones. I bought them iPhone to get my free time back.

hesarenu · on Oct 14, 2019

There is not much difference there. iPhone might be more confusing for people who are used to android.

oarsinsync · on Oct 15, 2019

On an Android device, there are lots of relatively simple to follow guides to be able to download 'free games'.

Rooting your phone is all well and good if you know what you're doing. It's not so good when you don't, and I'm your tech support phone call.

austinkhale · on Oct 13, 2019

I believe an argument could have been made that it was the right call if they had publicized it. Seeing as how it was implemented in the background, I am less inclined to give them the benefit of the doubt.

ddtaylor · on Oct 13, 2019

Do you think a decentralized database of unsafe URLs could exist?

larkeith · on Oct 13, 2019

I'm curious if, as @thefalken brought up [0], this is illegal under the GDPR, given that it's a hidden opt out and should apply to EU citizenry with browser language set to Chinese.

[0] https://mobile.twitter.com/thefalken/status/1183445477645312...

tialaramex · on Oct 13, 2019

Very doubtful, even with the "hidden opt out" that seems to be sufficiently poorly "hidden" that lots of people here have indeed opted out.

Safe Browsing uses very little data (pretty much the least they could get away with to make it work) and you'd have to establish either that Tencent is lying about how it uses that data AND that Apple knew or reasonably should have known that it was misused.

URLs never leave your browser, so "Apple is sending URLs" is wrong. The Update API is used, so the URLs stay on your browser but under some circumstances hash prefixes of some URLs are sent to Google/ TenCent.

If you choose to assume that Google / TenCent are bad actors then they can probably manipulate this data to target a few URLs and discover who (IP addresses) browsed those URLs. In less well designed browsers like Safari they might be able to tie that to a Google Account independent of the IP address because those browsers don't isolate Safe Browsing API calls from normal web browsing activity (this won't work in e.g. Firefox). If a bad actor did this, it would make performance worse for all users, and the accuracy of the trick would be sabotage unless the set of target URLs tracked is fairly small, if you were looking for a single PDF filename on a single web site it's definitely possible, if you want to track six thousand different articles about Xi's resemblance to Pooh Bear across tens of thousands of sites that's going to cause a lot of false positives you have to weed out somehow.

sroussey · on Oct 14, 2019

Forget the URLs, it’s my IP address I’m worried about.

Muromec · on Oct 13, 2019

If region is set to China, not just language. Locale has two components, like in en_US.

lern_too_spel · on Oct 13, 2019

That doesn't mean that the user is in China. It means that the user wants their interface in Chinese as it is written in mainland China. In other words, the CN means simplified Chinese instead of traditional Chinese, which is what the TW region code corresponds to.

derefr · on Oct 13, 2019

The GP poster is incorrect; the Region setting has nothing to do with setting the region code of the Language setting (each language+region pair being its own listing in Languages.) The Region you choose during initial device setup does determine your default Language region, but you can pick a different one while keeping the same Region.

The Region setting in iOS is literally just the question "what Country [or Country-equivalent political region] would you like to be considered to be in, when we make certain OS features be dependent on your country?"

This is separate from what country the phone treats you like you're actually in, geographically, which is determined moment-to-moment by geolocation and cellular profiles. (Time zone? Geolocation. Maps domestic/foreign feature display granularity? Geolocation.)

Whereas, Region is for things like, say, whether you see certain apps or features that are in partial progressive rollout; or whether you see features offered that don't make sense outside of certain regions.

Re: the first example, the News app, which rolled out in the US first, could be made to appear in other countries by setting your Region to the US. When this was done, the News app, if launched, would still detect what country you were actually in (geolocation-wise), and would make a best-effort attempt at showing news from the few sources Apple had made agreements with so far from that country.

Re: the second example, iOS has social-network "Accounts" integration with Sina Weibo, QQ, etc. just like it has integration with Facebook/Twitter/etc. It just doesn't display these sign-in options unless your phone is set to the China "Region." Because, if you're not in China or from China, why would you ever use these networks? (Note that Apple designs iOS under the assumption that people won't bother to change their Region when they travel; so it really is more of a "where are you from" rather than "where are you now" question.)

lern_too_spel · on Oct 14, 2019

This is incorrect. en_GB doesn't mean you're in or from Great Britain. It means you want the device to show English as it is used in Great Britain, with extraneous "u"s and rearranged month and day. A user in the US can request that locale instead of en_US if that is the language they prefer. Locale is for localization of the interface, not for telling where you are from.

See the Australian English example in https://en.m.wikipedia.org/wiki/Locale_%28computer_software%...

Now maybe iOS sets the locale based on where the user is from instead of based on how the user would like their interface localized. If it does, it is doing it wrong. Sending a user's data to Tencent based on a setting instead of based on their location is absolutely wrong.

derefr · on Oct 14, 2019

> This is incorrect. en_GB doesn't mean you're in or from Great Britain.

You misinterpreted. "Region" is a setting in iOS. But iOS "Region" has nothing to do with the "region" part of a locale. Setting your iOS "Region" to "Great Britain" and setting your "Locale" to "English (Great Britain)" are separate things. "Region" is just what iOS happens to call a completely distinct thing. If you like, to lessen your confusion, pretend it is called something different.

> Sending a user's data to Tencent based on a setting instead of based on their location is absolutely wrong.

You wouldn't want your phone to start sending data to Tencent as soon as you cross the border into China, right? And, vice-versa, you would expect a person from China, who thinks Tencent is a great brand, to not want to stop sending their data to Tencent just because they cross the border out of China, right?

lern_too_spel · on Oct 14, 2019

> You wouldn't want your phone to start sending data to Tencent as soon as you cross the border into China, right?

You most likely would. Google's service will be unreachable from within China. If it didn't switch providers, you would have no Safe Browsing protection. The key thing is to obtain consent from the user the first time this happens.

saagarjha · on Oct 14, 2019

What should they do?

lern_too_spel · on Oct 14, 2019

They should send to Tencent based on network location. If you're inside the Great Firewall, Google's safe browsing service will be unreachable. If you're outside the Great Firewall, you really don't want to use services through it if possible. https://arstechnica.com/information-technology/2015/04/ddos-...

qafy · on Oct 13, 2019

I think their point is that changing your device language is not the same thing as changing your region. Changing your language is a simple setting, but changing your region involves re-accepting the ToS for that region. So technically they would have to click Agree on the document linked in the tweet in the OP.

33degrees · on Oct 13, 2019

The code is checking the region part of the locale, which is CN for china. The language code for chinese is zh.

jakear · on Oct 13, 2019

“en_US” is “American English”, not “English on a Phone in america”. The alternative “zh-*” codes are SG, TW, or HK. It’s checking if the user has their region set to “Mainland Chinese”, not That their phone is “Chinese on a phone in China”.

33degrees · on Oct 15, 2019

Actually, american english is "en-US", "en_US" means english with the region set to the US, at least on iOS. But yes, it is checking that their region is set to mainland china.

https://developer.apple.com/library/archive/documentation/Ma...

larkeith · on Oct 14, 2019

Didn't realize this, thank you for clarification!

alextheparrot · on Oct 13, 2019

The code appears to be used for fraud related purposes, meaning, to my understanding, Apple would likely argue it has a legitimate interest.

There’s a lot of legal language around this exception, but fraud is directly called out as a legitimate interest and means that the group controlling the data would not need to obtain user consent.

For additional reading, I’d recommend the following post: https://www.gdpreu.org/the-regulation/key-concepts/legitimat...

the8472 · on Oct 13, 2019

Is apple the data controller here since it's all happening on the users' device? And does "legitimate interests" extend beyond the data controller's interests? I.e. if it's only about fraud against apple then safe browsing (which is supposed to protect the user from fraud) would not necessarily be a legitimate interest of apple. It might have to be opt-in at least.

alextheparrot · on Oct 13, 2019

Great questions, which I know I'm not equipped to answer authoritatively - prior comment was just my two-cents on how I'd expect Apple to argue the issue (And even that argument may be a losing one).

In opposition to the fraud argument, one could argue they wouldn't reasonable be expected to have their data forwarded to China. The counter-argument to that would likely be along the lines of users who have their localization set to China might have more of an expectation of this. And so the lawyer fees continue to increase in what would be an incredibly interesting case, honestly.

yorwba · on Oct 13, 2019

If it's illegal under the GDPR to send the data of EU citizens with browser language set to Chinese to Tencent, it's also illegal to send the data of EU citizens with browser language set to anything else to Google. Chrome, Firefox, Safari and probably all Chromium-based browsers (unless they disable Safe Browsing by default) use Google's API and would be in violation, too.

Geee · on Oct 13, 2019

That's true, but it's probably covered in the privacy notice. It doesn't make a difference that the data is shared outside of EU, it just has to be communicated to the user.

Also, the data shared here is not personal information, unless it's connected with personal information such as IP address or a tracking cookie.

This is pretty gray area. Apple isn't necessarily sharing information with Google, it's just the property of Internet traffic that Google / Tencent can collect the IP address from the request. Same happens when websites include resources from other websites (images, scripts, etc.), and these are not typically taken into account in GDPR privacy notices.

sroussey · on Oct 14, 2019

Apple says that they share your IP address and that it “may” be recorded by TenCent and Google.

icebraining · on Oct 13, 2019

That's not necessarily true, since the GDPR imposes extra restrictions to sending data to countries not covered by the GDPR (essentially, outside the EEA) or that are deemed by the EU to offer equivalent protection. I don't know where Tencent has these servers, but Google has servers in the EU and managed by an EU-based subsidiary.

leoh · on Oct 13, 2019

Link to code in question https://github.com/Igalia/webkit/blob/9777baa3db09cad7ed5b2c...

saagarjha · on Oct 13, 2019

Would you mind linking to the upstream repository instead? GitHub doesn’t let you search in forks.

fireattack · on Oct 13, 2019

https://github.com/WebKit/webkit/blob/master/Source/WebKit/U...

(For some reason "search in this repo" doesn't work for keyword `malwareDetailsBase` [1], but it's there)

[1] https://github.com/WebKit/webkit/search?q=malwareDetailsBase...

sqs · on Oct 13, 2019

URL to same code search on Sourcegraph (which works): https://sourcegraph.com/search?q=repo%3Awebkit%2Fwebkit+malw...

(Disclaimer: I am the Sourcegraph CEO.)

saagarjha · on Oct 14, 2019

Unrelated, but I like the fact that you support prefers-color-scheme!

fireattack · on Oct 14, 2019

It shows

Search timed out Try narrowing your query, or specifying a longer "timeout:" in your query.

right now.

sqs · on Oct 14, 2019

Sorry about that. There must’ve been a brief blip during a moment of intense load or a redeploy. Is it working for you now?

dhdhebsb · on Oct 14, 2019

It literally says it’s going to send links to Google Safe Browsing and Tencent Safe Browsing in the Safari setting page under “Safari and Privacy”

saagarjha · on Oct 14, 2019

That’s not what it says.

newshorts · on Oct 14, 2019

Which one of these comments is right? You both can’t be.

saagarjha · on Oct 14, 2019

> Before visiting a website, Safari may send information calculated from the website address to Google Safe Browsing and Tencent Safe Browsing to check if the website is fraudulent. These safe browsing providers may also log your IP address.

This is quite different from sending links.

awinter-py · on Oct 13, 2019

every form of software phone-home is sleazy

we should be linting code to say whether it phones home or not, and what it uploads when it does. plain language privacy policies and ever-changing browser settings are leaving huge gaps.

when the US government bought chinese drones they hired a consultant to prove that the drones never call home.

OJFord · on Oct 13, 2019

> we should be linting code to say whether it phones home or not

Is that possible? How do you diffentiate it from expected API calls?

(Not convinced black/white-listing strings is any different from code review in this case - it'll just be changed on demand if if prevents adding what was tried to be added.)

awinter-py · on Oct 13, 2019

it's theoretically possible. I don't know of any tools that do it (which could be a comment on my research skills rather than the state of the art).

in theory you can do dataflow analysis on all external inputs to the program (geo, filesystem, text) and monitor where that goes in the program. For something more complicated like a browser, you might want to do the analysis per component (URL bar in this case).

wouldn't be perfect, but it's a starting point.

linting is tougher on closed-source software than open-source, but if a company certified a linter output and was found to be lying I'm comfortable with using the law to resolve that.

UncleMeat · on Oct 13, 2019

Except you'd never have a good enough dataflow analysis to work on arbitrary code without burying people with false positives. Especially in C++ code, where things like function pointers just destroy call graph precision (and therefore taint analysis precision).

Linting doesn't even give you this much. All it'd be able to tell you is "where in the program are calls to networking APIs being made" and maybe determining parameters if they are defined in the same function as the call.

awinter-py · on Oct 13, 2019

Trial use case: a small FOSS codebase in a pointer-less language. The goal isn't perfect safety, it's to be safer than we are now.

UncleMeat · on Oct 16, 2019

Feel free to use any of the dozens or hundreds of such tools developed by the academic community and experience the imprecision yourself.

awinter-py · on Oct 19, 2019

examples pls

aussieguy1234 · on Oct 14, 2019

This is where they need to sacrifice some computer security for physical security. By turning this off, a few people who don't follow good security practices might get malware. But no one will be sent to prison or "disappeared".

wtmt · on Oct 14, 2019

Apple has done a lot for privacy in its products and its public statements. But I believe that if it has to have a better impact and be trusted, it needs someone dedicated to privacy who will (ensure that it will) publish details of its products, apps and activities in an honest form in an accessible place (and updated more often than a once-a-year OS upgrade cycle). This kind of commitment to more transparency will help the company be trusted and also held up to questions. Said trust is already eroding with recent events. Apple shouldn’t be complacent and stick to its old ways.

Sadly, Apple also has a history of brushing things away or ignoring uncomfortable questions.

sgz · on Oct 13, 2019

Are those Google/Tencent API requests done only when browsing with Safari, or are they done for any SFSafariViewController? That would imply it’s also inside Brave/Firefox/Chrome...

sekasi · on Oct 14, 2019

Again I feel like I'm reaching out to be educated here.. but if Safari is attempting to validate URLs for safe browsing using the Google API (which it states it will do, quite openly), and Google products is quite clearly blocked in China so it resorts to Tencents API (which it states it will do, quite openly).. why does this seem to provoke anger?

I mean this in the most equitable way possible, I'm more trying to understand where Apple has done anything wrong here?

brians · on Oct 14, 2019

We can’t tell whether non-China data goes to Tencent—intentionally or by some bug or adversarial problem.

woutr_be · on Oct 14, 2019

The code [1] along, with this explanation [2] does seem to show that it only happens for devices with the country code set to CN.

[1]: https://github.com/Igalia/webkit/blob/9777baa3db09cad7ed5b2c... [2]: https://news.ycombinator.com/item?id=21242628

taobility · on Oct 14, 2019

I think the audience in HN are crazy now. Why would you prefer Google than Tencent for same purpose of API? Should all Chinese scare that iPhone would send back all logs to California? Should they scare Tesla sent back all their driving data to US? If you don't trust anything from China, would you destroy any electronics Made In China, including your smartphones, laptop, TV etc, or even some food?

Jyaif · on Oct 14, 2019

It should be noted that Apple could very well proxy those requests to Google and Tencent to protect their customers' ip address, or even implement safe browsing on their own all together. The fact that they don't means that either they trust Google and Tencent, or that they don't care about privacy.

mulle_nat · on Oct 13, 2019

Wait, Apple is Sending URLs to Google ?

innagadadavida · on Oct 13, 2019

Just elaborating on the method google uses here. The client sends a hash prefix of the url if there is a match in the local db. The server then sends back full url hashes. Other than your IP address, there is not much data that can be collected here.

slenk · on Oct 13, 2019

I think a lot of browsers do for the "Safe Browsing" checks

teraflop · on Oct 13, 2019

The Safe Browsing API is deliberately designed to avoid leaking the contents of URLs to Google. You can read about how it works here: https://developers.google.com/safe-browsing/v4/update-api

xenophonf · on Oct 13, 2019

It's designed to avoid leaking URLs, but I'd be a lot more comfortable if Safe Browsing worked by downloading a list of hashes to my computer and checking locally. That way, data never leaves my device.

tialaramex · on Oct 13, 2019

That means giving you all the hashes, which is a lot of data, and you'd need to constantly update it because the whole point of Safe Browsing is the dynamism.

Whereas today your browser only needs the prefix list, which is much shorter and so can feasibly be updated more often without awful bandwidth costs. The full hashes in a prefix are only fetched (which is where we get "Apple is sending URLs" by squinting really hard at the facts) if you visit a URL with a hash with a known-bad prefix.

phyzome · on Oct 13, 2019

Firefox downloads a big blob of unsafe URLs and checks against that, last I saw.

sanxiyn · on Oct 14, 2019

No, Firefox uses the exactly same Safe Browsing protocol. In fact, the protocol was co-developed by Google and Mozilla.

phyzome · on Oct 15, 2019

Interesting. I do remember people complaining about a large file in Firefox profiles, in the context of multi-user host admins wanting to be able to have it in a centrally managed location rather than replicated to every profile. I recall it being a sizable sqlite DB for Safe Browsing. I wonder what that was about, then.

kzrdude · on Oct 13, 2019

Default configuration of Chrome sends whatever you type, while you are typing it, in the location bar to Google.

And Firefox can do that too, no idea what their default configuration is.

noisem4ker · on Oct 13, 2019

If I remember right, Firefox asks if you want to enable search suggestions right where they would appear, in the drop-down menu.

icebraining · on Oct 13, 2019

In Firefox, search suggestions are only enabled by the default in the Search field, not in the URL bar.

chenzhekl · on Oct 14, 2019

Probably this is the page of Tencent safe browsing: https://urlsec.qq.com/ I don’t understand why you trust Google so much. It’s as untrustworthy as Tencent for me.

zaphirplane · on Oct 13, 2019

The safe browsing seems to work in private mode or am I missing something

thawaway1837 · on Oct 14, 2019

Why is this more controversial than Apple sending URLs to Google?

ycombonator · on Oct 14, 2019

Tim Apple better have an explanation for this one.

ripley12 · on Oct 13, 2019

(edit: the source in question has removed the tweet, so I have too)

marcinzm · on Oct 13, 2019

> China only.

Based on the twitter conversation, it's NOT China only. It's Chinese localization only. Big difference. That means anyone anywhere in the world who set their computer to Chinese has their data sent. Including Europe which is likely a GDPR violation.

innagadadavida · on Oct 13, 2019

The google servers apparently takes url hash prefix. Does tencent do the same? If so is it still considered a gdpr violation? There is not much info in a url hash prefix.

rocqua · on Oct 13, 2019

Suppose peeps going to HN are suspect. Then anyone who often produces hash prefixes that match HN is suspect. When you start getting sequences, you could possible start matching how people navigate a website.

Essentially, a hash-prefix allows you to rule out / semi confirm guesses about browsing behavior.

BenTheElder · on Oct 13, 2019

Looks like the twitter post you referenced was deleted. That user's only recent post is about super mario maker 2...

kop316 · on Oct 13, 2019

I'm getting a "Page doesn't exist" for your link FYI.

valleyer · on Oct 13, 2019

More details on how this works, for Google at least, here:

https://developers.google.com/safe-browsing/v4/update-api#ch...

user82738 · on Oct 13, 2019

This is only one of the APIs. Full doc at:

https://developers.google.com/safe-browsing/v4

iamspoilt · on Oct 13, 2019

I am getting a "Sorry, that page doesn’t exist!" for the twitter link you shared.

bighi · on Oct 13, 2019

It's better than sending URLs to Google, in my view.

pearjuice · on Oct 13, 2019

Remarkable when people become upset when it is explicitly stated your mobile tracking device sends information to third party servers, but deep down we all know the dangers are in what is not explicitly stated.