Hacker News new | past | comments | ask | show | jobs | submit login
Re: The Spirit of Free Software, or the Reality (debian.org)
97 points by programmernews3 on July 15, 2015 | hide | past | favorite | 72 comments



So, from what I gathered it's just the favicons of the search engines, some mozilla country stuff (not sure why) and google safe browsing (which you can turn off and is a good feature for casual users) So what's the issue here? All this isnt really a problem and safe browsing is, in my opinion, even good for normal users.


I'm pretty sure that these get requests don't even fetch with cookies, so fingerprinting is probably impossible here. The most info they can get is "hey, this ip uses Firefox". Harmless in itself. It can be compounded with more info to track someone, but all of this info already contains IP/browser info so this doesn't help at all.


> Harmless in itself.

If I had to summarize the bad security design that is epidemic in the software industry, this would be a good candidate. Security is about always being vigilant and minimizing potential risks, and not the roughly boolean categorization that most people use, where everything judged "safe" or "unsafe".

TL;DR - Stop using "default allow"!

You would think that programmers (and engineers in general) would understand that understand how small, simple pieces can become incredibly useful when you find a clever way to combine them. Unfortunately, the lesson of "information hiding" - and all of the stuff we derive from it, like {object,aspect}-oriented programming and other encapsulation techniques - end up being ignored when discussing security.

Applied to web browsers, the burden of proof should be to justify why a request is both necessary and safe. The "necessary" criteria is important, because we do not know how this data could be combined in the future, possibly in damaging ways. A good example of this is how the NSA uses Google's PREF cookies[1] to track sessions. Another example is panopticlick-style browser fingerprinting, where every bit leaked can make the fingerprint more accurate.

The fact that this is about favicons is a perfect demonstration of how this "default allow" attitude. Not only is that not necessary, it can be supplied with the browser[2], completely removing the need for any GET request. Minimizing external requests isn't even considered.

[1] https://www.washingtonpost.com/blogs/the-switch/wp/2013/12/1...

[2] copyright and trademark are poor excuses - if the search engine wants to be in the default set of search tools provided by the browser, they can trivially authorize the redistribution of a a simple icon.


>copyright and trademark are poor excuses - if the search engine wants to be in the default set of search tools provided by the browser, they can trivially authorize the redistribution of a a simple icon

Debian's policy does not allow them to accept "special dispensation" for copyright and trademark permission - either everyone has to get it or Debian can't accept. This is part of why Iceweasel was split from Firefox in the first place:

Per the Debian social contract:

>License Must Not Be Specific to Debian

>_The rights attached to the program must not depend on the program's being part of a Debian system._ If the program is extracted from Debian and used or distributed without Debian but otherwise within the terms of the program's license, all parties to whom the program is redistributed should have the same rights as those that are granted in conjunction with the Debian system.


Why would you assume that such permission would be "special" to Debian? It's an icon. There is zero reason for a search engine to restrict such use (and distribution).


>There is zero reason for a search engine to restrict such use (and distribution).

Because these logos depict their trademarked images, and the licensing that they'd offer would likely not allow use for any purpose. Debian's social policy requires that everything included in their distribution be freely licensed for any use.


> and the licensing that they'd offer would likely not allow use for any purpose

Then they don't want to be in the default set of search providers. Seriously, has the art of negotiating been lost completely? Google/Yahoo/etc need that strategic placement far more than they need to enforce some minor point about their logo's trademark. Debian's social policy doesn't mena they have to let themselves be browbeat by businesses without even an attempt at negotiating.

Do you really think Google or Yahoo would just say, "No, we don't care about being in Firefox/Iceweasel's default search list."? Would their shareholders be happy about losing market share over want of a trivial licence?


>Do you really think Google or Yahoo would just say, "No, we don't care about being in Firefox/Iceweasel's default search list"?

Given that they haven't freely licensed their images and this has been an issue for some time I'm going to have to say the answer is "yes" because Firefox/Iceweasel download the missing images on startup.


> simple pieces can become incredibly useful when you find a clever way to combine them

Here's the thing. There's nothing to combine this info with that doesn't already have this info. IP+User agent. This info is sent with every request and any other gathering of data will contain these anyway.

So it's not just harmless in itself, it's harmless, period.

How is this different from OCSP pings and stuff like that?


I find it difficult to believe you cannot answer that question on your own.

The difference is that OCSP provides a needed service, the very purpose being related to a TLS connection that the user requested. Checking the CRL is an important part of the TLS security process, and it would be stupid to ignore it.

A favicon provides no security benefit, and is entirely optional.

Yes, it may be true that making a GET request for both a CRL and a favicon might leak more or less the information. That isn't relevant, and misses the point entirely about minimizing the network. We don't know what data is useful. It is entirely possible that the situation can change in the future and previously harmless data can become a part of something greater.

Minimizing network use to what is necessary is an implementation of the "default deny" policy, and it is the only sane security policy because we cannot predict the future. Enumerating badness[1] is always going to end up playing catch-up.

[1] http://www.ranum.com/security/computer_security/editorials/d...


I was talking about the safe browsing request. Arguably that's even more important to the average user than TLS safety. MITMs are rare, malware sites are plenty.

I do think that the favicons could just be integrated.

I don't think there's any point minimizing the network in this specific case. "We don't know what data is useful.", sure, but any other data this can be compounded with _already contains the same information_.

Though yes, stuff could change in the future. I can't see how it could change in any way to make this useful, but you're right, this isn't something we can predict. That point is valid.


> Harmless in itself.

Since we know NSA is reading this, it's telling them that you (and they have a good idea who you are if you are connecting from a residential line) just launched your browser. I'd say that's pretty intrusive.


Really? Aren't you going to use the browser in about 5 seconds? I'm sure the NSA cares that I checked my mailbox yesterday as well.


They don't care about that you checked your mailbox. The problem is that they saved your action for future. Everything you do is saved. That does not mean that they care. But in case they need to care in future. Everything you have done is logged.


In the case of Debian, it tells them you are using a specific version of iceweasel, which probably is quite deterministic in combination with the rest of the dragnet.


Whenever a new version of Iceweasel hits the repos, a bunch of people upgrade, and perform identical requests. You can figure out how frequently someone checks for updates I guess, but other than that what would you learn? The browser hasn't had the chance to acquire any information that would distinguish you from any of the other Debian users yet.


Well, that info in the URL probably means nothing.

The user-agent is sent for all those connections


I recall reading something on the internet that IP + browser fingerprint is good enough to unique identify a large number of people. Has this changed or otherwise untrue?


Getting the type of "browser fingerprint" they're depending on here requires a bunch of Javascript. You can't get that data from just looking at a single request, which is all that they're getting here.


I think IP + useragent + locale (which is sent with every HTTP request) is enough to pinpoint most users.


Especially Debian users ("iceweasel" in the user agent string!)...


I made an add-on that tries to help with that: https://github.com/avian2/thawed-weasel


It's not so much 'Debian users' as 'Iceweasel users' — you can install Firefox from source on Debian, and even use Mint's package repo to install it.


A quick click here convinced me that this is still the case: https://panopticlick.eff.org/


This is a handful of GET requests for images. You would need a page to fingerprint. The site could get an IP, likely the user agent. Safe browsing requests are sandboxed from all the other Google cookies.


Geez did you guys even look at the URLs?

https://safebrowsing.google.com/safebrowsing/downloads?clien...

Isn't the API key your fingerprint? The key is not shown here, because the author was showing what the requests look like on first run, subsequent requests would contain your unique ID in them.

Not picking on you specifically, I saw several responses saying the same thing.


> Isn't the API key your fingerprint?

No, it's a per-application key, so in this case it would be IceWeasel's key.


In that case why would it make a request with "key=no-google-api-key" then why not use the IceWeasel key right from the start?


No clue, but it's not like we have to guess about what it is. It's all covered in the Safe Browsing docs: https://developers.google.com/safe-browsing/lookup_guide#Get...


You sure those are images? All we know for certain is that those are GET requests.


I'd be willing to bet that Firefox would treat them as such, even if they returned something different. Still, those favicons should be bundled instead of making requests, IMO.


Because we can see the sum of the requests, I think it's safe to say that they are probably just images.

There's nothing to stop, say, ebay from serving PHP scripts on ICO extension, processing the request as a pageload and then ultimately returning the icon file, but for anything useful to have been gleaned, it would have generated more requests.

Either it did generate more requests and Mozilla didn't honor them (in which case, yay), or it didn't generate more requests (in which case, yay). The former would be slightly preferred as the latter doesn't prevent ebay from later changing their strategy.


Browser fingerprint requires at least Javascript, and to do it properly you need the ability to run Java/Flash too. These are HTTP requests, not HTML pages which are opened in the browser. Since its not using cookies, these requests only contain the user agent string, which is very far off from any useful kind of tracking-fingerprint.


I have a few problems with it, actually.

Number one is that it's happening when users don't expect it. As an end user, I don't expect my browser to start making requests until I tell it to.

And it's silly to say the information is "harmless in itself," because most of the information Google, Facebook, Yahoo, etc. collect is harmless in itself. The whole point of those companies is to collect as many tiny pieces of "harmless" information as possible to build up profiles about people.

TBH, I don't trust Google at all on privacy matters any more. The way they try to tie my work Gmail to my personal Gmail, to my youtube viewing and cram everything into Google+ has really rubbed me the wrong way, and I've been moving away from their services as much as possible. The last thing I want is my browser contacting them without my knowledge.


> TBH, I don't trust Google at all on privacy matters any more

Options -> Security -> deselect "Block reported attack sites" and "Block reported web forgeries". Easy.


After you've already launched the browser and sent info to Google...


The information "hey I have an IP and I'm using this browser, and I have a browser at this time" is going to be send A LOT when using the browser for what it's made for. The problems come later when sending every URL to another party (safe browsing). Also from google "Privacy: API users exchange data with the server using hashed URLs, so the server never knows the actual URLs queried by the clients." So it's possibly safe ?


I'm totally Google free save whatever traces of Google tech are in CyanogenMod on my phone and I can't say my life has been negatively affected by going this route.

There are alternatives for everything out there.


This is what happens when folks with a radically-non-mainstream view of privacy try to use an app built for mainstream folks by folks with slightly more mainstream opinions about privacy.


If Iceweasel is not hardline about freedom and privacy, what does it offer compared to vanilla Firefox?

Edit: Since this is getting upvotes, I was wrong and pigeons is right.


The sole purpose of Iceweasel is to not be subject to any restrictions (mozilla approval of changes) that may come with distributing trademarked "Firefox" software.


I looked into it further, and know I see where my confusion came from.

    [Debian] Iceweasel is a fork [from Firefox] with the following purpose :

    backporting of security fixes to declared Debian stable version.
    no inclusion of trademarked Mozilla artwork (because of #1 above) 

    Beyond that, they will be basically identical. (quoting Roberto C.
    Sanchez post in debian-devel mailing list)
But there was another Iceweasel, GNU Iceweasel. To avoid confusion with Debian Iceweasel, it has been renamed to GNU Icecat.

    GNU IceCat, formerly known as GNU IceWeasel,[3] is a free software
    rebranding of the Mozilla Firefox web browser distributed by the GNU
    Project. It is compatible with Linux, Windows, Android and OS X.[4]

    The GNU Project keeps IceCat in synchronization with upstream development
    of Firefox while removing all trademarked artwork. It also maintains a
    large list of free software plugins. In addition, it features a few
    security features not found in the mainline Firefox browser.
This article is about Debian's Iceweasel, which is why my comment was wrong.

https://wiki.debian.org/Iceweasel https://en.wikipedia.org/wiki/GNU_IceCat


It's not obvious to me that "I want my browser to tell Google about every site I visit" actually is the mainstream view of privacy.


"Privacy

Google maintains the Safe Browsing Lookup API, which has a privacy drawback: "The URLs to be looked up are not hashed so the server knows which URLs the API users have looked up". The Safe Browsing API v2, on the other hand, has the following privacy advantage: "API users exchange data with the server using hashed URLs so the server never knows the actual URLs queried by the clients". The Firefox and Safari browsers use the latter." https://en.wikipedia.org/wiki/Google_Safe_Browsing#Privacy


Heh. The next paragraph after that quote is:

> Safe Browsing also stores a mandatory preferences cookie on the computer[9] which the US National Security Agency allegedly uses to identify individual computers for purposes of exploitation.[10]

That may or may not be true, but must one be a radical to be concerned?


It's true, but slightly misleading.

If you open firefox and browse to a few sites, it will send that cookie. If you then take your computer down to the coffee shop and keep browsing, even if you don't log into anything, it will still send that cookie in the clear.

There are other ways that the NSA can figure out a list of IP addresses you've been using, but this is 1) totally silent, and 2) is common to a lot of systems.


[deleted]


Firefox's safebrowsing feature uses a separate cookie jar, so if you are logged into Google those cookies will never be sent via the safebrowsing API.

Also, Firefox hashes the URL and compares the prefix of that hash to a master table downloaded from Google. If the URL matches a prefix in the table, Firefox requests all URLs that begin with that prefix hash. Google is never sent the full hashed URL.


> "API users exchange data with the server using hashed URLs so the server never knows the actual URLs queried by the clients".

Privacy Theater. No real information is lost since all you need to do is have a database of domains and boom, hash easily reversed.


Please tell me you didn't just edit that wikipedia article and cite your own comment.

Not only is that not a valid wikipedia cite, it's not even right. Only hash prefixes are ever sent to Google, and only if it's already tested locally that the hash prefix includes malicious sites.

Humorously this exact same exchange took place in the linked conversation: https://lists.debian.org/debian-devel/2015/07/msg00232.html

edit: wow, it was you. https://en.wikipedia.org/w/index.php?title=Google_Safe_Brows...


> wow, it was you.

Yes, I did that for two reasons:

- I couldn't find a better link for the citation and it seemed like rather important info that should be in the wiki. Maybe someone else would find a better one to replace it with.

- I figured linking to an HN discussion would serve as a great citation, even if I was mistaken about something. Looks like HN didn't disappoint. :)

EDIT: I've updated the wiki text to remove my previous edits and added a mention about the use of hash prefixes.

> Not only is that not a valid wikipedia cite, it's not even right. Only hash prefixes are ever sent to Google, and only if it's already tested locally that the hash prefix includes malicious sites. Humorously this exact same exchange took place in the linked conversation: https://lists.debian.org/debian-devel/2015/07/msg00232.html

Thanks for the link and pointing that out, I stand corrected. I'm curious to know how big the prefix is. Depending on its size this either remains privacy theater or not.


> I'm curious to know how big the prefix is

I haven't looked at it in depth for a little while, but according to this it's the first 32 bits of the 256 bit hash:

https://developers.google.com/safe-browsing/developers_guide...


You're making a substantially different claim than the article. Where did you get the idea that it's sending every URL to google?


The follow up email.


Is false. Google is big, but even they couldn't handle the load of a large fraction of the world's web browsers sending them a request for every page loaded. That'd be insane.

Google Safe Browsing is based on a Bloom filter. The browser downloads the filter in a series of requests when it first starts up, or when the filter is out of date. It also sends a followup request if it finds a hit, but this is rare unless you're actually about to visit a site that's been flagged as unsafe.


Maybe? They essentially do this with Google Analytics? No?


Not to mention Google Chrome's search suggestions?


> It also sends a followup request if it finds a hit, but this is rare unless you're actually about to visit a site that's been flagged as unsafe

And the followup still doesn't send the URL or even a hash of the URL. It sends just a prefix of the hash to download all URL hashes matching that prefix to do the comparison to the actual current URL locally.


Apparently I need to clarify.

I doubt most mainstream users are even aware of safe browsing or how it works. They could very well be ok with it, but that is not obvious to me.

Nor would I say that a desire to disable safe browsing represents a particularly "radical" view.


Define "I". A GET request for a favicon sans cookies doesn't say much about "I", and doesn't leak anything useful.


I thought it was just downloading favicons? Is the POST request suppose to be sending stuff?


The POSTs aren't sending anything important. The POST to google is documented here: https://developers.google.com/safe-browsing/developers_guide...

Quote:

  The request body is used to specify what the client has and wants:

  * The client optionally specifies the maximum size of the download it wants to retrieve.
  * The client specifies which lists it wants to retrieve.
  * For each list, the client specifies the chunk numbers it already has.


At Webconverger I've been working on whittling these leaking issues down, by wiresharking Firefox, e.g.

https://github.com/Webconverger/webconverger-addon/issues/42 https://github.com/Webconverger/webconverger-addon/issues/41 https://github.com/Webconverger/webconverger-addon/issues/43

Though with things like https://bugzilla.mozilla.org/show_bug.cgi?id=1100304 and anti-features like http://dustri.org/b/firefox-youre-supposed-to-be-in-my-pocke... you got to wonder if Mozilla has stopped caring about privacy.


It's interesting, especially since Firefox currently heavily uses "privacy" as their selling point:

>Committed to you, your privacy and an open Web [1]

>We’ve always designed Firefox to protect and respect your private information. That’s why we’re proud to be voted the Most Trusted Internet Company for Privacy. [1]

>When it’s personal, choose Firefox. [2]

[1]: https://www.mozilla.org/en-US/firefox/desktop/

[2]: https://www.mozilla.org/en-US/firefox/new/


These browsers are all constantly accessing many sites. Here's[1] a comment I posted a month ago about Firefox. The summary is here's (at minimum) what Firefox accesses when it starts up as a Guest in OS X, and this is after I unchecked a bunch of boxes:

   self-repair.mozilla.org
   snippets.cdn.mozilla.net
   search.yahoo.com
   location.services.mozilla.com
   www.mozilla.org
   tiles.services.mozilla.com
   safebrowsing.google.com
   aus4.mozilla.org
Try it yourself as Guest. But make sure that Parental Controls are on. That way OS X will popup these sites and ask permission. Firefox is unusable w/o opting in all of these.

[1] https://news.ycombinator.com/item?id=9743799


The whole shrugging off and downplaying of issues like this are exactly the reason the internet is complete shit when it comes to security.

Why, after all the exploits, insecure software and bad decisions, can people still not see that they can't anticipate everything?

For instance, here's a scenerio I can easily envision: The NSA strongarms Ebay into letting them sniff TCP connections to their favicon, combines the TCP fingerprint with the browser useragent to uniquely identify you from perhaps millions of other users. Geolocate your IP to determine where you are and bam they know all about you they need to know. Tin-foil hat? Of course. Plausible? Totally. Doable? Absolutely. They don't need to be perfect, just good enough.


Wait, we all realize these are grabbing the favicons for the bookmarks right? This list of .iso's reads to me like a list of default bookmarks.


I also wonder how much work it would be to make a version with these striped out. Replacing functionality with a NOOP, is often not that hard.


The problem is that it's a big responsibility. You can't just do it once and dump it on the internet. It has to be kept up to date with the latest Firefox versions, and it has to have prompt releases (within a day). You also need to be absolutely sure that your changes aren't creating more problems than they're fixing.


   127.0.0.1 www.google-analytics.com

   127.0.0.1 ssl.google-analytics.com

   127.0.0.1 www.hosted-pixel.com # I Swear I'm Not Making This Up
On some but not all operating systems it's better to use 0.0.0.0.

It's better to block it with a firewall but your aged grandmother doesn't know how to configure them.

iOS and I expect Android have hosts files but you must jailbreak to edit them. On iOS you can do that with iFile from the Cydia store. iFile once cost money but it's free now.


My Grandmother's a real ace when it comes to editing hosts.


Your grandmother probably doesn't use Ice Weasel, and probably doesn't care it's making requests.


This does absolutely nothing wrt the URLs in the linked email.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: