You could either stop using these services or (as I suspect) you find them too valuable to dismiss entirely quarantine them to a VPN/incognito interaction in less time than it took to type that comment.
I don’t want to single you out personally but there’s a broad trend on HN of bitter-sounding commentary on the surveillance powers of these companies by people who can easily defeat any tracking that it’s economical for them to even attempt let alone execute that reeks of sour grapes that a mediocre employee at one of these places makes 3-20x what anyone makes (as a rank and file employee) anywhere else.
Again, you’re not likely part of that group, but seriously who hangs out on HN and can’t configure a VPN?
How do you stop using a service when you have little or no indication that it does something like this before hand, and afterwards the privacy is already gone?
If I use a site and view my profile page and the url contains aa account id or username and some google or facebook analytics is loaded, or a like button is sitting somewhere, how am I to know that before the page is loaded? What if I'm visiting the site for the first time after it's been added?
It doesn't even matter if I have an account on Google or Facebook, they'll create profiles for me aggregating my data anyway.
> quarantine them to a VPN/incognito interaction
Which does very little. I spent a few hours this morning trying to get a system non-unique on panopticlick, but the canvas and WebGL hashing is enough to dwarf all the other metrics. There are extensions to help with that, but for the purpose I was attempting, were sub-optimal (and the one that seemed to do time-based salting of the hashes wasn't working right).
So, I don't have any confidence that a VPN and incognito really does much at all.
> How do you stop using a service when you have little or no indication that it does something like this before hand, and afterwards the privacy is already gone?
It is small comfort for the average user, but the way you do it is use noscript. It makes the web awful, sure, but it won't happen to you.
> It doesn't even matter if I have an account on Google or Facebook, they'll create profiles for me aggregating my data anyway.
I sort of wonder what you envision this actually meaning. If I spam your website and you add a DoS filter for my IP, should I complain you made a profile of me? If when a user tries to log in I check the referrer to see if it contains a proper URL, have I violated your privacy?
> I sort of wonder what you envision this actually meaning.
I mean it to respond to the common response people sometimes give in conversations like these, which is "that's why I don't use Facebook" or "that's why I stopped using Google services". For this conversation, whether you use Facebook or not is irrelevant, they still gather your information, and in the same way myriad other advertisers (or however they bill themselves) do through online tracking. Google and Facebook are large, and have a portion that's easily visible, but they are not the whole problem by a long shot.
> If when a user tries to log in I check the referrer to see if it contains a proper URL, have I violated your privacy?
No. Noting which door a customer came into your store seems fine to me. That by default customers come in wearing the logo of the last store they visited is weird, but entirely something they can control. Having people shadowing all your customers while in the store looking and listening for tidbits they can report back on to get more info about those people is pretty creepy. As you suggest, the way to get around most of that is to dress blandly and say nothing.
Here's the thing, we're a market economy. There's a transaction going on, where we're trading away something (our information and privacy) to a company for some product, or possibly the right to view a product we might consider buying. How many people are actually aware of this transaction? If they aren't aware of the transaction, there's a name for that when it's a regular good, and it's theft (or fraud). The difference here is that most of our government systems don't apply any rights of ownership to this information, so our regular rules don't apply. I admit, they may not make sense to apply entirely, but at the same time, it's obvious that something is lost in the transaction, whether the person losing it realizes it at the time, or views it as important enough to make a big deal about when they notice.
> Google and Facebook are large, and have a portion that's easily visible, but they are not the whole problem by a long shot.
I meant more like in a literal sense, but okay. Point taken.
> No. Noting which door a customer came into your store seems fine to me. That by default customers come in wearing the logo of the last store they visited is weird, but entirely something they can control. Having people shadowing all your customers while in the store looking and listening for tidbits they can report back on to get more info about those people is pretty creepy. As you suggest, the way to get around most of that is to dress blandly and say nothing
These human metaphors are powerful, but don't map at all to basic analytics concepts. There is no person watching you. There is no intelligence judging you. There are a series of conditions in a deterministic system provoked by your actions. If we could have done this before now, we would have because it's a whole hell of a lot more ethical.
> Here's the thing, we're a market economy.
I dunno where you are but I'm in the US which is most definitely not "a market economy" without a whole hell of a lot of qualifiers.
> There's a transaction going on, where we're trading away something (our information and privacy) to a company for some product, or possibly the right to view a product we might consider buying. How many people are actually aware of this transaction?
Roughly as many, I imagine, as folks who realized the shopkeeper could see them enter and leave. Most folks know local proprietors can and will kick you out and put up a photo if you act up.
> The difference here is that most of our government systems don't apply any rights of ownership to this information, so our regular rules don't apply.
This is just flatly false. I don't know what you're thinking writing this, but it's clearly neglecting copyright and patents. For what it's worth, I think the later is a bad system an the former is in desperate need of reform to sharply limit it.
> it's obvious that something is lost in the transaction, whether the person losing it realizes it at the time, or views it as important enough to make a big deal about when they notice.
I am trying to read your comment in the spirit it was intended rather than the literal delivery, so please forgive me if there is a subtle impedance mismatch here but...
Welcome to the future, I guess? The top 50% earners of the world has access to computers that would have once bankrupted a nation to produce, and the options are still surprisingly good for the next quartile. With that power, it means that the people around you are going to start noticing things and making decisions about them with the information they can now process.
Ideally, this will be a distributed thing, but right now due to the nature of our society, authority of this sort is highly concentrated. But the dam has broken. A total surveillance system for up to a modestly sized city, with realtime tracking and long term data storage, is well within the reach of anyone with $10000USD to spend on hardware. They can self-host it. The banality of this cannot be overstated. It's boring to do this now. It's not new ground. So much so that average people can monitor their homes with it, or know if their friends have gone missing with it.
To some extent, there is just no undoing this. Society will have fewer secrets and those secrets will be much more deliberate, and the only response that can work is to change your attitude.
> There is no person watching you. There is no intelligence judging you. There are a series of conditions in a deterministic system provoked by your actions.
I don't think it's creepy because there's a (theoretical) person watching me, I think it's creepy because they're cataloguing all my actions in a systemic was which pierces the veil of perceived privacy (mostly through anonymity).
> I dunno where you are but I'm in the US which is most definitely not "a market economy" without a whole hell of a lot of qualifiers.
I'm not sure how to respond to this without a specific criticism of how you think it's incorrect. That said, it's somewhat tangential to the point, even if it would be an interesting conversation.
> Roughly as many, I imagine, as folks who realized the shopkeeper could see them enter and leave.
I don't know. If every time I entered my local 7-eleven someone picked up a clipboard, flipped to a specific page, looked back at me, nodded to their self and then marked something on the page, I might decide to go somewhere else, at least most the time. If I knew the info was shared with all the other 7-elevens, and the local grocery chain, and some hardware stores, that makes me want to use all the places less.
> This is just flatly false. I don't know what you're thinking writing this, but it's clearly neglecting copyright and patents. For what it's worth, I think the later is a bad system an the former is in desperate need of reform to sharply limit it.
I said "this" to qualify what I was referring to (personal information) and distinguish it from other types of protected information, of the type you reference.
> To some extent, there is just no undoing this. Society will have fewer secrets and those secrets will be much more deliberate, and the only response that can work is to change your attitude.
I don't think that's the only response that can work. It's the only one that works completely, as deciding to not care is always a solution to caring, if you can pull it off.
The alternative is new laws. Are they perfect? No. Will they solve the problem adequately? Likely not. Do they have a chance of making a positive difference across the board for massive amounts of people by empowering them with regard to their own information? I dunno. Maybe? I think it's worth pushing for though. Otherwise, why do we have minimum wage and labor laws? At some point we could have thrown our hands up and said "screw it" about that stuff, but people pushed for it, and while they aren't perfect, I think we're all better off for them.
I don't believe there will be any perfect solution to this ever, or even a good or acceptable solution all that soon. I do think it's still worth raising my voice over, because I think there are some possible futures that are better than others with regard to privacy and personal information, and I think that's worth pushing towards.
You use something that blocks scripts (like uMatrix) with an aggressive ruleset. On some sites you'll need to allow things to make them work. If they are loading trackers from the same servers that they load content from, you can't do much without wasting more time than you want. I'd say it breaks most of the tracking though.
More sites than you'd expect work without js or with first-party js only. It's annoying when you need to read a news site, because those are usually bloated garbage. Not a huge loss.
This was already with uBlock Origin. Also tried combinations of Ghostery and Privacy badger. All of it made very little difference for panopticlick, and that's probably a low-bar compared to what's common these days.
I don't care if every site that I browse using this VM knows that I'm Mirimir. I don't even try to hide that.
What matter is that my personas using other VMs, through other VPNs or Tor, don't get linked to my meatspace identity, to Mirimir, or to my other personas. And that's doable, I think.
Yes, and you go through quite a lot of effort to achieve that, given your other comment.
My main point is that the amount of effort you have to go through to achieve that is very high, and I wish it was considerable lower. There are technological changes that could help with this, and legal changes that could help with this.
I think a comfortable place would be if you visit the same online location using your main browser using one IP, and a private browsing instance of that same browser on another IP (through a VPN, proxy, or just new public lease), it would be nice if there was some expectation they didn't immediately have a high degree of certainty you were the same individual. For the general populace, this falls on its face.
Tor has quite a few mitigations to help here (e.g. simulated window/screen values), and Firefox has started to adopt some of them, but as mentioned here on HN frequently, Firefox sometimes has problems with CAPTCHAs and certain sites (I haven't had those problems, but I'm also not usually using it through a VPN), and I know Tor is sometimes blocked outright.
The point is that until most these protections (technological and hopefully some legal) are mainstream, completely protecting yourself is a double edged sword, since you also ostracize yourself from some sites and services. Tor is the equivalent of walking around in padded, baggy clothes and a ski-mask. Sometimes, like in the snow, it may seem fairly normal. Other times, like at the beach, it may preserve your privacy, but it's very uncomfortable and may cause people to avoid you, if not outright shun you and run you off. If everyone starts wearing masks and covering their hair, if you do the same you probably have a fairly high degree of anonymity and privacy through it.
In summary, I think Tor is a useful and necessary tool, but nowhere near sufficient for where I think we need to be generally.
> Yes, and you go through quite a lot of effort to achieve that, given your other comment.
That's true. However, it's mostly one-time effort. There are Linux and TrueOS workspace VMs, pfSense VMs as VPN gateways, and Whonix gateway and workspace VMs. All in VirtualBox.
There's ~no configuration required for the Whonix VMs. You just need to point the gateway VM to the pfSense VM that ends the desired nested VPN chain. And if there are multiple Whonix instances, rename the internal network that the gateway and workspace VMs share.
For the Linux and TrueOS workspace VMs, it's just like any OS install. You do have more machines to maintain, but mainly that's just keeping packages up to date. All of the devices are virtual, so you don't have driver issues.
Setting up the pfSense VMs is the hardest part. But once that's done, you can use them for years. pfSense is pretty good about preserving setup for OS upgrades. And there's a webGUI for changing VPN servers. But it's harder than using a custom VPN client.
So yeah, it's not so easy. However, someone could write an app that papered over most of the ugly parts. That even automated VM setup and management.
No, a clean browser and IP with the combination of what fonts I have installed, how my video card renders a canvas and WebGL instance (which may be affected not just by the video card you have, but the driver version used with it), my screen size, and a few other system level items that come through may or may or may not be enough to uniquely identify you. Along with linking to a prior profile if you screw up one time (or load a URL that has identifying information they can use), and you're busted.
So, sure, a clean browser and IP and never logging into a site you're previously visiting might be enough, but who does that, and doesn't that halfway defeat the purpose?
My meatspace identity uses a desktop that hits the Internet directly. It displays no interest in technical matters. Just banking, cards, shopping, general news, etc. It never accesses HN, or any of the other sites that Mirimir uses. Or that any of my other personas use.
Mirimir uses a VM, on a different host machine, and hits the Internet through three VPNs, in a nested chain. Some other personas use different VMs on the same host, connecting through different nested VPN chains. Some are Whonix instances, connecting via Tor, and reaching Tor through nested VPN chains.
So basically, each persona that I want isolated uses a different host machine and/or VM, a different browser, and a different IP address.
I appreciate the information-theoretic validity of your argument, but if you think that one of these firms cares enough about your buying preferences to burn enough compute to find that correlation then you either work for the CIA or are mistaken.
It doesn't take a lot of compute resources to have multiple profiles, and when evidence of a high assurance level (a referring URL that is known to designate a specific user of a major service) to link it with other profiles that also have that designation.
To me, that seems par for the course for any service that's generating profiles of browsing behavior and trying to make any sort of decisions based on it. It reduces cruft and duplicate profiles while also providing more accurate information. Why wouldn't it be done?
> the information-theoretic validity of your argument
The portion about canvas, WebGL and AudtioContext hashing is not theory at all, it's well known practice from years ago. Jest the other day here there was a story about some advertiser on Stack Overflow trying to use the audio hashing to tracking purposes.
Hell, if you get enough identifiable bits of entropy, you can probably assume weak to strong level matching using a bit-level Levenshtein distance that's low enough.
GitHub is always at your disposal. NV doesn’t sell the consumer cards to enterprises. So on AWS a multi-GPU box will cost you about 12 dollars an hour. If you can disambiguate, let’s just say 85% of profiles absent IP or cookies, well I think you just broke the academic SOTA and I’d love to make some calls.
whose results would have you believe that one's footprint is very unique. I'd be interested in hearing more about why this is hard to implement into an efficient process.
> GitHub is always at your disposal. NV doesn’t sell the consumer cards to enterprises. So on AWS a multi-GPU box will cost you about 12 dollars an hour.
I don’t see how this is related to the claim, since it doesn’t solve the problem. But the advertising company that I let run code on my website will certainly do the job pretty well, I’d say.
I was pointing out that it’s a commercially applicable of a very strongly worded claim that I know would be expensive to test because I’m optimizing GPU intensive code at the moment. I don’t know where in this thread I generated so much ill will for trying to add knowledge to the conversation, but I’m not making shit up.
There are tools that will supposedly do this to a high degree of accuracy. Are you saying that they are fake/don't work as well as they'd want us to believe?
> You could either stop using these services or ...
Are you serious? Have you tried not using their services? Try blocking Google Analytics, Tag Manager, ReCaptcha, fonts, gstatic,... What you will see is that you can no longer access much of the Internet. Want to participate in StackOverflow? Good luck if you block Google.
My beef is not with them trying to find my data when I'm on their site(s). They are however everywhere, on almost every site I visit. Coupled with their (impressive) technical provess it is beyond creepy, and there is simply no way one can avoid them.
I don't know what the solution is or will be, but as far as I'm concerned, this should be illegal.
Blocking those two doesn't seem to break much, does it? I have uBlock Origin and/or Privacy Badger block them everywhere.
ReCaptcha on the other hand…
Just this week I needed it to complete the booking of an airline ticket and just now buying a high chair for my son. And today I've completed the blasted thing ten times in a row because of a game installer that was failing at a certain point (GTA V's Social Club thing); each attempt to figure out what was wrong meant completing the ReCaptcha again.
Fire hydrants, parking metres, pedestrian crossings, road signs, hills, chimneys, steps, cyclists, buses — that's what the internet looks like in 2019.
The costs of compliance are not too high. Compliance is actually ridiculously easy for new companies: they need to collect only the data they need. That is all there is
Yes. Your point? It’s actually ridiculously easy to be compliant with GDPR.
Edit: That is, ridiculously easy for new companies. Incumbents have been hoarding data for too long and it was actually harder for existing companies to become compliant.
I enjoyed reading what you said as a different perspective on the backend of ad technology vs privacy up until this comment thread.
I didn't build a profitable social consumer business in Europe after compliance, but I was part of a team that implemented compliance for a long existing company within the US due to them having clients and client's clients in Europe. They're profitable. Do you want my term sheet? Or are you weakly attempting to flex while complaining that people's basic right to privacy is preventing you from earning obscene amounts of money?
As I’ve mentioned I think elsewhere in the thread I left that business in no small part because it didn’t feel right to be in anymore. It was at a significant cost. I’m really lost on where in the thread I started to sound like a shill for business practices I (knowledgeably) don’t care for.
This! I hope it costs them dearly. I have never (willingly) given them consent to have my data, yet I know they have loads of it, just because other people I know are careless with data about me.
No you can't. Facebook creates shadow profiles for every single person in the world. If any single one of your friends has WhatsApp, Facebook has your phone number. They have your phone number and the entire address book of your friend, who probably has friends in common. If two of your friends have WhatsApp and they both have your number...
You see where I'm going here? There are pictures of me on Facebook that I did not put there. From friends or friends of friends.
I'm not even scratching the surface of what Google knows with GPS and WiFi connections.
> The question is “If it weren’t FB who would be doing it instead?” [...] “Should cheap digital cameras be illegal?”
Those are a complete non-sequitur.
Facebook (and Google) analyse every single photo that goes through their system with state-of-the-art ML (it's so good that it almost beat humans at matching faces ~5 years ago). This is a scale of surveillance which the human race has never encountered before in our history[+], and is a serious problem that we (as a society) need to make a decision on. In many countries, car license plates are OCR'd and automatically tracked whenever they travel on almost any main public road. Facial recognition in public places and on public transport is becoming a prevalent problem. And wearing masks is illegal in many countries -- meaning there is no way of "opting out" of the pervasive surveillance in the physical world. None of these things were nearly as commonplace (or even technologically plausible) ~30 years ago.
Cheap digital cameras are a completely unrelated topic. And if such large-scale surveillance was made illegal then nobody would be doing it legally, and those doing it would be held accountable for the public health risk they pose. We don't let people build buildings with asbestos any more.
[+] The Stazi and KGB only really had filing cabinets for tracking people and physical surveillance measures. The Gestapo didn't even have that (the Third Reich had census data which was tabulated using IBM machines in order to track who was Jewish within the Third Reich).
I think you overestimate the degree to which SOTA computer vision is applied to a lot of images online, and I think bringing East Germany into it is pretty out of line.
There's a very good reason to consider negative outcomes of the past in discussions such as this. Let's pretend companies like Google and Facebook are totally on the up and up; pretend the company that aims to facilitate a user tracking search engine for China that is doing things including literally blacklisting searches such as "human rights", is on the up and up.
The reason what these amazingly benevolent companies are doing and collecting matters is because the systems we build today are precisely what will power the dystopias of tomorrow. As the GP mentioned, Nazi Germany used census data to select and track their victims, aided by some primitive computational technology built for the Nazis by IBM. In spite of how primitive all of this technology was, it ended up being quite effective at enabling them to achieve their ends.
Now compare this to the systems we're building today. Genuinely bad people do, and will, manage to take power in any system. It's not a question of if, but when. And these systems that we're building will be at their disposal. It's the same reason that in politics if you're considering granting the government more power you shouldn't think about today, but about tomorrow. Not do I want "this" administration to have those powers, but do I want future administrations - whom I will vehemently disagree with, to have those powers?
Most people here can avoid the impact of climate change - do you think we shouldn't talk about that either?
These are societal problems. It's good to care about people beyond yourself, and to talk about the professional ethical responsibilities of software engineers with regards to corporate mass-surveillance.
How about our friends and family? Should we configure a VPN for them too?
Btw the argument you just made applies to any form of surveillance or censorship. Just because your can still find functional VPN services for China, is China's great firewall OK?
And what happens when web services start blocking VPNs?
Netflix does it quite successfully. And I'm sure Cloudflare could provide such a service for free.
Like I said: it’s not an argument, it’s an attack. Plus I’m sure that there’d be many people here able to counter your claim regardless of the compensation number you drop.
A VPN will not help you against advanced behavioral browser fingerprinting like in this new Captcha. Not only do they have lists of VPN servers anyway, if you inadvertently log into your Google account once from the VPN (e.g. by launching your browser from your normal account), then the VPN IP(s) will be forever associated with your account and normal IPs, and they already know from the Captcha data that you're one and the same person. All the VPN does is adding the information that you sometimes use VPN servers of company such-and-such.
We’re on a site premised on entrepreneurship, and you’re pointing out what sounds like a big market gap. I angel invest now and then, if you have a plausible way to make two billion people care about something that we agree could be better my email is in my profile.
Even from the inside I didn’t see a way, but I’ve been wrong before.
Yes, looks like the industry cannot solve that problem alone, just like the electricity and chemical industries somehow didn't achieve clean air and water out of the goodness of their hearts. Another market gap. Or, wait, a case for government regulation.
I appreciate your comments in this thread. But could you please stop baiting people on this point? If there's one thing I've learned from running HN it's that the generalizations about the community that people come up with are invariably wrong. They're overgeneralized from a small sample of what the generalizer happened to notice—and since we're far more likely to notice what rubs us the wrong way, the results always have have sharp edges. In other words, people remember most the things they most dislike, then tar the whole with it. To borrow your phrase, the actual TLDR is less interesting.
Thanks for the mild rebuke dang, I think you do a great job meta-moderating this community.
I wish I had stayed out of this from the beginning, I see no merit in arguing about whether HN has some themes. I’ve been watching it daily for a long time as you can tell from the age of the account.
If you want to do something that would be both a good call as a mod and a favor to a longtime user, just whack this whole thread. I was trying to chime in with some knowledge but just wound up pissing everyone off.
I have to say I strongly disagree—I thought your contributions were excellent, and HN lucky to have you contributing on a topic that you know a ton about. If I contributed to your feeling otherwise then I wish I hadn't posted!
One thing I can offer from years here is: never underestimate the silent readership (I'd say silent majority but...associations). The vast majority of readers don't comment and most don't vote either. It doesn't mean they aren't following and getting a lot out of what you wrote. Usually it's only the most-provoked segment of the long tail that is motivated to respond. That's fine, it's the cycle of life on the internet—but it doesn't represent the whole community.
> Again, you’re not likely part of that group, but seriously who hangs out on HN and can’t configure a VPN?
Recaptcha tracks users / devices, not IPs. A VPN won't help, it'll only lower your score. At that point: not allowing them to track you just means you can't use large parts of the web.
"You don't want that GPS tracker installed into your skull? Well, we won't force you, of course, but public transportation, government services and most grocery stores can only be used by GPS-skull-people"
Is it though? I'm somewhat lucky, because my government is generally technologically behind and loves literal paper trails, but yours isn't. Plenty of .gov sites use recaptcha. Sure, you can still visit those sites, it's just that, unless you pass a captcha test, they can't verify that you're actually a person (and not a Russian bot) and can't let you do certain things. If you want to use those government services, you need to allow Google to track you, or maybe they'll add a "sign in with Facebook" option so you have a choice.
With invisible captchas, you can't even sit down and solve a higher number of riddles to prove that you're really human and know what a fire hydrant looks like even though you look kinda strange. If Google doesn't believe that you are human, tough luck. Unless you have a personal connection or a solid Twitter following that an amplify your concerns, nobody at Google cares. Does your government care? It makes their life easier and normal citizens never really had problems with it.
DHL makes me solve a captcha to login and buy postage stamps. There probably are, or will be, public transportation companies that use recaptcha. It helps them to combat voter fraud (crime, abuse, election meddling, fake news, lots of things) if they know where (on the web, for now) you've been in the last 6 months.
You don't like the "implanting" part, because that's unrealistic? Just wait 20 years, and it may not be your head, but an RFID chip in your hand (yeah, those exist already). Until then, carry your gps tracker around and install their software on it, so it can collect data on your behavior to make sure that you're not a criminal.
It is not "wild speculative hyperbole" not to give the benefit of the doubt to companies that have repeatedly demonstrated that they are not entitled to the benefit of the doubt.
I don’t want to single you out personally but there’s a broad trend on HN of bitter-sounding commentary on the surveillance powers of these companies by people who can easily defeat any tracking that it’s economical for them to even attempt let alone execute that reeks of sour grapes that a mediocre employee at one of these places makes 3-20x what anyone makes (as a rank and file employee) anywhere else.
Again, you’re not likely part of that group, but seriously who hangs out on HN and can’t configure a VPN?