How do you stop using a service when you have little or no indication that it does something like this before hand, and afterwards the privacy is already gone?
If I use a site and view my profile page and the url contains aa account id or username and some google or facebook analytics is loaded, or a like button is sitting somewhere, how am I to know that before the page is loaded? What if I'm visiting the site for the first time after it's been added?
It doesn't even matter if I have an account on Google or Facebook, they'll create profiles for me aggregating my data anyway.
> quarantine them to a VPN/incognito interaction
Which does very little. I spent a few hours this morning trying to get a system non-unique on panopticlick, but the canvas and WebGL hashing is enough to dwarf all the other metrics. There are extensions to help with that, but for the purpose I was attempting, were sub-optimal (and the one that seemed to do time-based salting of the hashes wasn't working right).
So, I don't have any confidence that a VPN and incognito really does much at all.
> How do you stop using a service when you have little or no indication that it does something like this before hand, and afterwards the privacy is already gone?
It is small comfort for the average user, but the way you do it is use noscript. It makes the web awful, sure, but it won't happen to you.
> It doesn't even matter if I have an account on Google or Facebook, they'll create profiles for me aggregating my data anyway.
I sort of wonder what you envision this actually meaning. If I spam your website and you add a DoS filter for my IP, should I complain you made a profile of me? If when a user tries to log in I check the referrer to see if it contains a proper URL, have I violated your privacy?
> I sort of wonder what you envision this actually meaning.
I mean it to respond to the common response people sometimes give in conversations like these, which is "that's why I don't use Facebook" or "that's why I stopped using Google services". For this conversation, whether you use Facebook or not is irrelevant, they still gather your information, and in the same way myriad other advertisers (or however they bill themselves) do through online tracking. Google and Facebook are large, and have a portion that's easily visible, but they are not the whole problem by a long shot.
> If when a user tries to log in I check the referrer to see if it contains a proper URL, have I violated your privacy?
No. Noting which door a customer came into your store seems fine to me. That by default customers come in wearing the logo of the last store they visited is weird, but entirely something they can control. Having people shadowing all your customers while in the store looking and listening for tidbits they can report back on to get more info about those people is pretty creepy. As you suggest, the way to get around most of that is to dress blandly and say nothing.
Here's the thing, we're a market economy. There's a transaction going on, where we're trading away something (our information and privacy) to a company for some product, or possibly the right to view a product we might consider buying. How many people are actually aware of this transaction? If they aren't aware of the transaction, there's a name for that when it's a regular good, and it's theft (or fraud). The difference here is that most of our government systems don't apply any rights of ownership to this information, so our regular rules don't apply. I admit, they may not make sense to apply entirely, but at the same time, it's obvious that something is lost in the transaction, whether the person losing it realizes it at the time, or views it as important enough to make a big deal about when they notice.
> Google and Facebook are large, and have a portion that's easily visible, but they are not the whole problem by a long shot.
I meant more like in a literal sense, but okay. Point taken.
> No. Noting which door a customer came into your store seems fine to me. That by default customers come in wearing the logo of the last store they visited is weird, but entirely something they can control. Having people shadowing all your customers while in the store looking and listening for tidbits they can report back on to get more info about those people is pretty creepy. As you suggest, the way to get around most of that is to dress blandly and say nothing
These human metaphors are powerful, but don't map at all to basic analytics concepts. There is no person watching you. There is no intelligence judging you. There are a series of conditions in a deterministic system provoked by your actions. If we could have done this before now, we would have because it's a whole hell of a lot more ethical.
> Here's the thing, we're a market economy.
I dunno where you are but I'm in the US which is most definitely not "a market economy" without a whole hell of a lot of qualifiers.
> There's a transaction going on, where we're trading away something (our information and privacy) to a company for some product, or possibly the right to view a product we might consider buying. How many people are actually aware of this transaction?
Roughly as many, I imagine, as folks who realized the shopkeeper could see them enter and leave. Most folks know local proprietors can and will kick you out and put up a photo if you act up.
> The difference here is that most of our government systems don't apply any rights of ownership to this information, so our regular rules don't apply.
This is just flatly false. I don't know what you're thinking writing this, but it's clearly neglecting copyright and patents. For what it's worth, I think the later is a bad system an the former is in desperate need of reform to sharply limit it.
> it's obvious that something is lost in the transaction, whether the person losing it realizes it at the time, or views it as important enough to make a big deal about when they notice.
I am trying to read your comment in the spirit it was intended rather than the literal delivery, so please forgive me if there is a subtle impedance mismatch here but...
Welcome to the future, I guess? The top 50% earners of the world has access to computers that would have once bankrupted a nation to produce, and the options are still surprisingly good for the next quartile. With that power, it means that the people around you are going to start noticing things and making decisions about them with the information they can now process.
Ideally, this will be a distributed thing, but right now due to the nature of our society, authority of this sort is highly concentrated. But the dam has broken. A total surveillance system for up to a modestly sized city, with realtime tracking and long term data storage, is well within the reach of anyone with $10000USD to spend on hardware. They can self-host it. The banality of this cannot be overstated. It's boring to do this now. It's not new ground. So much so that average people can monitor their homes with it, or know if their friends have gone missing with it.
To some extent, there is just no undoing this. Society will have fewer secrets and those secrets will be much more deliberate, and the only response that can work is to change your attitude.
> There is no person watching you. There is no intelligence judging you. There are a series of conditions in a deterministic system provoked by your actions.
I don't think it's creepy because there's a (theoretical) person watching me, I think it's creepy because they're cataloguing all my actions in a systemic was which pierces the veil of perceived privacy (mostly through anonymity).
> I dunno where you are but I'm in the US which is most definitely not "a market economy" without a whole hell of a lot of qualifiers.
I'm not sure how to respond to this without a specific criticism of how you think it's incorrect. That said, it's somewhat tangential to the point, even if it would be an interesting conversation.
> Roughly as many, I imagine, as folks who realized the shopkeeper could see them enter and leave.
I don't know. If every time I entered my local 7-eleven someone picked up a clipboard, flipped to a specific page, looked back at me, nodded to their self and then marked something on the page, I might decide to go somewhere else, at least most the time. If I knew the info was shared with all the other 7-elevens, and the local grocery chain, and some hardware stores, that makes me want to use all the places less.
> This is just flatly false. I don't know what you're thinking writing this, but it's clearly neglecting copyright and patents. For what it's worth, I think the later is a bad system an the former is in desperate need of reform to sharply limit it.
I said "this" to qualify what I was referring to (personal information) and distinguish it from other types of protected information, of the type you reference.
> To some extent, there is just no undoing this. Society will have fewer secrets and those secrets will be much more deliberate, and the only response that can work is to change your attitude.
I don't think that's the only response that can work. It's the only one that works completely, as deciding to not care is always a solution to caring, if you can pull it off.
The alternative is new laws. Are they perfect? No. Will they solve the problem adequately? Likely not. Do they have a chance of making a positive difference across the board for massive amounts of people by empowering them with regard to their own information? I dunno. Maybe? I think it's worth pushing for though. Otherwise, why do we have minimum wage and labor laws? At some point we could have thrown our hands up and said "screw it" about that stuff, but people pushed for it, and while they aren't perfect, I think we're all better off for them.
I don't believe there will be any perfect solution to this ever, or even a good or acceptable solution all that soon. I do think it's still worth raising my voice over, because I think there are some possible futures that are better than others with regard to privacy and personal information, and I think that's worth pushing towards.
You use something that blocks scripts (like uMatrix) with an aggressive ruleset. On some sites you'll need to allow things to make them work. If they are loading trackers from the same servers that they load content from, you can't do much without wasting more time than you want. I'd say it breaks most of the tracking though.
More sites than you'd expect work without js or with first-party js only. It's annoying when you need to read a news site, because those are usually bloated garbage. Not a huge loss.
This was already with uBlock Origin. Also tried combinations of Ghostery and Privacy badger. All of it made very little difference for panopticlick, and that's probably a low-bar compared to what's common these days.
I don't care if every site that I browse using this VM knows that I'm Mirimir. I don't even try to hide that.
What matter is that my personas using other VMs, through other VPNs or Tor, don't get linked to my meatspace identity, to Mirimir, or to my other personas. And that's doable, I think.
Yes, and you go through quite a lot of effort to achieve that, given your other comment.
My main point is that the amount of effort you have to go through to achieve that is very high, and I wish it was considerable lower. There are technological changes that could help with this, and legal changes that could help with this.
I think a comfortable place would be if you visit the same online location using your main browser using one IP, and a private browsing instance of that same browser on another IP (through a VPN, proxy, or just new public lease), it would be nice if there was some expectation they didn't immediately have a high degree of certainty you were the same individual. For the general populace, this falls on its face.
Tor has quite a few mitigations to help here (e.g. simulated window/screen values), and Firefox has started to adopt some of them, but as mentioned here on HN frequently, Firefox sometimes has problems with CAPTCHAs and certain sites (I haven't had those problems, but I'm also not usually using it through a VPN), and I know Tor is sometimes blocked outright.
The point is that until most these protections (technological and hopefully some legal) are mainstream, completely protecting yourself is a double edged sword, since you also ostracize yourself from some sites and services. Tor is the equivalent of walking around in padded, baggy clothes and a ski-mask. Sometimes, like in the snow, it may seem fairly normal. Other times, like at the beach, it may preserve your privacy, but it's very uncomfortable and may cause people to avoid you, if not outright shun you and run you off. If everyone starts wearing masks and covering their hair, if you do the same you probably have a fairly high degree of anonymity and privacy through it.
In summary, I think Tor is a useful and necessary tool, but nowhere near sufficient for where I think we need to be generally.
> Yes, and you go through quite a lot of effort to achieve that, given your other comment.
That's true. However, it's mostly one-time effort. There are Linux and TrueOS workspace VMs, pfSense VMs as VPN gateways, and Whonix gateway and workspace VMs. All in VirtualBox.
There's ~no configuration required for the Whonix VMs. You just need to point the gateway VM to the pfSense VM that ends the desired nested VPN chain. And if there are multiple Whonix instances, rename the internal network that the gateway and workspace VMs share.
For the Linux and TrueOS workspace VMs, it's just like any OS install. You do have more machines to maintain, but mainly that's just keeping packages up to date. All of the devices are virtual, so you don't have driver issues.
Setting up the pfSense VMs is the hardest part. But once that's done, you can use them for years. pfSense is pretty good about preserving setup for OS upgrades. And there's a webGUI for changing VPN servers. But it's harder than using a custom VPN client.
So yeah, it's not so easy. However, someone could write an app that papered over most of the ugly parts. That even automated VM setup and management.
No, a clean browser and IP with the combination of what fonts I have installed, how my video card renders a canvas and WebGL instance (which may be affected not just by the video card you have, but the driver version used with it), my screen size, and a few other system level items that come through may or may or may not be enough to uniquely identify you. Along with linking to a prior profile if you screw up one time (or load a URL that has identifying information they can use), and you're busted.
So, sure, a clean browser and IP and never logging into a site you're previously visiting might be enough, but who does that, and doesn't that halfway defeat the purpose?
My meatspace identity uses a desktop that hits the Internet directly. It displays no interest in technical matters. Just banking, cards, shopping, general news, etc. It never accesses HN, or any of the other sites that Mirimir uses. Or that any of my other personas use.
Mirimir uses a VM, on a different host machine, and hits the Internet through three VPNs, in a nested chain. Some other personas use different VMs on the same host, connecting through different nested VPN chains. Some are Whonix instances, connecting via Tor, and reaching Tor through nested VPN chains.
So basically, each persona that I want isolated uses a different host machine and/or VM, a different browser, and a different IP address.
I appreciate the information-theoretic validity of your argument, but if you think that one of these firms cares enough about your buying preferences to burn enough compute to find that correlation then you either work for the CIA or are mistaken.
It doesn't take a lot of compute resources to have multiple profiles, and when evidence of a high assurance level (a referring URL that is known to designate a specific user of a major service) to link it with other profiles that also have that designation.
To me, that seems par for the course for any service that's generating profiles of browsing behavior and trying to make any sort of decisions based on it. It reduces cruft and duplicate profiles while also providing more accurate information. Why wouldn't it be done?
> the information-theoretic validity of your argument
The portion about canvas, WebGL and AudtioContext hashing is not theory at all, it's well known practice from years ago. Jest the other day here there was a story about some advertiser on Stack Overflow trying to use the audio hashing to tracking purposes.
Hell, if you get enough identifiable bits of entropy, you can probably assume weak to strong level matching using a bit-level Levenshtein distance that's low enough.
GitHub is always at your disposal. NV doesn’t sell the consumer cards to enterprises. So on AWS a multi-GPU box will cost you about 12 dollars an hour. If you can disambiguate, let’s just say 85% of profiles absent IP or cookies, well I think you just broke the academic SOTA and I’d love to make some calls.
whose results would have you believe that one's footprint is very unique. I'd be interested in hearing more about why this is hard to implement into an efficient process.
> GitHub is always at your disposal. NV doesn’t sell the consumer cards to enterprises. So on AWS a multi-GPU box will cost you about 12 dollars an hour.
I don’t see how this is related to the claim, since it doesn’t solve the problem. But the advertising company that I let run code on my website will certainly do the job pretty well, I’d say.
I was pointing out that it’s a commercially applicable of a very strongly worded claim that I know would be expensive to test because I’m optimizing GPU intensive code at the moment. I don’t know where in this thread I generated so much ill will for trying to add knowledge to the conversation, but I’m not making shit up.
There are tools that will supposedly do this to a high degree of accuracy. Are you saying that they are fake/don't work as well as they'd want us to believe?
How do you stop using a service when you have little or no indication that it does something like this before hand, and afterwards the privacy is already gone?
If I use a site and view my profile page and the url contains aa account id or username and some google or facebook analytics is loaded, or a like button is sitting somewhere, how am I to know that before the page is loaded? What if I'm visiting the site for the first time after it's been added?
It doesn't even matter if I have an account on Google or Facebook, they'll create profiles for me aggregating my data anyway.
> quarantine them to a VPN/incognito interaction
Which does very little. I spent a few hours this morning trying to get a system non-unique on panopticlick, but the canvas and WebGL hashing is enough to dwarf all the other metrics. There are extensions to help with that, but for the purpose I was attempting, were sub-optimal (and the one that seemed to do time-based salting of the hashes wasn't working right).
So, I don't have any confidence that a VPN and incognito really does much at all.