Noiszy – A Chrome plugin that creates meaningless web tracking data

_wmd · on March 31, 2017

There have been a few of these plugins floating around recently, and really everything that needs said about them appears in the comments already. Fake traffic is wasteful, hard to make look authentic, and only serves to create more records of the end user around the web rather than less (e.g. your laptop IP was generating fake traffic? That probably means you had the lid open and were doing something with it at that time)

I'd much rather see a browser with some kind of built-in distributed cache, something along the lines of FreeNet, but trading perfect anonymity for performance. Given a large chunk of disk space, and a handful of browsers talking to each other in a local area (e.g. same ISP), it should be viable to concoct a scheme where after a handful of browsers request a particular page, the remaining browsers are confident enough that the data cached in their local group is representative of the data sourced from the origin network.

There are a million issues to iron out with a scheme like that, e.g. bad actors injecting crap into the cache, handling staleness, interactions with dynamic content and API endpoints etc., but I think something like this would have a much greater privacy benefit by denying at least some traffic to the origin networks, or simply by keeping some of that traffic within the boundary of the local ISP's network (and if the local ISP is evil, requests between the nodes could be encrypted as in FreeNet).

guelo · on March 31, 2017

That would be a cache of what's popular. That would actually help on the surveillance side because it would effectively be a filter of the common traffic allowing them to focus on your unique traffic.

_wmd · on March 31, 2017

That's an excellent point!

fooker · on April 1, 2017

Unless your traffic goes through someone else. Wait I reinvented Tor!

_lqaf · on March 31, 2017

Not quite what you're describing, but the venerable Squid cache can do distributed caching. Something, at least, to play with with your nerdy friends.

_wmd · on March 31, 2017

It needs to happen in-browser because of the modern prevalence of SSL. I think only Firefox has APIs powerful enough to intercept stuff on this level, and even those APIs are obsolete, so I think it really needs to be integrated straight into a browser fork. Chrome / Firefox WebExtensions have request interception, but I'm not sure it's powerful enough to do this kind of stuff

(Also, the mere fact such a scheme would have to jinx with the mechanics of a fundamental browser security mechanism should be enough to indicate how difficult it would be to implement safely!)

dredmorbius · on April 5, 2017

Or an alternate model / protocol for content retrieval.

Consider that SSL is largely used for connection encryption, though it has the additional side-effect of site authentication.

If you can still validate the site, and rely on a hash of the content to detect changes, then you're starting to get toward a cacheable, secure, autenticated, system.

I need to think more about this.

aianus · on April 1, 2017

The cache/proxy just needs to install its root certificate into your system store and no changes to the browser would be necessary.

_wmd · on April 1, 2017

It's not sufficient due to certificate pinning, and it also completely short-circuits any browser-based security policies keyed on the certificate or options advertised in the protocol like SPDY. That would be the worst possible outcome, and why any correct solution (i.e doesn't break security or functionality) would be hard to reach

detaro · on April 1, 2017

Local root certificates are prioritized by all browsers over certificate pinning, for exactly such intentional MITM.

j_s · on March 31, 2017

Not while SSLKeyLogFile is still around. I keep wondering how long it will last, but it's been a while already.

akerro · on March 31, 2017

Could you please tell more about it? How would that work? Do you have the config?

_lqaf · on March 31, 2017

It can work in multiple ways, depending on your goals. Keep in mind, Squid is infrastructure software, not end-user software. There is no one configuration; it depends on what you want to achieve and your environment.

That said, the web site[1] offers great docs with lots of config samples to start from. If you're starting from scratch, take a look at CARP configuration. Squid also speaks ICP and HTCP.

Squid is one of the rarely-sung heroes of web content delivery.

[1] http://www.squid-cache.org/

ntaylor · on March 31, 2017

Distributed caching is challenging enough to get right when you know the segmentation parameters; inferring these accurately from any old website would be incredibly difficult.

Still, it's a neat idea.

eecc · on March 31, 2017

I'm hacking at an article repository for that, bloody difficult... Maybe I should try with https://github.com/ipfs/ipfs

novaleaf · on March 31, 2017

I didn't get it for the purpose at the time, but my PrivateInternetAccess (PIA) account seems to be getting more useful because of this.

unfortunately a lot of abuse is performed via PIA so some sites (especially financial and forums) block PIA ip's.

ComodoHacker · on March 31, 2017

Static data caching is solved without p2p. Dynamic (personalized) data caching is unsolvable by p2p.

liotier · on March 31, 2017

Distributed spidering for a community-fed search engine index would provide comparable end-user benefits while being considerably more socially useful.

floatboth · on March 31, 2017

http://www.yacy.net/en/

ballenf · on March 31, 2017

Or a local meetup group for subscribers to a compatible ISP (all DOCSIS 3.x cable modems, e.g.) where you bring in your cable modems throw them in a pile and then take home a random one. Do it monthly.

Combine that with a script that takes all non-sensitive browser history and cookies and does the same -- swap out with a random stranger.

Neither is practical and the former would require (at least for my provider) that I go through a sometimes cumbersome process of registering the new device's MAC address.

luhn · on March 31, 2017

How would the former affect web trackers?

andy_ppp · on March 31, 2017

That is an awesome idea - a browser that is also a node in a search engine; it would be possible to build an experimental search engine then without a £40 million + investment to even get started.

r721 · on March 31, 2017

This exists: http://yacy.net/en/index.html

cryptarch · on March 31, 2017

It's a distributed search engine deamon with a nice search + management web frontend but not a browser too?

r721 · on March 31, 2017

I didn't install it myself, but I thought it's possible to interact with that daemon from common browser? Also it's possible to start custom web crawls?

Cyph0n · on March 31, 2017

Or maybe even a crawling project like Common Crawl.

cauterized · on March 31, 2017

What an awesome idea!

cfarre · on March 31, 2017

I agree

rishabhparikh · on March 31, 2017

You win buzzword bingo

jdormit · on March 31, 2017

How else would you describe the system liotier is talking about?

striking · on March 31, 2017

"Littered with buzzwords" and "interesting" are not necessarily mutually exclusive...

logicallee · on March 31, 2017

"Peer-to-peer web spidering for a crowdsourced search engine would create socially responsible end-user value."

watty · on March 31, 2017

It's kind of funny their website includes TypeKit, SquareSpace, and Google Tag Manager scripts - all of which can (and probably are) tracking various things. They may not be connected to "you", may be anonymized, may be unique number representing you.

noiszytech · on March 31, 2017

Hi - Noiszy creator here. You're right about the scripts - we're using GTM and Google Analytics. I think it's 100% legit for sites (like mine) to want to know how much traffic they're getting - this to me is a good use of data. I work in the analytics field and actually I love data - I just don't like AI using data in creepy ways. The thing is, there is a big moral grey area between "count visits" and "creepy targeting based on everything you do". My hope is that by disrupting the data, we can make the conversation happen about the right things to do with algorithms. I don't have the answers but I hope to be a part of the discussion.

_2x8f · on March 31, 2017

> I think it's 100% legit for sites (like mine) to want to know how much traffic they're getting - this to me is a good use of data.

As others have pointed out, it's not about you getting the data, it's about the Analytics services you're using getting the data.

I'll also point out that if all you want is to know how much traffic you're getting, you can do it with far more ease by simply looking at your server logs. Why server-side analytics aren't used more I have no idea.

spenczar5 · on March 31, 2017

Your website appears to also be sending data to facebook with App ID 314192535267336. This might be some squarespace thing. It qualifies as "creepy" to me that you're tracking more than you even know, and it's connected back to my facebook identity!

watty · on March 31, 2017

I get it. Everyone, everywhere, is collecting data - whether it's CNN.com or Noiszy.com or Google Analytics. While you may be using Google Analytics for basic tracking of anonymous visitors, Google may be using it for "creepy targeting based on everything you do".

I'm really not opinionated (I think the plugin is interesting and am completely ok with being tracked) I just found it a bit hypocritical to create a product against tracking while simultaneously providing tracking data to Google.

noiszytech · on March 31, 2017

I think of it as a product to deter AI processing of your data, not a product against data itself. I, too, am ok with being tracked - but being ok with data collection =/= blanket approval of all data processing & use.

Thanks for raising this point.

fenwick67 · on March 31, 2017

Google Analytics most certainly does feed user data into "an AI" and use it to track users and sell ads targeting them. I'm not sure what you're getting at here.

delroth · on March 31, 2017

Does Google Analytics' privacy policy even allow Google to do that?

fenwick67 · on March 31, 2017

They are certainly allowed to according to their TOS: https://www.google.com/analytics/terms/us.html

"Google and its wholly owned subsidiaries may retain and use, subject to the terms of its privacy policy (located at https://www.google.com/policies/privacy/), information collected in Your use of the Service"

Also read: https://www.google.com/policies/privacy/partners/

troyvit · on March 31, 2017

Check out Piwik (https://piwik.org/)! It'll do what you need but the data at least stays on your server.

andypants · on March 31, 2017

It's not about your own intentions. You are using third party services which track and collect data for their own purposes.

srrr · on March 31, 2017

The problem is, Noiszy is corrupting your own web analytics data and the analytics data of all other participants in the web while doing fake visits.

This is all or nothing. You can not separate the "good" use of data vs the "bad" use. It's all the same data. Only the usage is different.

noiszytech · on March 31, 2017

You're right that it's the same data, but I think we absolutely must address how to separate the "good" use of data vs the "bad" use. Especially since, for most people, the data is already out there and there's no way to roll that back. I hope we're going to be talking a lot more about the ethics and morality of algorithm development.

codydh · on March 31, 2017

I don't mean to be a curmudgeon, but why put out this kind of privacy-minded plugin for Chrome, one of the browsers I'd probably least trust to respect my privacy?

gingerbread-man · on March 31, 2017

Google has actually blocked similar "noise generation" apps from the Chrome Store in the past. As I recall, there were a few extensions out there that would 'click' on every ad displayed. Obviously that would have created serious problems for Google, so they nixed it.

wincy · on March 31, 2017

Ad Nauseam does this. It was removed from the chrome web store and I had to install it as a developer plugin, so it nags me to disable it as it could "damage my computer" every time I restart Chrome.

It estimates since I installed it that I've cost advertisers $460 in clicks.

https://adnauseam.io/

literallycancer · on March 31, 2017

https://adnauseam.io/

I wasn't convinced it was very useful and thought it would only take some easy heuristics to filter out when I first saw it, but if Google removed it from the store, it must have been a pain to deal with.

Maybe I'll even end up using it.

always_good · on March 31, 2017

This website reminds me of the KenM post where he suggests that he's getting back at Papa Johns by not tipping the pizza delivery driver.

Generating click fraud on your favorite web comic's website or other small websites you may be visiting is pretty sad. Adsense bans for life.

literallycancer · on March 31, 2017

I don't read webcomics, and somehow I don't think people like gwern who write long-form content do it for the ad revenue. And even if there are ads on the website, if they respect the DNT header, AdNauseam won't auto-click them.

(I'm not sure what the tipping anecdote is supposed to mean, but I don't understand why US customers put up with tipping either, the business is responsible for it's operating budget, and there is no legitimate reason to bother the customer with it.)

thrill · on March 31, 2017

How do you determine if a server is respecting a DNT header?

dexterdog · on March 31, 2017

You don't.

obmelvin · on March 31, 2017

I believe that is GPs point.

mrkgnao · on March 31, 2017

> webcomics

https://www.reddit.com/r/KenM/

mynewtb · on March 31, 2017

Selling your users' privacy is also pretty sad.

XzetaU8 · on March 31, 2017

"TrackMeNot" from the same ppl who made AdNauseam but instead of clicking on ads it creates random queries on various search engines

https://cs.nyu.edu/trackmenot/

voidmain0001 · on March 31, 2017

As an Ad-nauseaum user my beef with it is that certain web pages will peg the CPU at close to 90%. Otherwise it seems to be good at juggling system resources.

untog · on March 31, 2017

Presumably because Chrome has a dominant market share. Meet people where they are.

aleeds · on March 31, 2017

Different levels of trust. Some might feel that Google has established a baseline level of trust, while companies like ATT and Verizon have not.

noiszytech · on March 31, 2017

Hi, Noiszy creator here. You're totally right - Firefox is next!

siegecraft · on March 31, 2017

Put it where it will have the most impact / get adopted, then that can cultivate a desire to have it for other browsers, and someone may get motivated to write those versions.

supergreg · on March 31, 2017

Why would you need it in a browser that you trust to respect your privacy?

onychomys · on March 31, 2017

Because it's designed to frustrate your ISP. The problem with running it in Chrome is that google is probably already grabbing everything you do in the browser, so who cares if Comcast is too?

sneak · on March 31, 2017

If you have specific privacy complaints about Chrome, spill it. Content-free finger pointing like this isn't constructive.

_phaq · on April 2, 2017

Chrome is closed-source, so being specific isn't going to be possible. There are several specific privacy complaints about Chromium, though: https://github.com/Eloston/ungoogled-chromium/

literallycancer · on March 31, 2017

Here's one: why isn't there a feature to export settings without using a Google account?

Ajedi32 · on March 31, 2017

Bookmarks can be exported via the Bookmarks Manager, and all other Chrome-specific data you can just copy from `%LOCALAPPDATA%/Google/Chrome/User Data`.

mygo · on April 1, 2017

having to manually copy a directory from disk is not what I'd call "feature"

slifty · on March 31, 2017

Hello all! Recent creator of a similar tool that has been getting a lot of buzz, I am here to throw some constructive thoughts out there!

These ideas are useless from a technical level (for all the reasons that have been mentioned already.)

Where they are useful is at a social level. People are energized and ready to fight. Many of them didn't know about this issue. Many of them didn't know that there are things they can do as individuals to fight back. Your tool (and mine) are getting attention because they open eyes and tap into pain.

As useless as noise might be, people understand the idea and that makes it accessible. That means people will try it, get it, and share it.

We need to leverage that attention in order to teach those people things they need to understand about privacy. Our tools should be seen as a gateway into impactful approaches like Tor, VPN, HTTPS Everywhere, Privacy Badger, and the EFF at large.

Tooting my own horn: that's what I've been doing with https://slifty.github.io/internet_noise/index.html

In all interviews I make sure to explain that while this is an amusing form or protest, it is not effective and people who care need to go take the steps outlined on the project page.

A website can do this. A chrome plugin, however, risks being harmful with minimal benefit. It minimizes the potential for communication to your audience, it is also harder to access which means you are touching a more narrow audience.

Here's the good news! The project I linked to is open source -- https://github.com/slifty/internet_noise/ -- you could contribute to it directly and then update your plugin so that instead of generating noise and hijacking their browser information you just direct them to the website version of the concept.

noiszytech · on March 31, 2017

Hey there - I like your project. Sounds like our tools are complementary, and you make an interesting point about the browser-vs-plugin based approach. I do think it's worthwhile to specifically browse news sites - I believe they're the worst "filter bubble" offenders right now, and this can help break that. The more efforts there are in this space, the better. Thanks for sharing your project!

marvinkennis · on March 31, 2017

I created something similar yesterday afternoon [0]. Instead of opening a new tab, it just requests pages through a hidden iFrame and drops the headers so the request goes through.

It doesn't click on anything, because it would be awkward if this by chance started sharing things on logged-in Facebook profiles etc. I plan on adding sequential requests over the weekend so the traffic is more realistic.

It's open source and on Github, so you can download, install and modify it from there if you wish [1].

[0] https://chrome.google.com/webstore/detail/decoy-requests/aeh...

[1] https://github.com/marvinkennis/Decoy-Requests

noiszytech · on March 31, 2017

Cool! I really think this is a "the more the merrier" space. More tools = good.

jstapels · on March 31, 2017

Not to sound pessimistic, but as more and more people get forced into metered bandwidth, how is using a plugin that generates extra random traffic "sticking it to the 'man'"?

Edit: Yes, I know this is supposed to mask your actual viewing habits. But security through obscurity has never really panned out for anyone in the end.

dullgiulio · on March 31, 2017

This is not security through obscurity, this is obfuscation. Like everything in security, it's not all-or-nothing. It adds valuable noise, it won't hide your tracks.

arekkas · on March 31, 2017

Adding variance always messes with statistical models and thus anything related to big data analytics, which internet surveillance is all about.

threepipeproblm · on March 31, 2017

Is 'it uses bandwidth' really a valid criticism of this? I would say no.

It's also not really "security by obscurity", which I believe refers to situations where techniques uses to secure a system are kept quiet in the hopes no one will figure them out. Here we have a system that is already breached and the point is to defeat analysis. Most analytic techniques I can think of would be defeated by the right kind of "noise", despite the fact that there may awareness that the noise is in there.

russdpale · on March 31, 2017

Individually it doesn't do much. The bigger point is the render the data as useless.

amelius · on March 31, 2017

Will this not be counterproductive, i.e. distribute your personal data to even more websites?

By the way, I once got blocked by Google after installing a plugin that did automatic random searches in the background.

agumonkey · on March 31, 2017

Maybe install it in a different profile ?

amelius · on March 31, 2017

IIRC, my IP address got blocked for some time.

agumonkey · on March 31, 2017

Yes, Google can still monitor IP behavior, but you don't leak cookies for instance.

bognition · on March 31, 2017

> Read and change all your data on the websites you visit

While I applaud the authors for trying to solve a problem, I will not install this plugin unless its open source. There is no way I'm going to grant the above permission to some random plugin just because they tell a compelling story.

tyingq · on March 31, 2017

It doesn't imply any particular licensing, but it is trivial to view the source of a chrome extension.

You can do it with curl or similar to download it, but there's a chrome extension that does the work for you: https://chrome.google.com/webstore/detail/chrome-extension-s...

TjWallas · on March 31, 2017

Same here with a tiny twist: opensource and has a SHA512 mentioned somewhere in the repo that could be verified against whatever comes from the chrome appstore.

throwaway2048 · on March 31, 2017

Would be meaningless because the chrome store auto-updates addons.

TjWallas · on March 31, 2017

Questionable action but, one could freeze extensions via extended file system attributes & permissions or work around the auto-updates through the hosts file or other means... https://serverfault.com/questions/354606/where-do-i-find-the...

.. Or even a combination of the above.

sandmarq · on March 31, 2017

I have to agree with you, an open source solution will be better, but at least it's a start.

soared · on March 31, 2017

IMO this is taking the wrong approach. Why spread your data accross more site, when the problem is the individual sites you visit - not random ones you don't? This (dead) project of mine sends similar data to the site you are currently on - but instead of noise it is purposely malicious and will ruin the website owners tracking abilities if used at scale.

https://hello-kill.github.io

glenneroo · on March 31, 2017

I've been thinking about building something similar since the first Snowden leaks. I figured encrypted traffic to various locations would be useful considering that security agencies store everything until they can decrypt it at a later date. Unfortunately I'm not well-versed enough in implementing proper encryption and I'd probably just end up shooting myself in the foot.

Has anyone else ever thought about doing this?

arekkas · on March 31, 2017

https://news.ycombinator.com/item?id=14003270 :)

d--b · on March 31, 2017

https://news.ycombinator.com/item?id=13643873 :)

ajuc · on March 31, 2017

Using this will raise some flags (your online behaviour will be different from most).

_7siz · on March 31, 2017

Somewhat related:

A long time ago I tried to delete my Facebook and realized it only deactivates until next login. You have to specifically request that they permanently delete it or something. And, it's not even clear if FB still doesn't store your info or not after that whole process (little sketch to be fair...).

So I came up with this idea that I'd make a service called socialfacewash.com that just completely trashes your digital profile (liking random things, changing your info, and just basically obfuscating what FB thinks they know about you).

I never built it though, but I kind of wish I did. Still own domain if someone wants it.

Trend seems to be that we're not going to have protections over our own digital privacy/data for a very long time. Maybe a service that could at least mask, lie, trash, obscure our footprint for everyone else would be nice to have.

bartkappenburg · on March 31, 2017

Someone already created that: http://suicidemachine.org/

The even got a cease and desist letter from FB.

yarapavan · on March 31, 2017

Here is the author's guest post providing background for this plugin - https://mathbabe.org/2017/03/31/guest-post-make-your-browsin...

akerro · on March 31, 2017

https://github.com/dhowe/AdNauseam does the same but for ad-tracking. It hides ads from webbrowser but clicks them in background (just sends request to tracking server that user clicked the ad).

neuralFatigue · on March 31, 2017

I think Google removed it from the Chrome Web Store recently. Bummer.

akerro · on March 31, 2017

so at least we know it works!

JustSomeNobody · on March 31, 2017

So, it's going to be noise among a patterned world. How will they not be able to filter this out?

People are creatures of habit. Example: I read HN with my coffee every morning. Interjecting meaningless data doesn't prevent them from finding my patterns.

always_good · on March 31, 2017

Reminds me of a comment I once saw on HN where someone suggested they avoided timing attacks by adding a random sleep.

disiplus · on March 31, 2017

so we need a plugin that creates a patterns. could we crowd siurce a patterns also. lets say i share my pattern with everybody and you share yours.

JustSomeNobody · on March 31, 2017

I bet the ML guys and gals would be able to suss out my pattern from a fake pattern.

snerbles · on March 31, 2017

Start an arms race, use ML to generate fake traffic.

Johnny555 · on March 31, 2017

I'm not sure I understand the point of this -- marketers will just discard this random data that doesn't match a human access pattern.

If this anonymized my tracking data for sites I visited, that would be useful, but sending a bunch of random hits doesn't seem like it will keep anyone from tracking my activity which is what I want to shield.

Animats · on March 31, 2017

It generates meaningless tracking data by doing meaningless browsing automatically. So it sucks bandwidth. That's not a good solution.

Returning fake responses to tracking cookies is more efficient. For phones, returning bogus info to app requests for contact lists and location is especially effective.

noiszytech · on March 31, 2017

It does suck some bandwidth - and your point is well taken, that would be a great technique too. It loads about one new page per minute, so it shouldn't be a crazy amount (unless pages contain video, for example). The thing is, anything less than visiting from your own browser on your own machine and doing user-like things (like waiting and then clicking again) is easily filter-out-able for most tracking tools. The idea here is to purposefully associate the "noise" with your online footprint.

fjdlwlv · on March 31, 2017

It sucks far less bandwidth than advertising traffic or video content

troyvit · on March 31, 2017

This begs a question: how do ISPs collect this data? I thought it was intercepting and analyzing http requests as they come from the cable modem. Are you saying that rather it's at the browser level? If so how do they control that?

justshashank · on March 31, 2017

I have been working on something similar last night. Hack it and make some noise. No strings attached, kill it once you think its obsolete.

https://github.com/shashanksurya/HaughtyDog

ddevault · on March 31, 2017

This would probably be better as a daemon, not a browser plugin.

arekkas · on March 31, 2017

This is exactly what I had in mind when Snowden happened. I never wanted to take this on, because of multiple reasons. Very awesome to see this, installed it immediately.

Insanity · on March 31, 2017

Maybe I am being too naive, but could someone explain to me why just using a VPN would not be good enough to hide your traffic from these companies / governments?

I understand that perhaps they _could_ ask the VPN provider to give logs (depending on the provider, if they keep logs or not). But that would not work for the algorithms used to target you as an individual.

Could someone please explain that to me? :-)

jehna1 · on March 31, 2017

VPN commonly only hides your IP for the time you're using it. There are many other factors that can be used to track you.

First is cookies. Say you log in to Facebook with your browser from your own IP address and a hour later you fire up VPN to browse the web. One of the websites you load has Facebook's tracking pixel and - unless you cleared your cookies - boom: Facebook knows you have visited that site.

Even if you clear your cookies, you may have some long-living stuff (like Flash cookies etc) that can be used to track your browser.

And even if you clear everything or use incognito, sites can use some clever heuristics (CPU power, enabled plugins, timezone, browser version, webrtc, etc.) to track your browsing and to match that it's you.

Insanity · on March 31, 2017

Well, for the first part, just restrict internet access without turning the VPN on. (For me, I can not access any website when the VPN is not active).

The last part of which you wrote is incredibly interesting. I had no idea they could find out the CPU power or the enabled plugins.

Though, to be fair, I think I'd fall in a pretty large pond to match. And either way, that information does not seem very useful (to me) for targetted advertisements.

Thank you for your answer :-)

programd · on March 31, 2017

The EFF lets you see all the fun ways you can be tracked by your browser fingerprint:

https://panopticlick.eff.org/

Neliquat · on March 31, 2017

And yet that is a tiny fraction of fingerprinting tech. Many devs and companies roll their own as well. It is hard to anticipate or defend against all, or even most.

iask · on March 31, 2017

An interesting way to look at this, but wouldn't it be possible to pluck this pattern out from relevant data?

danbruc · on March 31, 2017

It most likely is because it is incredible hard to create realistic fake traffic. The moment you start using such a tool is probably very easy to detect due to a sudden jump in traffic. Then you can use prior data to figure out what you were actually doing - sites you visit, times you are active times, how you switch between sites, how long you stay on sites, and so on. Randomly following links will show very different distributions.

You have to fake your browsing behavior with pretty high accuracy but if you do that, you defeat the entire purpose, you just created a copy of your browsing behavior. Maybe if there is no prior knowledge of your browsing behavior and the fake traffic is human-like enough, but even in that case it seems not too unlikely that one could separate the two traffic sources. If you want to hide your browsing behavior, the fake traffic must be different from yours. But if there is a difference, the difference can potentially be exploited to separate the sources.

jerf · on March 31, 2017

If I were writing this, I would try to get it into as many hands as possible, and do some sort of backfeeding of data to generate "real" profiles of user activities, then send back down suitably randomized profiles of other real users that you use for some period of time.

It's important to conceive of this as an arms race, and I think one of the underappreciated things about arms races is that if you successfully anticipate the next three things your opponent will do and counter them, you can end up discouraging them from even fighting in the arms race. I don't know how random the page visits are, but if they are random in, well, almost any sense, eventually they will be something the trackers can filter out. "Well, this person shows a really strong signal for football, and weak signals for anime, robotics, accounting, and a hundred other things. Show them the football ads." Even today that wouldn't necessarily pose much of a challenge.

By contrast, if you get profiles from volunteers that are real, and then start mixing them up so that everybody downloading this extension shows ten or fifty equally strong signals of interest, of which only one is real, that'll scramble the data being collected something fierce, and require the data collectors to jump straight to very sophisticated teasing out of what's really true, which initially won't even be worth it until a lot more people are using this stuff. There's a lot of elaborations you can think of from there (temporal correlations, i.e., college basketball interest should be spiking now, vs. football), etc.

I have no idea what this extension is doing, because I can't seem to find any data about what it is doing, so maybe it's doing some of this stuff, but I expect they'd be talking about at least the data they'll need to collect to make this really work if it was going to work this way, so I assume without proof that it is not this.

In general, it's worth pointing out that a lot of systems can't be fooled by uniformly random data, because all real-world systems already have to be able to filter that out because all real-world systems experience noise, of which at least a significant component is probably more-or-less uniformly random. If you really want to scramble a system you need to be more clever.

noiszytech · on March 31, 2017

This is an awesome (and thorough) response and a great idea. I totally agree about the arms race and how AI + crowdsourced data could be applied to create much more realistic fake viewing patterns. I'm super-glad that this conversation is taking place.

But, wouldn't collecting viewing habits and then using AI to define (and emulate) real-looking behavior immediately put the developer(s) in that moral grey area that so many algorithms occupy? Technically it could be done, and it would be fascinating to work on, but we'd have to start with a huge browsing dataset (creepy) and then process it to figure out the patterns (exactly what this tool is trying to subvert), and then feed that back as output from within the user's browser (probably feeding back indistinguishable-but-AI-driven data and creating a loop). It's a murky space to wade into, and one that needs a lot more conversation.

Instead I decided to just keep it simple. The first page is chosen randomly from a list of (user-approved) sites. A link on that page is chosen randomly from the list of links that open in the same window and point to the same domain, and that's clicked. That's repeated a somewhat-random number of times, usually about 2-7 times, before a new site is chosen from the user-approved list, and the process starts over.

Check out Cathy O'Neil's definition of "Weapons of Math Destruction" (good overview of her book here: http://money.cnn.com/2016/09/06/technology/weapons-of-math-d...) - I'd love to hear your thoughts on that framework for determining the morality of algorithms.

danbruc · on March 31, 2017

By contrast, if you get profiles from volunteers that are real, and then start mixing them up so that everybody downloading this extension shows ten or fifty equally strong signals of interest, of which only one is real, that'll scramble the data being collected something fierce [...]

This would also increase your traffic ten or fifty fold in a first approximation. You would make a single household look like a family with several people sharing the Internet connection. Netflix for example can separate several people using the same account, I read about that during the Netflix Prize [1].

But overall I agree with your point, you probably can throw a wrench into the machinery maybe even to the point that it is not worth to pursue tracking any longer. But it certainly is not easy and randomly clicking on links will definitely not do the trick.

[1] https://en.wikipedia.org/wiki/Netflix_Prize

jerf · on March 31, 2017

"This would also increase your traffic ten or fifty fold in a first approximation."

I suspect if you worked at it you could mathematically prove that will be a necessary condition for any effective data collection scrambling technique, under an assumption that we can't use proxying to other nodes. And I do mean "mathematically" fully literally.

I'm eliminating the possibility of proxy, where you try to set up a situation where you create a P2P network and trade page views around, because I think that only works with a really abstract view of how the tracking works. In practice, as soon as you want to use a site in an authenticated fashion you're getting tracked via that authenticated account, so I think I can argue the only real possibility for scrambling the data is for it to source from the same network location as the real data you are generating.

On that note, it occurs to me this plugin probably ought to be automatically creating authenticated accounts on services you don't care about; the authenticated status of an account creates a shining signal that may be too bright to mask. Considering that a lot of the data you're trying to mask would be coming from Facebook, that would be a problem.

And now that I think of that, use of this plugin really ought to result in your Facebook profile getting pretty badly scrambled, too.

Man, this is a big challenge. There's a part of me that is actually quite sad I can't drop everything and start going to work on this right now for pay. This sounds wicked fun, but way bigger than I could ever dream to take on.

fjdlwlv · on March 31, 2017

Yes, let's fight the sale of our browsing data to build profiles by... Giving away our browsing data to build profiles.

state_less · on March 31, 2017

I was thinking the same thing, real sessions might be separable from fake.

This smells like an adversarial problem, where one AI makes fake traffic and one tries to detect it. Perhaps the fake session creation bot could mingle on the same sites as the real sessions to make it harder to distinguish.

nicc · on March 31, 2017

Excuse the ignorance, but are we safe in Europe..? Are we only talking about American ISPs?

What if an American ISP had a branch in Europe, could they sell our data, or the history must have been generated on American soil, or something like that?

edem · on March 31, 2017

How is this supposed to work? If I click on `Start` it loads a page then nothing happens. I have to click `Start` repeatedly to make it visit new pages despite that it says it is `Running`.

ComodoHacker · on March 31, 2017

I believe random noise can be filtered out easily with AI. To be effective, this gonna be AI vs AI arms race, much like AV vs malware.

kleer001 · on March 31, 2017

No need to bring a grenade to a fist fight.

If the overall history data is aggregated you won't really need to filter out the noise. Random browsing will just disappear under the threshold of noise and the 'real' browsing will stand out.

MR4D · on April 2, 2017

How about a plugin that routes any request to a sdserver thru Tor?

That would really screw up things for any location tracking.

xyz-x · on March 31, 2017

Is this plugin available for Firefox?

IgorPartola · on March 31, 2017

What exactly is the point? If the FBI suspects you in a case of the international heist of carrots and your search history includes "consealed carrot transportation" and "circumventing carrot museum security", it doesn't matter what else you googled. It only matters that this is included in it.

Edit: it was a 24-carrot job.

chasingtheflow · on March 31, 2017

I don't think it's about hiding from FBI inquiries but obscuring the value of data that ISPs, Google, Facebook, et al are collecting and selling without your consent.

accountface · on March 31, 2017

Doesn't seem like the best idea when it comes to environmental sustainability.

monochromatic · on March 31, 2017

What does one have to do with the other?

accountface · on March 31, 2017

Bandwidth has a physical cost

monochromatic · on March 31, 2017

I assume it's de minimis, but do you have a reason to think otherwise?

dullgiulio · on March 31, 2017

Very marginal effect, especially if you factor in the impact of Bitcoin and the other crypto currencies.

grandalf · on March 31, 2017

I had an idea a few years ago to make a chrome plugin that would send encrypted emails (containing a randomly generated message) to all sorts of Muslim country email addresses, helping to reduce the fruitfulness of encryption circumvention and surveillance.

lowonkarma · on March 31, 2017

Finally a DDOS solution for the Chrome browser

En_gr_Student · on March 31, 2017

radar detector detectors ... they are going to eventually happen here. sadly.

apahwa · on March 31, 2017

this needs to be open source. this plugin could be very dangerous to install

akirayamaoka · on March 31, 2017

Traffic filters easily remove any plugin attempts to make a noise.

cfarre · on March 31, 2017

Seems a nice idea

doodpants · on March 31, 2017

Turns out it's only for Chrome, and the front page doesn't even mention this. Could we at least change the HN title from "A browser plugin..." to "A Chrome plugin..."?

jchw · on March 31, 2017

It's hosted on Chrome Web Store, but otherwise it could very well work just fine in Edge, Opera and even possibly Firefox. They're all converging toward supporting WebExtensions, a mostly-Chrome compatible API.

throwaway2048 · on March 31, 2017

Could? or does?

jchw · on March 31, 2017

Didn't try. It almost definitely will work on Opera since Opera is based on Chrome and there's ways to install straight from the Chrome App Store. If the author released a direct download, it'd be easy to test in Firefox and Edge.

sctb · on March 31, 2017

OK, we've made that update to the submission title.

chasingtheflow · on March 31, 2017

@stevespang they don't force you to. There's a button labeled "get the plugin" that takes you to the chrome plug in store. Then below that there's a for labelled "Connect" which allows you to subscribe to their mailing list.

siegecraft · on March 31, 2017

It's a nice marketing stunt -- not meant to be perjorative, I hope it inspires more innovation in this space. But I find it ironic that a plugin that purports to make it harder to track you has google analytics.

TeMPOraL · on March 31, 2017

Dogfooding, I guess :D.

romanovcode · on March 31, 2017

inb4 it gets removed from Chrome Web Store!

mtkd · on March 31, 2017

"creates meaningless web tracking data"

or

creates a massive botnet which will eventually be used for some nefarious PPC scam or worse

you decide ...