Hacker News new | past | comments | ask | show | jobs | submit login
Fogging your Google search history with Python and Reddit (howlroundmusic.org)
128 points by marcosvpj on Dec 8, 2016 | hide | past | favorite | 50 comments



Reminds me a bit of Marlinspike's Google Sharing: http://www.techrepublic.com/blog/it-security/googlesharing-a...

The concept of hurling garbage into the all-seeing-eye is a good one and I hope it gains sustained popularity.


Posit that the government wants to round up all the pressure cooker searchers. Now, instead of one searcher, they have a conspiracy of a dozen pressure cooker searchers. Are they going to just shrug their shoulders and say oh well?

The idea that it's possible to overflow the governments secret lists is founded on the idea that the government cares if a list gets too long. I'm not sure that's the case.


We also have to consider the (near future?) implications of machine learning on sifting through the "noise" that is generated with tools like this. In fact, by knowing that one is using such a tool, one could envision a scenario where this data adds more information about the user and not less - in particular about everything that one is not actually searching for or what OS the user is actually using. Consider that just clicking on a random search link with a browser automation engine probably doesn't look like a real user browsing without considerable effort on the developers part. Here we arrive at counter ML :)


>The idea that it's possible to overflow the governments secret lists is founded on the idea that the government cares if a list gets too long. I'm not sure that's the case.

It's not just about governments having extensive and precise information on us, it's anyone who might use it (i.e. hackers, insiders) for nefarious purposes.

Whether or not they "care" if the data is superfluous is unimportant, the goal is to make reliably narrowing down a target more time consuming.


Why posit this? It's but one of many possible considerations. My activities have nothing to do with "pressure cookers", nor am I exclusively concerned with "secret lists".


>> The concept of hurling garbage into the all-seeing-eye is a good one and I hope it gains sustained popularity.

So this is a genuine question, because I don't get it. How is that good? What does it accomplish exactly that makes someone safer or more private on any realistic level? I understand the point at a high level, but what does that actually do for a person on a practical level? Like... "I hurl all this garbage at Google and now I'm better off because...." I can't complete that sentence in a meaningful way and I'd like to be able to. I'd like to believe this really makes some meaningful difference, so I'm hoping I'm missing something. (I'm not thinking in terms of "if everyone did it" but rather "If I do this")


Speaking as a data engineer: If your personal data is junk, it limits my ability to derive insights from it, and leverage it. I can't speak in terms of ad-tech/user data selling, I'm in an entirely different space, but if it prevents them from being able to both derive value from and further proliferate my identity as a datapoint, that's both a personal and a commons good from where I'm standing.


I work in a space that's orthogonal to ad retargeting, and from what I can tell after talking to people at the big data brokers, reading their white papers, and reviewing their actual data, I've concluded that their ability to retarget is excellent but they otherwise have a really poor understanding of the targets' characteristics/demographics.


For me, I imagine a scenario like the movie GATTACA where in a future world an employer can base their entire hiring decision on what your genome says about you (theoretically, a lot). Imagine a world where business giants judge you by correlating huge amounts of data about you. "You shop for expensive purses and small dog toys, therefore statistically your personality is x, y, z"


I don't know for sure, but if you try to get a job at certain places where they require you to get clearance, THEY WILL read your search history, emails etc, and that is. Information collected via online surveillance tells much more about you than your genes. Even from genetic standpoint, there is a thing called epigenetics. Environment affect about half the traits of who you are, so knowing what you eat, who you meet, where you go and so on would actually be quite revealing.



>I hurl all this garbage at Google and now I'm better off because....

I'm a more obscure/difficult target for hackers and social engineers.

There is no single IT security solution, and there is no 100% security. But the more layers you have against the many different attack vectors the more secure your identity is. This tool is but one layer.

If you left footprints everywhere you went in meatspace, criminals could use that information to own you.

This is a tool which obfuscates the footprints you leave on the web.


I'll try to explain my own reasoning. While "good" may not be the epitome of eloquence on my part, it could in this case be viewed similarly to the subjective manner in which one might say "That music sounds good", or a proposal that "sounds good" because it achieves something I perceive as in my favor. As a privacy advocate, I resent the collection and exploitation of my personal activities and communications, regardless of TOS, etc. I also don't view people as units of profit or as something to exploit to my advantage through non-cooperative means. I find it very strange that so many are willing to accept from strangers, agencies and corporations what they would aggressively protest from an individual. For myself, the unauthorized collection and use of what I prefer to remain private is, regardless of legalese, intrusive and bothersome. The difficulties involved in avoiding this bother are formidable, and in some ways overwhelming. If I can passively convolute the acquisition of this data, it is "good" because it hinders something hostile to my sense of well being. I should clarify that I don't believe that such data is used exclusively for marketing.

Regarding the sentence you found difficult to complete, I'll try:

"I hurl all this garbage at Google and now I'm better off because...." I am impeding something that I protest. I have no reason to believe that such collected data will be used responsibly, nor am I convinced that if it was, it would be innocuous. Just as I don't want my every thought read aloud, neither do I want my every curiosity, concern or inquiry. To the best of my observations, we are not living in a utopia where altruism and goodwill take precedence over profit and personal interests. Our society scarcely has affordable healthcare. Our society is still a brutal one, where such brutality is often administered based not on one's character, but on their finances. America has a thriving private prison industry with a startling portion of inmates incarcerated for victimless crimes. America freely engages in highly dubious wars that cost many innocent lives. The world is still a work in progress. I believe we all have a lot of work to do on our selves as individuals and a species, and before making careers out of furtively siphoning personal data from our fellow citizens for purposes of exploitation, we ought look in the mirror a bit longer. I don't think that's Google's (nor their partners'/clients') plan, despite the ostensibly "good" things they do. I do not believe humanity has evolved through the virtue of smartphones and petroleum to suddenly within one generation have permanently excoriated the dictators and tyrants of the world from itself. Whether I am in error or not, this data is connected to that which is personal to me, therefore if I can confuse it, it is good.


Well in fact if you check a section of your Google profile (Ads Settings) [0] you can turn off Ads based on your interests which will result it:

  You will still see ads and they may be based on your general location (such as city or state)
  Ads will not be based on data Google has associated with your Google Account, and so may be less relevant
  You will no longer be able to edit your interests
  All the advertising interests associated with your Google Account will be deleted
If you decide to believe them (as you are by using all their other services), that should be enough :)

[0] - https://www.google.com/settings/u/0/ads/authenticated


Note that they never state they aren't still caching and selling your data. They just aren't showing you ads based on it.


Google isn't selling your data. They'd never do that. They make nearly all their money with advertising because they have so much good data to target ads. If they sold this data they'd lose their competitive advantage.

It doesn't matter if Google is good or evil, selling user data to other companies would be a very bad business decision in both cases.


Good point. But selling it or using it in-house, they're still mining it for ad research.


Very useful, but not entirely the same as the the author is demonstrating. What you referenced is just for ads; what he's trying to fog is the order of the actual SERP.


Obligatory Apple Doppelgänger patent link: http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=H...


I think it's pointless.

a) You'd have to keep this up and running forever.

b) You are essentially trying to game a machine learning algorithm, which weights and factors in so many signals, it's impossible to know. If you think you can game this system, SEO should be child's play for you.

c) Google probably has a profile of your personality already (such as the OCEAN model), because it can be derived from few signals such as a bunch of Facebook likes or whatever you type in the search slot. Because personality doesn't really change, your profile is set in stone for the future.

The only slight upside of this is: Google has to use this data about you, reach about to you or sell it to other people who reach out to you. To some extend, an ad blocker goes a long way, because then people can't target you specifically, despite having many information about you. But ultimately, you never know what kind of "content" is brought to you based on your personality and actions. Maybe it'd be even sane to assume that personal interaction with strangers could be motivated by means you don't understand.


a) The requirement of running it forever doesn't make it pointless. b) 1. Not knowing the outcome doesn't mean it's not worth trying. 2. It's not entirely true - SEO is about getting your phrase as close to the initial position as possible in an ecosystem with many contending players, whereas this is about introducing noise in your personal data. c) I beg to differ. When the signals change, why shouldn't the profile change?


Google can easily tell how much time you spend on a page, or when you click links, and if they wanted, even how your mouse moves. If the idea is to hide this information from Google itself, it seems pointless, just don't use Google.


Run a network monitor and visit just about any webpage; you'll see Google IPs and a lot of them. Not using Google is reasonable advice and one could start by at least not having an active account, but I'm not sure how effective that would be. Convolution still seems worth the effort, but perhaps ideally, through more sophisticated methods. Some of the metrics you mention could be randomized.


You can block those with an ad blocker, or at your router / firewall if you are really concerned with them. I know that is not something for a normal person, but neither is this search fogging, so seems like there might be better ways to do it.

Not having an account would likely not help at all. All those companies have profiles about you even without an account. You might get lucky and start out as a few separate profiles, but they will eventually link them together.


Yeah, but most people going to this level already block Analytics, AdWords, etc.


A few years ago I devoted some time to screwing with Amazons recommendation engine. The results were interesting at first. They didn't seem to have much of an idea of the kinds of things I actually wanted to see. After a while I was just presented with dildos as recommendations, perhaps they saw through my schemes. I then stopped the campaign and things returned to "normal".


IIRC Google suspended the accounts of people who did something similar(automated random searches). I think it's against their Terms of Service.


Someone's account was suspended after visiting ruinmysearchhistory, but it was more of an isolated case:

https://news.ycombinator.com/item?id=11880008



Google probably already knows our typing pattern and typing speed. They could probably detect whether you automated it or really typing. It wont be hard for them. "Google knows everything!"


Just because they have that information, doesn't mean they can determine what they need or do it in a scalable and computationally feasible manner.


I agree. I just gave them an idea just in case if people trying to fog their advertising data


Tweak the automated driver to match your usage patterns, then.


Whilst I don't disagree with increasing privacy, as a general solution for everybody into the future this method seems a bit brute force and possibly disruptive. If you only use Google search, and don't use other Google services, aren't you simply an IP address to Google? If so, I don't see the harm in the knowing of search trends by geolocation.

At the end of the day, we all have a price for convenience - I knowingly use Facebook even though I hate the loss of privacy.


One issue is just that unless you work for Google (and maybe even if you do) -- you can't know exactly what kind of data structure(s) you are to Google (or any other service provider, large and small). You could be just an IP address, or an IP address and a User-agent or all of the above and then some other crazy stuff, like a browser canvas profile and operating system and age/race/income/sex... no one on the outside of a black box knows for sure exactly how it operates.


Does this really make sense to do? Unless you form all your google searches as questions, it will be easy to separate real from ELI5. None of my history looks anything like that, so would be completely easy to tell the difference.

Although I would be interested to see what a clean account doing this would look like to google. What kind of ads do they get served?


As a simple rule, whenever I want to search something and minimize polution to my web history, I open the other browser I never use and then go into incognito mode.

As a side note, this reminded me of the "Neo looks for Morpheus" scene (the one just before "Wake up Neo").


If I watch a Youtube video not logged in, in private browsing mode, in a different browser than where I am logged into Google, I end up seeing suggestions that are very much related to what I just watched when I use a logged in Youtube app on another device. And this is on a very shared Internet connection.


In my experience, YT suggests videos related to what I've been seeing with that "session", however I understand how this could be a YMMV situation.


I changed my default search to StartPage, which uses Google under the hood.

But sometimes I can't find what I want in StartPage, so I added an extra button on StartPage to take me to the same search in Google (TamperMonkey script).


Or you could just use DuckDuckGo and an ad blocker.


You need to block Javascript too, and not enable exceptions for Google services like NoScript does. Besides that, Google is probably part of multiple data-sharing alliances that allow Google to track you even when it's not tracking you. Loading "normal" (non-ad) images from a site already leaves your IP footprint on a site, which could share that info with other sites. How much sharing goes on is a mystery. It's possible data collection companies don't know just how much info they really have, but that'd be a good thing for the rest of us, IMO.


> If a story broke, I’d got to Fox news, CBS, Aljazeera, Politico– even fake news sites like The Onion or CNN.

Zing!

> Tkinter- I knew I’d have to enter my gmail username and password, and I didn’t want to hard code it knowing I was going to share my code.

For something like this that's intended to run forever or at an interval via cron or whatever, you're much better off pulling credentials from a config file or environment variables than requiring user interaction, especially via GUI.



Or just use something like startpage.com.


This is pretty much what I did to game Bing's search rewards with multiple users but I used Yahoo Answers instead. Worked well actually


See the TrackMeNot Chrome extension which does the same thing (also supports multiple search engines). I don't know how it affects Google ads though.

https://chrome.google.com/webstore/detail/trackmenot/cgllkjm...


There's a similar extension for ads - https://adnauseam.io/

Works with uBlock Origin and sends a click to everything blocked.


That sounds amazing. And great for website owners, they'll get paid! Any experience with it? Don't want to lose my google account because of that.


I immediately thought of TrackMeNot. I wrote a similar but less complex program(https://github.com/torchhound/chaff). It originally used Google's Web Search API but that became deprecated and I switched to DuckDuckGo. I've been thinking about adding Faroo and Google Custom Search.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: