It seems like the amount it will hurt Google is directly proportional to the amount it will hurt the owner of the site (assuming they want people to read their message).
I'm sure someone at Google is pretty happy that they don't have to show this page in their search results. Nobody can accuse them of bias against anti-Google pages -- the site owner did it to themselves.
Seems like as perfect an example of "cutting off nose to spite face" as I can imagine. (ok, refusing the vax and dying of COVID to get back at the left might be a better example, but this one is close)
The way to create real change is to come up with viable alternatives. I use DDG as my primary search engine but still find myself using Google on a daily basis using "!g".
Rarely do principles alone keep people using inferior products or technologies.
DId you just compare boycotting Google with not getting vaxed? I used to vote left. But reading people like you arguing like this reminds me to never ever do so again.
Interesting. That just means I can make you vote any way I want by making the dumbest argument for the other side. You advertise a strategy that guarantees that you will always cast a manipulated vote. Entertaining.
You seem to imply that not voting for the left automatically means voting for the right. May I remind you that there is another option you probably forgot about? Did you ever hear about the concept of abstaining from voting?
That is also a desired outcome. If I desire a green outcome, I only have to make blues stop voting. In multi-party system if I can make reds and yellows also stop voting it’s great. Fortunately, they will all comply even if I tell them I’m doing this. In fact, they (and you) will particularly comply when told for fear of the shame of backing down.
This pride is particularly exploitable and is why you can orchestrate all sorts of outcomes among demographics that have this vulnerability.
In a two-party system (a necessary de facto in a single-member-district plurality system), failure to vote for a candidate is mathematically identical with a vote for the opposing party.
That's nonsense. By that logic both sides can claim "They didn't vote FOR us so essentially the voted for the other side", which means that by not voting you voted for both.
Oh hey I thought I was the only one. lucb1e.com and another site are also not indexed, though I blocked it based on the user agent string. That way it doesn't get page data or non-HTML files from my server. I introduced this when they were pulling this AMP thing: https://lucb1e.com/?p=post&id=130 It personally doesn't impact me, but it impacts other people on the internet and I figured it was the only thing I can do to try and diversify this market (since I myself already switched to another search engine).
There are zero other restrictions on my site. Use any search engine other than google. Or don't, up to you.
That’s a good idea,but google sometime crawls without the google user agent. So that’s not going to be 100 percent foolproof.
You’d be better off just blocking all of the ip addresses that google crawls from. There are lists of those out there.
When I used to cloak website content, and only serve up certain content to google, the only reliable way was to use ip cloaking. Because google crawls using “partners”, such as using Comcast IPs.
So if you’re want to really get your site out of the index, serve up the page with a noindex tag or noindex in the server header based on google ip addresses.
We don't use our hardware located on partner networks to do indexing. Those machines are searching for malware and serving some YouTube videos and Play Store downloads.
Hi, author here. Google stopped supporting robots.txt [edit: as a way to fully remove your site] a few years ago, so these meta tags are now the recommended way of keeping their crawler at bay: https://developers.google.com/search/blog/2019/07/a-note-on-...
To be clear, stopped supporting robots.txt noindex a few years ago.
Combined with the fact that Google might list your site [based only on third-party links][1], robots.txt isn't an effective way to remove your site from Google's results.
"If other pages point to your page with descriptive text, Google could still index the URL without visiting the page. If you want to block your page from search results, use another method such as password protection or noindex. "
>noindex in robots meta tags: Supported both in the HTTP response headers and in HTML, the noindex directive is the most effective way to remove URLs from the index when crawling is allowed.
“ For those of you who relied on the noindex indexing directive in the robots.txt file, which controls crawling, there are a number of alternative options:”
The first option is the meta tag. It does mention an alternative directive for robots.txt, however.
> You can prevent a page from appearing in Google Search by including a noindex meta tag in the page's HTML code, or by returning a noindex header in the HTTP response.
I mean technically that says that your site won't appear in search results, not that your site won't be used to profile people, determine other site ratings based on your site's content etc.
they won't show your site's content, but that doesn't mean they won't use your site's content.
I thought that (i.e. removing the site from google search) was the goal.
I'd review the other usage on a case by case basis; e.g. determining ratings of other sites seems fair use to me. I'd guess you're allowing others to use your site's content when you're making your site public (TINLA).
Google will still index the url even if you block them from crawling the page via robots.txt. They will index the lane, and it can still rank well. Google just puts up a message in the results saying they’re not allowed to crawl the page.
The way to check googlebot (in a way that will be resistant to expansion of Googlebot's IP ranges in the future) is to perform hostname lookup, with dns lookup as well to verify that the rDNS isn't a lie: https://developers.google.com/search/docs/advanced/crawling/...
Not a very hard problem; after all, many websites allow full access to Googlebot IP ranges yet show a paywall to everyone else (including competing search engines).
I also happen to ban Google ranges on multiple less-public sites specially since they completely ignore robots.txt and crawl-delay.
Is that how archive.vn works? I've always wondered how they are able to get the full text of paywalled sites like Wall Street Journal who give 0 free articles per month.
Alternatively just use EFF's privacy badger and duckduckgo to stop feeding the beast?
Those are active steps you can take - I am not convinced a few metatags will stop Google spidering your site (even if it is invisible in results), and is of questionable value if you are still using Google search and not blocking their scripts.
A more interesting issue is the opposite -- many large sites have robots.txt rules that Disallow all crawlers except Google. A new search engine either 100% respects robots.txt with the result that some major properties are completely unavailable in their index, ignore robots.txt in these special cases where robots.txt configuration is unreasonable, or- crawl anything that allows Google to crawl it. None of these options are great.
I once read it's about bandwidth. Presumably, search engines require lots of BW to serve, and some sites don't want this cost. They make an exception for Google because they want to be on Google.
Replying to Google bots with errors will keep you out of the index, though they may keep retrying forever in case the page starts working again. A lot of times when I am retiring a site I will look at traffic logs and it will all be Google or Bing requesting content that is gone - sometimes years after the content was taken down. I think it is just greed on their part, forever hoping that if content was there before it might appear there again. No telling how much bandwidth gets consumed by that sort of traffic each year. If you are terminating traffic directly it can be really interesting to see all the connections that get opened but never receive an http request before being closed. A lot of that seems to be broken browser plugins, people scanning for live ports, or maybe seeing what certificate is offered in the TLS handshake.
The first sentence grabbed my attention, and I was looking forward to learning about the "threat that surveillance capitalism poses to democracy and human autonomy". But then the article fell flat: he gave no examples of that threat, and neither did the linked article in The Guardian.
Are there specific examples of this type of harm? The only complaints that he made were that Google makes a lot of money (which I have no problem with), and that Google's conduct feels "creepy" to him (which is merely an emotional reaction).
He did hint at Google "modifying your off-screen behavior", and I was eager to learn about that as well... but then he left that unexplored too, and gave no follow-up or examples of that intriguing scenario.
Thanks, ok I just looked at that one too. Most of the examples on that page don't pertain to Google, and the ones that do pertain to Google are not harmful (e.g. targeted ads, or someone's emotional reaction to data collection).
Since his ire is specifically directed at Google, I'm still left wondering what specific harm he is envisioning from Google's activities.
I use simple HTTP auth with an easy username and password on most my sites. It is rarely a problem for anyone I invite, except perhaps Instagram's browser, but no crawler traffic.
For "general" audience, I would use a proof-of-work puzzle 10 seconds long and a basic question captcha with human review.
But for a site which is primarily for tech-savvy people, with an accent on retro-compatibility (HTTP auth is supported almost universally, even by Mosaic), I can't think of a better option. Not that interested in SEO, since the software is my main target.
I think OP means they don't mind sharing their information with the search engine (be it Google, another engine that provides better results, or even a better Google in terms of results), _as long as_ OP has control over exactly what is being shared.
As an aside, I do see the trend for some companies to provide this control nowadays. Even Google is doing it (e.g. you can auto delete your information, or turn them off completely): https://myaccount.google.com/data-and-privacy
Of course, whether or not you believe Google is doing what you have configured in the backend is another question... and there is nothing anyone can do to actually make you believe it short of giving you complete access to the entire Google backend. Or is there a way to verify without exposing? Maybe an interesting research topic...
I don’t think Google taking your information and sharing it with advertisers is a great sin. Somewhat annoying but nothing particularly harmful.
I do think the search results are easily manipulated and it can be frustrating trying to find relevant information. Like most people I end up defaulting to Reddit for search queries just to find something that isn’t a blog by someone shilling their product.
But I understand the invasion of privacy would irritate some people and maybe in the long term it would be a net negative. So if there was a search engine that explicitly asked for certain information and you had the option to share, that would probably go a long ways.
Only if there is relevant traffic from Google to begin with, which is highly unlikely for a site like this. A high percentage of results in almost every Google search comes from the closed circle of the same top 10,000 sites or so.
This is the beauty of a protest like this, because this site does have valuable content, and if enough sites like this joined the protest it could actually hurt the relevancy of the Google index, that by the time Google figures out is valuable, would not be allowed to index anymore.
Sorry my statement was both generalized and specific at the same time, and that did not turn out well. How many visits does your blog have daily? And what would it take you to remove your site from Google index?
> I don't see why I would want to exclude my site from any index? Being in indexes helps people find my writing, which I like!
It's essentially a form of boycott. If one believes Google is a problematic entity (too many fingers in too many aspects of our lives), it's a way to sever connections with them at some personal cost.
At least, if you care about search traffic - one might argue the assumption that Google-like search is the default way to navigate the web is one worth reconsidering and encouraging alternatives to anyway.
So, first, I don't think Google search is harmful, so I'm not especially interested in boycotting, but I'm happy to grant this for the discussion.
There's already an interesting question about when boycotting is it worthwhile tactic, but in this situation, there's the additional complexity of there being two ends at which one could boycott a search engine:
* Producer: don't allow the search engine to include your stuff
* Consumer: use a different search engine
This is not the only place you find this dynamic. For example, if I thought Chrome was harmful, I could choose:
* Producer: make my site incompatible with Chrome and suggest people switch
* Consumer: use a different browser
Or, with email:
* Producer: don't email people with @gmail.com addresses
* Consumer: use a different email provider
Thinking through these cases and similar ones, if you think Gmail / Chrome / Search is harmful then the "consumer" side makes sense: the alternatives are nearly as good so you're not giving up much, and you're helping increase diversity. On the other hand, the "producer" side ones are much less attractive, because they're a much larger sacrifice and the benefit doesn't seem that big.
(Disclosure: I work at Google, but not on Search, Chrome, or Gmail)
> (Disclosure: I work at Google, but not on Search, Chrome, or Gmail)
I appreciate that you mention in your bio that you do work in Ads at Google, which seems directly relevant to OP’s point about boycotting Google by blocking indexing. If Google can’t or otherwise doesn’t index your content, Google can’t profit from selling ads that it otherwise show alongside search results for it. If Google de-indexing became popular among a group of content creators, other search engines may not be similarly blocked, and other alternatives to find said content would be found or created, all to the detriment of Google Ads placement, which is a profitable - and inseparable - component of Google Search.
I don't work on that kind of ad: I work in display ads. If you go to a newspaper or other publisher and see ads alongside the content, there's a good chance that my team owns the JS that handles requesting those ads and putting the responses on the screen.
The producer side would be the much more effective option if they could get a significant fraction of people to do the same. That might not happen now, but if enough people get fed up with Google maybe it will.
If that's your goal. Personally I host content that people can use or not. I'll link friends if I want them to see it. Visitors don't cost me anything, it doesn't really bring me anything (other than ego?) to have visitors either. Hence I saw fit to also block google (two years ago already apparently, I thought it was much more recent) and it didn't negatively impact my site in any way.
That depends on where your traffic originates from. Back when I tracked people on my site, I found I got very little from search results. Most of it (> 95%) came from links from social media and Github. On a blog that's heavily about privacy I wouldn't expect much to come from Google.
Also, so what if the numbers go down? If your reason from writing a blog is to see a number on a screen then what does that actually give you?
Wikipedia should do this as Apple and Google are showing Wikipedia results as their own, robbing, IMO, Wikipedia of importance. Wikipedia is large enough that it should have their own search engine, likely with more relevant results.
It does not help Wikipedia to do that. The content on Wikipedia is licensed so that Apple and Google can show the content from Wikipedia (and this is by design, not a loophole). If the users can get access to the encyclopedic content more conveniently, that is still in line with the project's goals, even if that content reaches the user indirectly via a third party.
Identical situation when Facebook was asked to not show previews of the news articles, because of the ePrivacy directive. Could this go for the same legistlation?
Removing website from Google search is the least of worries. Every meaningful aspect of your life is now being monitored by corporations and governments. It is too fucking late. That fucked up social scoring system being used in China to oppress people is coming here. Only instead of government doing it directly It will be mostly performed by corporations to keep the appearance of "free" society. Corps will collect your data, assign you a rank and act accordingly.
> …the essence of the exploitation here is the rendering of our lives as behavioral data for the sake of other’s improved control of us.
I don't doubt that surveillance capitalism is a problem. I do doubt that the underlying motive here on the part of capitalist concerns is improved control of us. Capitalism wants profit to move towards capital. Improved control might be a way of doing that, but it's only one tool in the arsenal of capital, and it's a fragile one at best.
It seems to me that the main motive of capitalism is to continue to drive high levels of desire-driven spending, and that it is at least as likely that surveillance capitalism is mostly about understanding how best to do that as it about actual control.
Yes, we know that marketing & advertising have demonstrated over the course of at least a century that consumer control is possible and desirable. It's just not clear that capitalism needs to increase or improve the level of control over what's already established.
Depends on how control is defined. Ultimately they want to steer your purchasing behavior, by showing you super focused ads on even more stuff that you would like to buy. That would be some kind of control.
I don't really understand why you would assume that the only alternative to surveillance capitalism is "surveillance socialism".
This sophism seems to be built on two errors:
* that there is nothing outside of pure capitalism and pure socialism
* that adjoining "surveillance" to capitalism means we're talking about an inevitable aspect of our society that is combined to capitalism, rather than a specific subset of the way business is done in this age.
To be honest, this lack of ability to conceive alternative social systems is concerning. The deformed Churchill quote comes as a cherry on top.
Of course there are an infinite number of alternatives...I just quickly picked something that sounded decent that tried to make my point about my observation lately that Capitalism was getting knocked around everywhere I looked.
In the last Econtalk podcast that I listened to last night, they discussed the loneliness "epidemic" and the author ended up blaming Thatcher-based Capitalism as perhaps the main reason why people are so lonely today!
I thought that was quite the stretch but imagine my chagrin when here was another spurious back-handed attack on it.
> Capitalism [is] getting knocked around everywhere I looked.
That's what happened with every social system in the past, and, while failures certainly have happened, we've always found ways for that criticism to result in improvements to our societies.
It would be very surprising for the current shape of our society or, more generally, capitalism to be an exception to the rule, unless you subscribe to the "end of history" thesis.
No need to believe Fukiyama. The exception to the rule concept can be based on the idea that capitalism is novel in its ability to absorb and transform any protest against it. The divine right of kings, to pick just one example, was unable to pull off this trick, and perished under questioning. So far, capitalism has done remarkably well at incorporating protest and criticism that targets it, in a way that does seem quite novel.
Presumably the "capitalism" in "surveillance capitalism" is to make it clear they're talking about private companies - as distinct from the traditional concerns about government surveillance.
I can’t tell if you’re being willfully obtuse or not. Harvesting people’s data for the express purpose of manipulating them into thinking/buying things that they otherwise wouldn’t is wrong in every sense of the word.
Capitalism has its problems just like everything else. Pretending it doesn’t is just as disingenuous as pretending that socialism would fix everything. If you’re concerned about people attacking capitalism, help fix the problems. Simple as that.
I'm interested in why this comment is getting so knocked down?
I asked what I think is a valid question and would like to hear honest reactions from people about my observation.
I feel I have every right to be a cheerleader for Capitalism as my father escaped communist Cuba in 1959 as Castro was coming to power and used the US's system and tons of hard work to create a extremely comfortable life for himself, while friends and family members who stayed there lived rather wretched lives.
He never forgot how lucky he was to be able to get out of there just in time and told me time and time again that the US while having flaws, was by far the best place in the world to live, so my original comment comes from this background.
I don't give a flying fuck if the comment gets modded down, but i would like to know just what in it is so offensive to those modding it donw so I can learn something.
I block all Google trash from my servers and the companies I work for, we have a separate domain for all search and a separate service for websites that can be indexed by Google.
I purposely block all their ips from my servers, aws, and Azure as well and then whitelist any service we want from those services individually and carefully.
Google spies on Everything online, they are the largest collector of exposed data and if they weren't so big they'd be in jail for theft, imo.
I'm sure someone at Google is pretty happy that they don't have to show this page in their search results. Nobody can accuse them of bias against anti-Google pages -- the site owner did it to themselves.
Seems like as perfect an example of "cutting off nose to spite face" as I can imagine. (ok, refusing the vax and dying of COVID to get back at the left might be a better example, but this one is close)