Hacker News new | past | comments | ask | show | jobs | submit login
How SEO Ruined the Internet (superhighway98.com)
479 points by midef on April 6, 2020 | hide | past | favorite | 286 comments



It kind of ruined content too.

Why it ruined content? You are not the only one that is searching for the answer to that question. Keep reading to know why SEO ruined content.

Many people think that SEO ruined content, in this post, we are goin to explain why SEO ruined content. When you finish reading this post, you will know why SEO ruined content.

In the last years we have observed a grown in the quantity of content created, unfortunately, as we are going to explain in a moment, it has been ruined by SEO.

¿Is it SEO really the reason content was ruined?

Some people argue that SEO is not really the reason content was ruined, we will review all the reason why SEO could be really ruining content.

Please, click "next" to know why SEO could be ruining content.


I just searched (1) "SEO ruined content" in duck duck go and this Hacker news page is the first entry.

We are very lucky that, as far as we know, an accumulation of irony doesn't create black holes.

https://duckduckgo.com/?q=SEO+ruined+content&t=canonical&ia=...


HA! People collect significant paychecks for what you just did. I love this so much.


Currently page 2 on Google.

To be fair, "SEO ruined content" is a pretty specific search string and doesn't even show up in the results that out-rank this submission. This comment specifically talks about "SEO ruined content" and is, correctly, a good results candidate.


>Currently page 2 on Google.

And now on first page


5th result.


It's the first one for me now.


Likewise.


Very specific indeed, no data on Ahrefs for this keyword.

Keyword Difficulty < 15

Ahrefs estimate that you’ll need backlinks from ~17 websites to rank in top 10 for this keyword


First result on Google now. Well done!


Even above the article that we're discussing. Hilarious. And a very good demonstration of the problem.


Ok, now that we agree this is neat (I am impressed!), what’s the solution?

Google or not, what can help me discover the right recipe, the right content, the right answer to my coding error?

Search algorithms that keep “mutating” even randomly? Ranking system? What?


I fear it's incredibly hard to fight fake content with automated systems. You need the ability to understand the content to determine whether the content is actually meaningful. For that, you need far better intelligence than just some fancy text processing.

You need people, basically. If you want to automate it, you need some way to figure out what people think about that content. This has been tried by counting how many people link to it, or rate it highly, but those methods have also been gamed to death.


Awesome.


Neat!


Yep.

Say I want a recipe. A tried and true delicious recipe. Can I search and just find a recipe? Nope. Through the magic of SEO, I now have to scroll through 15 paragraphs of somebody's life story before being able to examine the time and ingredients.

How much time and energy was wasted on building "tag" systems? All those fun little term link clouds that sites used to have. I know I wasted time on it. I had something that would scan for words and their synonyms and tag articles, a rescan feature for tags that got added after the fact, and various other utilities.


Filler content is the cost of ad supported content sites.

Pay for a Cooks Illustrates or NYT Cooking subscription and all these problems go away.


And if you want it all in one place, an entire compendium of tried and tested recipes, pick up a copy of one of Cooks Illustrated's New Best Recipe books, even an older one.

I have a copy from probably 5 years ago that seems to be out of print and can be had for $7 used. It's 900+ pages of well-tested recipes where each one has a narrative about how they tested and arrived at the final recipe.


The website is searchable and has videos. :-D

Cooks Illustrated is amazing, the rigor they go through testing everything, and adapting to available ingredients, makes for such incredible reliability!


I have never understood what the filler content on recipe pages is accomplishing though. It somehow makes the page do better in google? How?


Plain and simple, Google won't rank you without it. Google looks for additional filler when ranking. Otherwise you would see different kinds of sites.


Then we have to blame Google, not the authors.


Right, but is there any explanation for why a recipe page with several pages of filler content ranks better on Google though?

I assume (?) it is not google engineers tuning for this particular outcome on purpose...

Not all that plain and simple.


More space for ads.


Or you could buy a cookbook. There's a bunch of them out there. Some are good, some are bad, some are long-lived classics, some are deep dives into a particular cuisine by someone who knows it inside out.

You'll still have to suffer through a certain amount of SEO bullshit, you probably don't wanna just go to Amazon and try buying the best-selling cookbook in whatever cuisine you're interested in because that's got its own Amazon-specific SEO clogging things up, but...


My local libraries always have donated cookbooks for sale for 25 cents to a dollar. What is surprising is their number and variety.


Books? You mean those old fashioned things made out of paper with black signs on it?

Most are black and white only and don't even have pictures.

Grandpa used to read that.


Yes! Books. Most cookbooks have a lot of full color pictures. They also lack ads, if they have a story to give context to the writer’s relationship with the cuisine it will be a few paragraphs introducing the whole book rather than at the head of every recipe. They do not have any DRM; once you buy one it is yours to keep, give, or loan as you will.

If one gets wet (not at all inconceivable for something used in the kitchen) you merely have to dry it out. It will be slightly wrinkly but it will work fine, unlike a wet phone/tablet/laptop. It’ll have the dish you’re interested in plus a whole lot more, you probably won’t want to make every single one of them but there’s probably gonna be a few that look worth trying.

It will not track you. It is easy to store where it is needed. If you want to make notes - maybe you loved this dish and hated that one, maybe you made a variant that your SO loved - it’s easy to make them with a pen or pencil, and have them visible without any extra interaction. It’s a pretty useful technology!


I find this particular example particularly gross. Recipe's are no doubt a heavily searched category. Why does google allow pure CPM hacking garbage websites to win the top spots? does it have to do with the google ads from top to bottom?


>Why does google allow pure CPM hacking garbage websites to win the top spots?

Something bothered me about this question, and I think it's the way it frames Google's role as being a passive participant.

Google doesn't "allow" anything. Google writes the rules and picks the winners.

When you search for "chocolate chip cookie recipe," Google's search algorithm goes "Here's a nice webpage with Grandma Betty's life story and a paragraph about how to make chocolate chip cookies at the bottom. This is what you were looking for."

Recipe sites look like they do because Google forces them to look like that if they want Google to send them any search traffic.

Is there a different algorithm that would give more useful results? Is there a way to rank the sites on how well they present the information you were searching for? Is there a way to factor in whether a site has good recipes or terrible ones? I don't know, but I don't have a giant advertising money fountain and teams of very well paid engineers.

Like you hinted at, I think it's reasonable to suspect Google for not having an incentive to fix this. They get their ad money either way, and they probably get more of it from worse sites. As long as it's good enough to keep people from switching to other search engines en masse, they're not losing anything.


There's also manual tweaking. It's why known scam phishing sites and DMCA takedowns don't "win".

It's not just some simple disinterested mathematical orchestration without any engagement or other layer


Long time ago having your site in a high quality curated directory like DMOZ boosted your search ranks a lot.

I was an editor for a few categories at DMOZ. Not only I allowed only good content in our categories, but I checked older approvals from time to time to see if they behave. I had to delist some websites who thought they can trick us.


Seems like it’d be fairly straightforward to identify a post by genre and then weight for terseness.


Even worse when you do find the recipe it uses some crazy measuring system!

What is one "cup" of flour? It obviously depends on how fine the flour is. Use grams! Use ounces! But above all use some kind of real measurement!


> What is one "cup" of flour?

8 fl oz of flour spooned into the measuring cup or 4.5 oz. by weight.

I'm not at all trying to be snarky here. Like any other specialization people who cook have organic terminology that is useful to the in-group but confusing to the out-group. A /24 isn't the same as a class C network but we all know what is being conveyed.

And you'll get much more accurate measurements of small quantities measuring by volume instead of weight since kitchen scales aren't that accurate. Knowing that 0.25 tsp is about a gram for just about any granular thing will probably do better than your scale.

It also helps convey the sig figs in recipe. Very few recipes have a tolerance of 1 g. Even finicky bread doesn't get that accurate so it's ridiculous when some bloggers write something like 113 g when without question the recipe was originally formulated to be 4.0 oz and you just messed up the conveyed tolerances by blindly converting.


I understand how to make bread, bakers-percentages, etc. It's just that when it comes to flour a cup is a terrible measurement, due to the way it packs and the different volume involved in various varieties/types of flour.

Most of the time when I'm cooking and I see an American recipe I just google the conversion; a cup of milk is easy to deal with. Just a minor irritation most of the time, but for some things it matters.


Yeah, cups of flour annoy me to no end. The difference between a cup of packed down flour in the bag vs flour that's been sifted is something like 30% by my rough estimate.


Psst, dude, i recently found out there are conversion tables for Freedom Units!


Most Americans don't have a kitchen scale


I know that different cultures are different, but I always think of Americans as being (kitchen) gadget-obsessed.

Maybe I've watched too many infomercials and soaps. I've reached a point where I know Americans don't have (electric) kettles, but a scale seems like a necessity for anybody who cooks.

(I guess there are lot of people, American, or otherwise, who just don't cook. So I'd understand in that case. But cooking without a scale just seems surprising. Even where I come from a grew up with a balance-scale with brass weights. Never hugely accurate, but always available.)


It's because Americans mainly cook with volumes ("1 cup flour") and not masses ("100g flour"). Just look at any American recipe book - it will use volumes instead of weights.


Wait, Canadian here, American's don't have electric kettles?


Apparently they're unusual and not at all common.

People seem to prefer to use the kinda old-fashioned manual kettles you place on the top of a stove.


It lets you know whether the recipe is any good.

You searching for a bibimbap recipe and their page is split into Hangul and English pages and this was the recipe their grandma uses? You bet your ass it's going to be good.

You find something on allrecipes.com, how do you know if this is a good recipe? Only if you already know it is.


This is sooo true, every time i look for a recipe online it's all i ever see.


Don't look for crafting-ish instructions... the number of outright false farmed content is astonishing.

"Here is this stock photo of something near, we're going to teach you how to make it!" {4 unrelated steps} {stock photo again} "Tada!"


And it's surprising how many sites that are half decent on its own, if you visit it without an ad blocker they still have Taboola ads about how person in {your location} got rich in the way they don't want you to know.


There's a Chrome extension to fix that: https://github.com/sean-public/RecipeFilter

> This Chrome browser extension helps cut through to the chase when browsing food blogs. It is born out of my frustration in having to scroll through a prolix life story before getting to the recipe card that I really want to check out.


Glad to hear this exists, but is there anything like it for Firefox?


Most Chrome extensions can now be run on Firefox without alterations.

https://addons.mozilla.org/en-US/firefox/addon/recipe-filter...


I tried loading the source as a temporary add-on, it looks like this is not one of them.


Did you see that I linked to the Firefox version of this extension?


I cannot say this enough: buy a recipe book.

Online recipe sites will always be absurdly verbose, because if they just posted a list, Google would either rank it poorly because it is "thin content", or scrape it and post it as a Featured Snippet. There is seemingly no in-between.

The site authors have no choice but to fill it with a bunch of warbling about their childhood, to meet Google's minimum word count, which of course the company swears doesn't exist.


Book recipes have a different problem that they need to fill 100 or so pages of content and most authors have maybe 3-5 good recipes. So you get a ton of filler as well


See there is a double edged sword here because outstanding recipes are thesis papers explaining the motivation, concepts, prior art and research, history, experiments, methodology, equipment, and then finally, optionally even, the reipce. They should be long and verbose. You should learn something by reading a recipe.

The problem of course is that search crawlers can't tell the difference between between the two and so long but empty content is treated the same.


Recipes have become hilarious as they have transitioned into storytelling pieces for whoever is trying to promote their lifestyle blog.


This is because of a quirk of copyright; in the US at least, a recipe itself is not considered copyrightable if it is a mere listing of ingredients; however, they can be covered by copyright when they have "substantial literary expression – a description, explanation, or illustration, for example – that accompanies a recipe or formula…”.

If you want some recipes that don't have all that nonsense associated, I'd recommend the BBC's Food site[0].

[0]: https://www.bbc.co.uk/food/recipes


Which for some reason competes with another BBC website, https://www.bbcgoodfood.com/


I skip to print recipe straight away. The whole story is for SEO and I don't honestly think anyone ever reads it.


My wife recently sent me a recipe that had a link right at the top: "Jump to recipe". It was amazing :)


This reads a lot like Webb and Mitchell's gift shop skit. https://youtu.be/7MFtl2XXnUc


I like your good article about content click here to see how I think about good content also


this x1000! By pandering to the algorithm of google, a lot of great content has become watered down, softened, or impacted in other ways. Titles have to be worded a certain way, article length has to be a certain length...

hoping someone else chimes in with other thoughts because I used to know more about this, and SEO cheapening quality content was one of the key takeaways I had.


Recipe blogs are the worst! Just get to the goods already!


Years ago when I was a child growing up in the icy woods of Alaska, my parents ...

... And that was when we realized that the kindling was all damp. What to do! Four hours we ...

... it reminded of our travels in Italy ...

... And that brings us to our famous three-ingredient peanut butter cookie recipe:


I once came across a "world's most amazing curry recipe" that had two ingredients; rice, and a jar of storebought curry sauce. It was actually kind of tragic.


Google has transformed any webmaster into a writer.

After a few years of writing recipes, one can transition to screenplays and novels. :)


It's like we've flipped APIs on their heads - humans use tools to obtain the minimal data (just the recipe), while the bots make requests and receive the whole dump of pointless, extraneous fluff in return.


And here comes the business trick: API access is paid and restricted and requires you to enter a relationship with the data provider. Websites are purposefully made machine-unreadable, because otherwise a lot of people would use third-party scripts to avoid viewing ads and filler content.

The Internet is the exact opposite of what it was meant to be. Instead of giving efficient access to high-quality information, it gives inefficient access to watered down, spread-down, low-quality interaction. Site after site, query after query, we all continuously pay a little tax with our time.


> Websites are purposefully made machine-unreadable.

This is a sad result of incentives which ruined RSS, semantic web, and the original purpose of the Internet: to effectively share and communicate information globally.

Advertising (including propaganda and surveillance) and copyright are major forces in that ruination - not to say they don't have legitimate uses and benefits, but their concentrated financial interests conquered the domain. It's a kind of colonialism of public (cyber)space and collective mind.

Tax is an apt analogy, since the power lies in the control of "media" in the most general sense, to force itself in every transaction between things and people.

I believe "dis-intermediation" will help improve the free and direct flow of information. That is, to reduce the middle layers - the gate keepers, the walled gardens, intercepts (lawful or not), filters - by re-decentralizing the net.


This only really applies to the commercially-focused part of the Internet, though. Many Internet sites are still hosting academically-focused, non-commercial or broadly pro-social efforts, and those seem to be among the quickest adopters of Linked Open Data standards which are precisely meant to make content machine-readable, sometimes with standardized and freely available API-like endpoints for querying the underlying datastore.


That is because the ad revenue model. Not many people would pay for subscription and I don't know of other working revenue models.


More people would be willing to pay for subscription, directly or indirectly, if ad revenue model didn't exist. Otherwise, "free with ads" outcompetes all. That's why I'm in favor of banning "free with ads" wholesale.


I am not on a high moral ground to talk about ads.I am a web developer, yet I use ublock Origin.

Now, that I have some money, I am willing to pay for subscriptions, because that would mean high quality content, not because of ads. However, there were times in my life when I barely had what to eat and I barely afforded to pay for the Internet connection and paying subscriptions was the last things I was willing to do.

I suppose many of the 7 billion people living on Earth are poor enough to not afford paying for subscription.


A lot of the revenue models suck, if you pay they just put your name in a database sell it to everyone that offers still give you ads and make it impossible to cancel. I wanted to buy cooks illustrated for example but after reading about their business practices I chose not to.


Many now have an anchor directly to the recipe. I try to book mark those so I don't have to read about the author pontificating on the changing of the seasons or how an ingredient in the dish reminds them of childhood for the 800th time.


There's a Chrome extension to fix that: https://github.com/sean-public/RecipeFilter

> This Chrome browser extension helps cut through to the chase when browsing food blogs. It is born out of my frustration in having to scroll through a prolix life story before getting to the recipe card that I really want to check out.


If I never have to read another article with "XYZ Is Happening. Here's What it Means" or anything to that effect as a headline, it will be too soon. SEO has created lazy writing, and as a former journalist, I hate what it's happened to my one-time profession.

I can also do without sites that think they're being quirky with their newsletter popups. "Click here to subscribe" or "I'm not interested in staying informed/saving money/being in shape". Stop. You're trying too hard.


Thanks, I hate it


Every time I land on websites that have such content I am wondering why anybody would want that kind of traffic.

I mean, don't they have server costs? It's like they own an infinite amount of money bucket that constantly gets refilled somehow.

Why are advertisers even paying them for such impressions? Are they so ridiculously retarded in their business model? Can we all hurt advertisers by setting up a website like this?


I know little about the ad business but the insinuation I've heard is that the advertisers don't want those impressions, but struggle to block these sites for one reason or another. In other words, it's basically a scam.


I think most advertisers aren't even aware, the ad platforms are so full of dark patterns it's hard to avoid this stuff.


That traffic is monetized in a few ways:

- Autoplayinig video ads

- Affiliate marketing (meal kits, cooking courses, etc)

- Adsense display ads

- Taboola/Outbrain

These don't bring in much money on a per-site basis, but the owners usually own multiple sites, so it can add up.

If you're able to get the site to grow, it becomes the easiest digital property in the world to maintain. The content doesn't need to be fresh, or engaging. People will always search for generic recipes.


It’s all very parasitic and relies on a lot of disconnect. The advertisers often don’t even know what they are buying as these things go through “exchanges” that are basically computers buying and selling as real estate. The smaller the business the more likely they are to end up scammed by this because they will have less tools to monitor things.


Seems to me like you have it backwards.

The reason you even got a chance to click on it is because they wrote it that way. If they didn't, it wouldn't be as visible in Google = nobody would see it. When most websites do this (to survive) folks don't have a choice.

Also, advertisers don't pay for something that's invisible.


To base your income and your future on something as volatile as Google search seem to me a terrible idea.

What if algorithm changes will tank your all your fleet of poor content websites you've spent last few years on?


Then you shut down the servers and move on. You got the ad dollars at that point, what do you care?


That reminds me of an ad network, which was usually displayed next to webcomics, with nearly all the adds pointing to other webcomics, who all had ads from the same network… I had no idea how that network was making any money or paying anything to the websites.


Perhaps you're thinking of Hiveworks? They're less an ad network and more like a publishing company for webcomics. Many of those types of "webcomic ad-rings" are.


We have the lyrics to the sad song you're singing right now.

That is, we will have them, once someone adds them.


Deary me.

I wish I'd be around to see what future historians / archaeologists make of this.


This is especially prevalent for simple questions that would normally have simple answers.

I'd like to know when to water my mint plant without reading about the whole history of mint.


hilarious... reminds me of my favorite Mitchell/web parody: https://www.youtube.com/watch?v=7MFtl2XXnUc


The moment when Google turned to the dark side was in 2005-2006, when they stopped sponsoring the "Web Spam Squashing Summit" and started sponsoring SEO conventions.

"There's going to be a Web Spam Squashing Summit next week: Thursday, Feb 24th. (2005). Technorati is organizing the event (thanks guys!) and we're hosting it on-site at Yahoo in Sunnyvale. The main goal to get the tool makers in a room together to talk about web spam, share info, and brainstorm. So far AOL, Google, MSG, Six Apart, Technorati, and Yahoo are on board. I hope we'll also have representation from Feedster, WordPress (hi Matt), and Ask Jeeves and/or Bloglines too."[1]

The next year, in 2006, Eric Schmidt, Google CEO, addressed the Search Engine Strategies conference.[2]

"The search advertising market – a tremendous credit to you and to the organization that built this conference..."

And that's when Google turned evil. From trying to stop search spam, to promoting it.

[1] http://jeremy.zawodny.com/blog/archives/004256.html

[2] https://www.google.com/press/podium/ses2006.html


I can tell you as an in-the-trenches SEO that Google is 100% not in cahoots with SEOs in any way.

Google WILL bend over backwards for paid search agencies that direct their clients ad spend through AdWords - but this is not SEO. These paid search agencies that funnel millions to AdWords will get invited to Mountain View, get paid lunches, visits from Googlers, etc. - but this is SEM not SEO.

What does Google do for SEOs? They have a handful of ambassadors that answer questions on a weekly basis. While they are generous with their time (https://twitter.com/JohnMu), they keep things very close to the chest regarding the details of their algorithms. They will send some speakers to some SEO conferences, but will rarely if ever sponsor an SEO conference unless it's a part of a broader paid advertising or marketing tech conference.

Google's Algorithm updates - https://searchengineland.com/library/google/google-algorithm... - rolling out every month or so are notoriously a black box and frustrate SEOs to no end!


You base your judgement on the legal and public communication between Google and SEOs.

Everyone else is basing their judgement on the effect google has on the Web.


I totally agree. Google has become useless for about half my searches. It gives me only the biggest, most commercial or most popular results. Anything obscure is impossible to find.

I'd like to have a search engine where you get only the most obscure, hard-to-find content. One where you can tweak the kind of content you're looking for, or even switch between different modes: am I just looking for the definition of a common but complex term, am I looking for a specific article that I vaguely remember a phrase from, do I want something I've seen before, or am I in the mood to discover new, unexpected things?

Also, I just don't want to see results from some sites. Let me tweak the importance of some sites, rather than relying on Google's gameable algorithms.


To me it seems the problem with Google goes far deeper than struggling with bad SEO.

- For years it has been next to impossible to get a result that is faithful to the search you actually typed in. This is not dependent on SEO spammers at all, only on Googles unwillingness to accept that not every user is equal and some of us mean exactly what we write, especially when we take the time to enclose our queries in double quotes and set the "verbatim" option.

- Ad targeting has been so bad it has been ridiculous. Yes, on average it works but around the edges it is somewhere between tragic and hilarious. For ten years after I met my wife the most relevant ads Google could think of was dating sites. Not toys, not family holidays, not tech conferences, not magazine subscriptions, not offers from local shops, but scammy dating sites that was so ridiculous that I cannot imagine how most people would fall for them. (For a while I wondered if this was all a fluke but now I have confirmed it happens to others in my situation as well.)

- Also in other areas it is becoming ridiculous. For example: what is the idea behind aggressively showing me captcas while I'm logged in with two different google controlled accounts, one gmail and one gsuite, both paid?


> "For years it has been next to impossible to get a result that is faithful to the search you actually typed in."

Good lord, yes. If I type two words, I want preference for sites that contain both of them, yet the first results all have either one or the other, because surely I must be more interested in a popular site that uses only one of these, right? Google is sometimes too smart, trying to interpret exact words I type as vaguely related words. Sometimes that's relevant, but often it's not.

> "For ten years after I met my wife the most relevant ads Google could think of was dating sites. Not toys, not family holidays,"

They have a tendency to show you ads for exactly the thing you don't need anymore because you already found it. I don't think AI is in any danger of taking over the world just yet. Except with bad advertising, apparently.


The "AI" that Google's search engine seems to have is definitely feeling more "human" over time, but not in a good way --- it's like a stupid salesperson who has trouble understanding what you're trying to find. An analogy I have is that you go into a pet shop and ask for a black cat, and instead the salesperson shows you black dogs, white cats, and green gerbils (because they're absolutely cool these days and you wouldn't want to miss out on a great deal, no?)


They should have a checkbox to disable AI. :)

Something as simple as Apache Lucene would beat Google's algorithm any day for relevance.

But I guess Google doesn't care for relevance, they just care to show you ads.


> "They have a tendency to show you ads for exactly the thing you don't need anymore because you already found it. I don't think AI is in any danger of taking over the world just yet."

There's an eschatological trait to targeted advertising, as it seems to be all about past sins. So I'm not too sure about your evaluation and AI's own claims…


> I don't think AI is in any danger of taking over the world just yet.

The scary thing about AI is that, even as the algorithms have greater and greater intelligence, we're still not much closer to teaching them to do what we want them to do. They can game the system better than ever, and then the universe is tiled with surgical masks.


So if AI ever takes over the world and kills us all, it will probably be because it failed to understand what we actually wanted.



If "what we actually want" even comes into consideration, we did orders of magnitude better than the current industry standard. (Poor consolation, I know.) Right now, the vast majority of AIs don't even have a concept of "human desire" – probably none of them, to be honest, though some that are good at manipulating their handlers might've come close to a particularly stupid dog's understanding. This is at the core of the Friendly AI problem: https://wiki.lesswrong.com/wiki/Friendly_artificial_intellig...

Just because we created the AI doesn't mean it'll care about us. That's like saying a maths problem will try to make your favourite number its answer, just because you wrote it. No, you are the one who must make its answer the number you want. It won't happen by chance.

Corollary: you can't patch broken FAI designs. Reinforcement learning (underlying basically all of our best AI) is known to be broken; it'll game the system. Even if they were powerful enough to understand our goals, they simply wouldn't care; they'd care less than a dolphin. https://vkrakovna.wordpress.com/2018/04/02/specification-gam...

And there are far too many people in academia who don't understand this, after years of writing papers on the subject.


Or perhaps because it was rotten to the core, as it was initially created for a malicious purpose (most likely advertising).


Maybe this is how AC finally reversed entropy.

https://www.multivax.com/last_question.html


Insufficient data for an answer.


> They have a tendency to show you ads for exactly the thing you don't need anymore because you already found it

Say you searched for a TV a month ago. Now you're seeing lots of ads about TVs. Stupid Google.

But is it? A substantial fraction of those people are returning their TV because something is wrong with it. Now they are looking for another TV set.

Sure, the majority keeps their TV. But it is still profitable to target all those TV buyers, because they have self-selected into the set of people who really want a TV now, and they are willing to pay.

Reaching the fraction of those who need another one is probably[1] very lucrative.

[1] I'm sure Google has run the numbers


Agreeing: This thread of thought comes up semi-regularly here, I've argued similarly to you.

People will rebuy good products, or be stimulated to replace other similar products (bought a new TV for the kitchen, now the lounge TV seems dated, new boots feel awesome get another pair for when they wear out, new $thing is fun buy one for friends birthday.).

There's also a big place for brand enforcement. Show Sony stuff, to remind someone ['s subconscious!] they bought Sony.

A tertiary effect is what I call the "Starbucks Purposeful Bad Naming effect" - you get ads for the exact TV you bought -- beyond the brand reinforcement, etc., you also get to tell everyone you meet a weird story about how "internet advertising is broken ..." and "yes, my new Sony TV is great thanks, you should get one".

Those ad agencies aren't stupid; they have metrics for their metrics and have tracking that can tell you to the second when your gut bacteria burps ...


> Those ad agencies aren't stupid; they have metrics for their metrics and have tracking that can tell you to the second when your gut bacteria burps ...

Stupid they aren't, but I don't think they're smart in the way they are.

Ad attribution is a hard problem. Or, in other words, it's hard to estimate which $ spent on which advertising activities generated how many $ of profit where. That gap is a huge opportunity to scam the product vendor out of their money.

So the ad agency has a metric for their metric, and their reports overflow with numbers and various charts shaped like food or aquatic mammals. But does that mean anything at all? It might not. Statistics is hard, and as long as the vendor isn't better at it than the agency, money can be made. I used to work next desk to a group of content marketers who had no fucking clue about what their numbers mean, but their customers didn't have a clue either, so they happily paid money in exchange for reports that showed the Facebook campaigns "worked".

Now advertising industry is large, and by definition filled with companies that aren't a paragon of virtue and honesty. These companies specialize, providing building blocks and platforms for each other, and they compete internally. It's not like people building tools for lies and manipulation are suddenly honest when dealing with their in-industry customers and competitors. After all, convincing advertisers that your A/B testing package is worth the money requires... well, advertising.

So my personal view on the industry is that it's mostly self-reinforcing bullshit. Doesn't change the fact that it generates stupid amounts of money, though.


I used to work in Ad Operations (literally buying ad space and running campaigns) and can attest to the accuracy of this.

Clients were clueless: they had their metrics and they looked at them often, but from my interactions, deep understanding of those metrics and the realities behind them was lacking. The chain of technologies was patchwork and would rarely support all the required features from ad-serve back up to agency: click and view attribution was especially flaky and inconsistent. The adserving environment we worked in (in app) often had issues with view attribution, and we'd tell clients that, but we knew for a fact that some of our competitors didn't and clients would always ask us why our view attribution numbers were worse.

Combine that with more suspect behaviour from suppliers and competitors than you can poke a stick at (questionable traffic sources and campaigns that were probably outsourced from under you, suspect and plausibly forged numbers, etc) means that most of the metrics are plausibly poisoned with illegitimate data to a degree that is difficult if not impossible to nail down, which more or less makes lots of those metrics worthless.

> So my personal view on the industry is that it's mostly self-reinforcing bullshit. Doesn't change the fact that it generates stupid amounts of money, though.

Couldn't agree more.


> People will rebuy good products,

The big reason advertisers show the ads for products you already purchased is also to reinforce their brand. If you buy some stupid cable, you wont remember the name of the company that made it, but you will if they show you the add couple of times in a row and will likely to buy the things there again even if it not the same product.


Agree, that was in my 3rd paragraph above.


> Say you searched for a TV a month ago. Now you're seeing lots of ads about TVs. Stupid Google.

> But is it?

Yes, because Google knows who started searching for TV's again and who didn't.


That's why I miss pre-Google search engines such as AltaVista and alltheweb. "If you searched for "some obscure string of words" you would only get results that matches that exact string. I really don't like how Google just chooses to vary the spelling of your query when a match isn't found. I often search for electronics components using their part number. I'll type in something like "P204PPX" (a random code I just made up) and despite there being no match Google still gives me pages of results that are nowhere near what I was looking for.

And the worst thing is that this is all done to keep those ad dollars flowing. Look at how many companies always have a paid advert associated with their name when a search is made. They are paranoid about losing rank due to Google fiddling their algorithm or someone else doing a better SEO job using their brand.


Google used to respect search operators, and dramatically tone down query optimization for queries that contained operators. As I've written elsewhere, I suspect learn-to-rank is to blame[0], by optimizing ranking for generic sloppy queries despite your query being very focused.

[0] https://news.ycombinator.com/item?id=22747889


If a match isn’t found with your verbatim query, then it falls back to something similar. At least that’s my experience.

Random string of text still matches exact in quotes: https://www.google.com/search?q=%E2%80%9CI%20really%20don%27...


> - Also in other areas it is becoming ridiculous. For example: what is the idea behind aggressively showing me captcas while I'm logged in with two different google controlled accounts, one gmail and one gsuite, both paid?

To intentionally discourage you from using Firefox so you give in and switch to their stalker browser.


You're not their customer, advertisers are, so it's only natural that the ads you see aren't personalized. That's never been the goal.

It is, however, technically a potential benefit that the more exactly advertisers can target you, the more relevant ads you could be seeing, which is a wonderful sales pitch for users who are agnostic anyway, but that's not how advertising works in practice.


> You're not their customer, advertisers are,

1. I'm well aware of this

2. It doesn't contradict anything I wrote

3. I've read it so many times and seen it misapplied so many times it is getting annoying.

> so it's only natural that the ads you see aren't personalized. That's never been the goal.

I doubt it was the intent of the advertisers to waste expensive impressions on people who weren't in the target audience at all, so I'm pretty sure they expected some personalization WRT which customers gets what ads.

I also very much doubt that it was Googles intention to annoy me to the point where I trash them in public foras, I just don't think they're capable of fixing it anymore as they are way to busy "being Google", e.g. doing cool stuff while not listening to customers (I was planning to add more here, but this single example seems to summarize it well.)

I recognize I might be a bit more direct than usual here and you aren't responsible for the first 97 times I've seen this meme here but as an answer to my question it is not applicable as far as I can see an also generally that meme is just noise here at HN now.

(Anyone who is actually in todays 10000 lucky WRT the "you're not the customer" meme, feel free to prove me wrong.)


Meme? If hearing this uttered bothers you this much, then maybe complaining about poor relevance in Google ads isn't such a good idea.

I'm sorry that I seem to have offended you.


I'll try to explain:

My post may contain a meme but it was directly relevant to the post above.

Mentioning that I'm not Googles customer is significantly less relevant (I think irrelevant) when it is obvious that it should have been in the actual customers best interest to avoid spamming me with expensive and utterly irrelevant ads.


A meme, you say. Searching, you're doing it wrong.


>You're not their customer, advertisers are

Then it's high time everyone but advertisers stopped using google search and start using anything but.

If advertisers are their customers, so be it; let them have it.


> Yes, on average it works

Maybe we frequent vastly different websites, but this has absolutely not been true for me, even for companies who are supposed to be experts at using their data. I don't think I've ever seen an ad that has actually been relevant, and I'm not even trying to hide my habits or behaviors.

For example, take Amazon. Their ads all over the web frequently recommends me stuff I've already bought just a month ago, the very same product. Or the products they recommend are way out of my zone, like woman clothing while I never purchased woman clothing or anything close to it.

So, I'm not sure how the ad market even goes around and my friends are describing the same behavior from the ads, even from companies that have my entire shopping history already (like Amazon)


> I'm not sure how the ad market even goes around

The ad market's business isn't delivering right ads to you, it's convincing people paying for those ads to part with their money. It doesn't have to work well, as long as it works a bit, and there isn't any better alternative around.


> > Yes, on average it works

> Maybe we frequent vastly different websites...

I think we agree. What I mean is on average it works for Google, not that it works for us. They still makes boatloads of cash.

For what I know the targeting is equally bad for you and me and everyone and they are just convincing advertisers that is worth paying for despite this.


Right, that's a good point, that it's working on their side with convincing the advertisers. Thanks for clarifying so I could understand!


There are definitely sites who's results I wish I could ban from my results. I won't visit them so they are just a waste of space. My short list, thillist, collider, vulture.

Also related to SEO I think is every cooking recipe seems to be 6 to 8 large images and a bunch of unneeded text followed by the recipe 8 to 12 screens down. AFAICT it's entirely not related to me getting to the recipe and instead either a pattern for SEO or ads.


A Pinterest ban would greatly improve google results.


I'm surprised Google's own Search team doesn't get frustrated enough by Pinterest results contaminating their own day-to-day searches that they'd consider ranking Pinterest results lower.


They aren't going to de-optimize their careers in order to optimize search just for themselves. Google used to be somewhat optimized for power users, but I suspect that learn-to-rank is over-optimizing search ranking for the median user.

[0] https://news.ycombinator.com/item?id=22747889


There is a Chrome extension for maintaining your own personal block list: https://chrome.google.com/webstore/detail/ublacklist/pncfbmi...


Given the uniformity of recipe site design I'm starting to wonder if there isn't more going on. Like maybe they are all run by the same company or maybe there is a template that every person that wants to run a recipe site is somehow pushed to use.

I mean literally, search for any recipe, click the first 10 links. Screens and screens if large pictures and superfluous text with actual recipe way way down the page.


That used to be available as a Google Labs feature.


I like playing around with "Million Short" - https://millionshort.com. It's a search engine that lets you logarithmically filter-out the top websites. It isn't perfect, of course, but its a fun way to discover things.


I love the idea and have tried to use their service a number of times over the years. I've never been terribly happy with the results.


That is a fantastic idea.

Of course some of those top sites, especially the blogging and self-hosting platforms, can still contain obscure stuff that might be just what I'm looking for.


I see mainly two problems:

- Affiliate spam from douchebags that provide "reviews" of products just to link back to Amazon. Makes it nearly impossible to find actual reviews of products.

- People who type whole sentences in natural language into Google. I tried since I used the internet to serach for keywords, omit as much words as possible that aren't necessary. Most people (after I guess ~2010?) don't. This worsens the results.


Wow! Now people who know how to type using proper grammar and vocabulary are a problem???


Not OP, but let's say you want to find out the protein content of brussels sprouts.

I would type `protein content brussels sprouts` (without quotes) because I fully understand that the information I'm seeking might be in some tabular form, or phrased in a way I don't anticipate.

Most non-technical people however would type in `What is the protein content of brussels sprouts?` literally.

This leads content creators who see these queries in "keyword analysis tools" to dump SEO-optimized crap into millions of blog posts, with completely irrelevant word soups with countless varations of the question, and the actual information buried deep within that gibberish essay, unreadable by humans, only optimized to drive ad traffic.

Google's optimization for the non-technical use case has lowered the overall search result quality immensely.

There was a time when Google didn't simply ignore some of your search words, or when control characters like +, -, and "" were actually respected (~pre 2010), and the introduction of verbatim mode didn't change much IMHO.


Interesting. Why do you think that content creators wouldn't see `protein content brussels sprouts` in "keyword analysis tools"? Why do you think they wouldn't create millions of blog posts containing words `protein content brussels sprouts` or countless variations thereof?


Because Google optimizes for "quality content" since at least the Penguin update. They're using NLP tools to assess the writing quality (similar to algorithms telling you at what school grade level your writing is).

This was good, because it cured all the copy&pastable 2000s era "tag cloud" sites which simply dumped tons of search keywords all over the place.

Ideally, it lead to a stronger emphasis on high-quality human-written content, but it turns out that this algorithm, again, is easily fooled by feeding it "SEO essays" that looks like prose, but is irrelevant text gibberish, but written coherently.

That lead content creators to expand data that would ideally be presented in tabular form on one page, to multi-page "SEO prose" that looks like it's written for humans, but is completely undigestible.

That, along with Google's auto-suggestion feature that finish your sentences after you type in some words, especially on mobile, lead to the impression that people actually like to search in full sentences.


They do, but the whole-sentence fraction is probably in the majority. Which leads to degraded search results and optimisation in the wrong direction.


One can easily make them minority by creating a script which issues keyword-only searches to Google and let it work 24/7/365 from a few hundred machines.


No, you can't solve it that way.


When it comes to search engines: Yes. They are.


Recent example: I wanted to find out why the directions on those Banquet pot pies say to let it stand for five minutes. The words on the box say it finishes cooking in those five minutes. I want to find out more about that: why didn't it finish in the oven? Is it cover for liability, so people don't get burned?

All Google returns is page after page of recipes and posts about Banquet pot pies that have no connection to my input. Google used to be so good at this kind of thing. I know the answer exists somewhere because I found it once a long time ago searching for answers to the same question. Google found it then.


This could perhaps be illustrated well with a hard boiled egg. If you remove an egg at the 7 minute mark, the internal temperature is still around 100c, and it will continue to cook. To stop it cooking, dunk it in cold water.

Similarly with the pot pie, the filling retains a ton of heat, so it still is cooking for those few minutes, as the heat dissipates. If you left it in the oven five minutes longer (to 'finish') then put it in cold water (like the egg), you would have a pie in a similar, though wetter, condition.


I thought it might be something like that. You're way better than Google. Thanks.


I have another "Google is shit now" anecdote: a few months ago I wanted to look up some trivia for the movie "Lord of War". I entered this term in Google, and since there was a game that was just released, it responded with "Showing you results for 'God of War'". No results on the front page had anything related to the Nic Cage movie.


Perhaps Google believed that nobody would intentionally look for a Nicolas Cage movie...?


Why? He's an academy award winning actor who is hugely popular or else he'd be out of a job.


Lord of War is a pretty decent movie.


Can you find the exact search term from your search history?


It was "lord of war director's commentary". I thought it was because many more people were googling for the game, but I just tried and Google is still doing this!


If you click the “search instead for...” then the non-video result was exactly what I believe you wanted.


The point is, why is it doing this?

It's not even suggesting both God of War and Lord of War, for me all the results on the first page are about God of War.


It's probably correcting people's typos, as well as saving a crap-ton of money this way (search results are undoubtedly cached).


At the cost of destroying their once in-its-own-league quality.

Today I use DDG mostly, thanks to Googles choices. DDG is equally bad WRT this but

- it is easier to move from DDG to Google (just add !g) than the other way around

- and DDG isn't as invasive


I just searched "Lord of War" and the front page is entirely about the movie. Don't know what to tell you.


So you replied to my response about the exact search term I used by telling me... this?


Yes, you said it was happening currently, so I tried it. The problem you're complaining about doesn't actually appear to exist, at least not objectively.


> so I tried it.

It doesn't appear that you did. The search term in question is "lord of war director's commentary," and you say you're trying "lord of war." And even if the problem didn't exist for you, but did for them, that does not mean it doesn't exist "objectively."


You are using a different query it seems.

Also as a non native speaker you appear to be rude.


They aren't being rude, just increasing the sample size and reporting back. Perhaps we're witnessing SEO for different regions, regulations, and aggregate history of the two.


Again, I'm not a native speaker, here are the exact words that krapp used:

- "Don't know what to tell you."

- "The problem you're complaining about doesn't actually appear to exist, at least not objectively."

In my limited experience I don't see any reasons to use these exact words in this context except to belittle..?

Anyone care to explain? My seatbelt is fastened and I'm ready to learn (and apologize if necessary :-).


I'm a native English speaker. You're feeling that there is a belittling connotation is valid.

The first quote is potentially a dismissal, which is belittling. However, if it stood alone, it could also be interpreted as the person backing off because they lack qualification to interpret the results. But the second quote includes dismissal terms like "complaining" and "not objectively" in a demeaning context.

So, with all that, I'd say your impression is valid. The context highly suggests that these responses were meant to belittle the parent author's contribution to the conversation.

Here is my take on the parent's search results. The God Of War director ( Cory Barlog ) was a big part of the marketing for the game. And he did some in-game commentary for it too. So, this suggests some kind of SEO manipulation. But it could also just be the google spellcheck guessing wrong.


Found the point of confusion. “Lord of war” returns the movie. “ Lord of War director's commentary” returns the game.


Intentionally? I'll assume good faith. Under that presumption, my best interpretation of that poster is "socially unaware, likely prone to nitpicky argumentation."

The poster who said he was having difficulty getting good results from Google, was obviously venting about his own personal experience.

Next comes along another poster who says "I tried it. I don't have your problem." Another post down, "... The problem you're complaining about doesn't actually appear to exist, at least not objectively."

Excuse my outburst, but who the fuck says that?

The original poster was venting about a problem he experienced. What good can someone do when he comes in and says that he doesn't experience the same problem, and states that it likely doesn't exist? There is no upside here. There's an only a downside: being rude.


>Excuse my outburst, but who the fuck says that?

>There is no upside here. There's an only a downside: being rude.

Says the person who apparently created a green account just to shit on my grammar and cast aspersions on me.

Pot, meet kettle.


I'm not trying to be rude, but I honestly believe a lot of the complaints people make about how useless Google's search results seem overblown. I use Google all the time, sometimes for obscure results, often for technical stuff, and the worst I've ever had to do is look past the first page, but often the first page suffices.

Google showing results for director's commentary of God of War when someone searches "Lord of War director's commentary" is arguably not a failure on Google's part if more people do search for the game than the movie, regardless of the incorrect title.

That said, I completely agree with the theses of TFA. SEO is a cancer.


Google usually shows a link to "search for x instead" when that happens. And let's be fair, most people searching for "Lord of War" are probably really searching for "God of War."

Having to click an extra link or maybe scroll beyond the front page doesn't make Google a shit show.


DDG doesn't work for me either when it comes to more general, not tech-ish stuff. One thing for news I found was https://yetigogo.com


Yeah DDG is hit or miss. Most (70% or more) DDG searches work but a lot of times Google or Bing does a better job.


Half your searches? Isn't it a bit dramatic? You can always check your search history, but I seriously doubt 50% of your searches Google can not find anything - assuming data is somewhere accessible-. Tell us a few of the obscure things that you know exist openly and google could not find for you.

I see these claims all the time, but usually with zero examples.


It's a very rough estimate. I'm not going to check every search in my history. But it feels like the chances I'll find what I need are comparable to the chances I won't.

One example: this weekend I was looking for lyrics from the British folk band Why?. I know they exist; my brother has a bunch of their albums. I have quoted lyrics at Google, I've searched for it on Youtube, I've searched for the band name combined with song titles or names of band members, and I found tons other bands and other random crap, but not the band I was looking for. Eventually I searched for a very specific phrase that was also the title of one of their live albums: "Jig at a Why? Gig", and that finally turned up results.

It's an obscure band, and their name being a common word certainly doesn't help, but certainly combined with song titles, lyrics and band members, it should be pretty clear what I'm looking for? But with Google giving strong preference to the most popular results, Google becomes primarily good at finding things you don't need a search engine for to find them. I want a search engine that's good at finding things that are lost, rather than in plain sight.


Lyrics are notoriously hard to find on Google. I assume it's in part due to copyright issues, but it's also not how Google works. Searches are based on key words, not exact matches of several words in order.


There was once I time where I could type maybe 5 random words from a song and get it as the top hit.

Now there are times where I can’t remember the song title, but I can type a few lines of lyrics verbatim plus include the musician’s name and get only random, unrelated links.


I think this particular query has a problem with the with the unfortunate name "why" which is probably causing the confusion. I don't think search engines did a better job before, nor this has anything to do with Seo. Change the "Why" with another obscure band with a distinct name, you would get results, could Gogle do better, sure, is it worse than before, I reaaly don't think so.


This is how most complaints about "Google these days" play out.

Once pressed to give actual examples, we realize it wasn't a trivial search anyways and certainly not something better Google did long ago on a technical basis. And Bing certainly isn't doing much better.

Of course, there are some things that Google does filter out these days like things that seem like pirated content.


Somewhat unrelated to your point but the phrase "Google these days" made me think the following: Could Google freeze its index and capabilities every one or two years and make it available as a sort of "search the web like it was in 2008" archive? That might also solve the problem of how to prioritize search results over time. The people in 2015 are likely to have been interested in different things than the people of 2040, especially for some search terms (vine for example). I mean they already do this by having different search engines for different countries/languages.


    - Error messages
    - Error codes
    - IC part numbers
Those are the top 3 things that routinely yield absolutely no useful results for me. The first two seem to produce pages upon pages of spammy SEO sites (with titles like "fix errors now") which don't even contain the relevant error message or code, and the latter alternates between no results or, once again, pages of SEO spam.

When I was still in an office with coworkers a while back, we had developed the habit of yelling "fuck you Google!" and showing a middle finger at the monitor whenever a search yielded absolutely WTF or useless results, which was a cue to everyone else around to jump in and help. A stronger tirade of profanity was reserved for when someone managed to trip the bot-detector CAPTCHA hellban. At first only the former happened once or twice a week, but shortly before working from home, we were getting Google-screwed multiple times a day, and tripping the hellban so often that most of us switched to a combination of Bing and Yahoo; while still not ideal, and the results weren't much better, at least we weren't routinely getting banned from them for trying harder to find what we were looking for.


Agreed, I almost always find what I am looking for on my first search. It might be a few search results down the page, but usually what I'm looking for is found on the first search.

Maybe I've just adapted to typing in words and phrases into Google in a way that brings up the results I'm looking for.


https://millionshort.com/about is an implementation of the idea of finding obscure content, but it seems not successful to me.


Google is of course going to return the most popular results to please the largest amount of people.

If you're looking for more technical results, it'll take a more technical approach.

You should learn search operators - https://support.google.com/websearch/answer/2466433?hl=en - that allow you to control the results for specific sites or very detailed requests.

Even better, set up a custom search engine - https://support.google.com/customsearch/answer/4513882?hl=en - with 100 of your trusted sites and now you have Google search precisely tuned to your needs.


We should manifest a website that we can both find right now, agree on the search terms and operators used...

... and then meet again, in say, 3 years, where this content is either only findable on tiktokterest or not at all anymore.

Google has not a search problem. It has a chronological and social one.


Even better, don't use google


I wish I would have seen this coming years ago. I would have built a Google Custom Search Engine, and every time I ran into a good website, added it to the whitelist. By now, it would probably be alright.


Insert allegory about the best time to plant a tree being 20 years ago... Or today.


It is something ive started now.


How are you meant to find the most obscure content? If it's obscure, it probably means it is not relevant to your search. How would a search engine like that even work?

Tweaking importance of some sites seems like a nice idea though, but it could also be a bad thing.


I'm very confused by your definition of obscure. If I search for something rare and not covered very well using as set of precise search terms Altavista and company used to give me almost exclusively relevant results. Modern Google will give me windows helpdesk questions for a query about a linux driver using lots of of quoted terms, at most one of which will appear in the results. Rare results are _exactly_ the time that precise queries give higher quality hits. Substituting a different word will swamp them. Specifying sites to avoid or prefer is an extra signal to help with that


The smarter Google has become, the less relevant it is.

It was a lot easier finding things in 2004 than today.

That's because it tries to be clever and use synonims and search for terms it deems as related instead of actual search terms.


This is correct. DuckDuckGo is finally better.


My experience is the opposite: while there certainly was a time when I felt DDG actually respected my queries [1] that time is now gone. The results I get often have very little to do with what I typed in the search box. I find myself resorting to !g more often than ever before.

[1] I seem to have thought so in last December: https://news.ycombinator.com/item?id=18665232


This was my experience as of a year ago. Nowadays when I do a search on DDG (my default choice) the results are terrible. Then I add !g and the results are even worse. I’ve had so many recent searches for information on technical subjects result in abject failure, leaving me to throw my hands in the air in frustration.

It’s gotten so bad that I’ve installed BasiliskII and SheepShaver [1] just so I can relive the nostalgia of the days when I had my first Macintosh, before I’d even had access to the Internet for the first time. The help system and the documentation for software back then was so much more exhaustive than it is today!

[1] https://www.emaculation.com/doku.php/mac_emulation


After the last bout of anti-google posts on HN about a month ago, I took the comment advice and switched fully to DDG.

My experience was horrible. I don't tend to search 'popular' subjects, only technical ones, and hobbyist stuff. DDG was just plain useless for this, I had to switch back to google after about a week as I was using the !g prefix almost all the time.


> while there certainly was a time when I felt DDG actually respected my queries [1] that time is now gone.

I sometimes wonder if they've hired some ex-googlers lately because yes, this is my experience as well ;-)


If I had one word to characterize the modern web it would be "shallow". SEO and commercialization have led to a world wide web where I can easily find 100 shallow, keyword-optimized articles on "machine learning for IoT" published on high-ranking websites, but not a single page with actual in-depth information about the topic.

But there still is great content on the web, it just becomes harder to find in all the noise. I think websites like HN and Reddit and - to some degree - sites like Twitter with their human-based curation are really important for this, so I'm glad they're thriving.


Early SEO efforts were actually good for the web. They forced you to make your content easier to find and more accessible.

But then that became the table stakes, and people had to start resorting to dirtier tactics. Or just more annoying ones.

During this shelter in place period, I've been reading a lot more recipes online. Every one of them starts with the person's life story. And I get it, maybe how that recipe came to exist is interesting. But at least put a link right at the top that says "skip to recipe" or something. Sometimes I want to read the story, sometimes not. Make it easy for me skip!

I put a recipe on my own website, it's literally a .txt file with just the recipe, ingredients right at the top. I posted a picture of the final product on the internet recently, and a friend asked for the recipe, so I sent him the link.

He replied "the format and delivery method of this is almost more satisfying than the recipe itself".

That's how I know we've gone too far in SEO.


It's good ol' Goodhart's Law, all over again.


The article linked here has a good point it is trying to make, but makes a number of false points that undercut it's goal.

1998 - 2003 was one of the most difficult times to find what you were looking for, even on Google. Many searches for basic information would return results buried in spam pages, pornography, and scams.

Deleting old content to manage "crawl budget" is a myth and does not work or help your SEO.

The real problems are that Google is directing the bulk of traffic to certain brand name websites. Another real problem is that Google set a simplistic AI with a goal of increasing clickthrough from search results and decreasing bounce rates. This leads to developers building all those top 10 lists where you have to click through each item (harder to bounce that way), and some of the pages that disable the back button in various nefarious ways.

I also agree Google should be showing smaller websites more frequently - perhaps optimize for a different goal than the one listed above. More weight on keyword matching perhaps or maybe following only a few "authoritative" users CTR & bounce rate habits.


> Deleting old content to manage "crawl budget" is a myth and does not work or help your SEO.

Typical SEO cargo cult behavior. This worked at one time, or at least it seemed to work, so we'll just keep doin' it.

I can't really blame SEO people, though. As long as Google keeps its algorithms secret, I'm not sure what else they're supposed to do except publish good content and hope for the best - which, in an ideal world, would be good enough, but…


I agree, it's easy to hate on the current system and I think the ads are getting way too similar to the real results now. But all those websites that have a massive list of names/place at the bottom just to come higher up the search, that was the real worst time. When you'd search "Handyman in Leeds" and the top results would be for a company that isn't even in the right location but was big enough to list highly and had "Handyman in Leeds" in hidden text at the bottom of the page.


> and some of the pages that disable the back button in various nefarious ways.

When a site is acting up like that I just click and hold, and this brings up an extended list of previous pages. I kinda assume everyone knew about that feature; it seems to work in all major browsers.


Wasn't SEO, it was a Google's "Suggest over Search" strategy, followed by the completely predictable bastardization of organic results by internal groups.

Biz ops says increasing revenue for [random Google bs] by ranking Y over Z in the results, so it happens. M&A says Rotten Tomatoes won't give us all their data, their users and their firstborns so they won't show up, even on page 2.

This is literally what antitrust was created for. Companies do this naturally when they get too successful, it's on us to remind them who pays the bills.

By us I mean the US govt so we're basically fucked.


>By us I mean the US govt so we're basically fucked.

I'd imagine Mr Trump tweeting: "Google is broken. I am the only one that can fix it."


I’d argue that having the internet limited to a handful of gatekeepers, all of whom are sustained by ad dollars, is probably far more responsible for ruining the internet.

I find it hard to believe that in a world where Google and Facebook’s users were its customers and not its product, it wouldn’t be able to find a way to combat SEO effectively, especially considering how they are basically hoarding the majority of the smartest people in the world.


I worked in SEO for ages, and it's shady as f. You can buy links from anyone, the BBC the guardian, the times... It just costs money. You can ask/force people to take links down (copyright scare, sue, threaten). Fake blogs, we used to run a bunch, some became so popular they became actual blogs on that subject. We'd get money from competitor SEO companies for links on it. There are tons of niche subjects with no info on the internet. We'd often put it up for SEO purposes. Wikipedia was started by an SEO company. The rumour was that Wales started it as a cheap way to get high page rank links to sites he owned. But the rewards were huge. Get a struggling car insurance site from position 11 in Google to position 2 or 1 and their profits would be 10x. They would show us numbers from each advertising sector, SEO, radio, TV, newspapers, etc. Super interesting. Break that down by age / gender ... Very interesting. No wonder the internet is a shit show. So much money involved and zero regulation.


A MAJOR problem with google is it's assumption that if you search in English you don't care if the top results are American.

I've noticed that on google.co.uk, unless you add 'uk' at the end of your search query you'll always get US sites first [1]. Google clearly lump all English based queries into the same geo-graphical bucket - they never used to do this.

---

[1] Yes, I am logged in, and Google knows where in the world I am.


It's even worse when you are trying to search in a language that is not English from a region that is English speaking.

I often look for recipes in French or German and it's impossible to find anything. I try to browse to google.de/fr, I try adding site:.fr but it will still try and give me the most English results. My Google search settings indicate I am happy with German and French results but it seems to have no affect...


I have the reverse problem: Google would happily return me Polish-language results for everything, where I almost never want that. For the types of searches I do, there's almost never anything worthwhile in my native language. This, + their annoying attempt at detecting which country I'm in (so I would get German results when in Germany, and Chinese results when in China) led me one day to figuring out the magic URL incantation to force English results, and replacing the search in omnibar with that.


Do share, please.


I don't have the original anywhere; I used it exclusively on my work laptop two jobs back, which was the only one that moved internationally somewhat frequently (for the rest of my machines, I clicked around Google settings until I got English-language results and that usually stuck). But I think it was https://google.com/search?hl=en&q=%s.


Yes, I am logged in, and Google knows where in the world I am.

They never seem to know where you are in a way that would actually be useful to you, but always do for their own creepy tracking reasons.


I am not logged in and I always get UK search results even if I search on google.com.


It doesn't matter which google tld you use. .co.uk , .ca or whatever it will still give you results tailored to you, not to tld. Just did a random search on a random country tld and still got a result at the top for my city.


Just use a different search engine. Right now DDG is the only viable alternative. Just force yourself to use it, regardless of all the edge cases that suck. When DDG becomes as crap as Google, we can use whatever alternative exists at that time to replace it. The same goes for Instagram, it's slowly but surely turned into an ad infested cesspit (Three consecutive ads between user stories? seriously?). This is how the cycle goes I'm afraid.


I think most spam websites today could be filtered out with very simple algorithms. But that would lead to fewer people ending up on these websites and clicking ads. So if your search engine is also an ad network, filtering out spam websites is not in your interest.


This is the real problem.

SEO will always be a game of cat and mouse. The original algorithms were designed to surface useful content relevant to the search query with the limitations of the technology at the time (so they could be gamed).

Nowadays technology has improved and processing power is much cheaper so it should be possible to use machine learning to recognise what’s “good” and what’s SEO spam and thus get ahead of the SEO crowd again.

The problem here is that the spam sites are also the ones with ads (often Google ads), so there is no financial incentive for Google to actually do anything about those.


I think SEO mainly affects content which was already very low quality. Recipe blogs are a prime example - this is the worst way to get recipes. It is inferior to books and apps (which are more use in the kitchen), is poorly indexed, ephemeral (depending on the wordpress knowledge of the owner), lacks local context, and has none of the rigour and pragmatic advice of something like seriouseats. They deliberately use weird ingredients I find, perhaps to prevent you from falsifying the quality of the recipe and also to differentiate themselves from tried and trusted recipes (which is what most people want!). The instructions you get on the back of the flour packet are superior to recipe blogs, at least they should work for that type of flour.


The article doesn't present complete facts. Regarding zero-sum game, this perhaps was true in the old pagerank algorithm. But I'd believe Google's ranking algorithm has advanced beyond simple keyword density, passing links. What I've noticed is it now gives much more emphasis to user experience. (With metrics like bounce rate meaning the searcher didnt find what he looked for and went back to search results)

We all like to shit on Google but there's no search engine even remotely close to the quality of results. Of course, there's a lot of spam associated with SEO, hacking attempts, spam comments, e.t.c. There are side effects of its algorithm of course, that are negative to web.


How does Google determine bounce rate? That I click on another search result after I click on the first one?


Google search result links don’t link to the site directly but go through a Google-provided redirect that presumably has a reference to the original search query.

If you were to go back to the same search result page and click on another result within a short timeframe they will assume you “bounced”.

They also have Google Analytics littering the majority of the web, so I’m assuming that gives them a signal as well.


It's more complicated than this.

Google links directly to search results now, they stopped using the tracking redirect years ago. They probably track clicks and scrolling directly on the SERP with JS.

Google Analytics is absolutely not used for ranking purposes. GA is far too unreliable and gameable to be used for anything like that. It's more likely that Chrome and Google Safe Browsing are used for tracking user hits.


But website began splitting small articles in 20 pages you have to click trough instead of scroll and they disable back button.

I rather open a new tab from search results and close it instead of clicking the link and going back. I imagine many people do that.


I really dislike posts like this on HN. It’s essentially whining. The author does not suggest any alternative, does not offer any ideas of their own, and just laments the state of the world.

Instead let’s upvote articles on how to build search engines, how search indexing could be improved, how Google’s search works, etc.


The upvotes on this post say something else. Personally, I value the discussion more than the post itself, so as long as there's a healthy discussion the post checks out.


Why does an article have to do both: raise the problem and provide a fix for it? Do you also expect journalists to catch serial killers?


There's a difference between breaking news and lamenting the state of the world. Journalists don't write articles that say nothing but "Serial killers are terrible. I want to live in a world without serial killers."

We can agree to disagree. I personally do not find any value in people complaining without illuminating the problem they're complaining about, and seeking or proposing a different vision or solution.


This article is not just whining, it is detailing the ways that SEO has impacted the internet, many of them not obvious to most people.


So if my car breaks I am not supposed to say it is broken unless I provide detailed instructions on how to fix it and on how a car should be engineered to avoid being broken?


Google with default settings is useless and using more sophisticated queries quickly walls off the user with increasingly annoying captcha.

The "world's knowledge under you fingertips" motto is still valid and brilliant though. My personal solution is library of OCR-ed PDFs with most established books from various domains, git repository for each domain. Greppable in miliseconds, locally. Hijack this, SEO experts!


Your personal solution sounds interesting. I'd be interested to know more details about it, and how it works for you. If it is how I imagine it, it could also be one of those things that could be built into a tool that could rival Wikipedia or search engines.

Based on your reply to handsomechad, you may think that it's easiest for people to just build one themselves. But there may be a business opportunity in providing a pre-packaged solution for the masses. In the same way that Dropbox provided a tool that was "trivial" to experts, but was difficult for non-experts (see the infamous comment here: https://news.ycombinator.com/item?id=9224), if you have a tool that is essentially a rival to Google Books for reference texts, that is interesting.


There's b-ok.org


How did you acquire the books? Are they in the public domain, or did you have to buy them? In either case is there a place to acquire/buy these books massively or did you do it one by one manually?


Whichever most convenient way to obtain a full restriction-free PDF of a book. Fetched one by one through various channels in my case. Very few are in the public domain, if any. BTW one can do the same with academic publications, device manuals, or whatever else content available in PDF.


do you have a link to this solution


ImageMagick and Tesseract for OCR-ing each page of a PDF into a separate text file (through TIFF image format, disregard the huge TIFFs afterwards), private git repos for hosting, then ag/grep for searching. Not as easy to find the phrase back in PDF as with eg. Google Books, but then GB with copyright related content restrictions is useless most of the time.


I feel like I know enough about the internet that I almost don't need google anymore. (the search algorithm is great, but not the sources it provides)

At this point, I have built a set of "sources I trust" and use google as a tool to internally search their websites more than anything else.

    site:source.com "search query"
Is pretty much how most of my new searches go. If anything, I have stopped trusting 1st page results on google.

1st page guitar tabs are the most vanilla chord strum patterns. 1st page recipes are some "americanized - SAHM blog" version of the real thing. 1st page news is a sensationalized link to CNN or Fox, that doesn't quote 1st sources. 1st page game reviews are IGN and 1st page movie reviews are Rotten Tomatoes. For anything more niche google results wikihow or quora, when reddit almost certainly has a better answer somewhere.

The 1st page of Google search returns perfectly average results. But, I have stopped expecting any 'perfect' or 'great' results from it.


Is there anything I can do as a user of the web about this? Is there a search engine out there that is better for these things than Google?


To a certain extent, any search engine will be "better" for avoiding some SEO tricks than Google since somewhere between most to all SEO people are only concerned with how things rank in Google, thanks to its overwhelming majority usage on the English internet - where their pages rank in the results of Bing or Yandex isn't a matter of concern to them. Granted, many of the tricks to satisfy Google are going to satisfy other engines too, but not all of them, and not in exactly the same ways or degrees. (Getting a different result page from Google and from search engine X can be a feature, not a bug.)

I personally have been using DuckDuckGo for several years at this point and am quite satisfied with their results and their commitment to privacy. You shouldn't use any Google product if you have any serious measure of concern for your privacy (he hypocritically types while a YouTube video plays in the background - hey, at least I'm not logged in).


Well, there are other search engines of course, and sometimes they're slightly better. Not great, though.

I'm really itching to start writing my own search engine now.


This is pertinent and don't think the easy answer is yes, as soon as any other search engine gets big enough the same gaming will occur.


The situation has gotten so dismal that I often have to look past page 1, and sometimes page 2 of search results.

Especially when looking for technical things or reviews on practically anything, the top 5 results are garbage sites (with content written or modified by people with names suggesting an SEO "content" factory in a particular region of the world).


Recently, on page 2 and later, and sometimes even on page 1, I see lots of *.it links that just dump a wall of text that seems like it's been scraped from legit resources but all tags content got smashed into one text block. Examples pages that now return 404:

http:// axlk.bebanni50.it/debian-i915.html

http:// bpnq.circoloambientalepiemonte.it/nfsv4-uid-mapping.html

IOW, if it's not on page 1 it's mostly crap (spam).


> The situation has gotten so dismal that I often have to look past page 1, and sometimes page 2 of search results.

That's hardly dismal. That not even mildly inconvenient.


The r/juststart crown strikes again.


> I remember when it was easy to find logic, facts, and reason on the web. Then, someone optimized it.

When was that, exactly? In 1998, when only 3% of the world had access to the web and creating content was limited to a small handful of privileged individuals? This author rails against "Directing the narrative", while building site with a thin sliver of links related to a handful of topics they deem worthy of inclusion. They offer no solutions for an internet that servers 3.5 billion people, choosing instead to whine about how much better the internet was "back in the day."

I don't think Google is blameless, but I think they are more of an inevitable byproduct of this many people coming online than they are a root cause.


SEO combined with Amazon (and other) referral revenue opportunities. Combined with human psychology and ignorance outside of the tech community of the wiles of online marketing. Honestly Hacker News is one of the few sources I trust nowadays, and that is of course not implicit. There is a lot of misinformation and a lot of reporting about reporting. A lot of hyperbolic and misleading headlines. Stay safe out there.


Google also has reduced its memory by a great extent. Searches run many years ago which turned up results from the early 90s on return nothing now.


This is the result of Google trying to compete with Facebook. The search results page is their answer to the news feed.


For many of my searches, I simply type "search term xyz Reddit" to find more relevant results than what Google would throw at me.


Anybody else remember when you could search for a phone number and find legitimate web pages that contained the phone number?

We lost that a long time ago.


I just tried this out, and it works fine for at least one of the local organizations here, both in Google and DuckDuckGo (530-926-4698).

I just tried it with my own number, and my site comes up with both search engines as well.


If I am doing that most of the time the first few hundred results are pages which just list all numbers there are – like in a consecutive manner.

Never understood why there are so many of this pages and why you would build one...


Ad revenue.


Well yeah, maybe in former times, when there was pay-per-view. Don't know anybody who is still doing this. Nowadays (already 10+ years) ads are only paid-by-click, because of "conversion".


Content today is written for machines, not for humans.

There is no bigger turnoff than coming across waves and waves of listicles and "alternatives to" articles that provide no real insight, beyond dumping a bunch of links, adding 1,000 words of nothing, titling it "Ultimate Beginner's Guide to X in 2020" and calling it a day.

Sometimes, to find "actual person" content, I'll add "reddit" to the end of my search query, but that won't be enough, as some enterprising content marketer has decided that they need to rank for those searches too, and created posts like "What reddit thinks about X".

SEO is the symptom, not the problem. The problem is that businesses create this kind of low-quality "linkbait" as "inbound marketing", which is getting eyeballs on a page (by whatever means necessary) and upselling their own services. That's why the content feels so soulless.


> that won't be enough, as some enterprising content marketer has decided that they need to rank for those searches too

Would it help your goals to use "site:"?


One might add social media to that list as well. If there's a Dunbar's number in real life, there definitely is one for online communities too. https://en.wikipedia.org/wiki/Dunbar's_number


This article is good, but to go a bit deeper, it seems like part of the problem is ultimately capitalism or the commercial nature of the modern internet. As long as there are these incredible incentives to game the system, people are going to do so. And those same incentives apply to Google, since they are advertising driven too. They used to fight more against this stuff, but I think as they've realized they get their cut either way, they're less inclined to do so. And even if they were so inclined, it's sort of asking them hold back the ocean in my view, because the incentives are so stacked against them.


I think Google is to blame for allowing this.


Not just "allowing", but directly causing. Google were the ones who first monetized links by treating them as a search signal.

As soon as links became a signal to search engines, they stopped being an organic expression of page authors.

What's that old saying about metrics? "You get what you measure."


As I see it, maybe 1% of the web is not junk.

I think we need an alternative search engine based on community vetting.

I.e. you submit your url to the search engine but it isn't present in all search results until a certain number of real people will say it's meaningful and useful. You can do that by only showing it first in 1% of search results and ask people to rate it.

You don't have to index 10 billion websites, you just have to index 1 million of useful websites.


Agree. If you would like you can read an article where I propose how such system could be implemented: https://medium.com/@TautviIas/it-is-time-to-create-a-decentr...


I wonder if Medium would get included in the index.


SEO effectively means catering to whatever metrics Google happens to be focusing on at the moment. It's supposed to reward "good" content, but there's really no way of automatically judging what's "good" content so Google relies on all these other methods that are open to abuse.

Whatever way Google rates websites has a direct effect on the web itself. In a way they're a victim of their own success.


I wonder if some real competition in search would help things. It might be harder to create SEO that “fools” a handful of different search engines than just creating quality content. I’m not sure how we’ve come to accept a near total monopoly on search so willingly.


Or rather we are the victims of their success.


Partially true because of how pervasive they've become, they're powerful enough to influence the lawmakers in my country. I don't really see myself as a victim though, I just use different search engines.


Would be good to have search engine algorithm that searches on visible content only and places anything with ads and trackers last.


That would require the search engine itself to not be funded by those same ads.


I hate SEO. Google's ranking algorithm is basically an accretion of scar tissue built from years of SEO bullshit mitigations. If you make a ranking algorithm that does right by the user, and you're successful, it won't be long before SEOs come along and poison the well.


> erasing the past

Wouldn't it make sense to archive the old articles and put no-crawl rule on them? Deleting just seems extreme no matter how you look at it. That being said - content that is controlled by such people is probably not worth keeping.


So, someone pays a lot of money hiring people to farm content repeating the same meaningless expressions again and again, to get some visitors from Google so they can show them some adds and make money.

Wouldn't it be more productive to hire people to write meaningful and interesting content? That way they wouldn't just have visitors tricked to come from Google but also a constant following.

The only reason I see for junk content is lack of imagination.


Maybe it's time for a search engine developed and ran by the community in the same way Wikipedia is.


Just look up a recipe for something and this fact instantly becomes apparent


That's exactly why you're seeing this new generation of startups who are building high-fidelity content through their own work and crowd-sourcing (like AskFinny for personal finance), because you can't trust Google any more--full of affiliate-led promotions.


Ironically, the solution is old Yahoo, back in the web directory days.


I'm actually really glad to have seen this and the comments. This has something I've been prattling on about to anyone who will listen for the last couple of years now. Glad to be somewhat vindicated!


I think good content is more than shitty html code


not only was this entirely predictable, it was actually predicted.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: