I’ve been pretty frustrated with Google lately. Maybe it’s me, but they seem to keep skewing their algorithm to ignore precise queries and always show broad matches.
For example I recently searched “US ambassador Germany” on Google News and the top results were all stories about Ukraine and the impeachment hearings. Google completely ignored “Germany.”
Similar things happen to me all the time on Google Search. Perhaps I just search too many long tail queries?
Having been on the fringes of the SEO industry for many years, my theory is that the spammers and algorithm gamers have slowly but surely won out. This has forced Google to de-emphasize keyword relevancy and instead prioritize content source--large and well known brands with high domain authority have captured increasingly larger numbers of queries over the years.
Someone at Google will probably disagree emphatically, but it seems to me like this represents a bit of an existential threat to web search. There's so much money in being #1 for popular queries that countless actors will throw enormous resources into getting there.
I do not think it is the SEO, but Google adapting to its less-technical userbase (people who search for "facebook" and click the first result).
It makes sense for this target audience to give broad results. They seem to be relying more on neural language models to accomplish this.
Google also surfaces more authoritative results than Bing, Google thus favors bigger commercial sites.
Google is suprisingly good on vague searches (for instance, describing a singer's appearance and song themes will return the name), but getting worse on "conspiracy"-type searches (you will not find many amateur sites discussing the high-class escort past of the First Lady, because these lack authority, all results focus instead on the massive fine for posting allegations).
Then why not further personalize search? I know that search results already vary according to things like the user’s location, but the entire search algorithm should be tweaked according to the user. Machine learning could potentially do a good job of this.
Google can also allow users to self-select. The way to do this would be to have a survey with multiple sliders that allow the user to demonstrate, on a scale of 1 to 10, how much they care about a certain parameter. There would be default profiles that automatically set the sliders for you, like “academic.”
Even if Google doesn’t take this route for global search, they can add more features to advanced search. Google Advanced Search is underpowered, given Google’s technological capabilities.
Finally, with regards to the Advanced Search approach, Google can allow the user to create several search buttons that always appear on the Google front page. For example, an Arab American researcher may have a default search button, an Academic search button, and an Arabic search button, all of which can be tweaked along multiple parameters, as described above.
There seem to be very few "sliders" in today's web. I suspect all kinds of advanced functionality would go directly against what those companies are all about.
Google and SEO are a fundamental issue with the web itself. It's been a while since Google could simply search for websites, but that's no longer possible in the same manner it used to be. Google has become so dominant that the web itself changes to accommodate Google.
Google can no longer be an independent observer, as the act of observation changes the results!
That's really unfortunate. If this type of brand-reliant search ranking continues it will only accelerate the centralization of the web toward big entities.
Halfway through your post you moved the goalposts from News to Search. When I search for the query you gave on google.com, I get a knowledge card about Richard Grenell. When I search on News, I do get articles about Ukraine. Are there any recent news articles about Richard Grenell or one of his predecessors though?
John Kornblum, former US ambassador in Germany, is often present in German media when they need somebody to "explain" the USA or stories from cold war era. So in Germany media he can be found, how much then finds the way back in English speaking media I haven't checked.
So much this - you have to go from least popular to most popular terms to get accurate results - there is a weightage penalty of how further the word is from the start of the query.
if you search "Javascript issues Rshiny" (this is just an example not the actual query) - you only get javascript results for 4 to 5 pages of results.. when you reverse it to "Rshiny Javascript issues" - the results are better
I have found as of late that using natural language search terms works much better over the old keyword soup you'd have to feed in to AltaVista or something.
i.e. "how do I change a ford taurus headlight" vs "taurus service manual headlight"
It's not just you. I find myself increasingly encapsulating words in my queries with quotes because Google outright ignores relevant words in my searches. In some cases it has even lead to queries looking like this:
I have nothing to point to, but my experience is that enclosing terms in quotes makes no actual difference.
Google search has been getting increasingly worse for me over the last five years or so. In terms of the quality of search results, it has no edge over DDG.
I stand corrected, I wanted to provide some examples but I cannot reproduce this anymore.
From memory I do recall the +/- and "" operator being overriden if there was a "popular" search query that contained broader results than my specific query.
Specifically for the "" operator was that the query would drop "" so instead of showing x results which must include that expression it would show broader results as if the expression was not contained in "".
There was a time, several years ago now, where that was happening pretty regularly to me. It's gotten better, but since 2009 or so Google search quality has been on the decline for me. It only works well when I use things like site:blah.blah or quoted searches, or something I happened to search for and click on before (but couldn't find in history or had found on another computer). Finding novel things is a pain.
I do this sometimes, and it still ignores the quoted words in favour of giving me whatever results they want. I'm not sure when it was changed to no longer require quoted words in the results, but I usually just give up on the search at that point.
It frequently notifies you that it's ignoring a term in your search to give you a particular result: you see those "Doesn't contain: X" messages associated with a "must contain: X" link to override it... which I think just does the query over again but with that term in quotes.
Yeah but what I meant was that this is becoming increasingly frequent. I mean you would think that Google would try to include pages matching your full query in the top results and then when you keep moving to subpages it could give some less related hits as well. Now I often see these in the top 3 results.
Yes, in general I'd agree with the sentiment that Google trends more toward trying to show you "popular" content and changing your search to fit that, rather than the other way around.
With the amount of stuff that gets shoved into Google search bars worldwide, it's easy to believe that kind of strategy is broadly beneficial, but it does lead to frustration when you're being precise with a query and you're overridden.
That's actually not what's happening in OP's search (at least when I run it): Germany is included in every single snippet I see, it just isn't relevant for most of them. "ambassador to germany" works, though.
Its not just you, there are memes that compare Google search results and Bing results now, with Bing giving more mechanically specific responses
Google has "overfitted" because their A/B test told them messing up more queries gets them higher "engagement". They are masquerading engagement as satisfaction, clouded by their incentive to have people dwell on ads longer.
I'd agree with this assessment. Searching for anything technical seems to especially have terrible results. I'm getting to the point that "verbatim" and "this year" feel like they should be defaults. Oh, and it really, really needs a "filter out anything that looks even remotely like a top 10" switch.
I've been pretty frustrated with HN lately. Every time Google search is discussed, someone comments with a Google News search that fails for them, despite it being almost completely irrelevant to the conversation.
You're comparing a web search to a news search. If you do the same search on google.com (not news!), you get good results. Including a picture of the US ambassador to Germany.
I'm going to second this for web searches. My results lately for either geopolitical or tech have been oddly broad as well. Even though I try to find specific information on a topic, the website results (I don't do their news searches often, can't speak to that one) always have a "trendy" vibe to them. Even if I'm looking for tutorial or technical info, there's been more search-result attention devoted to puff-piece blog articles roughly related to what I want. Like they are more focused on the political aspect of what I want instead of giving me what I want.
I've been noticing it for the past 3 or 4 weeks. I've been thinking my google-fu has been weakening. Glad to know I'm not the only one.
I have certainly been disappointed recently by Google search results when searching for certain information on microcontrollers and the Forth programming language. Forth gets corrected to fourth and/or is found in the idiom 'back and forth'
Out of curiosity, do you do your searches with JavaScript disabled or logged out of your Google account?
The engine should generally pick up relatively quickly that when you specifically search for "Forth," you usually mean the programming language and not a typo. Catching these differences in use case is why customized search is a feature.
Google search has moved away from keywords and towards queries.
"US ambassador Germany" makes it hard to comprehend what you intend. Do you get better results for "who is the US ambassador to Germany" or "news about us ambassador to Germany?"
For a few years, I got better results with natural language queries. However, over at least the last two, I've been getting better results by just including key words. The order of the words matters, though.
The worst part is the corporatized results. For example, if you search for a movie, google search returns the reviews of corporate newspapers first and in the results it also lists the corporate newspapers news reviews first so that you need to scroll down to see useful aggregators and go to the 2nd or 3rd page to see pertinent forum reviews or independent reviews.
Google properties have been skewing to less relevant news and corporate results ( search, news, maps and even youtube ).
It's a shell of it's former useful self. Hopefully ddg and other search providers step up their game to put pressure on google.
Google is just another example of a monopoly gone bad.
In my attempt right now, I do get many articles about Ukraine, but every single one of them includes the word "Germany" in some form.
I belief this is an effect of Google News emphasising recency in its ranking algorithm.
Interestingly, your complaint is almost the opposite of the one often made on HN, namely that people would prefer a somewhat dumber but very literal search interface. It also shows why that wish's fulfilment might end up disappointing people: In the snippets I get, "Germany" gets included in these articles in all sorts of irrelevant ways, including navigational links that really aren't part of the story ec.
I would hypothesize that "us ambassador germany" is actually a hard query to suss out the meaning of ("Does the user want to know about the US ambassador to Germany? The German ambassador to the US? Some news snippet involving Germany, the US, and any ambassador?")
Perhaps the answer is simply that you have disabled user-tracking (either explicitly in Google's settings or by disabling cookies), so Google doesn't know how to serve you best, and you end up with the best possible answer for the "average" person.
I've actually found that the search results get better when Google can't customize the search results for the particular user. It's still not great, though.
For a few years, I've been blaming the reduction in the quality of search results on this attempt to "know how to serve me best". I still think that's a huge part of the cause, but I'm not so sure it explains everything.
It's completely clear. "Germany" gets dropped as there are overwhelmingly more recent stories about the "US ambassador" testifying at the impeachment hearings.
There are three things going on here, one of which is probably bad and the others pretty normal.
- humans ranking results: these are called evals and Google has been doing them for at least 15 years. It used to be that there were a sort of "golden" set of manually ranked queries and results, and any change in the search algo would run against them to make sure nothing broke. My guess is over time, as the algorithm became less heuristic and more AI-driven, these sets are instead used to train the models.
- blacklisting of misinformation or spam: as a user, I appreciate this. As a citizen, I'd rather not trust Google as the arbiter of truth, but I generally think they've made good choices here (vs other tech companies). Until society has solved this problem (through regulation or otherwise) I'm OK with this.
- giving advice or favors to advertisers: this is messed up. And will be hard to argue that if they do it for advertisers they don't also (anti-competitively) do it for their other internal properties.
This is going to be based on conventional wisdom. The evaluators are a biased representation of general public. What do most people think are conspiracy theories?
Despite all the policy guidelines, in the end it's not just a legalistic decision but more like a survey to see what people think.
If you're running a search engine, you need to track of what people think somehow, or you'll totally miss things like cultural issues.
This (and other things about not knowing who the manual raters are, or what cultures they come from and bring to the conscious ans subconscious table)should be of concern.
Depending on what region of the world the publisher is in, and what region his/her readers are in.. and where the raters are from, can make some issues.
Some years ago I sat in a class at a college for building contractors when the subject of burying electrical lines came up, people scoffed at the cost, then the teacher did a survey and found that 90% of the class thought radon gas was a fairy tale conspiracy theory, not real, and not something they would ever consider when building.
Half of those in the class had already been in the building professions for years and some cases generations.
"...even though company leadership has said publicly, including to Congress, that the company does not use blacklists or whitelists to influence its results."
Ummm, isn't that providing false testimony and considered perjury? Why is this shit never, ever prosecuted against a large corporation? It's simple - you know who lied under oath an it's recorded. You know that person knew what the truth was because they were a high ranking person who's job it is would be to know stuff like that. So either they committed perjury or they are grossly incompetent. And it's not like they didn't know what was going to be asked of them, nor did they state they weren't sure. Can't stand this crap. The two standard requirements are that the person needs to have intent to deceive and that it must be material information to the proceeding. Since the discussion was entirely surround this aspect of search, there logically must have been intent to deceive. Furthermore, being the central topic, it was most certainly of material significance.
Probably not, but those lists are more than just dangerous and deceptive sites. They also include things that individuals at Google manipulate based on their own opinions which can be one-offs or added to the lists.
From the article:
"The engineer "didn't want a piece of human anatomy or the description of a sex act to appear when someone started typing the singer's name," as the paper describes it. The unfiltered search results were "kind of horrible," he added."
That has nothing at all to do with web safety or anything fraudulent or deceptive.
Also, just because people wouldn't have a problem with that limited use (if it actually was limited to just those two things), that still doesn't make the statement not technically perjury. If you were in front of Congress and they asked you something like whether or not your business changed the design of the website the user sees based on metrics/tracking that an AI bot determines might generate more user engagement for that user, and you say "no, we don't do that" but later on you tell everyone that you're only doing it for people that opt-in, that's probably considered perjury. And I'm sure you could see why, especially if all users were set to "opt-in" by default.
I spent some time rating search results for google just to get a feel for how they rate pages. Not much time though because the pay is low and the criteria just silly.
In essence, they give you search term, a url, and about 30 seconds (at most) to make about $10 an hour or less.
Now, I'm pretty savvy when it comes to understanding what to look for but I found it nearly impossible to provide both an accurate rating and making enough money to make it worth one's while.
The people I knew who were doing this for a living told me flat out told that they take a fast glance, check a rating, and moved on as quickly as possible because they pay for quantity, and quality of the rating isn't a concern for them. So what they're really rating more than anything is visual aesthetics.
If the page looked nice and seemed to fit the description of the search term they'd give it a good rating.
Based on that, Hacker News probably wouldn't be rated very high by most of those who rate for google.
Somewhat click-baity. They are not implying runtime / live intervention (which isn’t possible anyways). Top bad results are tweaked. That is part of making the “algorithm”. Who says that is immoral or illegal?
Not immoral or illegal, but using human raters to seed an algorithm can make the results worse. For example, it could skew to favoring websites where the rater is familiar with the brand. Which might rank mediocre branded content over less well known sites with much higher quality content.
That seems to match what I've seen over the last 10 years with Google.
I worked for Leapforce for a while doing this sort of rating work using Google's guidelines. The guidelines were linked elsewhere in this thread and, at a glance, they are the same ones I used:
If you're doing something that would skew the results to favour a certain brand, you'd be going against Google's guidelines and your contract would be terminated quickly. I found the review process of my results to be fairly stringent; they were reviewed frequently to ensure I was staying within the guidelines, and I was told even about slight deviatiations.
Examine.com is one, here's the prior HN thread. Had it not been for that HN thread I would never have known that site even existed, despite having frequently searched for queries where it would have been extremely relevant: https://news.ycombinator.com/item?id=20676755
examine.com is the first organic search result I get for "supplemental vitamin d sleep". Maybe the results are hit-or-miss depending on the subtopic (or Google re-tuned the algorithm in the intervening 3 months).
At the moment it's a Frankenstein set up with a rules/algorithm based ranking subsequently shuffled around by several separate ML runs with different aims.
The FTC did once unearth a Google attempt to purposefully influence the human ratings: http://graphics.wsj.com/google-ftc-report/ (long read, but worth it). Find this section of the paper:
"Unable to get positive reviews from raters when Google demoted comparison shopping sites, Google
changed the raters' criteria to try to get positive results."
I've heard that historically all the major search engines have used a lot of hand-tuned results (ordering, exclusions) across the highest frequency searches. IIRC Microsoft bought Powerset in '08 just before the launch of Bing specifically to shore up their human tuning. That said, I expected humans to play less of a role by 2020.
I was thinking this was going to be about human moderation in search filtering, but it actually is about engineers implementing manual input into the algorithm to force certain results to never appear or certain results to always appear.
I wonder what the consequences of no-filter algorithmic search results in. Google had to tweak its algorithm to avoid disinformation campaigns from spam in the past. Now we're dealing with another sort of disinformation campaign and I don't know a reasonable way for google to tackle it the way it is now.
It should be based on public protocols/networks running as a decentralized system rather than relying on a private for profit entirely. It may be possible to have businesses run on top to provide added value, but the core should provide transparency and auditability.
I feel like some of the objection to "human intervention" comes from the fact that things that society at large used to determine was true or false, black or white, are now less so. There is now so much misinformation that a huge portion of the population now screams "censorship".
I mean, factually, there is 0 evidence vaccines cause autism, and it was shown that the guy who started this conspiracy theory did it basically as a scam to make money. So am I supposed to be upset or something that Google is tweaking their algorithm to not push falsehoods? Same thing as the suicide query. I say this as someone who has and still periodically suffers from deep depression, who has searched for suicide methods in the past: am I supposed to have a problem that Google is "pushing" results that try to keep me from killing myself?
Those seem like they could at least plausibly be representative of actual searches. "Men can..." is probably not a terribly common search outside of sex/gender discussion.
Suggestions are based on what people search on phrases that appear in text. People don't search and/or write down what everyone knows already.
Those suggestions aren't there because Google believes men can get pregnant, it's because that's something confused teenagers search.
I also don't get why you consider "Men can get pregnant" to be insulting or biased? It's mostly just nonsense, but if you want to find meaning in it, the potential to become pregnant has historically been regarded an almost holy facet of humanity.
Another fun one I just found, and which you might appreciate: "[idiots can]'t get cold".