Hacker News new | past | comments | ask | show | jobs | submit login
On the growing, intentional uselessness of Google search results (neosmart.net)
150 points by pzoellner on Feb 25, 2016 | hide | past | favorite | 125 comments



Yes, yes, yes. I have been meaning to write this very blog post for years with examples just like this.

There is not a single day that goes by that I am not searching for something specific and particular in google and am treated to pages and pages of search results that are missing at least one of the terms, thus rendering the results useless.

The worst part is, the strikethrough "missing: search term" identifier does not always appear, and you click through to a page that is useless without knowing it.

My habit has become to immediately ctrl-f on the resulting page and look for my terms so I don't waste my time.

Further problems:

- "allinsite:" is just a toss-up whether it is respected or not. Who knows why, but it does not fix this problem.

- "quoted strings", such as for programming or naming conventions, are completely ignored and are useless.

- there is no "not" operator, which is desperately needed.[1]

The only function that actually works as advertised is the site: prefix which limits searches to that particular website. I won't be surprised when they break this too, because it's not producing enough search-result-revenue.

I am not a teenaged kid searching for Justin Bieber and perfectly happy with whatever "relevant" or "related" results pop up. I am a professional. I am an engineer. I need tools that work, and google is shit as a search engine.

[1] https://support.google.com/websearch/answer/2466433?hl=en


I thought I was the only one going insane here. It's become legitimately difficult to discover blog posts/articles on programming topics that don't include the asinine "missing term" suffix. There are plenty of great blog posts/guides out there which DO include the term and are relevant, but are ignored for whatever reason. I end doing "xx yy mdn" with the search range to 1 year for most of my JS needs now. But god forbid you search for an object property (i.e. Object.assign) - you'll end up with 75% of results having "object", "assign", but not both.


... there is no "not" operator, which is desperately needed

Not true. It can be achieved by simply adding the

  - 
Operator to a search. For example, to get Google Search results about Tesla, minus all the stuff on the internet about the cars / motor company try...

  tesla -"tesla motors" -car


Can you give some examples of searches that don't work for you? I'm also a professional and an engineer and generally find a quick Google search to point me in the right direction every time.


Search for:

rsync version freebsd 9.2

First, note that the first four results (for me, anyway) do not contain the string "9.2" at all.

Second, note that google does not give you the "missing: 9.2" notifier on those in the results page.

Later results (for me, 6 and onward) do contain all of the terms. Google decided that search results that did NOT contain all of my terms were more "relevant" than ones that did.

Bonus laughs: changing the search to:

allinsite:rsync version freebsd 9.2

does change things up a bit, but results 1,3,4 and 5 still don't contain the string "9.2".


So, 5 exact matches and 5 related pages is considered poor result quality?


I have no idea why I would ever want "related pages".

I am not waiting in line with my girlfriends and searching for Britney Spears Songs. Related pages do nothing for me.

They are bad results.


Follow-up question: Are you logged in?

Google search will generally learn your preferences and get better at vending you the results you can actually use if you're logged in so it can build that search history for you.


I see what you mean. Yeah, write that blog post.


I think most of your points are wrong. Quoted strings always work for me (symbols are still stripped -- Google is a natural language search engine, not a code search engine), and there is a "not" operator, if I understand what you mean by that. From the article you linked:

When you use a dash before a word or site, it excludes sites with that info from your results. This is useful for words with multiple meanings, like Jaguar the car brand and jaguar the animal. Examples: jaguar speed -car or pandas -site:wikipedia.org


"Quoted strings always work for me (symbols are still stripped..."

That is the definition of not working. The point of the quotes is to allow searching for literal strings.

You have elucidated the entire problem. Unless, of course, you're just searching for movie star names, in which case the rubbery nature of the search results really doesn't matter.

It appears that -negative searching does work, which is amazing - that's nice to know. But it immediately reminds me of a terrible, glaring problem: searching for command line switches. You cannot effectively search for things like:

rsync --ignore-existing

... since they are interpreted as -negative keywords, even if you quote them.


Use quotes and search for rsync "--ignore-existing" or "rsync --ignore-existing". They are two different searches but both work fine and return useful results.

You can also remove the double dash and search for rsync ignore-existing

I appreciate the point of the posted article, but for the most part I don't actually have any trouble using Google as a programmer. You just need to format your searches correctly.


sure you can - just use "rsync ignore-existing" . Might be slightly inconvenient, but people who use command line switches should generally be able to figure this stuff out?


> I think most of your points are wrong.

I believe s/he makes a good case.

Google provides no concept of time. Best case conception of time seems to be either provided by the site directly, e.g. "article date" or something to this effect, or t0 = when google first learned about the page.

Top anser for: "nodejs" + "2016" mongo api

returns top hit: 2015, 2nd hit 2014.

and that I can't give it context myself:

Qouted strings can't possibly work. There is almost certainly an algorithm in the background to provide confidence that they could work before being run. Even with the best case optimization, how could they check every page in index for:

"does google respect qoutes?", I mean even if it finds pages with "google", "respect", "quotes" then runs query on all of them, how could it be fast enough for a human.

In above case, it is possible, however smaller phrases with common words can't ever be respected.

> Google is a natural language search engine

Eeryone has a google search profile. Allowing someone to calibrate theirs would provide them best results. Outline down thread, I explain why it would be nearly impossible for google to do this better than you.

Your appraisal of the not or "-" operator falls into the same problems in terms of time. I can't define my percentage of crawl index that is authoritative, have to have the applied "-" operator applied to a larger base than google and this may not be possible in limited time.

One idea I had is that google could actually take longer, and update itself on the rolling basis. So immediate best guess would come up, and then it would continue running in bg and rerank if you waited another 3 seconds for better results.


You can restrict results to the past day, week, month, year or custom range, using the search tools that appear underneath the search box.


> The worst part is, the strikethrough "missing: search term" identifier does not always appear, and you click through to a page that is useless without knowing it.

I always assumed that such cases mean that the webpage changed since Google crawler's last visit. Did you try ctrl-f'ing cached version of the webpage in such situations?


This happens to me frequently. I've also noticed that sometimes the "Did you mean [this]?" or "Search instead for [this]" don't show up anymore, which can be a big problem. As an example, it's difficult to find information on the company "Amazone" because Google insists on correcting it to Amazon. Usually putting things in quotes fixes it, but it's annoying when I know what I want to search for and Google keeps assuming I want something else.


Except that in 99% of cases, Google is correct. The pure volume of people mistyping Amazon is good enough reason to autocorrect "Amazone." You know what you want to search for. So tell google what you want, "Amazone."

I don't see how you can complain when you have the solution to the problem, don't use it, and are an incredibly niche (<1%) case.


For me, the frustration is that quotes usually fix it, but definitely not always. Also, they don't always tell you that they are auto-correcting. I agree that it's niche and, as he's the outlier, he should have to work around the system, but that's only true if the system is, for lack of a better term, "honest", which it isn't always.


Agreed. I should have emphasized that usually quotes fix it. They don't seem to every time for some reason.


I agree, and Amazone probably wasn't the best example. My point is that even when I search for something in quotes Google still autocorrects it (not talking specifically about Amazone here that was just one example I've encountered in the past). This is especially true when I have to find a person whose name isn't spelled normally.


Love to see where you found that 99% figure


Look it up on adwords if you want more specific data, but this chart showsit https://www.google.com/trends/explore#q=amazon%2C%20amazone&...


Yeah, this works for me by just clicking "Search Instead for Amazone" - it even remembers it next time for me - I also was easily able to strip out Amazon results by adding -Amazon to the query.


I mean... Amazone should probably change their name.


Is it this Amazone?

https://en.wikipedia.org/wiki/Amazonen-Werke

> the company was founded in 1883

I'm reminded of this:

https://www.youtube.com/watch?v=ADgS_vMGgzY


Yeah that's the one I had in mind.


When I search for Amazone the only result for Amazon is one for a product called Amazone. All the rest are Amazone domain names or Amazone the movie


If you are logged in to Google, and search for Amazone, then click the "search instead for Amazone" option, Google should remember your preference the next time you search.


Interesting, I just tried that. It's been a long time since I've searched for Amazone (that was just the first example that came to mind) but I don't remember seeing that in the past.


Alas, it’s 2016 and there’s still no serious competitor to Google.

And there will not be any time soon. Writing an efficient crawler for what we call the "modern" web is not something a small or even median-size company can pull off. Google enjoys a tremendous competitive advantage: people specifically optimize webpages for what it can and cannot do. So any newcomer to the field will have to replicate tons of technologies Google had years to perfect (in addition to solving problems like storage, search logic and bandwidth management).


Newcomers might be better off approaching the challenge in a totally different way.


A newcomer can index, store, search and display data differently, but first they still have to get that data. Which means dealing with JavaScript, paywalls and other fun stuff like that.

Besides, regardless of what you do, you would need to have tons of storage, bandwidth, CPU power and a high-availability infrastructure.


The vast majority of the data on the internet is dross or spam, if someone finds a way to index only quality content, the crawling burden could be radically smaller.


I remember DDG did side-by-side comparisons, but did they ever just query Google themselves and then do their own stuff on top? Anyway I'm not sure Google can be beaten at all by any particular newcomer, instead they might suffer death by a thousand cuts as more niche engines like Shodan show up and bring in context-specific searches that are orders of magnitude better than Google's more complicated all-web-they-approve-of-and-find-relevant-to-your-account search.


You are not correct. Apache Foundation, electron/chromium & elastic are the open source components of google. meme-explorer is the paradigm but was funded by drape/jpl_nasa so work was suspended on it.

heirarchical crawl index and DNS rebuild(or similar model) will lead to search platform and fix discovery, monopoly and monetization.

note that Google has solved a HUGE problem, but their task is now impossible. You can't simply use a textbook and some booleans and quotations(which I am not sure they even respect) to deliver results for a billion people.

Individuals and the market will calibrate their own results.


I maintain an Elasticsearch client and have used it for work quite a bit, but let me just say

>elastic

hahahahahahaa

you have not used Elasticsearch I see :)


You are perpetuating a myth for the sake of a monopoly.

A novel solution could be designed and implemented by a small company but no one dares.


If a small company could design a credible alternative to Google, I'm sure it would have been done - the incentives are too great.

Entrenched businesses are RARELY displaced from their top position. Instead, what usually happens is the world changes around them, and the entrenched business is ill suited to compete in the new world. Nobody ever managed to seriously challenge Microsoft for desktop/laptop OS dominance: https://en.wikipedia.org/wiki/Usage_share_of_operating_syste.... The issue is that desktop/laptop OS dominance is no longer as important as it once was.


i started working on a model, i discovered memex-explorer after and they had apparently designed a slightly less decentralized model but had already implemented it however the project was suspended indefinitely.

memex-explorer + blockchain is the model. A company can not do it. A company can design a platform, and many companies can sell on the optimization & information marketplace to fix problem.


Search, or the quest for knowledge, is not going to be displaced or marginalized. If a small company builds true AI, Google is done.


Depends on how they arrive at it. If they create true AI with lots of spare capacity then maybe Google is threatened, but what seems more likely to me is that true AI is achieved in some hodgepodge system that wasn't well architected to scale. An example of this would be a true AI that came about as a result of embodiment in a robot. Yes, it is true AI but that doesn't mean it knows how to read, let alone read and understand billions of webpages. Even if it could read a page of general text at the level of an average adult, that doesn't mean it can read it quickly.

[EDIT] And as an addendum to this fun tangent, even if the AI was capable, in principle, of reading billions of webpages fast and well, we don't know the power requirements would be for this. This hypothetical small company may have stumbled upon the right algorithms and the right training data to produce a true AGI, but they may simply not have the hardware or the engineering know-how to scale it up across multiple processors. Or scaling it up may require too much (i.e., more than the company can afford) data bandwidth if the robot is controlled. Again, it depends on the details of how they arrived at the AGI.


And, more likely, if a small company does come up with a great AI, they're going to get bought be Google, not compete. Reason being you are still going to need the sheer hardware capacity (a LOT of it) to process so much data. I remember a good article about how YouTube would have likely hit serious trouble had Google not bought them. YouTube was starting to crumble under the exponential growth in load, and Google was perfectly matched to provide infrastructure support.


Maybe. Search is a resource-intensive algorithmic problem. So you need one of two things to beat Google: more resources or a much better algorithm.

You're not going to get the first unless you're Facebook or Amazon or God, but maybe you can build a smarter algorithm. You are up against an army of some of the smartest computer scientists and mathematicians ever assembled -- but what you have going for you is a complete lack of inertia or legacy. You could try crazy things that Google might not, because they won't think it'll work. If you get lucky, one of those blows up. But you have to get very lucky (this is the Innovator's Dilemma in a nutshell).

Anyone want to give it a shot?


You can also niche your space. Hoogle would be an example of that. If you know the searcher cares about Haskell functions only I imagine you can beat Google in that space. That solution probably expands to other interest spheres.


Additionally, you'll still have to find a way to make money. Even if you manage to get on par with Google's search results, it is difficult to replicate their advertising cash cow, and even more difficult to invent a completely new monetization strategy and make it successful.


> ..more resources or a much better algorithm.

Or a limited search space. For myself, it could be HN, SO and Wikipedia.

Content can even be static and downloaded once or regularly.


Use duckduckgo as a first line. Use yahoo and google as backups. Eventually DDG will get enough funding to improve their quality. That's what I do.


I believe the model will be a programatic hierarchical lookup:

local cache ===> personal cloud cache ===> centralized crawl repo ===> small specialized data cache sold into marketplace ===> faiil


There is Yacy (a P2P search engine):

http://yacy.net/


Alas, it’s 2016 and there’s still no serious competitor to Google.

Eh. I switched my default to Duck Duck Go and I'm pretty happy with it. Not quite as magical as Google was in its heyday but then neither is Google. Set up keywords for your searches ("g" for Google, "b" for Bing etc.) and they're all just a keystroke away anyhow.

And if you're too lazy to set up your own custom searches or you're using a borrowed machine, Duck Duck Go has some slick built-in "bang" searches: "!imdb aronofsky", "!msdn system.diagnostics" and so on.


In terms of results from crawling, I would consider Bing a "serious" competitor, although still a distant second. They are well-funded, used in ~30% of searches in the US, and don't look like they're going away anytime soon. It would be sad if they did because Google would then be a total monopoly.

Though DuckDuckGo does do its own indexing, AFAIK it is limited and they mostly rely on other search engines (Bing and Yandex most probably). So it's more a meta-search engine not quite in the category of Google.

Today, I just posted a Show HN for a search engine and feed reader I've been working on. It also has "bangs" except they are called activation codes and start with a ?, like "?jq appendto". It's easy to add your own "?" handler, as it doesn't require any approval to do so. I'm just getting started so I would love to get any feedback.

Link to the Show HN: https://news.ycombinator.com/item?id=11174127


I don't understand this article, particularly this footnote:

>OK, confession time: the article linked to in the fourth result – the one that says “no retina support […] Deluge” actually talks about another app’s lack of retina support on OS X, but just go with it!

That's the result he's using to say the results are worse, that that result should be higher, but says in the footnote that's not even a result relevant to what he's looking for. What am I missing? He's saying the first results are worse, but they are better results. They return the actual app he's looking for, where he could presumably find info on retina support, not some completely different app's lack of retina. All of the links in the second screenshot are wildly irrelevant to what he's looking for.


> All of the links in the second screenshot are wildly irrelevant to what he's looking for

In fact, if you look at the results with "deluge.app" and "retina" in quotes, there are no results that he would be looking for.

I think this article misses its own point: it's not that the results are worse, it's that what he actually wanted was for google to tell him there were no results, not to realize there were no results and so expand his search for him.

There are two choices that google can make in this type of situation, show you that there are no results for your search terms (helpful in this situation but not when there are obvious synonyms that do have results ("help" vs "aid" example in the article)) or google could expand your search to include synonyms or not contain all terms (helpful when there are lots of useful results close to your terms in search space, not so helpful when you're looking for something very very specific).

I'd imagine it's incredibly hard to tell the difference between these situations with just a few words as a query, but if I had to bet, I'd guess they probably hit some useful level a large portion of the time. It's not clear the problem is universally solvable, however, and going back to add quotes to terms is annoying to type. We're probably never going to get the beloved + operator back, so how about just making verbatim search much quicker to toggle, google?

(I'd settle for a more consistently triggered "Search instead for [original search term]" at the top of the results)


I'm the original author of the blog post; if you'll allow me to clarify. I apologize for the misleading footnote - I believe it gives the impression that this isn't the correct search result when it is.

"The" post that I wanted to be first place had the perfect summary in Google, discussed deluge on OS X, talked about the lack of retina for a few different apps, and explicitly mentioned a few without retina support but did not outright include deluge in that list of apps without retina support. It was the most-relevant result in that it actually discussed the topics being searched for. It was, for all intents and purposes, the correct result that should have been returned - only pedantically it did not provide a point-blank answer to whether or not deluge itself was retina-ready.

I agree with you 100%, the results in the first image which do include all the search terms are more relevant than the results in the second search. But Google, for some reason, chose to prioritize the results that did not have all the search terms over those that did. Now from the results in the first image, the first of the displayed results that did use all the search terms (i.e. did not say "Missing: deluge") was the most-relevant of all the results that were obtained from either listing (important pedantic note: whether it actually answered my original question or not does not detract from the fact that it was the most relevant. Because the other links neither answered my original question nor were relevant to it.)

I think a comment by "Robert" from the blog post (if I may re-post it here), best summarizes my disappointment:

Imagine if I told you I have someone who might be the perfect soulmate for you, but unfortunately because the pool of candidates for “perfect soulmates” is so small, I’m also including people that are maybe compatible with you or maybe not – a kind and thoughtful act, on my behalf…. And then I proceed to introduce you to these latters while holding back the perfect match until a random time that I saw fit?

Regardless of whether or not the suggestion for potential soulmate ends up working out, the fact remains, you don't say "I have a result for your search query, but let's look at these definitely irrelevant results first"

If you want to over-analyze this, let's look at the "blurbs" returned by Google for the search results:

1) Deluge's main download page; blurb: open-source cross-platform torrent client. Site includes screenshots, FAQ, and community forums. MISSING: RETINA

2) Download - Deluge. Latest release <url here>. Release... <link to ubuntu.png here> Deluge.app. MISSING: RETINA

3) Installing/Mac OS X: A deluge package is available which works on Mac. MISSING: RETINA

4) From Linux to OS X: Meet your new apps: OS X Mount Lion ships with an app similar to AppX and AppY ..... [sic] It has one notable shortcoming: no retina support .... [sic] There are plenty of great Bittorrent clients on Linux - Deluge, KTorrent, Transmission, etc.

Of these four results, only one specifically talks about Deluge.app and Retina. It's the fourth result. Based off these four blurbs, which do you think is the right page to click on with the highest probability of answering my question? 1) The product main page which I know, thanks to Google, does not have the word "retina" anywhere, 2) the product download page, which I know, thanks to Google despite the completely useless blurb, does not contain the word "retina" anywhere, 3) instructions for installing on Mac, which thanks to Google, I know does not contain the word "retina" anywhere, or 4) a page discussing a variety of apps available on OS X, including explicitly by name, Deluge, which also talks about the retina support of one or more of the aforementioned apps?

I clicked on number 4. A page that talks about Deluge and other torrent clients that are available on OS X and lambasts an (unknown from the blurb) app for not having retina support would ideally be the page that would contain specific information on whether or not Deluge has retina support. It didn't provide the direct answer I was looking for. But it was a hell of a lot more relevant than the first three results, and Google knew it.

Addendum:

Oh, and about deluge.app not being in quotes: that's a lesson learned the hard way. Mac apps unfortunately do not have "unique" names. Pages. Numbers. Deluge. etc. People often append ".app" to clarify their meaning for SEO purposes, and I know that Google indexes "foo.bar" (sans quotes) as "foo bar" (again, sans quotes). Ironically, the only "word" of the original search query that could have been logically dropped is "app". But odds are that a post discussing Mac apps would contain the word "app" or "apps" somewhere. It's not fair to put "deluge.app" in quotes to provide a counterexample, because I knowingly and deliberately did not place it in quotes in the first place, because that's the one term that I do not require to be present verbatim.

Also, this is just the proverbial "straw that broke the camel's back." I run into this problem many times on a daily basis. This is just the concrete example that triggered the post in question, and for which I was able to obtain screenshots of the different variations so that the situation could be properly documented.


"Imagine if I told you I have someone who might be the perfect soulmate for you, but unfortunately because the pool of candidates for “perfect soulmates” is so small, I’m also including people that are maybe compatible with you or maybe not – a kind and thoughtful act, on my behalf…. And then I proceed to introduce you to these latters while holding back the perfect match until a random time that I saw fit?"

Doesn't that basically describe dating? I found my wife because the pool of "perfect soulmates" for me was basically zero, so I figured I'd take a risk and expand my definition of "perfect", and then discovered that I liked what I found.

Back to the topic at hand - it's a bit strange that a page that gives you the wrong answer is the right page because it answered you. It'd be like if you asked "What city is the capital of Kansas?" and I answered "Kansas City" because it had both the words "Kansas" and "City" and both of them are Capitalized, even though the answer is actually Topeka. I'd think a better answer would be "I don't know, but here's a list of state capitals" even though it's missing the words "Kansas" and "City".


That's a pretty good point! I missed that footnote altogether. To me, it sure seems like his issue is more that of SEO than necessarily Google being wrong.


There are lots of problems with Google search results recently, this is one I noticed today:

https://www.google.com/webhp?q=goat#safe=on&q=goat

Yes a crude Urban Dictionary definition was above the Wikipedia definition. Maybe they think they know my sense of humor and pop culture, but I really wanted to learn about actual goats, the animals, because my 2 year old daughter enjoys them so much at the petting zoo. Not about how to arrange my junk in a particular way.


Google revolutionary approach to indexing the web was that it started to use the amount of hyperlinks to a page as a measure of its importance. Since then, they have a strong "internet bias". If something is more present in the internet than in the "real" world, it would appear first.

Searching for "Apple" used to display just pages about the company, not the fruit (even when the company was almost out of business). Searching for "samba" would display just pages about mounting windows filesystems in Linux, not the most important musical genre of a whole country.

Nowadays they improved their algorithms and this bias isn't so strong, but it always important to remember that what is important in the cyberspace isn't necessary important for the whole world.


I don't see this case as a particular problem. "People wanting to learn about goats" might actually be in the minority. Most people actually googling that word may in fact be looking for some alternative meaning because someone used it in some other sense and they didn't understand, so they Googled. And then clicked Urban Dictionary, voting it up.

However, there's a particularly interesting case:

https://www.google.com/#q=tsla

I don't know about you (Google personalizes results to some degree) but the first result I see is Yahoo Finance. Google Finance is second. Why is Google promoting Yahoo over their own product?


> I don't see this case as a particular problem. "People wanting to learn about goats" might actually be in the minority.

This is the problem. Search isn't a democracy. I lookup the results I need, so not giving me filtering ability makes no sense. An engine that does what google does is an amazing achievement, but no longer makes sense as a model for the exact reason you gave as a defense.

edit: If disagreement could be verbalized it would be super helpful to me. I have been thinking about this issue a lot, and often I see people say the same thing as me:

* single website controls almost 100% of english language results

* limiting of images/video from results

* incorrect/wrong results

* limited respect for boolean and quotation operators

* high ranking sites are not authoritative, e.g. w3schools, wordpress automated blogs.

* no way to filter at all

But then down vote my conclusion:

> searches could be filtered and parameters set by user.

Really would be useful to understand my logic error, or if I have missed something.


Search being a democracy is really just a crude way of creating a better ranking system than just looking at, say, keyword occurrence count. Humans are great at filtering out spammy and useless websites, and the democracy system picks up on that.

As a next step, privacy issues aside, what if they "profiled" you by the types of things you search, and tried to guess what you need based on other people who "think like you"?

For example, I'm a programmer, and if I search "python", I'm probably searching for something different than a biologist who is researching reptiles. This would be fairly obvious to decide based on the other types of things I typically Google for.

I'm sure Google is probably already researching how to do this, though. It sounds difficult to me though because of the sheer number of models you'd have to train and store, and then figure out how to run a distributed index on. It might be more feasible to create some small set (e.g. ~1000ish) profiles of "types of people" and then match you into one of those types. This could also mildly alleviate the privacy issue as the profiling could be done offline on the client.


I made this point below about inability to provide context. In the other thread link, I think I provided why,. although I am no machine learning specialist but I think because:

Google can never necessarily know what you want and can never truly know you achieved your goal, so you could not train it properly.

Not only would you need to discover what profession I am in, assuming you had fully updated linked profile, etc. you would need to build a comparable universe of like minded people and calibrate.

Then, you would have to assume what inputs are similar in that they have same/similar parameters and expect similar results.

Then you would have to assume which link I clicked was the answer, for every person who did this same thing.

Then you would have to discount your bias as an engine, because you provide the top results to me and (for now) people trust the engine so they typically have a false choice of the first 5-10 things. If those 5-10 things are wrong, whole model is in error to extent it is wrong.

Any one of these would provide error and the cascade leads to larger disparity. Google IS SO AMAZINGLY GOOD, it has actually managed to make this not a problem for a very long time.


> Google can never necessarily know what you want and can never truly know you achieved your goal, so you could not train it properly.

They do have some amount of confirmation. All of their search results are redirect links, so they're tracking which links you click on. Based on the timing of those clicks, they can tell if you clicked on a result, left that site a few seconds later, and then clicked on another result further down the page, which probably means the first result didn't give you what you want. It's not perfect but it's still potential training data.

If that site has Google Ads or a Google '+1' icon, they can get slightly more information about how you spend your time on that site. I don't know about the legality of this but it's technically feasible.


But you are given the ability to filter your results! Countless ways actually!

Qualify your searches and modify them if needed. You were looking for information about goats. You should qualify "goats" with some form of "information about" statement.

"goat" + "animal" makes the wikipedia page for goats the first search result.

"goat" + "facts" gives you countless trivia pages, information, videos, etc

Alternatively:

goat -"greatest of all time" -"goatse" will search for "goat" without including "greatest of all time" or "goatse".


Correct. However the problems (as I see them now) are 2fold:

No concept of time. Best case conception of time seems to be either provided by the site, "article date" or something to this effect, or t0 = when google first learned about the page.

So yes, "goat" + "animal" will return your results. Try:

https://www.google.com/#q=%22nodejs%22+%2B+%222016%22++mongo...

Top anser for: "nodejs" + "2016" mongo api

returns top hit: 2015, 2nd hit 2014.

and that I can't give it context myself:

I am on Mac, but my pc is broken looking for windows info or don't include Alexa1000 links as authoritative. million short (i believe) removes the Alexa1000, but not their link authority.

Also, [neverShowWordpressSite unless traffic >3million unique] some larger news sites are actually built on wordpress like bloomberg. But the point is that I would delist by technology, and filter by time and tweak my authority parameters.

However, if google let you do this it would exponentially compound the difficulty as the algorithm would exist on both sides of equation.


You can set a time range under "Search Tools". See here: http://i.imgur.com/n5BNkei.png

As for filtering by backend technology - I'm not sure that would always be feasible. While it may be possible to filter default Wordpress sites, I'm not so sure about sites whom backbone architecture may not be known or publicly available.

E:

As for your computer issue, try searching for "problem/error name" + "solved" rather than "how to fix" + "problem/error name".


Thanks, the "issue" was an illustrative example, however I am a relatively tech savvy person and I use google frequently and I didn't know about this feature. I would be interested to know what percentage of searches leverage it relative to people who type recent year/date into the box for just a quick and dirty statistic about visibility.

I don't actually want to filter backend technology, but I would like to communicate to the search engine that I do not trust (nor want to have returned as a result) any wordpress, blogger or medium website, and I want their rank to be negative.

That is another extreme example, however to discover new things is hard and to find useful information, when communities of bad actors have spent years incentivized to rank higher but not produce quality, it could be easier to simply delist everything and gradually add websites you trust to have authority.


I was going to suggest daterange, but it didn't work for me (haven't used it in awhile), however, I think this tool replaced it, you can even set custom ranges with that.


It is though. Search is about providing the highest likelihood of the desired result, not matching words, at least in one value space.

If most people searching for "goat" want to know what the word means in slang, that should be the top result. Possibly you could argue that you want a personal search profile that knows you value Wikipedia higher than other links, but for the default case it feels like optimising for highest odds of success makes sense.


The default case is that everyone has a "search profile" but it is made by google and applied to abstracted parameters.

Everyone wants a "search profile" except they would like to control it and how it is applied as it is, for most people, their most important interaction with a computer, e.g. how they access information.

Currently, in some respects, that is out of a single silo or set of balkanized silos.

This will not be true in the future. One place can not dictate information flow for world. Plus, Alphabet has better things to do


>I don't know about you (Google personalizes results to some degree) but the first result I see is Yahoo Finance. Google Finance is second. Why is Google promoting Yahoo over their own product?

Because that's how lawsuits happen. They let the search engine run itself. If more people are using Yahoo Finance over Google Finance, the search results will reflect that.

I genuinely believe they aren't dumb enough to open themselves up to monopolistic behavior lawsuits. The Microsoft lawsuits weren't that long ago and I'm sure Google is wary enough to make some attempt at avoiding a repeat.

I have little reason to suspect Google purposefully toys with their search results to promote their own products. Given the quality of their products - I'm more inclined to believe that when a Google product is the first result for a name/search, it's probably because people actually use/enjoy that product.

As an example, Google Maps vs other "Maps". While Google is certainly trying to join the ranks, I find other "Maps" to be entirely unusable with terrible UI and am not surprised in the slightest when Google Maps is the first result when looking for directions.

It could also be you use Yahoo Finance more often and thus personalized results had it listed first. Google Finance ranks 4th on the page for me for that search result.


Interesting. But Yahoo has their own search engine. If Yahoo listed Yahoo Finance first, and Google listed Google Finance first, is there any legal issue? Both are working search engines and nobody is forcing you to use one over the other. I don't see a "true" monopoly situation. At least not a Comcastish monopoly in which other players are explicitly blocked from entering the market.

For what it's worth, Bing and Yahoo also list Yahoo Finance first, if searched for "TSLA".


>But Yahoo has their own search engine.

No they don't. They're a front-end for Bing search (at least until 2019 and in everywhere but Japan, which IIRC is a front-end for Google), but that's being nitpicky. :)

FWIW I've always had an issue with companies being punished for simply being better than alternatives. That includes Microsoft's advertising IE, even if IE at the time wasn't the best browser. People were free to use IE to download a better browser, so I never saw the issue with providing IE as a default. Linux was free, they were free to buy a computer, set it up themselves, and install Linux on it. The fact that they weren't choosing to do so should not result in Microsoft being punished.

However my beliefs and precedent set by previous law (even if it isn't legal precedent?) is still enough to have Google play it cautiously. Especially since there has been threat of such lawsuits if they were caught playing with their search results to advertise themselves and "kill off" competitors.


I just went there and the first "result" is a giant stock chart widget showing the 1-day chart with the options of viewing 5-day, month, etc. This chart takes up a full 1/3 of the vertical screen size. At the bottom of the chart there's 3 quick-links to: Google Finance, Yahoo finance, and MSN Money. The next result-ish thing is Google's "In the News" section that has an article about the broken Gigafactory promises. The first real result is yahoo finance, just as it is for you. I suspect it isn't so much that Google is promoting Yahoo over its own product, since that first chart is clearly a Google product, but that it doesn't bias the real search results in their favor. It's a legitimate organic search result. Most people probably just want the stock price from the first/simplest way possible, which is provided or they want the yahoo finance page.


Not here but because of the "s=TSLA" in the Yahoo's url?


You got unlucky. G.O.A.T. (greatest of all time) is really popular slang at the moment for describing athletes, fashion, actors, etc. and people are undoubtedly searching for that term more than they are looking for the wikipedia page for the animal.


I presume that Rob would have been OK with "greatest of all time", which indeed is the top result once you click on the link. The problem is that the highlighted snippet on the Google page is significantly "racier":

  Urban Dictionary: goat
  www.urbandictionary.com/define.php?term=goat
  Urban Dictionary
  Tucking back your balls and dick, then bending over thus 
  resembling the back of a goat...according to the rules of 
  the game, the person who looks gets 4 kicks in ...


To be fair, when I search I get a huge Knowledge Graph result on the right with a picture of the animal.

Generally, I think that the number of people who want the Urban Dictionary result is higher than the number of people looking for the animal, so it's hard to fault Google for this.

What would be ideal is to have something like DuckDuckGo's disambiguation bar that included the Urban Dictionary definition of goat:

https://duckduckgo.com/?q=goat&ia=meanings

Unfortunately, it's not there right now.


I get Wikipedia as the top result. Maybe Google is making a guess about your preference based on your history? Try the search in incognito/private mode and see.


I just tried with Incognito, and got Urban Dictionary as the top result: www.urbandictionary.com/define.php?term=goat

So maybe it's based on location? Ranked higher based on browser, IP, or something else?


I was using a mobile browser before, and the page layout was different, putting the big summary box full of photos (which excerpts and links to Wikipedia) at the top.

On a desktop, I see what you mean; UrbanDictionary is indeed at the top of the plain-text search results.

Not going to defend Google at all. But for myself, if I'm looking for a Wikipedia-grade introduction to some topic, I usually just search Wikipedia directly. Wikipedia has become so ridiculously useful that it is the top result of a Google search half the time anyway. Wikipedia is a better human-curated web index than any of the old-school 1990s human-curated web indexes ever were.


I've long given up on Google outside of the anglosphere. I travel a lot. Location services are enabled on my phone and laptop yet Google still wants to serve up results from the US. I've had to be very careful with how I use Maps since the name of a dutch city or street just might share the same name as some bar in middle America.

I think someone is buying or gaming search results. When I'm in Paris some results take me back to a tour company in central Paris even if they have nothing to do with what I'm looking for.

This has forced me to go back to search aggregators like DDG. I miss the old Google. DDG can be a little too broad with the results and then I have to load it with filters where old Google would kind of get it right away.


I got quite tired of Google only showing me local (Brazilian) stores when I search for the name of a product in English. Or completely translating my search queries when I get some compiling error on libraries.

Good thing is that with DDG, I can simply use the !g command, and Google won't know I'm making the query from Brazil, and won't be able to localize it at all.


DDG uses other search engines' indexes but I would not describe them as a "search aggregator".


I know its technical but in laymens terms they are an aggregator unless another definition exists.


There are lots of problems that I've noticed anymore too. One that I always find is the utter uselessness of searching for a phone number anymore. There are a million pages that simply list random numbers. That's not useful AT ALL.

Another that I've more recently noticed is conversions on mobile. I went from being able to type any approximation of "ounces" "to/in/-" "pounds" and getting a number right away to having to click on one of the results to get the number. It seems like backward from Google's normal MO so I don't really understand why they'd do it but I definitely have consistently had problems with that especially more recently.


You could never find useful results for a phone number, except for businesses. Searching for a small local business's phone number brought up their page for me.


Yeah this bites me nearly every day. I can only assume that they have some data that this helps 60%-80% of their users in some way, and they are optimizing for the common case rather than the uncommon case.

Not sure that it's the right decision, even if it helps the common case, but it may be. And I say this as somebody who's not particularly fond of Google as a company or its policies.


It could also just be that they [incorrectly] assume their users are dumb, their algorithms are smart, and have forgotten what made them big in the first place. IIRC, besides PageRank, one of their original usability improvements was AND-ing query terms by default instead of OR-ing by default like their competition.


In some cases I think it would be useful to see a count of results with one of the terms removed, but there's no way it should override _my_ search terms. That just seems broken.


A friend had to chance his product name, because Google wouldnt stop suggesting porn. You googled his product and instead you found porn. It was not really related, and the term was as innocent as possible in it self, but well google thought and still thinks you want porn.


Duckduckgo has a safe filter that blocks porn and a few other things from showing up in the search results that works pretty well, you can decide to take it off at any time too. They have the same kind of filter for looking for country specific results.


And their localized search is the epitome of what localization should be. A small always present toggle takes me immediately from globally most popular results to results local to my area, which truth be told I only ever want when I look up something like "legally protected mushrooms". Meanwhile, I typed google.com in the address bar, why does it insist on redirecting me to google.nl?


Google also has a safe filter. That doesn't help if most users have it off because sometimes they want porn.


So you are saying that friend should go on a campaign to convince all of that product's potential users to not use Google and use DuckDuck ago instead?


I dont get it ether. Seems like a brutal way to keep his old name.


I've developed the habit of just asking Google questions instead of crafting something to feed how I think the algorithm works. I also don't worry about ranking, just as long as something is in the top 10.

Does deluge bittorrent support retina display?

The 7th link result for that question has the text "Deluge has updated their program to support the Retina Display" in it's text summary. That's good enough for me.


I think that some of this reaction is just resistance to 'relearning' how to search. Whereas before you developed techniques to make your queries more 'computer like', now you can revert to something more like natural language.

It would still be nice to have an 'advanced' search mode that was more strict, allowed advanced features, and still took advantage of Google's talent and infrastructure.


I have this problem all the time. I type {movie/book/cd name} illegal free download -price -buy -purchase and click search. Google knows what I want. Everyone knows what I want. Yet time and again the first hundreds of results are for non-free stores where I can purchase the item. Stupid Google!


Google couldn't find what he was looking for because it doesn't exist on the web. Take a look at the search with a +retina term:

https://www.google.com/search?q=deluge.app+%2Bretina

Or in verbatim mode:

https://www.google.com/search?q=deluge.app+retina&tbs=li:1

His article says that the answer is in the 4th result, the first that actually includes both of his search terms. But if you read that article, it's actually saying that Twitter's app has no retina support, and Deluge is mentioned elsewhere on the page, with nothing about retina support.


I don't think that Google accepts +term any longer. They switched several years ago to requiring "term" with quotes. Presumably this was because they wanted to make searches for Google+ names easier. For a while I think they gave a warning, but it seems like they've stopped doing that now.

Edit: Wait, you work for Google, right? Am I the one that's lost? Did they switch back to supporting +term?


Actually, no, I think you're right. I left about 20 months ago, and they disabled support after I left.


Yeah, in my experience the problem is not about Google ignoring search keywords. It is just that their recent updates made it much less apparent that what I'm looking for is not in their index. Their new algorithm seems to drop keywords until it can show me some results that are barely relevant and makes me think that I have to keep tweaking my keywords until I can find what I'm looking for, even if it doesn't exist on the web.


I agree that the lack of a clear "no pages matched your query" is large part of the problem. And as your phrasing indicates, it's also worth distinguishing between "not in their index" and "doesn't exist on the web". Google has a very large corpus, but doesn't include everything.

But the odd part here (and in line with original post) is that there are pages in Google's index that include both terms: https://discussions.apple.com/thread/7169461?start=0&tstart=.... It's just that they don't start until 40 or so results in.

Searching in Verbatim mode without other modifiers seems to create the response that the original post wants: https://www.google.com/search?q=deluge.app+retina&tbs=li:1


Each similar search Google can avoid re-running saves them compute resources and therefore money. Google's as-you-type search suggestions encourage some significant percentage of users performing a query to accept a search suggestion (or suggestion that aggregates multiple searches) that match exactly with searches that were already run. Google can just re-return the results.

The autocorrection and removal of terms accomplishes the same thing.

While I think these measures may be partially to help users, I think they're actually mostly cost-saving measures on Google's part.


I disagree with you. In the query "deluge retina", "deluge" is definitely the more important bit. After all you're looking for deluge and are hoping to find it with support for retina. Otherwise you would've written something like "torrent client retina".

Now this is where a bit of guess work comes in but I'd say Google correctly deduces there is no such thing. Even the best result for both terms just talks about some app not supporting retina, and from the looks of it does help with what other client you might want to use if switching to OSX when you were previously a deluge user on Linux (possibly when dealing with retina). But that's not deluge. That's a useless result considering your primary intent was finding deluge. So Google chooses to give you results that might get you what you want over results that (correctly) only disappoint explaining there is no such thing. So in the first three results Google correctly decides that, to get you a relevant result at all it needs to omit the "retina" to get you results that might possibly get you what you want - deluge despite not having retina support - over just giving you the results that are relevant but definitely won't get you what you want. Your query had no results that would get you what you wanted so Google tried to alter the query and see if it could get you something useful anyway.

I think Google trying to give you results that might be hits instead of giving you disappointment in the first place is very sensible behavior.


One of the worst things is that they are removing Wikipedia from the "Knowledge Graph" results and generally replacing it with stuff inferior.


Any speculation on why they'd want to do this? It seems clearly better for the consumer if wikipedia shows up at the top anytime you google a word/phrase that appears in wikipedia. Do they make money on the results that are showing up higher?


Not necessarily.

One of the reasons why Google has looked so smart is that it has leaned on Wikipedia. I have a kid in middle school and definitely one thing you learn there is you can look like an expert on any subject by consulting Wikipedia (i.e. "what calibre ammunition is used by a tommy gun?") To an extent teachers encourage it, but it can tend towards plagiarism.

The Wikipedia page is generally a safe bet for relevance but it may or may not be a quality answer.

I think today they may be like the middle school student who is learning the tricks to not look like a plagiarist.

Remember that Google search and advertising is actually in big trouble -- there is a reason why they renamed the company to Alphabet. The 90%+ market share they have in many countries is unsustainable for many cultural reasons and they have to diversify.


Could it have anything to do with Wikipedia usually being the number one result people are actually searching for for a great number of queries? And they want to distribute that traffic more democratically?


Since you asked for speculation: Wikipedia is not very well written for someone just asking a quick question to Google. Articles are overly long, and written in a boring and neutral tone.

If someone just wants to know where birds go when it rains, an article on bird habits is less relevant to them than an article that directly answers that question.


Here's a thought: maybe the shift in the index to display realtime website updates and factor in social network inputs have ruined their ranking system? Or at the very least, forced it to approximate results rather than be precise about it?

Maybe google has become like robocop receiving 100+ prime directives. All these low quality, noisy inputs are simply driving the engine bonkers?


I find myself using the "verbatim" search option all the time, which is hidden one menu-layer deep in the "search tools" drop-down, behind the "All Results" button. It helps but I've noticed this trend of google search working against the user too.


Alternatively, in many browsers you can create a shortcut that you can use when entering a URL. I have:

  https://www.google.com/search?q=%s&num=100&tbs=li:1&filter=0
aliased to "v" for "verbatim". Here's instructions for adding a custom search engine to Chrome: http://www.swestwood.com/blog/view/fast-searching-chrome


Awesome, thanks!


Try Google Search without a Google account or any Google cookies. It may be more neutral without "search personalization".


I sorta do this, I think. Via setting DuckDuckGo as my default provider, and doing google searches via `!g my search sentence`. You can even go straight to images with `!im`, maps with `!maps`, and whatnot.


I will not beleagure this point again, I make it everyday on deaf ears.

* Search can never be decoupled from the browser.

* search is worse and discovery is hard.

* Brave Software needs to focus on building security by design which means a search engine AND browser.

The memex-explorer model is the design of the google killer. Likely why it was abandoned.

Browser wars began ~6 months ago.

The next larry page & brin, WILL write a search engine.

Edit; It is impossible for google to use human language and (questionably respected) boolean operators as the only filtering mechanism For hundreds of billions of pages.

Staggering how well they do with just 1 textbox and image & video tabs, but new players wont have to follow the same roadmap.

============================

Please Explain Downvotes

============================

I am trying to ascertain how I have been thinking about this incorrectly. Often, downvoted but no one explains why. How am I thinking about this incorrectly, what am I missing?


I would like to know your reasoning as to why search cannot be decoupled from the browser; I would like to know why you think that a browser war began about six months ago.


https://news.ycombinator.com/item?id=11134798

Trying to find time to continue work on something a bit more polished but it is not ready. This is my response in another thread, either ctrl+f my un or "-2" and you will see response to similar query.

Idea is "so obviously crazy", mentioning it publicly for several months and begging people to consider it and work on it has been met with down votes or completely ignoring it. Which means, if you realize it now and start working on it, you will have ~6 (I think less now though) head start as people catch up with it.

edit: Because I don't think this makes sense as a model for search. http://imgur.com/Gz7hXY7


Can you explain the image and what it's trying to show? That the number of text links should be more than the number of image links to well known sites? I'm asking because I do something similar on my site, solveforall.com, but to a much lesser degree of course.


I am going to hack on this and see how it works, but first glance looks cool.

The image (really confusingly) illustrates the searchflow model:

Broweser ==> Google

Goolge ==> Results

results ===> Hacker news or another aggregator

aggregator ===> your fav. sub community (because gooigle discovery sucks)

subreddit helps you find links you want

in those links you find information

==============================

that is what a manual crawl feels like, and how many people use the web.


Ah, I get it now. Thanks for the explanation. This is the problem with the "deep web" not being indexable by crawlers. I've started looking at how to make this easier for users. Basically I'm got some ways to detect which sites should be searched in response to your queries, based on the category of words in your query and the site. But then I need to do a real-time search of that site, that might require JS to run, extract the links, and present them back to the user. It might be slow, but it's easier for the user to wait for the cloud to do it than click the links on multiple sites himself.

Thanks for checking out my site. If you have any questions or have any trouble please let me know at help@solveforall.com. And maybe we can explore brainstorming/collaborating since you've clearly put a lot of thought into search.


Search can be decoupled from the browser, it always has been; so starting with an obviously wrong statement puts people off.

You've not explained the relationship between security and search either (unless you mean privacy?)




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: