Hacker News new | past | comments | ask | show | jobs | submit login

> I don't see this case as a particular problem. "People wanting to learn about goats" might actually be in the minority.

This is the problem. Search isn't a democracy. I lookup the results I need, so not giving me filtering ability makes no sense. An engine that does what google does is an amazing achievement, but no longer makes sense as a model for the exact reason you gave as a defense.

edit: If disagreement could be verbalized it would be super helpful to me. I have been thinking about this issue a lot, and often I see people say the same thing as me:

* single website controls almost 100% of english language results

* limiting of images/video from results

* incorrect/wrong results

* limited respect for boolean and quotation operators

* high ranking sites are not authoritative, e.g. w3schools, wordpress automated blogs.

* no way to filter at all

But then down vote my conclusion:

> searches could be filtered and parameters set by user.

Really would be useful to understand my logic error, or if I have missed something.




Search being a democracy is really just a crude way of creating a better ranking system than just looking at, say, keyword occurrence count. Humans are great at filtering out spammy and useless websites, and the democracy system picks up on that.

As a next step, privacy issues aside, what if they "profiled" you by the types of things you search, and tried to guess what you need based on other people who "think like you"?

For example, I'm a programmer, and if I search "python", I'm probably searching for something different than a biologist who is researching reptiles. This would be fairly obvious to decide based on the other types of things I typically Google for.

I'm sure Google is probably already researching how to do this, though. It sounds difficult to me though because of the sheer number of models you'd have to train and store, and then figure out how to run a distributed index on. It might be more feasible to create some small set (e.g. ~1000ish) profiles of "types of people" and then match you into one of those types. This could also mildly alleviate the privacy issue as the profiling could be done offline on the client.


I made this point below about inability to provide context. In the other thread link, I think I provided why,. although I am no machine learning specialist but I think because:

Google can never necessarily know what you want and can never truly know you achieved your goal, so you could not train it properly.

Not only would you need to discover what profession I am in, assuming you had fully updated linked profile, etc. you would need to build a comparable universe of like minded people and calibrate.

Then, you would have to assume what inputs are similar in that they have same/similar parameters and expect similar results.

Then you would have to assume which link I clicked was the answer, for every person who did this same thing.

Then you would have to discount your bias as an engine, because you provide the top results to me and (for now) people trust the engine so they typically have a false choice of the first 5-10 things. If those 5-10 things are wrong, whole model is in error to extent it is wrong.

Any one of these would provide error and the cascade leads to larger disparity. Google IS SO AMAZINGLY GOOD, it has actually managed to make this not a problem for a very long time.


> Google can never necessarily know what you want and can never truly know you achieved your goal, so you could not train it properly.

They do have some amount of confirmation. All of their search results are redirect links, so they're tracking which links you click on. Based on the timing of those clicks, they can tell if you clicked on a result, left that site a few seconds later, and then clicked on another result further down the page, which probably means the first result didn't give you what you want. It's not perfect but it's still potential training data.

If that site has Google Ads or a Google '+1' icon, they can get slightly more information about how you spend your time on that site. I don't know about the legality of this but it's technically feasible.


But you are given the ability to filter your results! Countless ways actually!

Qualify your searches and modify them if needed. You were looking for information about goats. You should qualify "goats" with some form of "information about" statement.

"goat" + "animal" makes the wikipedia page for goats the first search result.

"goat" + "facts" gives you countless trivia pages, information, videos, etc

Alternatively:

goat -"greatest of all time" -"goatse" will search for "goat" without including "greatest of all time" or "goatse".


Correct. However the problems (as I see them now) are 2fold:

No concept of time. Best case conception of time seems to be either provided by the site, "article date" or something to this effect, or t0 = when google first learned about the page.

So yes, "goat" + "animal" will return your results. Try:

https://www.google.com/#q=%22nodejs%22+%2B+%222016%22++mongo...

Top anser for: "nodejs" + "2016" mongo api

returns top hit: 2015, 2nd hit 2014.

and that I can't give it context myself:

I am on Mac, but my pc is broken looking for windows info or don't include Alexa1000 links as authoritative. million short (i believe) removes the Alexa1000, but not their link authority.

Also, [neverShowWordpressSite unless traffic >3million unique] some larger news sites are actually built on wordpress like bloomberg. But the point is that I would delist by technology, and filter by time and tweak my authority parameters.

However, if google let you do this it would exponentially compound the difficulty as the algorithm would exist on both sides of equation.


You can set a time range under "Search Tools". See here: http://i.imgur.com/n5BNkei.png

As for filtering by backend technology - I'm not sure that would always be feasible. While it may be possible to filter default Wordpress sites, I'm not so sure about sites whom backbone architecture may not be known or publicly available.

E:

As for your computer issue, try searching for "problem/error name" + "solved" rather than "how to fix" + "problem/error name".


Thanks, the "issue" was an illustrative example, however I am a relatively tech savvy person and I use google frequently and I didn't know about this feature. I would be interested to know what percentage of searches leverage it relative to people who type recent year/date into the box for just a quick and dirty statistic about visibility.

I don't actually want to filter backend technology, but I would like to communicate to the search engine that I do not trust (nor want to have returned as a result) any wordpress, blogger or medium website, and I want their rank to be negative.

That is another extreme example, however to discover new things is hard and to find useful information, when communities of bad actors have spent years incentivized to rank higher but not produce quality, it could be easier to simply delist everything and gradually add websites you trust to have authority.


I was going to suggest daterange, but it didn't work for me (haven't used it in awhile), however, I think this tool replaced it, you can even set custom ranges with that.


It is though. Search is about providing the highest likelihood of the desired result, not matching words, at least in one value space.

If most people searching for "goat" want to know what the word means in slang, that should be the top result. Possibly you could argue that you want a personal search profile that knows you value Wikipedia higher than other links, but for the default case it feels like optimising for highest odds of success makes sense.


The default case is that everyone has a "search profile" but it is made by google and applied to abstracted parameters.

Everyone wants a "search profile" except they would like to control it and how it is applied as it is, for most people, their most important interaction with a computer, e.g. how they access information.

Currently, in some respects, that is out of a single silo or set of balkanized silos.

This will not be true in the future. One place can not dictate information flow for world. Plus, Alphabet has better things to do




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: