Hacker News new | past | comments | ask | show | jobs | submit login

Search being a democracy is really just a crude way of creating a better ranking system than just looking at, say, keyword occurrence count. Humans are great at filtering out spammy and useless websites, and the democracy system picks up on that.

As a next step, privacy issues aside, what if they "profiled" you by the types of things you search, and tried to guess what you need based on other people who "think like you"?

For example, I'm a programmer, and if I search "python", I'm probably searching for something different than a biologist who is researching reptiles. This would be fairly obvious to decide based on the other types of things I typically Google for.

I'm sure Google is probably already researching how to do this, though. It sounds difficult to me though because of the sheer number of models you'd have to train and store, and then figure out how to run a distributed index on. It might be more feasible to create some small set (e.g. ~1000ish) profiles of "types of people" and then match you into one of those types. This could also mildly alleviate the privacy issue as the profiling could be done offline on the client.




I made this point below about inability to provide context. In the other thread link, I think I provided why,. although I am no machine learning specialist but I think because:

Google can never necessarily know what you want and can never truly know you achieved your goal, so you could not train it properly.

Not only would you need to discover what profession I am in, assuming you had fully updated linked profile, etc. you would need to build a comparable universe of like minded people and calibrate.

Then, you would have to assume what inputs are similar in that they have same/similar parameters and expect similar results.

Then you would have to assume which link I clicked was the answer, for every person who did this same thing.

Then you would have to discount your bias as an engine, because you provide the top results to me and (for now) people trust the engine so they typically have a false choice of the first 5-10 things. If those 5-10 things are wrong, whole model is in error to extent it is wrong.

Any one of these would provide error and the cascade leads to larger disparity. Google IS SO AMAZINGLY GOOD, it has actually managed to make this not a problem for a very long time.


> Google can never necessarily know what you want and can never truly know you achieved your goal, so you could not train it properly.

They do have some amount of confirmation. All of their search results are redirect links, so they're tracking which links you click on. Based on the timing of those clicks, they can tell if you clicked on a result, left that site a few seconds later, and then clicked on another result further down the page, which probably means the first result didn't give you what you want. It's not perfect but it's still potential training data.

If that site has Google Ads or a Google '+1' icon, they can get slightly more information about how you spend your time on that site. I don't know about the legality of this but it's technically feasible.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: