Hacker News new | past | comments | ask | show | jobs | submit login
Google Needs Sex (nytimes.com)
59 points by McKittrick on Jan 11, 2011 | hide | past | favorite | 38 comments



Here we see a search engine in his native habitat, foraging the fields of the world wide web for his sustenance, advertising targets. These beautiful lumbering beasts play an important part in the ecosystem, processing great quantities of flora and excreting them into little packets for the creatures further down the chain.

Master of his niche and with no natural predators, the mighty search engine's greatest threat is the ongoing habitat destruction by Mankind, as slash and burn development techniques replace the verdant fields of pages with cheap plastic knockoffs and gaudy littering, which the mighty search engine haplessly consumes, but grows fat and slow from gorging on the empty calories. If this practice is not soon ended, the search engines may have to be placed on the endangered species list.

But hark! What's that sound? A female search engine in the throes of heat has entered the clearing! Her mating call reaches deep into the soul of the male search engine... "ACK kwire! ACK kwire!" and the male must have her. He unfurls his spectacular page index count to impress the female, and truly this is a virile male, for his index count easily reaches into the trillions! A rare specimen indeed. The female is impressed and approaches the male to exchange their algorithmic details. Soon the deed is done and satiated, the male wanders off in disinterest. Soon four or five baby search engines will be born, each of which to thrive in the harsh environment of the web must eventually kill and consume their parents, a difficult feat that no baby search engine has accomplished in many years, but such is the harsh reality of the untamed wilderness.

Tune in next week when we'll follow the adventures of the cutest little baby search engine as he grows, encounters his first clickbot, and acquires his first distracting side business, all before facing his first mortal threat of acquisition by the trophy-seeking megacorporate hunter who wishes to gut the young search engine and turn it into the something to hang on his intranet to impress his fellow megacorporates.


Bravo! You, sir, are an artist.

[Edit] What I originally wanted to comment on is that the metaphor this article uses seems invalid. Google is no stationary target. I presume the engineers working there are not just sitting around getting massages all day long.


This is quite possibly the dumbest article about technology I've seen in a mainstream publication.

What Google really needs is a willingness to accept a way higher threshold of false negatives in weeding out content. I'd love to have a "known good" version of Google that risked leaving out some content. Let's start by banning all .info domains along with any that include a hyphen.


> This is quite possibly the dumbest article about technology I've seen in a mainstream publication.

Come on, it's in the NYTimes & by Paul Krugman (an economist) - not bad for that combination. After reading lots of articles by technologists, you need to remember to temper your expectations when you visit the website of a mainstream publication.

That said, his theory isn't entirely false - Google may actually need some "outside" ideas to mix in with their own. Then again, perhaps (my instinct says probably) they've been doing that for years, and the spammers are just evolving faster.


You mean former economist. He's now a columnist and since seems to have abandoned most of his critical thinking skills.


Google has plenty of great ideas. What they lack -- is execution in most of these projects. For example, anti-spam efforts at blogger.com is a joke.


You and the article are saying the same thing. But the author of the article is presenting the same kinds of ideas in the form of a metaphor that's easy to understand for his intended audience. That's ok, right? It's the same kind of thing discussed in this article from HN the other day http://news.ycombinator.com/item?id=2079473 .


So you just want Google to keep raising and raising the threshold? You know that can't last forever, right?


What the author is saying, as I understand it, is that Google needs a way to mix up their algo's so they aren't easily "gamed" by scammers and spammers.

The question is not whether Google is doing this actively but how fast and decisively they do this as well as what specific issue they deem as priority to address.

Google's algo, as I understand it, is an array of knobs that are turned up and down to increase or decrease impact of various factors such as domain age, incoming links, quality of content, etc. Google is always adjusting these knobs and adding/removing knobs. The goals of adjusting these knobs are obviously only known by Googlers but what is clear to many is that the sole object of adjusting these knobs in not to make the results pages more relevant. Don't get me wrong... they definitely care about relevancy (or they wouldn't have the trust of millions) but they are a public company with obligations to meet so they must take many factors into account when making adjustments.

The point? The sex (aka variation over time) that the author refers to will likely come from adjusting existing knobs and introducing new knobs such as social authority, locality, etc... but it doesn't necessarily mean that problems of relevance as we see it will get fixed through these variations. Google will take into account a number of factors when deciding what problems they want to solve and to what extend.

</rambling>


> what is clear to many is that the sole object of adjusting these knobs in not to make the results pages more relevant.

You're implying there is a sole objective. There isn't. And if there was one, it would most certainly be to maximise relevancy.


Great headline, crappy article. People have been trying to game search engines since a few minutes after they were invented. And other search engines don't appear to be better, or the market share would have shifted by now. It costs nothing to the consumer to switch, and people talk about which websites they like - so if something that is more appealing to Joe Public comes along, it will be noticed. Bing's ascent in the number of searches has not been huge yet. Personally I hope for more than one search superpower in the future.


It costs nothing to you and I to switch search engines. The cost is a lot greater for the kind of people who type URLs into Google's search field, to navigate the intertubes.


One could, in theory, manually create a database of a few million (or way more) websites, and rate the content in each of them (including advertisements - just how the page is showed to the end user). This database could then be used to train algorithms, with genetic fitness based on how close the ratings of the algorithm are to the human ratings of value (note that the algorithms would not be aware of the human value ratings).

A second - perhaps slightly smaller - database could then be used to test the performance of the best algorithms in the "real world," or at least on data which they weren't trained on. This would select against algorithms which are adapted solely for the first database. Content ratings generated algorithmically could then be used to modify the ranking of websites in the results, penalizing websites that seem to have bad content.

I'm not sure how I got here from thinking about sex between search engines; I suppose it's because one way to deal with the evolution would be taking the best few algorithms each time and combining them (which I'm sure that there are issues with). Of course, people far more intelligent than I have certainly had this idea before, and probably figured out why it wouldn't work (or, alternatively, that it would work, which would suggest that someone is now busy implementing it).


The catch is this database would need to be kept away from spammers, lest they be able to test their site designs against it directly.


Not so much the database as the algorithm; if it's sufficiently understood (as seems to be the case with Google's algorithm, or at least a lot of people claim it to be the case), then spammers can target it directly. Merely having the original data used to train it doesn't give much insight into the algorithm itself.

On second thought, though, being able to identify common characteristics of the least spam-like websites would allow spammers to mimic those characteristics. It would take a lot of effort (figuring out the core bits), but they are clearly willing to put that in. So yes, I suppose that you're right.



Google already has "search engine sex", it's called revision control (although I really doubt this is what Krugman had in mind.) To solve Google's spam problem, they either need better business priorities or smarter engineers, depending on which popular explanation of Google's spam problem is accurate.

> And the most persuasive answer, as I understand it, is defense against parasites.

Probably more likely the general ability to merge in one generation two or more highly advantageous adaptations into one individual, which could include parasite defense but also everything else.


If I was a web search company, I'd allow users to upvote or downvote their search results (this would increase or decrease their prominence on subsequent searches). This could be done on a per-site, or per-page basis.

Then I'd use one person's preferences to alter how other people receive search results (on an optional basis; if people didn't want their results filtered like this, they wouldn't have to.)

But I wouldn't just use an average of all users; it'd be too easy for spammers to create fake accounts to upvote spam. No, a user's search results would only be affected by what their friends upvote and downvote (or possibly their friends of friends as well).

This would make it in a user's interests to link to their friends and have their friends rate websites, as everyone uses web search. So people would want to promote the search engine to their friends.

To give people more of an incentive to proselytise the search engine, I'd add social features such as a twitter-like service (allowing public, friends-only, and named-recipients-only messages), chat (text, voice and video-voice), an extended-length messaging service (you could call it, I don't know, a "blog" or something) that also allows pictures, and RSS feeds of one's own and one's friends' public entries.

I'd also add a "fan" feature (intransitive, as opposed to friendship which is transitive). People could create lists rating websites, and others could fan those lists.

Maybe someone like DuckDuckGo might want to implement something like this?

(Incidentally, DDG market themselves as being privacy-friendly, which coupled with the recent subpoena of Wikileaks' Twitter data, suggests there may be an opportunity for a competitor to Twitter that is more privacy-minded).


"I'd allow users to upvote or downvote their search results" Google already did that. But for some reason they were unable to efficiently use results of that project. I think something is broken in Google's mid-management.


I'd guess that the clickstream data can yield conclusions that are nearly as accurate as manual voting, if not moreso. It's not worth cluttering the interface for data with such low marginal value.


Using clickstream data for defining ranking is a must, but not enough. Manual rating adds value to web site rankings. Google Toolbar's data adds even more value to the ranking. I'd say all that clicking data should be affect search rankings even more than hyperlinks.


Assuming that people actually use the "down" buttons, I can envision visiting a spammy link and wanting to go back to the search results page and kill a result.


That's because average users don't vote on search results.


Maybe they would if they knew it would result in them getting better results in future. If I could ban certain websites from appearing in my results, I would.


I have been thinking a lot about this problem. If indeed it is a problem.

Google is in the business of serving ads. The vast majority of those spammy sites display google ads. So, there is little incentive for google to change things just yet. Joe public is not complaining, yet. It is only the digerati, and a whole bunch of other webmasters who think their sites should be ranking higher because they are just better, who seem to be up in arms over this.

The interesting thing is that half the spammy do provide content that is 'just good enough' for what most people are looking for. The quickest exit from one of these sites, is via a google ad, so google wins, the site wins, and the advertiser targetting a specific niche wins.

The only problem will be when the advertisisers stop getting the bang for their buck when their ads are displayed on these sites. Until then, Google has little incentive to change.


Ahh, I forgot to add the 'solution'.

The 'sex' bit, will be marrying google search results with other sources, and breeding a search ranking algorithm specific to you.

For example, the Facebook 'like' button could turn out to be very useful. Stuff liked by more people gets ranked higher. Since webmasters are already gaming Facebook though, then stuff liked by people who are one or two degrees of separation from you on facebook could count more. Any social network that allows you to indicate some sort of trust in an individual could work.

I bet you though, that if Google did this, the very same people complaining about the bad results now, would be the first to complain about the privacy implications of all this, and how google seems to know so much about us .... sigh


The author is dead wrong about benefits of sex. The key benefit of sex is the ability to preserve huge gene pool and try out multiple combinations. The battle against parasites does not worth huge reproductive complexity that sex introduces.

On the other hand the author is right that Google is inefficient in fighting web spam.


yeah, especially considering that a huge chunk of species on this planet reproduces, clonally, just fine.


I think he's suggesting a genetic algorithm... Google-style organisms, competing for fewest retried searches, most-different front-page results, etc., that over time merge and split in various ways. Seems pretty similar to some of the tactics Google's already trying to me.


  I’m not quite sure what search-engine sex would involve. 
  But Google apparently needs some.
Wow, Krugman spouts gibberish when he isn't talking specifically about economics. This post is meaningless.


"I’m not quite sure what search-engine sex would involve. But Google apparently needs some."

Picture a duck (http://duckduckgo.com/).


Eww.


I said to myself, whatever, DuckDuck's the shit. Then I searched a keyword relevant to my industry, the first five results:

niche.gr niche-factory.fr cn-niche.com myspace.com/russianniche niche.kz

Right, then. Guess the problem isn't specific to Google...


Tech journalist to the rescue.


"Why doesn’t nature just engage in cloning?

... If each generation of an organism looks exactly like the last, parasites can steadily evolve to bypass the organism’s defenses."

Why doesn't nature, indeed?

http://en.wikipedia.org/wiki/Asexual_reproduction


this article screams of linkbaiting (I guess it worked, it got onto HN and I checked it out).

Author concludes that "Google needs sex" because cloning avoided parasites (not strictly true, but whatever) but then signs off with "I don't know what Google sex looks like".

Duh.


Binglehoo


g + m$ = BOOM




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: