Currently, the predominant business model for commercial search engines is advertising. The goals of the advertising business model do not always correspond to providing quality search to users. For example, in our prototype search engine one of the top results for cellular phone is "The Effect of Cellular Phone Use Upon Driver Attention", a study which explains in great detail the distractions and risk associated with conversing on a cell phone while driving. This search result came up first because of its high importance as judged by the PageRank algorithm, an approximation of citation importance on the web [Page, 98]. It is clear that a search engine which was taking money for showing cellular phone ads would have difficulty justifying the page that our system returned to its paying advertisers. For this type of reason and historical experience with other media [Bagdikian 83], we expect that advertising funded search engines will be inherently biased towards the advertisers and away from the needs of the consumers.
Google solved the problem by turning it on its head, but now it looks like the spammers have indirectly implemented what Google avoided.
The spamminess of a site is not proportional to it's ad click through rates. Strong web communities have the best click through rates (though they don't usually generate the most ad revenue). Yahoo Mail has incredible click through rates right now, and most people wouldn't classify it as being spammy.
The real question that google needs to answer is "Does this site exist to serve ads, or do the ads exist to support the site?". This seems like a great computer science problem to me, but I'm not sure if statistics will be the end all solution. I think Google will need another quasi-social hack, ala PageRank, to really solve it.
Sure -- I guess in general it wouldn't be click through rates, but click through rates on pages arrived at via search traffic. Obviously a community where members of a niche frequented a given site and clicked on related links would be different. And you're also right that using the word proportional was sloppy -- what was meant was correlated in assumedly meaningly ways.
Really this was the front end of the question of, "Could some application of (negative) weighting based on the combination of search + adsense clicks help to remove some of the spaminess that is currently being lamented on the interwebs?"
Google showing adverts on google.com/search is one thing, but I think the real metric to look at is usage of the Google advert network on the sites themselves. The more googlesyndication.com or doubleclick.net on a site, the spammier it is.
The problem is that Google can't penalize a site for using these technologies. And not because Google makes money from sites using these adverts, which they do, but because it'd be an arms race. The guys taking advantage of SEO would catch on and change advertisers. Google could penalize other advert networks too (Bing say), but then they'd be open to accusations of anti-competitive behavior.
It would be interesting to know the proportion of Google ad revenue between displaying ads on its own properties (search, but also Gmail, etc.) and networks of ads on external content.
The reason why spam exists is because people make money from it. If AdSense is the problem, how much would it hurt to just drop it?
Certainly another system would appear, but then Google could discriminate against any website with ads on it without being accused of being unfair to competitors.
Not all spam sites have ads. Many online stores are quite spammy, for example, and they only promote stuff they sell themselves. On the other hand, would you consider Wikipedia's huge Jimbo Wales banners as ads?
Good point, but ExpertsExchange has ads too. It also behaves in a spammy way, burying the answers at the very end of the page. That should be detectable behavior.
I bet there's a lot of things that spam pages have in common with regards to things like page placement.
So, if Google gets better at quantifying "spammy-ness" there will be an initial drop in spam effectiveness, but then spammers would catch on and emulate the behavior of good pages.
Eventually, the spammers' behavior would be indistinguishable from genuine posters.
They wouldn't use such a "metric" because it'd be completely useless, not for some other reasons you just made up. The homepage of the NY Times uses AdSense, for example.
I think you could form some kind of metric out of a) number of AdSense placements b) prominence of AdSense placements. There should be nothing against using AdSense, as long as the ads are secondary to (informative) content.
> the real metric to look at is usage of the Google advert network on the sites themselves
Right, that's what AdSense is. That probably should have been made explicit though since a lot of folks miss the distinction between AdWords and AdSense.
Don't really think that makes sense. Ad placement, site design, ad relevancy etc... all dominate the reasons of why people click ads. I don't think you're going to get much information about content relevancy out of the CTR as you can't hold those other features constant.
If Google is looking for how useful a page is, they're better off just looking at whether people go back to their search results and click on another site after a short delay.
I think it makes sense in extreme cases. For a 'real' page (e.g. with useful content and ads that are secondary to the main content) CTRs are unlikely to be that high. From experience I would say its hard to get > 10% CTR in such a site.
However, for 'spam pages' (e.g. those that have little information and ad links in primary positions) its common to have much higher CTRs. The prime examples are those sites that present AdSense links in with navigational links so its not clear you are clicking on an ad. CTRs of >50% are not uncommon.
This is correct. If you don't present the exact information someone is looking for but provide links that look like they will, they will click those links.
Any search engine that capitalizes on a new and so far unused element of the content that it indexes (in the case of google the links) and that becomes successful because of this will cause the destruction the very thing that it it capitalizes on.
So, for altavista, that was meta keywords and on page text, for google it is the link structure of the web.
I've been amused by that for a while - in effect, Google is trying to destroy it's own business model. With perfect searches, there would be no need for ads.
I wonder if there would be a way to split Google up into multiple companies to deal with the (multiple) conflicts of interest it has viz-a-viz providing both search and buying/selling advertising. I doubt it would be good for Google stock holders, but good for the market in general.
I imagine they'd still make their revenue from advertising - they'd have an adsense bar. This would put the search part of the business on the same footing as other sites selling advertising space.
I'm imagining a world where there are multiple adsenses (ad brokers) competing effectively with each other for space, driving out supernormal profit for brokers, and returning the bulk of the advertising revenues to the publishers rather than to the middleman (currently Google). With more revenue for the publishers, more publishers come online and drive out supernormal profits for publisher too. That would be a more efficient market.
My knee-jerk response is licensing the search and analysis technology back to the ad company. Abstractly, not necessarily a terrible idea, other ad networks could compete using the same base search tech, and the GoogleAds could switch engines or merge the opinions of more than one. Concretely I imagine it would take the respective halfs of the company a number of years to actually detach their tech stacks from each other enough that a switch would be practical, let alone cost effective.
You really don't want folks clicking your ads because they can't find what they want, you want them clicking your ads because they found what they want, and now they are on to a new search. A good way to check for this is to use a Google search box on your page. If folks are going back to Google for another search, they probably didn't get what they want the first time. You can also provide trackable links. So if on my http://facebook-login-help.com page you really wanted facebook login, I provide a link. Then I can track folks that are in the wrong place (But I haven't hooked up the links yet because the traffic was so low). It's in the publisher's best interest to provide lots of targeted information and be clear and honest about what they are doing. Another good metric is looking at time on site. If the people are only there for 5 seconds, you got a bad visit. If they are there 2 minutes, you must be doing something right.
An ideal page is where somebody is looking for X, finds it, then realize that they need to purchase Y in order to do X. You've provided them necessary information, the advertiser and ad network is providing them the next piece of information on their journey.
If you ask me, the sad fact of the matter is that most folks have no idea what they are searching for. Let me give you an example. I created a site that is related to financial matters. People search for this stuff and come by to visit. I try to provide them useful information.
Now I know from looking at my logs that the most people that click through the ads are going on to Quickbooks, which is tangentially related to the search term but not really. It's as if people were searching for "muffler installation tips" and then went on to click on an ad for car insurance. It's kinda-sorta related. I guess? But at least I can measure it. But I'm really confounded as to the relationship that's happening between the site and the ad. But at least it's something.
Metrics are very important. Without an ad click or some other event, I'm left to assume that you have changed your mind or just didn't know what the heck you were doing and were just thrashing around.
Contrast this targeted approach to my blog, which is just a bunch of stuff thrown together. I get folks all the time coming in on random searches -- bikinis, monkeys, ocelots -- things I've mentioned in a blog entry about something else. But they visit, stay a short time, then leave. If you ask me, these are the people who are truly not getting anything useful at all. At least with the targeted content the publisher (presumably) is trying to help the searcher. These are just random combinations of search terms and content.
Couple other things: first, Google limits the numbers of ads you can put on the page to 3. I personally think 3 is pushing it -- my first sites only had one ad per page. Google was kind enough to email me and tell me that I could put up to three. So lately I've went ahead and put three basically because I'm lazy: I don't want to have to go back and update the site later. Plus it's easier to design for three right off the bat than try to put them in later.
But 1 ad per page and 3 ads per page has a completely different CTR, even though it's the same user, same search, and same types of ads. So unless the publisher is breaking the 3-limit, I think there's a lot of room for mis-diagnosis here.
Second, the content provider has no control over ad availability or scheduling from the ad network. So if you write a wonderful series of articles on polar bears, and there are no polar bear advertisers, you're going to get a bunch of ads that are Public Service Announcements, or are for polar bear ice cream bars, or something else that's a poor fit -- which also decreases your CTR.
I'm not sure any of that helps, but at least it's a bit more information. I know the intent was to look at this from the searcher's perspective, but I think the only way to work towards a solution is to look at problems from multiple angles, and content providers (at least good ones) should have the same goals as searchers.
Why can't you target your content to specific affiliate marketing promotions?
I've found the adsense ads on my sites are not contextual, but based on the search history of the user (according to Google).
Quick question: You track how many people actually want the facebook login page, but what do you do with that info? Specifically, how to you use that knowlesge to improve the user's search experience and expectation?
Interesting. Depends on search history? That might explain why it's so hard to fathom from this end.
As far as the login, I'm in a bit of a quandary. For the mis-typings, at some point on the page I tell them they have mistyped their search, the correct way to type it, and provide a link to where they really wanted to go. I feel like this is how I would want to be treated if I made the same mistake.
As I see those links being clicked, the idea is to move them higher or lower on the page depending on their use. This should get the user to where they want to go faster. But -- I also think it's important to correct the user's spelling. I don't want them coming back because they keep making simple mistakes. And I'm not happy with a page beginning with "Hey bub! You can't spell!" so there's probably an upper limit to how high I would put the "I really wanted Facebook you moron!" links. Maybe about halfway from the top?
Currently, the predominant business model for commercial search engines is advertising. The goals of the advertising business model do not always correspond to providing quality search to users. For example, in our prototype search engine one of the top results for cellular phone is "The Effect of Cellular Phone Use Upon Driver Attention", a study which explains in great detail the distractions and risk associated with conversing on a cell phone while driving. This search result came up first because of its high importance as judged by the PageRank algorithm, an approximation of citation importance on the web [Page, 98]. It is clear that a search engine which was taking money for showing cellular phone ads would have difficulty justifying the page that our system returned to its paying advertisers. For this type of reason and historical experience with other media [Bagdikian 83], we expect that advertising funded search engines will be inherently biased towards the advertisers and away from the needs of the consumers.
Google solved the problem by turning it on its head, but now it looks like the spammers have indirectly implemented what Google avoided.