Hacker News new | past | comments | ask | show | jobs | submit login

I recently talked for about a minute about the topic of "too much advertising" that sometimes drowns out the content of a page. It was in this question and answer session that we streamed live on YouTube: http://www.youtube.com/watch?v=R7Yv6DzHBvE#t=19m25s . The short answer is that we have been looking more at this. We care when a page creates a low-quality or frustrating user experience, regardless of whether it's because of affiliate links, advertising, or something else.



Thanks Matt, took a look at the video, that answers my original question.

Also is there a preferred monetization model, e.g. do Google think advertisements are more or less harmful to the user experience than affiliate links, sponsored posts, etc?

Obviously across different models you can't just track space taken up, so is there some kind of metric that tracks the rate of content diluting via monetization?


I know you were looking for an answer from Matt but I thought I would offer up my opinion here (as you might have noticed, I love talking about this stuff).

Our models currently suggest that the presence of contextual advertising is a significant predictive factor of webspam.

We use 10-fold bagging and classification trees, so it's not all that easy to generalize. But I pulled one model out at random for fun.

The top predictive factor in this particular model is the probability outcome of the bigrams (word pairs) extracted the visible text on the page. Here are a few significant bigrams:

relocationcompanies productsproviding productspure qualitybook recruitmentwebsite ticketsour thesetraffic representingclients todayplay tourshigh registryrepair rentproperties weddingportal printingcanvas prhuman privacyprotection providingefficient waytrade printingstationery priceseverything website*daily

Next, this model looks for tokens extracted from the URL and particular meta tags from the page. Similar to above, but I believe unigrams only. A few examples follow. Please keep in mind that none of these phrases are used individually... they are each weighted and combined with all other known factors on the page:

offer review book more Management into Web Library blog Joomla forums

The model then looks at the outdegree of the page (number of unique domains pointed to).

From there, it breaks down into TLD (.biz, .ru, .gov, etc)

The file gets pretty hard to decipher at this point (it's a huge XML file) but contextual advertising is used as a predictive variable throughout.

Just from eyeballing it, it appears to be more or less as significant as the precision and recall rate of high value commercial terms, average word length (western languages only), and visible text length.

Based on what I'm looking at right now, my answer would be that sponsored posts are going to be far more harmful to the user experience than advertising.

Can't answer the rest of your question which I assume relates to the number of ad blocks or amount of space taken up by ads... we don't measure it.

Edit: Just realized that Google will probably delist this page within 24 hours. Should've used a gif for those bigrams. Oh well ;-)


Thanks for your response, is the data you are viewing publicly available?


No...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: