Hacker News new | past | comments | ask | show | jobs | submit login
How browsers get to know you in milliseconds (oreilly.com)
128 points by wallflower on Dec 7, 2014 | hide | past | favorite | 52 comments



Shame on O'Reilly for burying the sponsored content notice at the very bottom of the post. While native advertising & sponsored content is a great strategy these days, disclaimer is usually expected before people start reading.

> This post is part of a collaboration between Aerospike and O’Reilly Media. Read our statement of editorial independence.


Quick poll - how many of you noticed this was a sponsored article brought to you by Aerospike?


I didn't notice it until you mentioned it. I didn't feel cheated though since I got good amount of info from the article and was curious about how Aerospike did it.

I wish all ads were like this.


haha, after I read it I went ahead and bookmarked Aerospike (before I read this comment) -- their little tactic worked. :)


When I read articles like these, it always makes me sad to see the amount of incredibly complex technologies dedicated to advertising. I hate to be that guy but I'm sure these resources could have a better usage.


It gets interesting when you think of it in relation to something like our space program (which was largely a war-driven effort) or even the internet itself. I'm sure someone somewhere said "wow, instead of all these eggheads working on better battlefield communications, we should really put those dollars to something meaningful.

Not trying to be snide, but one can rarely predict where something like say...complex ad tech...might lead future technological innovations. It certainly has helped fuel a large tech industry.


Advertising currently provides most of the revenue for many Internet sites. Until another major revenue source comes along a significant effort will still be put towards optimizing advertising.


Yeah I know... It's even worse than that actually, since advertising is not a zero sum game (if you have two companies with 50% market share, they will get less compared to a company with all the market), any alternative would just decrease significantly the value of advertising and therefore, no alternative can emerge.

That is why there is no option to pay for Google services instead of the tracking. I don't think there is any solution to this, unless the share of AdBlock users would reach a tremendous level.



Thats interesting, but it just seems like mining additional money out of people, they still get to keep all the data even if they are not displaying the advertising.


They power a lot of creative ecosystems that enable creators to provide their content without charge.

I love that the Green brothers could quit their jobs & make quality content full time, and make it available to everyone.

I fully disagree.


Do these people have the editorial freedom to criticise their major advertisers? Or potential advertisers?

Is their creativity, in fact, funnelled consciously or unconsciously into selling over-priced shit to people who don't need it?

Would there be more or less creativity in the world if it wasn't focussed on making a few hand-picked, advertiser-friendly people into elite-level, rich straight-white-men?

I don't think global creativity goes hand-in-hand with advertising. In my opinion, advertising just sucks the soul out of it. So, I disagree back.


> Do these people have the editorial freedom to criticise (sic) their major advertisers? Or potential advertisers?

That's actually one of the (uncelebrated) benefits of the exchange landscape. Because advertisers and publishers are so removed from each other, there's 0 risk of editorial freedom being jeopardized.

Honestly, I'm tired of the trite advertising hatred on HN. Advertising is basically the only thing making culture and creative work (from journalism to ballet) available to the general public. A world without advertising would be a much darker one, with entertainment only accessible to the wealthy few who can directly commission creativity.

Advertisers also tend to allow a lot more creative freedom than patrons.


And without a source of income, there will be zero "creativity".


This stuff is always fascinating to me, especially the "capital markets" approach to buying/selling ads. I saw an article a few weeks back on how traders are arbitraging different ad exchanges in the same way high-frequency traders do so... You can buy ads for cheap in 1 place and sell them immediately at a higher price somewhere else. I'd be curious if anyone has a high-level idea of how this works.


nope. no one can: http://www.slate.com/articles/technology/technology/2014/06/... Just another bubble in the making -- just be ready to pull out at the right time.


Nobody has a problem with the title? Yes I know it's directly from the linked article.

My browser already knows everything about me. It's the web server (and all the associated ad servers) that "get to know" visitors in milliseconds.

Also, my browsing is much much faster than what the article mentions. That's because I don't bother with Adblock or Ghostery. Instead, I simply use NoScript on most websites. The (sponsored) article was fully readable w/o JavaScript. So, while the timeline discussed was interesting in the abstract, the flurry of backstage activity didn't occur when I viewed the article. Nor were any ads displayed. Thankfully.


Block ads & third party trackers to spare your browser the useless effort. Thanks Adblock & Ghostery !


Don't you know that ghostery works for advertisers by providing them profiling info on people who black which ads so advertisers can better target them ?

« Ghostery makes money by tracking the trackers while blocking them and selling data about third party trackers, which it calls Ghostrank. Disconnect takes a pay-what-you-want approach, but it is still developed by a for-profit company founded by a former Google engineer. Adblock is donation-supported, but many users confuse it with Adblock Plus, which has generated controversy for having advertisers pay to land on a whitelist. » http://finance.yahoo.com/news/not-ad-blockers-same-why-15001...

see also: http://www.businessinsider.com/evidon-sells-ghostery-data-to... https://thetruthiswhere.wordpress.com/2014/11/30/ad-blocker-... http://lifehacker.com/ad-blocking-extension-ghostery-actuall...

You may want to switch to eff's privacy badger: https://www.eff.org/privacybadger

Also have a look into mozilla's ligthbeam: https://www.mozilla.org/en-US/lightbeam/ and request policy: https://www.requestpolicy.com/


> Don't you know that ghostery works for advertisers by providing them profiling info

Don't you know that "Ghostrank is off by default, meaning you can use Ghostery without sharing anything ... if you'd prefer". https://www.ghostery.com/en/faq


Nope, I didn't know that.

I have to admit my knowledge of ghostery is not current as I quit using it not long after the privacy extension got bought by an advertising company. I was shocked that the option to support the extension turned out to be a spying tool for advertisers to design ads that users wouldn't want to block.

I tried installing ghostery just now and ghostrank is indeed opt-in now. It is a good thing they upped their PR and changed their faq less than 6 months ago to make it clear how they make money, but they have lost my trust long ago and there's no coming back from that. Then again I wouldn't trust a company whose business is advertising with protecting my privacy and I definitely don't want better advertising, I want advertising to disappear from the web altogether as it was when i first started using it.


privacy badger is annoyingly unspecific

you can't block twitter's CDN on non-twitter sites, while allowing it on twitter itself, for example

- - -

This is all really FUDy regardless

adblock plus's whitelist policy is detailed publicly for obvious reasons, yes they do take money, only from companies where that amount of money would be a drop in the ocean, other companies get in free. Its a case of not people paying to get in, but ABP devs making large companies pay a fee to get in, to help fund ABP.

https://adblockplus.org/en/acceptable-ads https://adblockplus.org/en/acceptable-ads-agreements

- - -

"Disconnect takes a pay-what-you-want approach" which is apparently nothing, or a subscription fee. which explains how money appears

"but it is still developed by a for-profit company founded by a former Google engineer" from the site: "[...] and a consumer-and privacy-rights attorney."

https://disconnect.me/privacy

- - -

ghostery is already addressed


Not sure what you mean about privacy badger whose point is to automatically detect and block third party tracking but doesn't block first party tracking yet which seems to be exactly the bahviour you want (it is planned as an option). It is also based on Do Not Track.

For the others, apart from the arguably conflicting business model, as I said the issue is one of privacy, ads are unacceptable because they collect data about me without visitor consent. I use adblockers (and a variety of other tools) to protect privacy, save bandwidth and protect computers from malware.


The goal is for display ads to not suck to the point where Adblock, Adblock Plus, and Ghostery are required.

For better or worse, advertising pays for the salaries of the people who write the stories and build the websites and apps that people use. The goal is not to remove ads altogether but to improve display advertising so that it is a positive improvement to a site rather than the negative effect that it has on user experience today (yes, advertising can be positive addition to content - ask anyone who reads Vogue magazine).

Google won search advertising because ads were relevant, looked good, and it was simple for advertisers to reach the users they wanted. The major reason for Google's success is because they owned every piece in the advertising stack:

- The content (e.g. google.com) - The adserver - The auction engine - Fraud detection (Google's quality team's success at removing fraud meant that advertisers trusted that their messages were reaching users) - Creative adserver (e.g. text ads and creatives are served by Google)

In display advertising, each of these pieces is operated by a different party. This means that integration between each of the pieces is not a tight, giving users and advertisers a low quality experience. Here are some of the effects of this:

1. Ads generally cannot be tailored to the website where they run. Publishers do not have the ability to tailor ads to work within their environment like they can with Google Adsense (native advertising is starting to change this though) - so most websites look bad with display ads.

2. There is massive data leakage from ads. Because a 3rd party adserver is serving the ad on a publisher site, it is nearly impossible for publishers to keep their customer data from leaking to everyone in the ad stack, which quickly becomes everybody in the ad industry.

Because ad networks that buy the publisher ad space from a publisher almost never actually serve the ad creative (they just serve an ad tag to another network), a massive circle of redirects where a single ad can be bought and sold over 100 times before an ad is displayed. This means that the publisher is giving their customer data to 100 companies before they get an ad on a page.

3. There is no industry wide coordinated ad fraud effort that is successful. There are so many ways for ad fraud to occur in many different parts of the ad stack, that a single company that tries to reduce fraud from a single point in the ad stack will only have limited success.

4. Ad creative does not have rigid controls where the publisher can let the advertiser know what can and cannot be done on their site. This means that either an ad seems to 'take over' the publisher page without adding to the user experience, or an ad is reduced to a backup image, which is not a pleasing experience.

A reduced set of advertising features needs to be done much better in order for display advertising budgets grow from where they are now.


The two times that Google succeeded in getting my family to follow through on ads:

1) My 10 year old, in trying to download ITunes, clicked on a download site ad and installed malware that I had to rescue him from while he was in tears;

2) My wife while trying to pay for vehicle license renewal clicked on a DMV look-alike site that attempted to charge her $30 handling fee for something that DMV does for free.

The other cases for their clicking on Google ads are mostly well known websites such as Amazon defensively placing ads to protect those who couldn't distinguish an url from a search query in the address bar or an ad from an unpaid search result.

So it seems to me the ad model is really built on preying on the technologically naive to subsidize the technologically savvy. While in general the technologically sophisticated should be able to charge for their services to the technologically naive, it would be better if it could be done with informed consent instead of trickery.


Both cases were the results of clicking on ads from direct Google search returns, when the users really intended to go to a known website through the address bar but they didn't know the complete url or couldn't tell the difference between that and a search or a search return from an ad.

I am fine with most display ads even though I find those that hijack my browser functions annoying and use blockers for that reason.


I'm sure if you could make a graph of IQ versus the amount of Google Adsense ads clicked it would be interesting to say the least.


I would say "technological sophistication". I would not call my wife or my son low IQ (they really aren't :-). But people can be good at many things while still blissfully unaware of the shenanigans employed by the technologically sophisticated.


How do you know the first ad was even on the Google Display Network? You sitting there watching him click it and following the redirects seems to be the only way you could know that. In which case you allowed him to click and I'd argue it was not his fault you got malware, but yours.

Your second example also sounds like a situation where you heard a (likely inaccurate) complaint from a second party, and are attributing it to a sketchy ad when in reality it could very well have been a site that ranked well organically and had affiliate links to the look-alike site.

You statement of "the other cases for their clicking on Google ads" could use some clarification. The digital media space is vast, and while Google has a large market share, they are not the only players, so I'd love to know how you can attribute both of these examples (which sound like you were not present to observe and have info from less savvy second parties) to Google.

Publishers absolutely have done shady things to drive ad revenue. You'd be shocked at how many attempt to arbitrage cheap clicks from low-quality traffic sources straight into high CPM impressions that they can bundle into direct buys. There are many other things that go on.

What frustrates me with these threads is when people try to summarize an entire industry's best and worst practices by a couple anecdotal experiences using language that indicates a lack of knowledge of said industry, and frankly lacks any real credibility based on the information provided. That sort of comment isn't really constructive--it is jumping to poorly informed conclusions.

Count-point...if I run ads driving people to awesome video tutorials about a somewhat complex product, and it shows a high CTR and engagement rate, am I evil? I'm not holding a gun to anyone's head, and people's actions speak louder than words.


Scamming via lookalike sites is very common, and Google should be ashamed of itself for its part in conning people out of money for no reason except to line its own pockets. In the UK:

"Google is coming under increasing pressure over taking money to promote copycat websites. Among the more printable comments sent to us this week from readers who have fallen for taxreturngateway.com was: "I believe Google is implicated in this as it is well aware of what's happening and why – and it too should be prosecuted". Others ask why Google couldn't at least move copycat sites to below the official sites on its search results page." http://www.theguardian.com/money/2014/jan/30/tax-return-pass...

I had to deflect my own wife from one of these, because of the deceptive way Google was displaying ads above organic search. I've experienced the second -- the (failed) attempt to install malware -- myself.


So now we get down-voted for poking obvious holes in people's broad sweeping generalizations when they've attacked an entire industry? Mmmk, sounds good.


> The goal is for display ads to not suck to the point where Adblock, Adblock Plus, and Ghostery are required.

Adblock (not plus, nor ghostery who work respectively with and for advertisers) is required not because ads suck (they do) but because ads are a major privacy issue.

> Google won search advertising because ads were relevant, looked good, …

Google won search advertising because google won search and because they refused to outsource their advertising business to third party advertising networks, notorious for their cluelessness and mindlessness.


Ad are certainly obnoxious. It is interesting to think that with a "reduced set" of features advertising would improve a website.

Advertising does more than detract from user experience. It is formative of consumer desire. Targeted ads in Vogue are appealing because I opt in to them. The information used to target me on the internet are behind my back. They support an incomplete feedback loop used by statistics to identify what market I belong in. I think this is what has a negative effect on user experience.


Or using hosts file to block those IPs!


It would be interesting to see the same sort of diagram and discussion in the linked article that includes what happens when end users are using the AdBlock or Hosts block techniques.


It's fairly simple: none of those steps happen, because since the advertiser's scripts and iframes are blocked, they don't even know you've loaded a page with an ad on it.


Based on end-user experience during accessibility testing, I'm pretty sure they do detect the presence of tools such as AdBlock, test to see if certain elements were successfully provided before continuing with a specific set of ads, and also make some assumptions about providing "errors" to end users that only people with something like AdBlock or NoScript _should_ see.


They can't at all because nothing is loaded. Check the network tab of chrome with/without adblock enabled and you will see the difference.


Nonetheless there are websites that give you a notice to say that you have AdBlock enabled, and some even tell you to disable it in order to use the site....


Yes! But if you add the anti-Adblock script to Adblock, they won't work as well. (Doesn't mean the website will still work however!)


I've created code that test to see if ads are blocked based on failed-to-load scripts. I'm sure others have as well.


Are you a publisher or advertiser?


Neither, just a coder.


Does anyone have any other reading recommendations on this topic, or advertising recommendation in general? How are different websites able to know what content you are interested- who is storing this data on people?


> Does anyone have any other reading recommendations on this topic, or advertising recommendation in general?

Sure. What's being discussed in this article is Real Time Bidding: http://en.wikipedia.org/wiki/Real-time_bidding . You can read up about the specifics of different Real Time Bidding exchanges, for example here's Google's AdExchange protocol: https://developers.google.com/ad-exchange/rtb/

> How are different websites able to know what content you are interested- who is storing this data on people?

The websites that host the actual content don't usually care about what ads are being shown to you. They just put code in from an ad exchange and let them handle the rest. The buyers (the ones who run the bidders) are the ones that are interested in figuring out what to show you. For example, many bidders are interested in doing what's called Behavioural Retargeting ( http://en.wikipedia.org/wiki/Behavioral_retargeting ). Say you visit a shoes website. Then later you go on Facebook. The moment you go on Facebook, their ad exchange FBX will send a bid request to all the bidders. If the shoes website has partnered with that ad exchange, when they receive the bid request, they can identify you by your cookie, and they can bid and try to show you an ad to get you to come back to their site. Typically a shoes website wouldn't have the infrastructure to do real time bidding so they get a third party to do the bidding on their behalf, e.g. AdRoll ( https://www.adroll.com/retargeting ).


Thank you.

And, they can identify you through other methods if you have cookies thoroughly disabled, as mentioned in the article?

If they have no data on you, I assume ad exchanges supply default advertisements, or there are bidders on default ads?


> If they have no data on you, I assume ad exchanges supply default advertisements, or there are bidders on default ads?

To be clear on what happens here: the ad exchange itself isn't usually interested in identifying you per se, it's the bidders. So for example the ad exchange might send the bidders a bid request that has some information like this:

Country, user agent, time zone, website that will be showing the ad, size/location of the ad, etc.

Then it's up to the bidders to try to figure out how much this impression would be worth and then bid appropriately. Even if they can't 'identify' the user as someone they've seen before, they might still bid on the ad because the website showing the ad might be relevant to the product they're trying to sell. If the only information they have about a user is what is passed in the bid request they would probably bid lower.

I don't know what the behaviour is for when an a bid request doesn't receive any bids, but it would be exchange-dependent, and likely rare on the major exchanges.


I see, thank you for the information!


As an aside, does anyone have any experience using Aerospike?

Any impartial thoughts?



Is it just me or do the times listed in this article seem implausible?


The big question mark is the latency. The lookups are/should be really fast - as I guess that these guys either have lots of memory, and a random SSD read should be 1/10th of a millisecond.

The article is an ad for database software, but I'd be interested to know in how they keep the latencies low enough when querying third party data. The article mentions that the data is colo'd, but I find it hard to believe that all 150 DSPs are colo'd.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: