Company withdrawing from Facebook as analytics show 80% of ad clicks from bots

reustle · on July 30, 2012

Copying the text just incase FB decides they don't like it

---

Hey everyone, we're going to be deleting our Facebook page in the next couple of weeks, but we wanted to explain why before we do. A couple months ago, when we were preparing to launch the new Limited Run, we started to experiment with Facebook ads. Unfortunately, while testing their ad system, we noticed some very strange things. Facebook was charging us for clicks, yet we could only verify about 20% of them actually showing up on our site. At first, we thought it was our analytics service. We tried signing up for a handful of other big name companies, and still, we couldn't verify more than 15-20% of clicks. So we did what any good developers would do. We built our own analytic software. Here's what we found: on about 80% of the clicks Facebook was charging us for, JavaScript wasn't on. And if the person clicking the ad doesn't have JavaScript, it's very difficult for an analytics service to verify the click. What's important here is that in all of our years of experience, only about 1-2% of people coming to us have JavaScript disabled, not 80% like these clicks coming from Facebook. So we did what any good developers would do. We built a page logger. Any time a page was loaded, we'd keep track of it. You know what we found? The 80% of clicks we were paying for were from bots. That's correct. Bots were loading pages and driving up our advertising costs. So we tried contacting Facebook about this. Unfortunately, they wouldn't reply. Do we know who the bots belong too? No. Are we accusing Facebook of using bots to drive up advertising revenue. No. Is it strange? Yes. But let's move on, because who the bots belong to isn't provable.

While we were testing Facebook ads, we were also trying to get Facebook to let us change our name, because we're not Limited Pressing anymore. We contacted them on many occasions about this. Finally, we got a call from someone at Facebook. They said they would allow us to change our name. NICE! But only if we agreed to spend $2000 or more in advertising a month. That's correct. Facebook was holding our name hostage. So we did what any good hardcore kids would do. We cursed that piece of shit out! Damn we were so pissed. We still are. This is why we need to delete this page and move away from Facebook. They're scumbags and we just don't have the patience for scumbags.

Thanks to everyone who has supported this page and liked our posts. We really appreciate it. If you'd like to follow us on Twitter, where we don't get shaken down, you can do so here: http://twitter.com/limitedrun

eytanlevit · on July 30, 2012

Happend to us too!! we first thought kissmetrics was wrong, so we built a logger that logged server side requests and a client side code that executes on load of the landing page, and we found that 70% of the server side hits had javascript disabled(or that hit the page and left it before the js executed, which seems improbable for such a large number - 70%).

Cushman · on July 30, 2012

> (or that hit the page and left it before the js executed, which seems improbable for such a large number - 70%).

That's actually an interesting proposal-- 100% of the time when I click on an ad, I didn't mean to and I try to close it before it has a chance to annoy me.

Probably doesn't make sense on a large scale, but it's something to consider.

flexie · on July 30, 2012

If that is true and 70-80 percent hit the back button before the js files load, it still says a lot about the quality of the clicks from the perspective of someone considering advertising on FB...

GoodIntentions · on July 31, 2012

so the net result would be the same.

Someone's grandmother miss-clicking and then mashing back is about as useful to the advertiser as a bot. conversions are where it's at. Poor quality traffic / bots / it's all the same. the advertiser quits paying

Cushman · on July 30, 2012

To be fair, that pretty much matches what I've heard about Facebook's click quality.

eytanlevit · on July 30, 2012

I agree that some users might close the page or hit the "back" button before the $(document).ready() happens, but I can't believe that 70% of the people did that(taking into account that our ad was really targeted and highly relevant to the landing page).

alttab · on July 30, 2012

and then we find out drunk or stupid people accidentally click stuff.

sicxu · on July 30, 2012

drunk or stupid people probably won't click 'back' that quick.

tectonic · on July 30, 2012

What user agent was reported by those hits? This is fascinating.

eytanlevit · on July 30, 2012

It was a few months ago and we've already deleted the db table, we are soon relaunching with a new campaign, so once I get the data again I'll post it here.

chmike · on July 30, 2012

70% is a ratio ! The effective number may be small at the same time.

seiji · on July 30, 2012

We built a page logger. Any time a page was loaded, we'd keep track of it.

They reinvented the access log? By hand? Are people really that detached from how servers work?

tlrobinson · on July 30, 2012

I assumed this just meant they built something on top of access logs that told them what percentage of page loads also loaded JavaScript. I don't know about your webserver, but mine doesn't tell me this kind of thing out of the box.

Anyway, it's pretty irrelevant whether they added logging in their app (one line of code?) or enabled their webserver's built in logging.

bendemott · on July 30, 2012

Access logs that write continually from different processes or threads (like PHP) can cause System IO to reach critical proportions. In any non-trivial enterprise application it's completely reasonable to have a stat-logic-layer for recording events... The access log can still do it's own sorta thing - this is more true if you have more than 1 server.

brador · on July 30, 2012

Surprised no ones mentioned the noscript plugin.

Since the user is on facebook they'd have facebook unblocked, but noscript still blocking everything else.

How would you discern a noscript user from bot?

Zak · on July 30, 2012

You wouldn't, but if the vast majority of your ad click visitors have javascript disabled, and only a tiny percentage of your other visitors do, what conclusion would you draw?

takluyver · on July 30, 2012

You don't, but it's very unlikely that there were enough noscript users to match the stats. From the post:

"...in all of our years of experience, only about 1-2% of people coming to us have JavaScript disabled, not 80% like these clicks coming from Facebook."

brador · on July 30, 2012

It's a different marketing block.

For example, target something for the Reddit.com users and you'll see 90%+ block ads.

With facebook granular targetting you could very specifically end up with a target segment with a large proportion of noscript type plugin users. Without more details on their campaign I would not be so quick to place blame purely on bots.

takluyver · on July 30, 2012

But this is about users who click on ads on Facebook. The sort of users who have javascript disabled are going to self-select out of that group pretty strongly.

And even among the most technically literate groups, I'd be amazed if you saw 80% use of noscript. So many sites require javascript that it's easier to browse with it enabled.

tripzilch · on July 30, 2012

> How would you discern a noscript user from bot?

There are a few tricks to do that, described on the old ha.ckers.org weblog, or perhaps on the related sla.ckers.org forum. No idea if they still work, though ..

You could always see if you could trigger NoScript's built-in XSS or frame hijacking protection :) Bots will not notice, and users get a scary warning message and believe your ad is trying to hack them. Not good for your conversion rates, but a pretty sure way to tell them apart.

tlrobinson · on July 30, 2012

Unless you're targeting noscript users there's no way they account for a significant portion of those 80% of clicks.

Maskawanian · on July 31, 2012

What you say is valid, however no-script users are a small minority of internet users. I doubt their numbers would tip the results in either direction.

wpietri · on July 30, 2012

This could be a businessperson's interpretation of what happened.

Or they might have indeed reinvented it. Kids these days and their Heroku! Back in my day we'd process each HTTP request by hand. And we'd do it uphill both ways in the snow.

vampirechicken · on July 30, 2012

using awk.

jlgreco · on July 30, 2012

Hey now, awk is simply lovely. It's ridiculous how popular it is to slight it these days.

dredmorbius · on July 30, 2012

Indeed.

Most of the parsing required for the logfile analysis I describe is written in awk, and goes through roughly 1 million lines of logfile in about 75s on a commodity 2Ghz Intel Xeon.

Pre-caching of host lookups is done via xargs and a suitably high '-j' value. Running on a pre-sorted and de-duped list of IPs, this takes another few minutes. Lookups are read into hash tables for faster processing in the main awk parser (avoids system calls, DNS latency, and especially negative result caching failures).

It's fast and simple. Not a 100% solution, but quick to add bits to over time as new needs arise.

vampirechicken · on July 30, 2012

How does 'using awk' count as a slight?

It was the proper add-on to the previous comment kids today don't learn awk. Becuase they have Perl, Ruby, and Python which are supersets of the unix command line tool set. Awk was the king in the years before Perl.

I never really learned to use cut, so I sitll use awk for one-liners the deal with columns in delimited-data text-files.

jlgreco · on July 30, 2012

Sorry, I read it as a slight because it came right after "uphill both ways in the snow".

vampirechicken · on July 31, 2012

Which usually precedes "Get off my lawn!"

vampirechicken · on July 31, 2012

Anyway, if I want to take gratuitous pokes at a language, I usually choose XML; sometimes java.

vampirechicken · on Aug 1, 2012

Which, of course, I've just been informed is the "preferred" integration language for the project I'm presently working on. Sigh.

tripzilch · on July 30, 2012

You had awk? Ha! Luxuries! Why when I was a young programmer we had to write the code in the snow with our pee, and a compiler was just a word for the pilot of the hovering dirigible that read the instructions and passed them to the ALU, which was another fellow with an abacus. They would wrap the re- sults around a rock, and drop it on my house when the program would exit. We had to walk uphill...

scribblemacher · on July 30, 2012

Using butterflies: http://xkcd.com/378/

jeffbarr · on July 30, 2012

With VB6 you can check IP Addresses.

fsniper · on July 30, 2012

Please guys. This is not Reddit.

Karunamon · on July 30, 2012

Damn straight. Cheap jokes are not allowed.

jamiecurle · on July 30, 2012

And that is how to do a c-c-c-c-c-combo breaker on Hacker News.

mattmanser · on July 30, 2012

Access logs can't record if javascript is enabled. There are other things I would be interested in if I were doing this too, I'd check the browser's TZ, check if flash was enabled, etc. and do it across sessions.

Depends how far they went profiling their visitors, it clearly says in the article they were tracking if js was enabled, so yes, you'd need to roll something or another by hand. GA relies solely on js for example.

cientifico · on July 30, 2012

easy: < script > (new Image()).src = document.location.href + "Hey, i have javascript < / script >

And then compare in the log.

wildtype · on July 31, 2012

genius

hobonumber1 · on July 30, 2012

This is besides the point.

seiji · on July 30, 2012

Is it though? It seems they spent a lot of time throwing more and more javascript at the problem until they realized they couldn't figure out what was happening. If they had real logging from the beginning, it would have unconfused them much sooner.

spaghetti · on July 30, 2012

They were taking orthogonal measurements to see if the pattern would appear over and over. Also it sounds like each successive measurement technique was at a lower level than the previous technique. Sounds pretty thorough. I'd love to see some actual data though.

crygin · on July 30, 2012

How would you use the access log to determine if client-side javascript was being executed?

mapgrep · on July 30, 2012

For the page in question, count total hits in the server access log and subtract total hits reported by the Javascript analytics software.

mike-cardwell · on July 30, 2012

Check if it's even being fetched for a start. Assuming it's not inline.

alttab · on July 30, 2012

Browser-cache my friend. You'll need to poll the server on page load (or even better in this case, after you load the ad)

waivej · on July 30, 2012

Use a javascript function to poll the server for something.

SoftwarePatent · on July 30, 2012

thanks for coining the word "unconfused"

Someone · on July 30, 2012

The way I understand this is that they want to log who accesses their pages on FaceBook's servers.

I am not familiar with how things work on FaceBook, but I would think that FaceBook users do not have access to the logs that FaceBook's servers write.

AznHisoka · on July 30, 2012

you're not alone. I've heard similar stories from other businesses that have tried to advertise on FB. At first I thought they just didn't track properly, but it seems it's a widespread problem.

samstave · on July 30, 2012

That escalated quickly!

And so it seems that Facebook could be proven to be the new groupon.

I would absolutely not be suprised if the hits were FB bots - however I'd expect they wouldn't be that directly stupid so as to have an Ad-bot net in house. Surely this is some service they outsource for plausible deniability.

Something they would have learned from their relationship handing all user data over to the NSA/CIA.

EDIT:

I'd like to make a prediction on how this could potentially play out:

1. FB ignores them. They have nothing to do with this issue and they know the bots are tainting traffic - but they can't/wont do anything about it because it boosts their revenues.

2. FB may or may not be involved in the ad-bots, but, they will drop their $ contingency on ad-rev for the name and payout these guys to shut them up. Their stock was improperly priced and taken a beating. They don't want this coming to light in a large way as it will have a negative affect on the perception of their business model. They hav to be kind of careful of the Barbara Streisand effect here too...

3. Many more people will reveal the same results with empirical testing and it will be revealed that FB earnings are 80% over inflated. Their stock will drop to the predicted real value of <$10... maybe hit that actual projected number of $7.00 - Zynga will BEG FB to purchase them as they are now a penny stock and they can't sustain.

spaghetti · on July 30, 2012

We need to see some real numbers before making these kind of predictions. What was the volume of clicks that OP couldn't detect? 10, 1K, 1M? It makes a big difference.

Also given FB users' aversion to ads this whole issue could be explained by users just clicking the back button. Ad supported sites have trained users to do three things very well: install adblock, habitually ignore ads, quickly click away from any interstitial or unexpected ads like on Hulu.

Another note on the "back button": I recently discovered the awesome power of "three-finger swipe back button" on Macbook trackpad. So it's possible users can "click" back even faster than before. I can and before the swipe I was well-trained in the art of "CMD back arrow" :-)

samstave · on July 30, 2012

Totally agree. However, I wanted to put my thoughts on this down... Surely we need more info and time to see what the actual truth is - but I just do not believe that FB is in any way some altruistic innocent/neutral entity.

stef25 · on July 30, 2012

What user data stored on FB servers could be useful to intelligence agencies of that level? People brag about crimes they committed, IP addresses are logged and people may "like" jihad but criminals targeted by the NSA are surely not at that level of stupidity.

Also, FB may not have much of a choice to comply with those agencies, or find themselves pissing off a lot of high level people by refusing to comply with government agencies. But outsourcing / tolerating a network of bots to vastly boost revenue is probably considered fraud, no?

tripzilch · on July 30, 2012

> What user data stored on FB servers could be useful to intelligence agencies of that level? People brag about crimes they committed, IP addresses are logged and people may "like" jihad but criminals targeted by the NSA are surely not at that level of stupidity.

It's not what people brag about, it's the social graph itself that's useful. All the other things like photo-tagging are extra candy.

More info at http://www.wired.com/threatlevel/2012/03/ff_nsadatacenter/al... and http://nplusonemag.com/leave-your-cellphone-at-home . Sorry it's late out here, or I'd copy/paste the relevant quotes for you--look for the bits about data mining and machine learning, especially in the 2nd article, everything is a signal and they do have the computing power to process it all, apply Bayesian belief nets, algorithms can pull intelligence information about you from the local shape of your social graph that you don't even realize can be inferred from it (humans are not very good at reading complex info from big graphs much further than simple "friend-of-a-friend" relationships).

AznHisoka · on July 30, 2012

All this happening as employees in FB are itching to cash out their options in a few months.

taligent · on July 30, 2012

So without ANY facts your conclusion is that Facebook is fraudulently deceiving advertisers.

And that they are doing this using experience from working with the NSA/CIA in order to cover up the fact that $800 million dollars of Facebook's yearly profits will disappear overnight once people realise the truth. And as such they are willing to pay out the ad-bot companies in order to hide this conspiracy.

Right.

phenom · on July 30, 2012

I think problem is more deep, I do not want to make other Conspiracy Theory but here some story from other huge Russian social network (vk 150m. users). There is one small, not that big group (like fb groups), users made agreement, that no one will post next day (something like honeypot), and those who post would be considered as bot and baned forever. So next day couple or so bots generated comments encouraging others do not go on strike against some political party [0].

I do not know is it applies to FB, but those who have like 100k bots have some power too.

[0] http://imgur.com/Vvkbx screenshot in russian

Edit: spelling Edit2: proper link

samstave · on July 30, 2012

>...without ANY facts...

What about the farking article? Where they proved that 80% of their clicks were from bots. Then several others confirm they ahve the same experience?

What about those facts?

I also speculated on what FB is doing. Them giving user data to the NSA/CIA - that is a supposed fact, sure, but I am not the only person who believes this.

Am I extremely distrustful of Facebook? Of course! Maybe you have not followed their track record or the character of their founder.

heelhook · on July 30, 2012

If you are into conspiracy theories then how about maybe this is Google directing a massive bot net to make facebook ads useless?

> I also speculated on what FB is doing. Them giving user data to the NSA/CIA - that is a supposed fact, sure, but I am not the only person who believes this.

Great argument! A lot of people believe it, so it must be true! Are you for real?!

samstave · on July 30, 2012

It is not my responsibility to educate you on modern internet history. This has been argued here on HN prior; Do you recall the backdoors that AT&T put at 600 Folsom, in SF?

And the backdoors in Cisco's equipment that are a requirement by the federal government?

Maybe you thin this shit is "conspiracy theory" and if you do, you're simply a naive fool.

Look around you at what governments are doing. The NSA has records on everything you do online. They had a system in 2005 which could trace communications between users to 6 degrees, automatically.

dboat · on July 30, 2012

It's odd to me that your posts aren't more downvoted than they are.

samstave · on July 30, 2012

You think I make this shit up? There are some really naive and stupid people on HN that have zero grasp on the history of intelligence agencies.

SwellJoe · on July 30, 2012

So, this is why people don't take you seriously.

I know a couple of guys like you in my local activist community, who take a very hostile "I know the truth and you're all fools!" attitude, despite the fact that their audience is mostly very sympathetic to most of their assertions. We know spying on the Internet takes place; HN is full of cantankerous old Internet geeks who've seen, first-hand, plenty of examples of the state (whatever state you may choose, as it happens all over the world) behaving unethically on the Internet, spying on people and punishing people for things that shouldn't be crimes.

But, your paranoid approach is counter-productive. You might as well be working for the people you claim to be afraid of, for all the good you do (negative good; you're convincing people that the folks who believe the government is spying are all paranoid nutjobs who scream at anyone who has the gall to mention other possible explanations).

So, let's review:

1. Just because spying has taken place, and is currently taking place, and may even have the complicity of facebook in that spying, it does not mean that facebook is running a botnet to steal advertiser dollars. The simplest explanation is that facebook looks the other way while others run the botnets. facebook wins (a lot, as long as most advertisers don't know it's happening), botnet owner wins (a little), and the advertiser loses. But, there are other plausible explanations, including incompetence.

2. When you paint things in a "Either you accept my theory in its entirety, or you're all idiots", you force people to choose a side. Nobody wants to be on the same side as an asshole, so you force them to choose the other side. You make people who may even agree with you (to a greater or lesser degree) to begin to formulate plausible reasons for why you're wrong about the crazier stuff you're spouting...further convincing themselves that you're entirely wrong. The best you can hope for is people ignore you and don't have the chance to be inoculated against your ideas; having you as their first exposure to these concepts guarantees they will be less likely to believe them in the future, even if they come from a more credible source. Humans are funny creatures.

Thus, I would point out that there are some really naive and stupid people on HN that have zero grasp of effective argument, persuasion, and even basic logic.

You might be well-served by reading about non-violent communication: http://en.wikipedia.org/wiki/Nonviolent_Communication

Edit: Removed the word "schizophrenic" as it was an insensitive use of the term, and was counter-productive to making my point.

tripzilch · on July 30, 2012

Hey, calling people "schizophrenic" is not cool. Just sayin'.

SwellJoe · on July 31, 2012

You're right. It was an unproductive method of describing the behavior I was seeing (and was too easy to interpret as saying the person is schizophrenic rather than exhibiting behavior I associate with schizophrenia). It was also insensitive to schizophrenics.

In my defense, schizophrenia runs in my family, and I'm very familiar with it...I don't think of it as an insult. But that's a local custom in my family that I shouldn't think follows in the rest of the world.

Dylan16807 · on July 31, 2012

When describing behavior and not as a slur I don't see any problem.

samstave · on July 30, 2012

>But, your paranoid schizophrenic approach is counter-productive.

Really, now I am a paranoid schizophrenic?

Just because I make claims which are readily confirmed and were completely available in the media - even the EFF filed suit on the AT&T events...

Yet, for some reason, it is my responsibility to educate everyone every single time someone new comes along who hasn't been following these things closely.

Now I am a paranoid schizophrenic?

Also, its a strawman to focus on my tone, rather than content. You're trying really hard to be overly pedantic and, frankly, an asshole.

Thanks for the link on communication.

dboat · on July 30, 2012

What I am witnessing is someone trying to inform people, while effectively censoring himself by delivering in such a way as impedes its delivery.

You obviously have a vested interest in seeing your "message" be received, so stop ignoring what people are telling you about your tone and change it. Figure out what it is that people do listen to, or else you have no one but yourself to blame when they take issue with your tone.

To wit: Your tone is defensive. Pejoratives and cursing undermine your expression. What you are effectively telling people is that your thoughts are not important enough to merit self-restraint. Detailed explanations following assertions are also helpful.

FWIW, I work in the performance marketing industry and, IMO, the most complicity facebook could be said to have in this issue is in not placing a sufficiently high priority on preventing bots from clicking on their ads. Even if it is a sizable project there, they represent such a large target that the difficulty of the job becomes much greater. I see no salacious story here, other than a company found itself unable to optimize a campaign into profitability, which I think says more about them than facebook. Ho hum, find a different traffic source and move on.

samstave · on July 31, 2012

Thanks - Ill take the constructive criticism on my tone.

However, I will point out that with respect to your comment on FB's complicit actions due to the daunting nature of the problem, this does not take into account their other actions of a 24K ransom on the domain.

Everyone can argue in any direction they want - but neither me nor anyone else is really going to know until we get further down this path...

praxulus · on July 30, 2012

He's not arguing with you, he can't be using a strawman. He's just pointing out that other people will be distracted by your tone, making them ignore your (mostly correct) message.

samstave · on July 30, 2012

Thanks. In that case, I will try to be less emotional about this issue.

When I was talking about China hacking lockheed as far back as 2005, everyone said I was nuts! (I freaking worked at lockheed!)

There needs to be a better way to log and track this stuff so that we can point people to some sort of Tyranny Wiki.

briandear · on July 31, 2012

And there are some that actually know what they're talking about in terms if intelligence agencies but choose not to indulge in idle conspiracy hypothesizing. The real operations aren't something that the people with actual knowledge would ever discuss. Those that do claim conspiracies are often ill informed or just have read too many spy novels.

pbhjpbhj · on July 30, 2012

>in all of our years of experience, only about 1-2% of people coming to us have JavaScript disabled //

I use noscript. When I enable it for visits it's usually by enabling the domain itself and leaving third-party scripts disabled. Depending on your area this could be significant.

Also the possibility of blocking of the scripts by other elements not loading - for example if the browser can't parallelise the requests for some reason and a preceding request can't be handled then the page might be displaying well before the script is loaded, especially if it's loaded at the bottom of the markup.

jakejake · on July 30, 2012

I think unless you run a forum for javascript virus injection techniques, you're not going to see 80% visitors with disabled javascript.

thlt · on July 30, 2012

I didn't write an own analytic program but can confirm that 80% of clicks from my Facebook ads don't show up on my GoSquared dashboard.

rio517 · on July 30, 2012

And here is a screenshot: http://i.imgur.com/4hu66.png

xentronium · on July 30, 2012

What happened? Is reddit offline again or something?

eblume · on July 30, 2012

There is (subjectively) a marked deterioration of comment quality on front-page discussions in the past few days, I feel. I have also noticed that HN articles are getting linked on reddit more frequently (I use both networks, but for highly disparate purposes - gotta get those funny .gif's somewhere). No data supporting this but I fell it's a trend and I'm not liking it.

skeletonjelly · on July 30, 2012

Few days? Try since forever. As people increase the SNR gets larger. Even on reddit when it started, it was heavily tech focused, you'll find a comment that says it's going downhill. Hard to say what the numbers on HN are like growth wise in comparison to reddit (which has their subscriber numbers on display).

RobotCaleb · on July 30, 2012

Here's ascii of the screenshot in case they censor your picture of text.

http://pastebin.com/raw.php?i=HHeY5nP1

logn · on July 30, 2012

Here's the MD5 of your ascii version of the screenshot of the blog post just in case: 14e34998a7bc46174bccd981ad5f41f6

rviscomi · on July 30, 2012

I acknowledge receipt of the MD5 of the ASCII of the screenshot of the blog. Please acknowledge receipt of acknowledgement.

koglerjs · on July 30, 2012

I give you a hamburger.

cpeterso · on July 30, 2012

Alert! The MD5 hash of the ASCII screenshot I computed does not match yours:

$ curl -s http://pastebin.com/raw.php?i=HHeY5nP1 | md5sum 7b69269bd0e25daa8e8b171e5d513011

mandeepj · on July 30, 2012

waiting, when screenshot will say "Mark Zuckerberg" likes this :-)

kragen · on July 30, 2012

I don't understand why you would post that here. Can you explain?

_ntka · on July 30, 2012

Very unlikely to have been bots from FB, they're not yet quite that desperate. They would be 90% sure of getting busted, and it would be a huge deal breaker for a lot of their clients. A suicide move.

The more plausible reality is that there are just a lot of bots crawling FB fan pages, poorly programmed bots that will just follow every single link they come across. That's very much in line with my personal experience of FB fan pages.

But FB is certainly to blame for not detecting these as bots, and putting ads on the pages they serve them. Hopefully they'll take care of this issue soon.

bigdubs · on July 30, 2012

In my mind having an in-house bot network and being totally ignorant that you're charging for bot traffic via ads amount to the same thing.

This is very, very bad.

jokermatt999 · on July 30, 2012

How? In house bot network implies active fraudulent charging, while being unaware of bot clicks is simply neglect or ignorance.

ralfn · on July 30, 2012

The distinction between ignorance and intent does not compute.

Yes, we dont need a conspiracy to explain stupidity. The stupidity is the conspiracy.

People arent ignorant, because they are handcapped or something. They are ignorant, because thats the behaviour and excuse we reward. Just be an idiot, dont put in the intellectual effort, and you will be allowed to speak lies and screw every one and every thing.

From a corporate point of view, its the golden ticket. Plausible deniability. We didnt know that our insurance helped finance weapon exports to dictators. We didnt know. We didnt know. Sorry this, sorry that. We're keeping the money though.

I dont think that provable intent should have legal meaning. Because all neglect is intentional, as is ignorance in a world, where every one has the choice to be well informed.

flatline3 · on July 30, 2012

Negligence

stdbrouw · on July 30, 2012

It does not amount to the same thing at all. Keep some perspective.

bigdubs · on July 30, 2012

It does amount to the same thing.

If I am writing a mechanism to charge for ads based on traffic, one of the first things to QA/write test cases for is fraudulent traffic.

The leap here is from facebook being remotely competent (which is likely), and willfully ignoring checks on certain types of traffic because it benefits them financially.

My assertion is that this is the same class of bad behavior as running your own bots. Note the distinction in 'willful' ignorance.

tatsuke95 · on July 30, 2012

>"Hopefully they'll take care of this issue soon."

Much like the multiple account issue, it's beneficial for Facebook to turn a blind eye. That is, unless it escalates into a PR issue.

eranation · on July 30, 2012

First page in HN, for some people this is considered a PR issue :)

saraid216 · on July 30, 2012

HN's opinion of anything is not remotely significant to FB's PR, really.

mattmaroon · on July 30, 2012

These bots, if they do exist, wouldn't be operated by Facebook. It's a market-based ad bidding system. If 80% of clicks are worthless, clicks will be worth 80% less, and advertisers will bid accordingly. Facebook won't make anything in the long run by doing this, and there's clearly risk of getting caught doing something illegal.

If there is a bot, it's likely one of their competitors, though even that seems unlikely. One must be logged into FB to see ads, and I'm positive Facebook has some click fraud detection. They'd almost have to notice a number of accounts clicking the same ads over and over. I guess maybe if Facebook's CAPTCHA is weak enough and a bot was creating an account, setting up the proper interests to see the ad, clicking it, then repeating... Seems unlikely though.

Also none of this explains why they're taking down their Facebook page. One does not need to buy ads to have a Facebook page. You can even have it automatically update from Twitter. They could leave it up there and get free publicity from it and simply cancel their ads.

This whole thing seems like a sham for attention to me.

mojowo11 · on July 30, 2012

> Also none of this explains why they're taking down their Facebook page.

I think it pretty clearly explains why they're taking down their Facebook page. Just because you don't agree with them doesn't mean it's not explained. They're shutting down their Facebook page because (according to them) it's sitting around with outdated branding but they can't change the name unless they spend $2000/mo. Also because their user experience with Facebook has been crappy, and they don't want to have anything to do with them anymore. You know. Principle.

Could be a sham for attention, but the article pretty clearly explains why they're shutting the page down.

mattmaroon · on July 30, 2012

Couldn't they just make another page with the correct name?

mojowo11 · on July 31, 2012

They could, but a) that's a pain in the ass, they'd have to build all the content out from scratch and b) they'd lose a ton of their Likes in the transition.

lilyball · on July 30, 2012

The name on their page is wrong, and Facebook refuses to change it for them unless they buy $2000 worth of advertising. That's pretty scummy, and I think perfectly justifies their decision to remove their FB page entirely.

nhashem · on July 30, 2012

My startup is essentially an advertising aggregator (pooling traffic from a variety of publishers and routing it to advertisers) and dealing with things like bot detection is a HUGE chunk of what we work on, technology-wise. Let me try and give you an idea of how deep the rabbit hole can go.

- Okay, you want to detect bots. Well, "good" bots usually have a user agent string like, "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)." So, let's just block those.

- Wait, what are "those?" There is no normalized way of a user agent saying, "if this flag is set, I'm a bot." You literally would have to substring match on user agent.

- Okay, let's just substring match on the word "bot." Wait, then you miss user agents like, "FeedBurner/1.0 (http://www.FeedBurner.com)." Obviously some sort of FeedBurner bot, but it doesn't have "bot" or "crawler" or "spider" or any other term in there.

- How about we just make a "blacklist" of these known bots, look up every user agent, and compare against the blacklist? So now every single request to your site has to do a substring match against every single term in this list. Depending on your site's implementation, this is probably not trivial to do without taking some sort of performance hit.

- Also, you haven't even addressed the fact that user agents are specified by the clients, so its trivial to make a bot that identifies itself as "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:14.0) Gecko/20100101 Firefox/14.0.1." No blacklist is going to catch that guy.

- Okay, let's use something else to flag bot vs. non-bot. Say, let's see if the client can execute Javascript. If so, let's log information about those that can't execute Javascript, and then built some sort of system that analyzes those clients and finds trends (for example, if they originate from a certain IP range).

- This is smarter than just matching substrings, but this means you may not catch bots until after the fact. So if you have any sort of business where people pay you per click, and they expect those clicks not to be bots, then you need some way to say, "okay, I think I sent you 100 clicks, but let me check if they were all legit, so don't take this number as holy until 24 hours have passed." This is one of the reasons why products like Google AdWords don't have real-time reporting.

- And then when you get successful enough, someone is going to target your site with a a very advanced bot that CAN seem like a legit user in most cases (ie. it can run Javascript, answer CAPTCHAs), and spam-click the shit out of your site, and you're going to have a customer that's on the hook to you for thousands of dollars even though you didn't send them a single legit user. This will cause them to TOTALLY FREAK THE FUCK OUT about this and if you aren't used to handling customers FREAKING THE FUCK OUT, you are going to have a business and technical mess on your hands. You will have a business mess because it will be very easy to conclude you did this maliciously, and you're now one Hacker News post away from having a customer run your name through the mud and for the next several months, 7 out of the top 10 results on any Google search for your company's name will be that post and related ones. And you'll have a technical mess because your system is probably based on, you know, people actually paying you what you think they should, and if you have no concept of "issuing a credit" or "reverting what happened," then get ready for some late nights.

I'm seriously only scratching the surface here. That being said, I'm not saying, "this is a hard problem, cut Facebook some slack." If they're indeed letting in this volume of non-legit traffic, for a company with their resources, there is pretty much no excuse.

Even if you don't have the talent to preemptively flag and invalidate bot traffic, you can still invest in the resources to have a good customer experience and someone that can pick up a phone and say, "yeah, please don't worry about those 50,000 clicks it looks like we sent you, it's going to take us awhile but we'll make sure you don't have to pay that and we'll do everything we can to prevent this from happening again." In my opinion this is Facebook's critical mistake. You can have infallible technology, or you can have a decent customer service experience. Not having either, unfortunately, leads to experiences exactly like what the OP had.

kirubakaran · on July 30, 2012

I understand that it is challenging for a third party like you to identify the bots. But Facebook has more than enough signal on each user clicking on an ad (friends, posts, location, 'non ad-click activity' to 'ad-click activity' ratio) to determine if that user is a bot. Looks like they are choosing to not do anything with this information. "FB is turning a blind eye to keep their revenue" seems to stand up to Occam's Razor.

rapind · on July 30, 2012

Couldn't agree more. It should be almost trivial for them to track down bots and flag bad accounts. I definitely wouldn't classify it as one of their harder problems.

pmarsh · on July 30, 2012

Not saying that's not the case, but you have no data or knowledge to back any of that up. I wouldn't assume a problem is easy or hard until I've got sufficient info on it.

Too often I've heard "how could XYZ not have done this, fixed that?" Then have that same party sign on board to fix this "easy" problem and get themselves in a world of hurt.

rapind · on July 31, 2012

Well I do have at least some data to back that up. I know that they are a walled garden that keeps a ton of user data for every account (even the subset represented in Open Graph is significant). And it's with this knowledge that I make my conclusion.

* Accounts subsisting of an unusual proportion of ad clicking activity compared to their other activities are likely bots.

* New accounts with no history but high ad clicks are likely bots.

Before a click is counted, check the source account for it's human rank (activity-to-clicks). If it's below a certain threshold (tuned over time) than don't bill your customer for it.

The algo's to determine human rank can start fairly simple and become more complicated and accurate over time.

In terms of their current sophistication I would definitely call this an easy problem from a tech standpoint. Scaling to millions of concurrent users? Hard. Attracting and keeping millions of users? Hard.

jiggy2011 · on July 31, 2012

The bot masters then just have to figure out roughly what your activity/ad clicks rate is. They can do this over time by looking at statistics.

Then they make some bundled crapware browser plugin that gets installed by legit facebook users that will click ads at that interval + some random time. Possibly do the ad clicks in background so that the user does not know.

rapind · on July 31, 2012

How exactly are they going to know they're being flagged as bots? When all you're doing is discounting those bot clicks to the advertiser it becomes far more difficult to test your limits as a bot (you would need to be a paying advertiser, just to analyze the human to bot threshold). It's similar to hell-banning, in that the user's generally unaware they've been hell-banned.

Take a minute to think about the effort a bot account has to go to in order to even come close to gaming a system that tracks their behaviour.

Let's assume that in order to be deemed human you need at least a few characteristics like "account older than 1 day", "friends from other non-suspicious accounts", "a photo or at least comments on a friends photos", and a "certain amount of click activity".

Now let's take that criteria and make it difficult for a bot to even know if they're being flagged as a bot where the only indication is a reduced bill at the end of the day / week / month to the advertiser. So bot makers would have to pay to test and would at best receive delayed feedback on their success.

Isn't this starting to seem extremely difficult, and generally unscalable from a bot perspective? It doesn't have to be impossible, there just needs to be lower hanging fruit, of which there are plenty should FB implement a system like this.

And should this fairly simple solution still have a few holes that a very elite few bot coders are able to game, you've still got tons of data and time to use to tune your algos.

It just doesn't make sense that one of their advertiser's is being charged for 80% bot clicks, until you start to question Facebooks motivations.

AznHisoka · on July 31, 2012

I agree. This is an easy problem, considering FB's resources. Only a logged in FB user can click on ads. Not random crawlers. That alone makes this problem 100 times easier than what Google has to face with invalid clicks. No excuses.

jiggy2011 · on July 31, 2012

It depends whether the bots are creating accounts or hijacking existing ones (via browser plugins or whatnot).

The bots will not be able to get accurate statistics without paying, but they might be able to get "good enough" statistics. For example, is the system moving in a direction we like.

If you do need to pay , a good bot herder probably has enough stolen credit cards to use.

Yes, there are certainly mitigation techniques. But like other spam/bot related problems it becomes an arms race.

You only need 1 good bot programmer to release his bot online for thousands of budding script kiddies to take advantage of.

rapind · on July 31, 2012

Hijacking browsers? We're getting pretty sophisticated and far fetched. If you've hijacked someone's browser there are many better ways to make a profit than spiking a competitors ad buys.

So basically we've narrowed down the potentially bot makers to:

1) Has a way to hijack thousands of facebook user's browsers, and is more interested in spiking some companies ad buys than other profitable activities they could be doing.

2) Has access to stolen credit cards so they can spend the money required to debug and test their bot's humanity against the defensive algorithms. Spending money in order to indirectly make money because some company is paying them to spike a competitor's ad buys. Obviously this is going to cut into their margins, so the amount they spend should be less than they're being paid.

3) Has the patience and dedication to analyze the bills when they come in to determine how much of their activity is considered botlike and then tweak their behaviour accordingly.

4) Thousands of budding script kiddies are also being payed by companies to spike their competitor's ad buys.

5) Rides a unicorn?

Don't forget there also needs to be companies willing to pay someone to make these bots and don't mind how incredibly difficult it would be to measure it's success. In fact, if there are companies willing to pay for this, then you're going to have a whole bunch of scammers who simply claim they're able to do it that the companies will have to filter out.

We're incredibly far into the realm of unprofitability and edge-casery don't you think? An alien spaceship is going to land in my backyard before this happens.

jiggy2011 · on July 31, 2012

Not really, this is just off the top of my head and I'm sufficiently unfamiliar with bot programming and FB advertising that I'm sure that plenty of better solutions must exist.

You are forgetting that there are growing numbers of programmers in developing nations for whom doing this sort of thing can be an attractive way to earn a living (or at least they are willing to try it). Look on any "hire a freelancer" type sites and you will find hundreds of postings for tasks that sound awfully like scamware development (usually at a couple of dollars an hour).

Browser hijacking is not so uncommon either, I've installed legit software in the past that has come bundled with some crapware that injects extra adverts into web-pages and other things.

The analysis would not be too difficult either, simply set a "click rate" and adjust it downwards until you are being charged > $0. Do facebook offer any method to get free trail credits? If so this could be abused, heavily.

The economics are roughly the same as with spam, if you do enough of it your "hit rate" doesn't matter so much.

valejo · on July 31, 2012

Where there is an affiliate program there will be bots. Insofar as Facebook creates a way for developers to monetize their apps through a revenue share on ad revenue, developers will have an incentive to falsify app usage and ad clicks.

epivosism · on July 31, 2012

Compare it to google's problem filtering out automatically generated splogs, though - it took them years to make progress against that.

Real human social activity is a lot more random than real informational content in blogs - so fb's problem seems harder.

Also, facebook banning accounts is a much stronger action than google just lowering the pagerank of suspected splogs.

So overall I think fb's problem is harder than a currently google-unsolved problem.

jonknee · on July 31, 2012

Facebook has total control of the users very action since joining Facebook. Much easier than coming across a random HTMl page and deciding on if it's real (whatever real means, YouTube comments are real but are very low signal but a database that runs itself could provide lots of signal).

Foy · on July 31, 2012

He never said to ban the suspicious accounts, just don't charge companies for their clicks.

The detection part aside, it's a simple fix. I just wonder why FB hasn't already. (Is it because it'd hurt their revenue when they need it the most?)

publicfig · on July 31, 2012

But in the same respect, the fact that real human social activity is much more random is a reason why this should be easier. Bots are inherently pattern based, and while there are bots that could work to maintain fake interactions in order to pass a wide-sweep analysis of low-socializing accounts with a high ad response rate, it would do quite a bit to take a first attack on some of the more obvious bots, which are the ones that many of the people who had these problems noticed.

Now of course I'm simplifying this to a nearly absurd degree, but I do think there are reasons for Facebook to want to ignore these issues and that even a minimal step up in bot-prevention could go a long way when it comes to their advertisers.

epivosism · on July 31, 2012

I suppose a naive bot would not be "random" enough, and would obviously not be human.

But on the other hand, since valid human behavior includes much more senseless, random, and repetitive actions (compared to blogging), a well-done bot would be harder to detect.

publicfig · on Aug 1, 2012

Oh, absolutely! I have no doubts that Facebook will always be a step behind the bots, and I don't think anyone feels or even expects otherwise. That's just part of the social networking climate right now, and all major (and even many minor) social networking sites have to deal with it. However, in many cases that were pointed out in both the article and the comments here, these people weren't having their pages liked and ads clicked through by advanced bots, but instead by obviously shill accounts. Now of course there is always the option that these are legitimate accounts with weird use patterns, but regardless there are methods of doing at least a significant sweep and remove some of the more obvious bots or at least offering them a chance of legitimacy confirmation (even something similar to CAPTCHAs for non-standard account usage that messes with advertisers campaigns).

Like I said, I absolutely believe there will be bots that are incredibly sophisticated that might be difficult to detect, but I still think more could be done to stop the less sophisticated ones.

TomAnthony · on July 31, 2012

I come down on the side that it shouldn't be so difficult for Facebook to at least spot some of the 4 bots per human.

However, even if you assume it is a hard problem - thats more reason that they should have listened (and responded!!) when a company identified this problem with some hard data to back it up.

ebtalley · on July 30, 2012

data would be * IP addresses, one can assume a bot would only have a set of addresses they could use, barring botnets. * request patterns, ie: did the bot request css/js, etc * request timeframes * UA strings

Sure, its a big data problem, but I can imagine that Facebook has solved these types of scenarios many times over.

zburt · on July 30, 2012

What if you start a new Amazon EC2 spot instance (netting you a new IP address), start up Chromium in headless mode (say, using Xvfb), navigate to the website of choice, use mouse automation to start clicking around, click the ad, spend 5 minutes clicking around in a semi-choreographed pattern on the advertisee's website, and then shut down the instance -- only to repeat?

Moreover, Amazon is always buying new IP subnets.

jonknee · on July 31, 2012

It sounds like you don't need to go to that much hassle currently, but even that rigmarole is simple enough to combat. The user account should be real, the usage real (comments, photos, messages back and forth) and the friends also real. False positive spam Ids are OK, that will lower your revenue but won't constitute fraud with your customers. Put up a test for uses you think are spamming, the test they already do of identifying photos of your friends would be a good one.

Large numbers of real looking fake accounts should be hard to keep up.

kirubakaran · on July 30, 2012

The user from that new IP won't have any real human like history - photos shared and commented on over time etc.

freeall · on July 31, 2012

But why would you go through all that to click on ads? If I click on an ad for "Some Record Company" how does that make me money?

aidos · on July 31, 2012

It doesn't always need to make you money, sometimes you might just want it to cost your competitor money.

geori · on July 30, 2012

I dont count clicks from any amazonaws or ec2 hostnames on my site.

radicalbyte · on July 31, 2012

Then pay Amazon for a list of their EC2 IPs, or obtain that information from a public source (i.e. RIPE, ARIN).

pegmanm · on July 31, 2012

This is a mostly solved problem though. Or at least there are people and products out there that do this specifically for you, not with a focus on bots alone as that is a bit of a edge case but for mobile devices.

DeviceAtlas and WURFL are both products that I have used that attempt to have complete coverage of all UA strings in use in the wild even, the ones that attempt to spoof legitimate UA strings.

While their focus is on device detection and the properties of these devices for mobile content adaptation, knowing which connections are from bots is just as important if you are going to alter what content you serve up based on UA.

These product actively seek out new UAs and add them to their rule sets in a way that I can only assume will be more accurate over time than a home grown solution.

FWIW I know that FB use at least one of these products for their device detection, so if you want to match what FB thinks (knows!?) this is where you would start.

cookiecaper · on July 31, 2012

It's just as easy to invest in several fake profiles that act just like real users. Pay people to work from home maintaining fake accounts for you, posting stuff that seems real, interacting with your other fake accounts and celebrity/business pages, etc. Then, run bots with these accounts, click around on a subset of good links randomly, and throw in a reasonable ratio of ad clicks per account, then move on to the next shill account.

This definitely takes non-trivial effort, but it's also definitely doable if you have a serious interest in perpetrating click fraud. While Facebook hypothetically could fight this form of click fraud, it'd likely be difficult and the method would be fragile/easily circumvented. Like most things, PPC advertising can only function in a world primarily populated with positive actors.

simula67 · on July 31, 2012

I am unsure if Facebook provides a service like AdSense to motivate this kind of behavior. It would be interesting if to see how they could fight this type of behavior though.

1. Make the split uneven. For example for every dollar advertiser pays, only 30 cents goes to the page owner who received the "Likes"

2. Lobby to make these kinds of activity illegal. They are after all spamming activities

3. These kinds of fake account farms will usually have very few connections to outside world. That is, accounts that do not obey the "Six Degrees of seperation" should be flagged for human review.

4. Once reviewed enlist law enforcement help to track down the owners of the all the accounts in these farms.

Painstaking work, but I guess the point is that advantage can lie with Facebook as opposed to the spammer in this game. I am not sure about the ethical implications though.

Maybe this is why Google wants to succeed badly in the social space : extra spam fighting abilities.

roel_v · on July 31, 2012

You're basically saying 'there is a point where the line between actual user and paid click-frauder becomes blurry'. Which is true, but that point is way beyond what is profitable, which is enough - you 'only' need to make your system hard enough to beat to make it not worth the effort.

cookiecaper · on July 31, 2012

Sure, I agree that your summary is correct. I don't really think the system I described is necessarily untenable, though. If you make 10k+/mo from ads I think it could be plenty profitable.

tantalor · on July 31, 2012

Google seems to handle click fraud pretty well: http://www.google.com/intl/en_ALL/ads/adtrafficquality/index...

I couldn't find any resources on how Facebook handles this.

mkup · on July 31, 2012

It's much easier to guess a weak password than create a fake profile. As for everything else, I totally agree with you.

johnbenwoo · on July 31, 2012

FB turning a blind eye towards shady a advertising practice because it generates revenue? See: Scamville http://techcrunch.com/2009/10/31/scamville-the-social-gaming...

lifeisstillgood · on July 31, 2012

Just as a thought - facebook must be in a unique position to provide almost unbeatable captchas - just put up images of a known friend and four / forty random pictures - the one who chooses correct is likely the person logged in

ignore this is fb already does it

ars · on July 31, 2012

They do in fact do that (or something like it) when you need to recover an account/password.

omellet · on July 31, 2012

They also did this the first time I logged in from a foreign country.

lance0 · on July 31, 2012

I suspect even that signal is not trustworthy. Given that its not 100%, what fraction of the 900+ million Facebook users are actually users? Twitter seemed to be totally plagued with that at one point anyway.

What's the grey area look like? What's the chance of falsely identifying a real user as a fake one?

Rhetorical question. Not sure FB could even answer them accurately.

dredmorbius · on July 30, 2012

We've found that identifying sources of traffic and patterns of usage is superior to simple user-agent detection, particularly as, since you've noted, user-agent spoofing is trivial, and/or some bot traffic is driven through user-based tools (say, script-driven MSIE bots).

Instead of that, we'll watch for patterns of use in which high volumes of traffic come from unrecognized non-end-user network space.

"Unrecognized" means that it's not one of your usual search vendors: Google, Yahoo, Bing, Baidu, Yandex, Bender, AddThis, Facebook, etc.

"Non-end-user network space" means a netblock which is clearly not a residential or commercial/business networking provider.

For this you'll mostly want to eyeball, though when we saw very high volumes of search coming at us from ovh.net (a French low-cost hosting provider), it was pretty obvious what the problem was.

Using an IP to ASN/CIDR lookup (such as the Routeviews project: http://www.routeviews.org/) and a local ASN -> name lookup. One source of this is http://bgp.potaroo.net/cidr/autnums.html

rscale · on July 30, 2012

One as-yet-unmentioned technique is to adjust a bot score by taking an OS fingerprint, and comparing that to the listed user agent.

It's not perfect for a variety of reasons, but I found it to be a useful input for a similar bot detection problem. That said, this was a long time ago so I'd need to re-run some experiments to see if the hypothesis remained valid.

For those not familiar with OS fingerprinting, it's a method of looking at the details of a connection and determining which operating system is most likely to have created packets with those options and flags. In BSD, it's built into pf.

tripzilch · on July 30, 2012

On the packet level ... hm that's pretty clever, definitely not trivial to fake, either.

Just that if your bot runs from, say, Windows 7 and it spoofs an IE8 user-agent header, or even runs by directly automating IE itself, how do you detect it then? Both of those scenarios are not unlikely at all.

rscale · on July 31, 2012

If the bot spoofs a user-agent that is viable for the OS, then you need to rely on other signals. It only works as a component of a defense in depth strategy.

dredmorbius · on July 30, 2012

Interesting. Any Linux equivalents or Java classes which offer similar capabilities (we're using a Java-based application server / webserver).

biot · on July 30, 2012

It works off of checking the SYN packet [0], which happens at the transport layer [1]. Once it gets to the application layer the information is not available unless the transport layer stores the information and provides a method to query a given connection.

[0] http://www.openbsd.org/faq/pf/filter.html#osfp

[1] http://en.wikipedia.org/wiki/OSI_model

dredmorbius · on July 30, 2012

I suspected something like this, but was hoping enough of the signature might survive for a daemon (or Java class acting as one) to take a peek.

rscale · on July 30, 2012

The linux equivalent is p0f (http://freecode.com/projects/p0f)

I can't speak to Java classes, but they'd need to be dealing with raw packets which isn't an option in a typical stack.

ww520 · on July 30, 2012

These are good info. In my past life, I oversaw the web monetization pipeline as part of my job. Click fault detection was part of it. It is a hard problem and there's no 100% foolproof solution.

As in your case, it's nearly impossible to detect click fault real time as the clicks streaming in. We batch-processed the clicks throughout the day and do daily billing settlement at the end of the day. It's easier to detect click fault over a period of time. Obvious bots are filtered out, like the search engine crawlers.

For click fault, it's impossible to distinguish one-off click as a user click or fault click. We tried to catch repeated offenders. we looked for duplicate clicks (same ip, same agent, other client attributes) against certain targets (same product, product categories, merchants, etc) over a period of times (1-minute, 5-minute, 30-minute, hours, etc). We also built historic click stats on the targets so we can check the average click pattern against historic stats (yesterday, week ago, month ago, year ago).

We ran quick checks every 1 minute and 5 minutes to detect unusual click patterns. These were mainly for detecting DOS attack against our site or against certain products. We actually had a realtime reaction mechanism to deal with it: slow down the requests, throw up CAPTCHA, redirect to a generic page, or block ip at routers.

Every couple hours, we batch processed the clicks so far and ran the exhausting checks to flag suspicious clicks. At the end of the day, definite fault clicks were thrown out, moderate suspicious clicks were grouped and reviewed by human. It's a fine balance to flag less suspicious clicks since there's money involved.

The sophisticate click fault I've seen involved a distributed botnet clicking from hundreds of IP all over the world doing slow clicks during the day. We found out because it's targeting a product category of a merchant and causing unusual click stats compared to historic pattern.

rotten · on July 30, 2012

And then there are the "wet bots" - Amazon Turks and overseas humans (lowly) paid to use a regular browser and PC and manually click on stuff for whatever purpose (rig internet voting, drive fake advertising charges, steal data, skew hit counters - whatever).

gojomo · on July 30, 2012

Thank you for introducing me to the term "wet bots"!

alttab · on July 30, 2012

Is an impression from a mechanical turk not a true impression? Its a real person, after all. If the premise of marketing and advertising work due to mere exposure, then intent of the view shouldn't be measured.

What you are getting at however is very important to ad-driven internet business models. You make the empathetical deduction that if you are there for one purpose, you are impervious to advertising.

Now, if you are still with me after that statement, then there are larger impications of online advertising than are actually being measured. For instance, if the intent of the view mattered, would it be worth more? I think Google has answered that question rather clearly.

Measuring "why someone is visiting a URL" (and thus generating an ad-view) is a larger technical problem than could be accomplished with some fancy algorithms. That is, unless the service knew all of your browsing history...

In any case, you raise a good point about the intent of the page-view and how that should be "credited" as advertising. To do this however is to challenge existing 'fundamental' theories.

nhashem · on July 30, 2012

The problem with mechanical turk impressions/clicks is typically the country of origin, since they usually involve paying people in third-world countries a fraction of a cent which is still worth some nominal amount to them. Say you are an advertiser who runs an ecommerce site that only ships within the US. You don't want to pay for impressions/clicks from people in India because you can't actually sell to them, even if they are "real" people and not a script.

Also, there is "brand marketing" and "performance marketing." I'm not sure they have classic definitions, but generally brand marketing is based on "mindshare" and tough to quantify. This is basically any TV commercial -- Budweiser doesn't expect you to jump off your couch the second you see a Bud Light commercial, but maybe a week later when you're in the supermarket, you spot that new brand of beer and recall the commercial with Jay-Z rapping and grinding with some models, and think, "eh, maybe I'll try Bud Light Platinum, I will purchase this and consume it later because I like the idea of rapping with models," even at a subconscious level.

Performance marketing is all about ROI. You spend $100 marketing dollars, and you want to make at least $X back. Typically $X = 100 in this example, although in some cases you may want to adjust profit margins or even run at negative profit (for example, if you want to maximize revenue even at an unprofitable rate). Almost any advertiser on the internet has a performance marketing mentality, mainly because it's much easier to actually measure performance. This is why advertisers like the OP are frustrated with Facebook, because paying for bot traffic makes it extremely hard to hit their performance marketing goals.

You are right though, in that this is a huge gray area. To give two extremes, let's say you run an ecommerce site but you'd also like to monetize with ads. So you reach a deal with a shopping comparison site to display their products on your pages in your footer. If you buy AdWords to your site, and those visitors from AdWords scroll down and click on the shopping comparison ads, they are probably fine with paying you for that traffic, although you are probably paying something like $2/click. If you make a page with JUST the shopping comparison ads, and buy traffic from someone who basically infects computers with trojans and overwrites their Google search results to be your site (say at $0.01 per click), then you'll get clicks on the shopping comparison ads and make lots of money, except the shopping comparison site will realize these clicks don't lead to any actual sales for their merchants (since this is extremely poor quality traffic since all the users that end up there have basically no intent) and will kick you off. Somewhere in between adveritsing for "first page Google AdWords" traffic and "trojan/botnet" traffic is basically every other form of internet advertising, and how gray that area is, is typically completely defined by your business partners.

dredmorbius · on July 30, 2012

It's at best a gamed impression.

Advertising and marketing are generally very intentionally tailored. You're not trying to reach any human, you're trying to reach a human in your target demographic.

Skewing results (and wasting resources) by paying people who are entirely outside your marketing effort to click through links (or take other similar resource-costing / data-skewing actions) is not the desired result.

pherk · on July 30, 2012

All that is really no excuse for not developing a foolproof system. And if you know that you don't have a great system to catch fraud, then you should simply leave some money on the table.

See, the essence here is not to come up with a great algorithm. This is not a programming contest. But to create a system where everyone (publisher, advertiser, ad-network and customer) wins.

In this very early stage, the success of Facebook's Ad network should be measured only by one metric - ROI of Advertisers. And that number has to be consistently better than Google's.

dangrossman · on July 30, 2012

> And that number has to be consistently better than Google's.

Why's that? This isn't a zero sum game. You don't advertise only on Google or only on Facebook; as much as your budget allows, you advertise on every channel you can achieve a positive ROI on.

For any advertiser looking for a direct response, like a signup or purchase, I doubt anything Facebook can do would make the ROI better than advertising on search. No amount of demographic and interest targeting will make advertising to people as effective as advertising to intent. You can pay Google to send you people while they're in the process of looking for your product; Facebook doesn't do that.

tlow · on July 30, 2012

Who is the "you" that your post above mentions?

dangrossman · on July 30, 2012

The advertiser.

johnvclifford · on July 31, 2012

ROI to the advertiser is a key metric. Smart marketers compare the LTV or Life time value of customers that come from specific campaigns. ROI metric does not have to be better than Google for an ad campaign on Facebook or Microsoft Adcenter to be profitable. I have worked with clients to use web analytics and cohort analysis to optimize both quality (higher conversion, better LTV ) and quantity of leads. Sometimes Microsoft Adcenter traffic converts better but does have sufficient quantity compared to Google Adwords. Also different ad platforms have different levels or competition as well as different levels of fraud detection and click quality. I am surprised that Facebook does not detect and suspend obviously spammy user accounts whether that account is bot driven or outsourced to humans overseas.Google and Facebook have financial incentives to not filter out all low quality ad traffic because many advertisers don't have expertise or time to closely examine traffic sources as the company mentioned in this news story.

A combination of tech expertise and marketing savvy are needed to reduce this problem. More analytic tools are available for Google Adwords than are available for Facebook ads.

loopdoend · on July 31, 2012

Another great way of detecting bots is to redirect the visitor through an SSL connection and sniff their cipher suite. By comparing the list of ciphers the bot presents with a list of known ciphers of major browsers, you can accurately detect a majority of bots. Many bot authors use the same few libraries which never change their cipher suite.

mthoms · on July 31, 2012

This seems brilliant. Are there any downsides?

loopdoend · on Aug 1, 2012

There is a slight performance hit, there is overhead added by creating SSL connections. However, using this method you can detect most cURL bots, practically all bots written in Ruby, and myriad other bots that are lying about their User-Agent. Combine it with other techniques for maximum effectiveness.

lambda · on July 31, 2012

The big problem here isn't detecting bots. Sure, in this one company's case bots may be the problem; but in general, it's detecting behavior that you are paying for that doesn't make you any money. There may be real users who click "like" a lot; they may do it because they're paid, or they may do it because they're bored and trying to see how many "likes" they can collect. It doesn't really matter; none of them are very valuable to an advertiser.

Rather than trying to explicitly filter bot traffic, they should try to quantify the value of particular "likes". For instance, a user who demonstrably clicked through and purchased something is much more valuable than one who "likes" 10,000 companies and has no friends, whether or not they're a bot.

And heck; a bot that has thousands of followers who all buy lots of things is valuable too. Maybe someone created a particularly effective meme-generator bot, and people who follow that bot are likely to buy ironic T-shirts. Whether or not a user is a bot isn't all that relevant; it's whether that user's activity correlates with you making sales.

tripzilch · on July 30, 2012

> How about we just make a "blacklist" of these known bots, look up every user agent, and compare against the blacklist? So now every single request to your site has to do a substring match against every single term in this list. Depending on your site's implementation, this is probably not trivial to do without taking some sort of performance hit.

???! Excuse me? Are you programmers? Efficient substring matching is a solved problem. How many entries in that blacklist are you looking at then? Can't be more than a few thousand to catch 99.9% of the ones where a blacklist would work.

If done right it's easily faster than

> see if the client can execute Javascript.

because that requires a whole extra request roundtrip before detection. Also it's no longer true that bots don't use Javascript, the libraries are freely available.

It's also definitely faster than

> some sort of system that analyzes those clients and finds trends (for example, if they originate from a certain IP range)

> This is smarter than just matching substrings

No. It's smart to grab the 99.9% of bots with a blacklist of substrings, most importantly because it has a very very low false positive rate (unlike checking for JS support) because any human user that goes through the trouble of masking their UA as a known bot certainly knows to expect to get blocked here and there.

After that you can use more expensive checks, but at least you can bail out early with on the "properly behaving" bots with a really cheap string matching check (seriously why do you think that's an expensive operation--ever check how many MB/s GNU grep can process? just an example of how fast string matching can be, not suggesting to use grep in your project. Your blacklist is (relatively) fixed, you can use a trie or a bloomfilter it'll be faster than Apache parsing your HTTP headers)

MartinCron · on July 30, 2012

This is entirely honor system, though. All of the user-agent checking has to be layered with the other bot-removal techniques. Malicious users will always fake user agent strings.

tripzilch · on July 30, 2012

Of course. But you also need to make sure you don't accidentally count the non-malicious bots that have known UA strings but just happen to crawl all over the place, including ads.

I mean it's a good (and cheap) first pass to throw out quite a number of true positives.

zzzeek · on July 30, 2012

this is a great use case for bayesean filtering, and I bet you could even hook up an existing bayes engine to this problem space, if not just write one up using common recipes - I've done great things with the one described in "Programming Collective Intelligence" (http://shop.oreilly.com/product/9780596529321.do). Things like "googlebot", "certain IP range", "was observed clicking 100 links in < 1 minute", and "didn't have a referrer" all go into the hopper of observable "features", which are all aggregated to produce a corpus of bot/non bot server hits. run it over a gig of log files to train it and you'll have a decent system which you can tune to favor false negatives vs. false positives.

I'm sure facebook engineers could come up with something way more sophisticated.

th0ma5 · on July 30, 2012

Well yeah, getting a training data set, and then having your algo not be poisoned, seems like it is a challenge all around!

zzzeek · on July 30, 2012

The algo can't be poisoned, if I understand the term correctly, if you only train it manually against datasets you trust.

Each iteration of training that produces improved behavior can then be used to generate a new, more comprehensive training set. When I wrote a bayes engine to classify content, my first training set was based on a corpus that I produced entirely heuristically (not using bayes). I manually tuned it (about 500 pieces of content) and from that produced a new training set of about 5000 content items.

Eyeballing it for a couple of months initially, manually retraining periodically, spot checking daily as well as watching overall stats daily for any anomalous changes in observed bot/non bot ratios will work wonders.

mey · on July 30, 2012

I'm in another field then advertising, but less then 5% of transactions require special handling, but they represent a significant investment in error handling, procedures, accounting and reversal mechanisms. Even then we are still manually dealing with things that represent 0.00001% of activity. Accounting correctly is always a deep rabbit hole, it comes down to how much you plan to right off inside your margins. :) Thanks for writing up this inside perspective of running this type of business.

qntm · on July 31, 2012

> How about we just make a "blacklist" of these known bots, look up every user agent, and compare against the blacklist? So now every single request to your site has to do a substring match against every single term in this list. Depending on your site's implementation, this is probably not trivial to do without taking some sort of performance hit.

Build a finite state machine which only accepts the terms in the blacklist. That should be a one-time operation.

Then feed each request into the FSM and see if you get a match. Execution time is linear in the length of the request, regardless of the number of terms in the blacklist.

bartl · on July 31, 2012

A perl regular expression will test for a match in less than a microsecond. Other scripting languages have a similar speed.

There's no need to program like it's 1985 any more.

qntm · on July 31, 2012

Hey, you show me a performance problem, I show you a solution. Preferably one involving finite state machines.

springheeledjak · on July 31, 2012

> So if you have any sort of business where people pay you per click

This. This is the problem -- pay-per-click is completely and utterly broken, and has been ever since people figured out how to use bots effectively. For one, PPC is simply not compatible with real-time reporting, but there are myriad other problems with it.

And honestly, I don't know if it's even really a problem worth trying to solve, since there are better models out there already. "CPA" ads [1] are by no means perfect, but they obviate almost all of the problems you mention simply by their definition: the advertiser does not pay unless whoever clicked on the ad does something "interesting". Usually, this can mean anything from signing up, to buying something, to simply generating a sales lead, but the important thing is that "interesting" is defined by the advertiser themselves. Not by, say, Facebook.

Full disclosure: I'm an engineer at a startup that does, among other things, CPA-based ad targeting.

[1] For those of us who don't work in ads, http://en.wikipedia.org/wiki/Cost_per_action has a decent explanation.

nikcub · on July 31, 2012

This sounds like a problem that should be addressed with machine learning and AI rather than the wack-a-mole style of matching strings.

Separate out and distinguish possible humans from possible bots using a set of known humans and known bots.

Assign a human likelihood probability to each user and pay out the click based on the probability.

for eg. this user is 90% likely human, therefor the cost of the 20 cent ad click is 18 cents.

This doesn't have to wait on Facebook, somebody could build this as a third-party app and then send refund requests to Facebook on clicks that are likely bots.

wyck · on July 30, 2012

I don't agree with this at all, user-agents...really?

Go create a really, really smart bot, you will instantly know the limitations they have. It's pretty trivial with enough data and honeypots to separate them from actual people.

stef25 · on July 30, 2012

Would it be useful to have a kind of open source database of "trusted" bots, just to get past that hurdle? I do realize there are many more issues apart from declared user agents.

pbreit · on July 30, 2012

Would "white listing" the dozen or so top agents that make up the vast majority of traffic work better?

dredmorbius · on July 31, 2012

You'd want to white-list both based on user-agent string and IP / CIDR origin.

Keeping those lists maintained would be a bit of fairly constant effort, unless you could come up with a self-training mechanism, or a self-validating mechanism.

Killing Google's ability to crawl your site is pretty much as bad or worse than blocking bots. Though you should also be able to ID bad guys by noting which IPs the spoofed user-agents are coming from to, say, some honey-pot links specifically disallowed for that user-agent in your robots.txt file.

lmm · on July 31, 2012

In the general case it's important to let google crawl your site, but facebook would probably be fine without it.

GoodIntentions · on July 31, 2012

Those are exactly the user agent strings the bot would spoof.

don_draper · on July 31, 2012

One quick way to detect bots is to look for a GET request of the robots.txt file

freeall · on July 31, 2012

Sort of explained already, but "bad" bots would never go look for that file. And a good bot probably already identifies itself the the request, so no need to look through robots.txt.

fsniper · on July 31, 2012

That's it. A "bad bot" would not check robots.txt, but a legitimate user would check it. So looking for the software not checking robots.txt combined with user agent matching for good bots, you would have a good matching ratio.

logn · on July 30, 2012

could you just add a mouse-over event to each ad that sets an isHuman flag?

nerdo · on July 30, 2012

The bot could just fire the mouseover event.

ZachPruckowski · on July 30, 2012

That would break screenreaders, some touchscreen mobile devices, and probably other things I'm not thinking of.

slig · on July 30, 2012

IIRC, years ago, Google changed one parameter of it's AdSense click url based on the number of times the user hovered the ad.

joshaidan · on July 30, 2012

I wish their post was a bit more technical. Show us the code you used to detect javascript, how did the analytics work? Did they test their code by using their own browser to click an ad and see if the code detected javascript, or their hit in the analytic software?

More technical info and data would be nice!

tsycho · on July 30, 2012

While that would have been interesting to me as well, I think the post was meant for their actual users, to explain why they are moving away from FB. Technical details would have been an unnecessary distraction for their intended audience.

Maybe if they have any employees who follow HN, someone would share more light on the technicals?

joshaidan · on July 30, 2012

Found this TechCrunch article that goes a bit more indepth. http://techcrunch.com/2012/07/30/startup-claims-80-of-its-fa...

jyu · on July 30, 2012

This whole thing seems fishy to me. Facebook is one of my best sources of traffic. It's cheaper than AdWords, I can do keyword, demo, and geo targeting all built in. And if I have a hit campaign, it's pretty easily scalable. If you tie it into specific performance, like $ / email sign up, or $ / pageview, it blows away adwords. Getting in touch with someone from Facebook can be a pain, but it's drastically easier than AdWords or other ad networks.

This whole thing seems like one big linkbait/PR play. It's a really big claim from a company that doesn't even have an about us page, has only 500 Facebook likes, and basically has 0 traffic.[1]

I'd like to know if they have tried other paid traffic sources, or have concrete evidence. Inexperienced advertisers tend to spend $50-$100, expecting instant results. And when it inevitably doesn't work out, they blame anyone and everyone.

The second point seems way exaggerated. Facebook isn't "holding your name hostage" and being "scumbags". You didn't sign up with the right name from the beginning, and now you're complaining. You could have told the ad rep that you'll do $2,000 budget at some point, just change the url now. You could have also taken the 5 minutes to sign up for a Facebook Page with name@limitedrun.com and get the page you want. With only 500 facebook likes, the switching cost is low.

[1] http://siteanalytics.compete.com/limitedrun.com/ https://www.facebook.com/limitedpressing http://who.is/whois/limitedrun.com/

jyu · on July 30, 2012

This whole thing seems fishy to me. Facebook is one of my best sources of traffic. It's cheaper than AdWords, I can do keyword, demo, and geo targeting all built in. And if I have a hit campaign, it's pretty easily scalable. If you tie it into specific performance, like $ / email sign up, or $ / pageview, it blows away adwords. Getting in touch with someone from Facebook can be a pain, but it's drastically easier than AdWords or other ad networks.

This whole thing seems like one big linkbait/PR play. It's a really big claim from a company that doesn't even have an about us page, has only 500 Facebook likes, and basically has 0 traffic.[1]

I'd like to know if they have tried other paid traffic sources, or have concrete evidence. Inexperienced advertisers tend to spend $50-$100, expecting instant results. And when it inevitably doesn't work out, they blame anyone and everyone.

I don't agree with the second point. Facebook isn't "holding your name hostage" and being "scumbags". You didn't sign up with the right name from the beginning, and now you're complaining. Instead of taking 5 minutes to sign up for a Facebook Page with name@limitedrun.com and get the page you want, you decided to sling mud with no real evidence. With only 500 facebook likes, the switching cost is low.

[1] http://siteanalytics.compete.com/limitedrun.com/ https://www.facebook.com/limitedpressing http://who.is/whois/limitedrun.com/

j45 · on July 30, 2012

Wow, even if this is a little true the implications are big.

Is this similar, or related to other ad space inflation? I'm reminded of Google (and others)(1) sending out free $75 adwords coupons to easily allow the costs of bids to inflate since it's not my own real money (thereby increasing the percentage of real cash being made on the ads)?

(1)Edited and clarified

nolok · on July 30, 2012

How come, everytime some company does (or seems to be doing) something shady, there is always a bunch of people like this saying "not as bad as Y doing something else that might or might not be true or related". This is not middle school anymore, pointing the finger at other kids is not proper discussion ...

j45 · on July 30, 2012

If genuinely curious: I'm drawing a direct comparison of the potential inflation of ad costs in one playground (google) with another (facebook) through both common and unique means. Another common issue to both is click fraud and it's effects on the said cost of ads. My mistake for assuming this was obvious and known.

The relevance of such a comparison? It's not a one off, as uncommon as we think. Has nothing to do with pointing fingers.

Didn't mean to come across as off topic. To me, if someone upstream and in the past was doing this, chances are it might happen later in time downstream too, and there might be something to see in the forest instead of pointing at one tree or another.

aggronn · on July 30, 2012

Raising awareness isn't necessarily bad, even though its off topic.