Hacker News new | past | comments | ask | show | jobs | submit login
How to Sniff Out Online Fakers (gigaom.com)
101 points by brandonb on Oct 4, 2012 | hide | past | favorite | 47 comments



Hey, OP here! I work at Sift Science, who provided all the facts for this article.

It's been crazy to see how far fraudsters will go to create fake accounts. We've seen people in the Phillipines use Twilio accounts, for example, to fool SMS verification and look like a U.S. user. People scripting the creation of thousands of accounts. People distributing malware via Chrome extensions to take over legitimate users' accounts.

Are any of you out there are dealing with malicious user behavior -- fraudsters, spammers, account takeover, etc.? I'd be happy to answer questions!


Do you also look at - or have a chance to look at - timezones as reported by the user's browser, or as selected by the user?

One of my services, TweetingMachine, used to have a massive problem with spammers going out of their way to abuse it. However, before a user can schedule a tweet, they have to select their timezone.

Problem solved! There are three or four specific timezones always chosen by the bad guys, and every ten minutes a script ran through the database banning users whose timezones fell into this list.

The surprisingly thing for me was that given the effort of constantly trying to get past my other detection scripts, the spammers either never worked out what was happening, or simply didn't choose a different timezone.

In hindsight, it's quite a cute test, and one that it looks like few of the bad guys bother to adjust (i.e. the time their browser reports via JS) or are aware of.


(Doug here, also from Sift.) Yes indeed -- timezone, as reported by the client, does factor into our predictions. Our observations definitely resonate with your experience with TweetingMachine -- while fraudsters may be able to appear natural in certain ways, they usually miss a way or two :-) Thanks for the info, mootothemax!


Any test like this will work so long as not everyone is doing it. Once enough companies use that as a filter, the bar will be raised.


I used to see this exact same issue (wrong timezone selection) on some phpBB spammers. They all chose GMT-12:00, which contains no permanently inhabited landmasses. Easy block. :)


This is why legitimate Nigerians and Chinese are banned from half the Internet.


It's not ideal, but past a certain point, as a saas operator you're caught between a rock and a hard place. The best I've been able to come up with is a "do you think this is a mistake" button, which spammers tend to not click.

Still not nice to know that I'm treating people like second-class Internet citizens. At the same time, my available time is extremely limited, so lessor or two evils - for me - it is.


Users on HN complain every single day about service companies like Google who remove content or close accounts that haven't broken any rules.

Do you have anything to say to those users, as an operator of a large SaaS provider that deals with fraud/spammers daily?


I think it's really important to have an "appeals process." Many of our customers don't use Sift to block users unilaterally, but rather to ask for further verification. For example, if the Sift score is high, they might call the user to verify their identity. That's a great way to get rid of most of the fraudsters, while giving good users a "way out" for those cases where the algorithms get it wrong.


Do you have anything to say to those users, as an operator of a large SaaS provider that deals with fraud/spammers daily?

A quick point of order: I am definitely not "an operator of a large SaaS provider" ;)

Now, affected users can get in touch directly with me - the owner of the website - and I generally respond in a matter of minutes. That's way better than a lot of the big companies out there, but I recognise it's a result of my small size.

I agree that it sucks for the legitimate users, but for reasons I won't go into, the site in question is never going to scale to anything mass market.

Also, as I said, my time is limited, and I'd much rather be spending my time on stuff that improves services for users and makes a better product than on writing systems to tell the difference between spammers and legitimate users. Remember, my scale is tiny and, frankly, if I lose a few customers because of this, that's money well spent.


That's mostly because Google has zero customer support. They rely purely on their automated systems. False positives by themselves are not the issue.


It sounds like the person I'm asking is doing the exact same thing, but they haven't scaled up to the tens of billions yet. I'm asking how he intends to scale his model.


I'm asking how he intends to scale his model.

Heh, as I alluded to in my reply to your other comment, for TweetingMachine at least, I have zero intent of scaling; tt's a small tool, and will remain a small tool (can't even remember if I've done any work on it in the whole of 2012) whilst I focus my energies elsewhere.

So, being quite so harsh was a means to an end. At the same time, if I experience abuse of any of my other services, it'll definitely be a marker I'll use for more manual investigation.


> lessor or two evils

I'm going to guess this is an autocorrect error.


I'm going to guess that it's two autocorrect errors. :-)


Hehe, thanks, you're right :-)


That's a huge issue, and it's part of our motivation to use machine learning to combine lots of signals. It's not fair to the legitimate users in Nigeria and other countries to be blocked entirely from booking airline tickets, posting blogs, or booking reservations, but that's the status quo. It's hard to blame the site owners; given the number of attacks, and the difficulty of building a good detection system, it's the best they can do given the time and resources available.

Our hope is that by pooling data and technology across all sites on the internet, we can build a better system that keeps the bad users out without causing "collateral damage" and harming perfectly legitimate users.

And in response to the grandparent, we absolutely look at timezone! It's a great signal. We've found 3am is the most popular time to create a fake account. But note that creating an account at 3am, all by itself, is not enough to condemn you. It has to be combined with many other behavioral signals.


Nice to see your work out where it can do some good!

One of the most amazing things about running a search engine was getting a look at what bad people look for (like php exploits or out of date Wordpress themes).

I keep wondering if there is an opportunity for a 'fraudster alert' service like the Realtime Black Hole lists where members could share IP addresses from which has originated fraudulent or hostile traffic, seems like it would make suppression easier.


Definitely! In fact, if you look at our site, one of the features we offer is real-time alerts: https://siftscience.com

We combine data across all of our customers, so that if we find a fraudster on one site, we can alert every other site where the person has an account. I think there's a lot to learn by having sites aggregate all their knowledge about bad actors.


I was interested in how you look at security and information privacy, particularly as an important part of your strategy is to aggregate data between services. When designing the service what sort of considerations did you make in this regard?

Secondly, as I was reading your privacy policy I noted that your condition on non-collection of data is for users to not use any products or services that "utilize the Service". That is all well and good, but you only "encourage [your] customers to describe their use of the Service and other technologies that collect user information in privacy notices that are displayed to users." [1]

If one of your customers chose not to disclose the fact that they are using your service what recourse does a user have? Is there any 'opt-out' option that lets me choose to not be tracked by you (with the understandable restriction I _can't_ use your customer's services)??

[1] https://siftscience.com/privacy


This is one of the benefits of MaxMind MinFraud, another fraud risk evaluation service. Not only do they tell you the risk of each signup/purchase (on a 1-100 scale), but any other MinFraud customer can flag someone and every other company that recently evaluated the same person gets notified of the increased risk.


Cool, can I use this service to get my enemies' transactions refused?


Theoretically, but there are a few issues:

* You're defrauding a fraud management service, which will catch on pretty soon when zero risk transactions get flagged by you as fraud, and close the account for abuse.

* Doing what you asked is illegal in several ways, and tied to your real identity -- MinFraud costs money, and being a fraud management service, paying with a prepaid debit card, connecting over a proxy, and verifying your account with a SIP phone will all be detected and flagged -- getting an account not traceable to you is quite hard.


Not if you can't tell the needle from the haystack. A "legit" company can mask data in many ways. The issue is the broad-brush-stroke generalizations. Some people find them pretty offensive.

This is the same Sh!t banks use to deny poor, black people credit and healt companies use to deny sick people coverage.

The people running these biz ideas fighting alledged XYZ (~corruption) do so out of the motive/incentive of there own personal greed and self interest, not the public good.

Don't kid yourself.


Surely by flagging them yourself, you wouldn't be accepting their transactions and so first and foremost hurting your own business?


Of course, that's true only if Utility(profit) > Utility(enemy's pain).

I wonder if services like these count as consumer credit services (and hence are governed by all sorts of fairness laws)


"... create a digital profile of who will likely perpetrate online fraud, ..."

How do you deal with false positives?


Just wanted to say, Brandon is super knowledgeable in this area - great guy and resource.


How do you handle false positives? Or is that the responsibility of the site using your service?


Can we expect a Chrome plug-in that does fake opinion analysis on HN?


If you haven't dealt with an elevated fraud risk yet then you are not yet running a successful business.

Dealing with fraud and abuse takes up a good percentage of the time and other resources at any successful online service. If you don't budget for that and automate it as much as you can then you will possibly fail even though the rest of your service is doing fine.


I support having better systems for catching fake accounts and sock puppets, but I am concerned about the potential for new systems/profiles/countermeasures to make online life very difficult for innocent users. We've seen this happen with captchas; the zeal to stop bots has made registering for certain sites or leaving comments nearly impossible for a lot of people. Michael Arrington had this problem recently on Tigers.com (1).

As for the criteria named in the featured article, I work late every night, and I use Yahoo Mail for almost all "casual" account signups to catch spam, unwanted newsletters, and other notifications that I would rather not deal with in my other email inboxes. I would hate to see these be used as an excuse to target me for more intrusive registration steps or deny me access to various sites or services.

1) http://twitter.com/arrington/status/236893640365068288


Fraud detection seem to be a staple of a lot of successful software companies (Paypal and Yelp come to mind immediately). Great to see a SaaS solution for this pain!


How do you plan to deal with false positives (guess who is often awake at 3am :) By that I mean are you returning a probability to your customers and they pick their level of comfort/risk or is it some binary answer?


Great to see machine learning used in this way. I hope this lowers prices on services now that other companies don't have to do this work in house.


I find the statement Most traffic coming from Nigeria is fraudulent to be bollocks. Like many countries, criminals comprise a minority of the Nigerian population and correspondingly, net fraudsters comprise a minority of the Nigerian online population.

Interestingly, the 2011 top 10 countries by # of reported complaints of net fraud are: 1) United States 90.99% 2) Canada 1.44% 3) United Kingdom 0.97% 4) Australia 0.66% 5) India 0.50% 6) Puerto Rico 0.22% 7) South Africa 0.22% 8) France 0.19% 9) Germany 0.19% 10) Russian Federation 0.17%

Source: Internet Crime Complaint Center (FBI) www.ic3.gov/media/2012/120511.aspx


I find the statement Most traffic coming from Nigeria is fraudulent to be bollocks.

I don't.

Criminals are minority of the Nigerian population, but that minority spends an inordinate amount of their energy trolling online and looking for new venues that regular Nigerians are not likely to visit.

Therefore if you're a US based and targeted company, the majority of YOUR Nigerian traffic IS likely to be fraud. It isn't most of the fraud that you've got to deal with, but it is likely to be so ridiculously obvious and easy to deal with that you'd be a fool not to.

(Of course this is widely enough known that Nigerian scammers use proxies to hide their origin.)


The jury is still out on this one. Unless you can show some numbers, it's hard to see this as anything but a subjective extrapolation of the facts.

Nonetheless, Sift Science's claim is at best poorly worded because it gives armor to the false and unfair interpretation that the majority of regular Nigerians folks are fraudulent, which by itself begs the question.


I no longer have numbers. It was several years ago that I was working for a US-only company that decided to add a customer to customer piece, then got targeted by scammers. But we definitely rediscovered the fact that IPs from Nigeria were entirely scammers, and so were messages saying Nigeria. Oh, and "Western Union" was not a phrase you want to see and so on.

It was a black hole sink. And since it was at best marginal to that business, that piece eventually got shut down.


uhm, you know the stats you just cited were for victims? Complainants are the people making the complaint. Says nothing about where the perpetrator behind the complaint comes from.


This is because most online fraudsters over the world target US companies and individuals.


If you're interested in fake registration detection, see also this presentation on a 2-day rush development effort to implement fraudulent account detection at Groupon. http://www.infoq.com/presentations/Bootstrapping-Clojure It's also about the awesomeness Clojure, but much of the meat is in the fraud detection algorithm itself.


This is some good stuff. I'm definitely looking forward to seeing what can be done with it and perhaps the spam all over certain phpBB forums, etc, can be fixed. Of course, those are just the little fish.


At this point in time, they should probably know it's comcast.net, and not comcast.com which is for their employees.


My initial feeling is that the most crud left behind is from professional fakers, aka menial labor from India/Phils/etc. That's where one can buy fake reviews en masse.

That explains point 1, 2, 3, 5 and possibly even point 4. (working time difference, and cheap equipment.)

The key to detecting fake accounts is tractability. That's why FB logins are gold. You can look at the account and - most of the time - it is easy to tell if it's real or fake.

The problem is that many sites want volume, not quality. They just let anyone "add content". This is a an easily solvable problem that most sites do not actually care to solve.


Easy to solve if someone publishes a credibility guide to websites, based on this analysis.


It's just simple business rules.

Is it really that hard for you to tell a fake FB/twitter account from a real one that you couldn't code the logic?

You don't even have to code it. You just need to link it. If someone is reading the review, they can go back to the person that wrote it and judge for themselves.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: