TheSixtyOne.com's Jobs Page 'Voting Rings' Problem

patio11 · on Feb 26, 2010

Ooh, that sounds fun.

In terms of process design, I think you could take a hint or two from the Netflix guys here. Currently you're distributing a single test case. That takes very little work to accomplish: good first step! Consider creating two test cases: one we'll call the probe case, and one we'll call the real case. The real case is kept on your servers and never gets exposed directly to the participants.

Make a really freaking simple way for you to check your work on the probe case automatically, without needing to speak to you guys. For example, you could make a simple REST API or even a web page with an upload/submit button. Then run whatever your evaluation function is against the submitted solution for the probe case, and tell folks how good their submission was. For an extra ten minutes of work you can add a leaderboard. (Add a leaderboard!) You can optionally rate-limit this if you're worried about people overfitting to the probe via taking a lot of measurements.

This has a better user experience and will get you more viral exposure because a) you've now totally decoupled your ability to run code from people's ability to participate in the contest and b) people can now get immediate feedback that their participation is valued and producing results. Including a leaderboard also gives folks an emotional hook both for participating and for spreading word of your contest: "Oh cripes guys, the top submission for this problem was done in Ruby. Let's show them the awesome power of functional programming. For the greater glory of Lisp!"

Of course, you can feel free on putting whatever restrictions you want on code which you individually invite for offline exposure to the real test case.

threepointone · on Feb 26, 2010

Now you have 2 problems :)

carterschonwald · on Feb 26, 2010

Isn't the problem format a teeny bit underspecified, as in shouldn't the input format also indicate who submitted the article?

chronomex · on Feb 26, 2010

It could be, or depending on how the problem is specified, it could be entirely superfluous. If the accounts that submit articles vote according to the general population's general vote distribution, then knowing the submitter of the article does not help finding a voting ring.

JoelPM · on Feb 28, 2010

Did anyone write a solution? I wrote one for fun (not looking for a job) and am interested in comparing findings. Here's the biggest ring I found (assuming there isn't some huge flaw in my code):

114317 117417 116711 104623 55965 61543 117666 111454 106702 109719 59735 98844 79232 99084 56944 42023 116674 113113 106336 106877 114731 107692 106107 116790 116420 98958 111863 87465 102051 59239 36249 61496 108406 93936 117489 102314 114890 40853 43144 112933 61454 93895 99996 107278 100940 106030 98559 28959 36931 8629 65535 61589 113214 108605 100944 99853

Tichy · on Feb 26, 2010

Is that job still open? I toyed with the idea of applying, but then my gf was a bit shocked by the idea of moving to the states. Still, it sounds like fun.

I started coding the exercise, my first approach would have been a clustering algorithm. For various reasons I decided to write my own (though with a standard algorithm) and didn't have enough spare time to debug it properly. Should have just used an existing package... Anyway, if the job is still open, I might give it another shot.

buro9 · on Feb 26, 2010

That's the best job application I've seen in a while and something that I've no doubt interests many of us. I think I'll take a stab at their interview problem just for my own amusement. Shame I won't be moving to California otherwise I'd consider going for the job.

dangrossman · on Feb 26, 2010

If I were looking for work I'd have some fun with this. Machine learning is a fun area.

DanielBMarkham · on Feb 26, 2010

Would you do it that way?

I'd just brute force it, assuming the entries would support it.

Of course, I always like using a shotgun where a fly-swatter might work :)

jules · on Feb 26, 2010

What do you mean by brute force? How would you even determine whether a solution is good?

DanielBMarkham · on Feb 26, 2010

This does sound like a neat problem -- and one which I'll be solving soon in my application.

Don't want to give away too much, but looks like a basic statistics text might be a good place to start....

Perceval · on Feb 26, 2010

So we'd have to set up the null hypothesis that the voting patterns are simply random. What probability distribution are we going to use? Gaussian distribution is probably the wrong one to use, since online voting and online popularity usually conforms better to a power law distribution. So bootstrap a distribution from the data, and then test for correlations? How do we control for the deliberately random behavior of the voting circle sockpuppets?

kf · on Feb 26, 2010

A Bayesian approach probably works much better

karanbhangui · on Feb 26, 2010

On first pass, I thought you were being satirical by just using big terms. Now I see that it's a legitimate question. Carry on.

evgen · on Feb 26, 2010

The solution is so readily apparent it does not even require you to know math anymore... Let's reframe the question to see if anyone can figure this one out:

"A search engine called Goggle wants to help give you a better view of the web. They use links between web pages to decide which pages will be the top results for a given query; pages with a high authority rank will be features at the top of the search results and will get more traffic from people searching for information. Goggles suspect that spammers have found a way to manipulate the system by creating fake web pages that link to their pages and give them an artificially inflated rank, hence forming 'link farms.' You have been hired to identify suspected farms by looking at link data.

Your task may not be as straightforward as it seems however. The caveat is that the spammers may have fashioned their fake pages to look like normal web pages. To create some misdirection, fake users may sometimes remove their links to targeted articles or even link on articles that they have no association with.

Sergiy Brin and Larry Page, the creators of Goggles, request that you write a program in Python or C to identify the top five suspected unique link farm rings consisting of at least five pages for each ring."

Sound more familiar? It is not quite the same problem, but you will find a lot of good solutions in this well-explored space that also apply to the original problem.

rwhitman · on Feb 26, 2010

Wouldn't this be a problem that was better solved without software?

I mean if its actual people doing the voting, rather than just bots, shouldn't this be handled in a less software-focused way?

What I'd say is treat it like police detective work, have someone go undercover, act as if it was like any other human 'crime' syndicate...

Perceval · on Feb 26, 2010

1) There is the problem of identification, which seems to be the object of this exercise. TheSixtyOne wants to have a program that can red flag certain correlated behaviors that they might not otherwise notice or be able to quantify. You can't really go undercover if you can't spot the criminals in the first place.

2) There is the question of labor. I doubt TheSixtyOne has a large enough staff to devote one or more employees to the task of infiltration. Infiltration is not the easiest skill to just pick up. Nor are voting rings actually made up of multiple people. As the test prompt states, it could just be one person with multiple sockpuppets. How do you infiltrate a one-person voting ring?

dasil003 · on Feb 26, 2010

How do you infiltrate a one-person voting ring?

This should go on a list of startup koans somewhere.

axomhacker · on Feb 26, 2010

How would you decide which of the hundreds/thousands of "rings" would you pursue?

trafficlight · on Feb 26, 2010

James took the simple way out with this one. He just took down all of the voting and commenting features. This drives away all of his users and now he doesn't have to worry about voting rings.

That's a bingo!