Hacker News new | past | comments | ask | show | jobs | submit login
What Real-Time Gambling Data Reveals About Sports (gambletron2000.com)
163 points by lil_tee on March 13, 2014 | hide | past | favorite | 64 comments



ESPN Magazine recently had a "conspiracy theories" issue, in which it explored (among other things) the long-held, popular theory that basketball is fixed. College basketball, in particular. IIRC, the preconditions for a fixed game tended to be:

- Non-tournament, regular season play (b/c not as many bettors and media would be paying attention)

- The favorite team is favored by 11 or more points

- The favorite team is dominated by one or two very strong players

If one player controls his team's play, and he's favored by 11+ points, he has the incentive, the ability, and the margin to shave points without risking losing the game. With a smaller point spread, on the other hand, it's too risky. For reasons I can't recall, an 11-point spread was the magic number. It provided just enough cushion to cover shaving, without jeopardizing the nominal win.

When analysts looked at the history of games that met these criteria, they found consistently abnormal distributions of outcomes in favor of the winning team, but just south of the spread. They estimated that about 3-4% of games in the study sample are quite likely to have been fixed.

At any rate, it would be interesting to see bigger data sets plied for this sort of thing.


An alternative explanation is simply that a lead of 11 points tends to dwindle because the good players relax at less than maximum effort while on the court when the win is in hand. That would be statistically indistinguishable from intentional point shaving. (In fact, it IS emergent point shaving, just without any sinister motive.) This also explains the postseason discrepancy, that players wouldn't let up in a tournament game.

The correct resolution would be for the pregame betting line to account for that tendency, taking into account the favorite's likelihood to coast during garbage time.


It's certainly plausible, but in theory, the point spread is supposed to account for that effect (among all other natural factors involved in play). As you suggest in your second paragraph.

Intellectually, of course, I tend to assume the Occam's Razor solution (i.e., natural variance and other effects over deliberate fixing) in most cases. I'm not one for conspiracy theories as a first line of thinking. And I think a bigger sample size is needed to begin with.

Another thing to keep in mind is that -- at least in theory -- players aren't supposed to be cognizant of a point spread when playing. If we wanted to detect deliberate point shaving, we'd need to look at instances where the player behaved as if he was very aware of the spread and was actively trying to manage it. For example, he actively takes possession of the ball and runs out the clock, misses easy shots, passes into heavy coverage, and then reverses course to correct when/if he goes too far. Basically, the behavior of someone who's trying hard in either direction, as opposed to someone who just tries hard to secure the win and then lets up.


Definitely good points. Down by 11 points with little time left a losing team takes risks- such as fouling in shooting situations- that the winning team doesn't take. The winning team tends to stall.

Of course, it may also just be hard to pick an 11 point spread. I wonder how many won by more than the spread?



It was that issue, but a different article, about an alleged point-shaving operation involving the University of San Diego.

EDIT: Correction noted.


Small correction, but it was actually University of San Diego.

The article about Brandon Johnson and the point shaving is here:

http://espn.go.com/mens-college-basketball/story/_/id/105453...


I wonder if such a small effect size is really statistically significant, especially if you correct for the multiple hypotheses that are implicit in that specific a choice of subgroup...


"On the other hand, all the leagues have significantly lower average hotness in the first half compared to the second half, so maybe it’s not just the NBA that has a boring first half problem."

This seems to be a misconception. It stands to reason that the chart will grow jumpier as you near the end of the game. In options lingo, the implied volatility will be more stable the further you are from expiry (the end of the game). Theta decay and all that. The market is basically giving you an integrated forecast from each point in time until the final buzzer, and as that window shrinks you expect the odds to be jumping around more.


Actually, I'm not sure whether it's a misconception or not. What you say about applied volatility is true, but if your excitement is proportional to effect on the odds of the outcome then that is still what you're measuring. I have no idea whether that is actually the relationship, though (but hey, you could measure it!).


Heh, true. I guess what I was saying is that if you freeze a basketball score at 20-18 in the second quarter, and at that point the odds are 52-48 in favour of team leading by 2, and then neither team scores for the rest of the match, then the odds will asymptotically approach 100-0 as you approach the final buzzer. Even though no one is scoring. By the variance measure used here, the second half will appear more exciting, even though nobody scores. Reminiscent of football (soccer) perhaps. The fact that first halfs have less variance is just evidence that they are further from conclusion, not necessarily that they are less exciting, but then maybe the two are linked.


I'd really like to do that actual study - monitor people watching the games, see how heart rate and perspiration match up with odds-based measures of "excitement".


Emotions get the better of people as a deadline approaches. The market will become an exaggeration of the actual excitement factor of the game.

Hell, emotions get the better of people without a deadline.


It strikes me as a shame that people can be addicted to gambling and we see it as a moral problem. For example, gambling data around elections seems highly likely to be more useful than opinion polls are. After all, this is simply another application of the ideas behind how we expect markets to function.

Gambling is one of the few ways you can incentivise someone to be honest with you about their opinion, and for that reason I think it's actually a mass of untapped potential.


One reason it's problematic is that people can get addicted to it, they can spend all their (& their families money), and you won't know until it's too late. If one were an alcholic, there's physically a limit to how much you can drink in one day (because you'll eventually pass out), and people will know (since you'll be drunk all the time). But one could pop into a bookie at lunch time, and lose €10,000 on a horse in one bet, and no-one, like your family, will know until the bank comes to take your house.


I don't see it as a moral problem per se, but as with other forms of addiction it's too easy to put your personal or family's security at risk, plus there's a moral hazard in that it encourages entry to the market of people who prey on the addiction of others.

I'm not sure what to do about the latter. It's like drugs; I think drugs should generally be legal but it's a fact that many drug dealers are predatory individuals.


This is great analysis. My opposition to using drugs follows that logic: by using them, you are implicitly supporting the predatory system.

Your mention of "market" in that way makes me think of viaticals[1], where the life insurance policies of people near death are bought for a profit. In a way, the people who now own the policies have an interest in the death of the other.

1. http://en.wikipedia.org/wiki/Viatical_settlement. A great book on this subject and how the prevalence of markets has degraded norms is What Money Can't Buy by Michael Sandel (http://www.amazon.com/What-Money-Cant-Buy-Markets/dp/0374203...).


the most destructive types of gambling are the things like slot machines and casino games that offer no wisdom of crowds external benefit, and are -EV no matter what you do. the type of gambling you are talking about happens in things that are usually called prediction markets. lots of effort has been made to harvest knowledge this way, and it works well. And, fairly, I dont think prediction markets are stigmatized the same way casinos are.


This, this this, x1000. The bulk of gambling problems in the UK (where gambling is completely legal in almost all its forms) primarily centre around Fixed Odds Betting machines in shops, rather than market based betting. The latter can still cause problems, but they're not as immediate as FOB machines, which are a blight on communities, tbh. I have 8 bookies within a half mile radius of my house, and I can tell you - basically nobody is gambling on the horses.


Interestingly at the last US presidential election there was a consistent and fairly massive arbitrage opportunity between US and European gambling exchanges.

US exchanges gave Romney a much higher implied winning percentage than the European exchanges did. The gap was so high that even taking into account currency conversion fees and the like there was several percentage points of guaranteed profit with deep liquidity that was available for weeks.


Interesting if true. Do you know where I would look to find the numbers for this?

Also, do you have sense of how such an arbitrage opportunity managed to stay open.


I you do a search for something like 'intrade betfair obama arbitrage' then you'll find plenty of site who were discussing it. The size of arb varied but it was persistently there for months varying from a couple of percentage points (that didn't make it worthwhile considering currency risk) to over 20% at one point.


Second question needs an answer. If there really was an arbitrage opportunity, it couldn't have lasted more than a few seconds. I doubt the "arbitrage opportunity" is taking into account all relevant factors.


If I remember correctly this was a few months before the election and since Intrade would hold the money in escrow the amount you would make off of interest was fairly close to the arbitrage opportunity


So effectively there was no arbitrage opportunity.


I don't care if it makes people honest. If it's that easy to lose control and bet away your life savings, we shouldn't encourage it.


Eh, the same could be said about cars (easy to lose control and kill a bunch of people) but their benefit is so useful that we applied controls rather than outlawing them. Over a century or so we've applied various braking technologies, crumple zones, seat belts air bags, etc... Gambling could be treated the same way and with the advent of big data the benefits may outweigh the harms.


That's what I'm saying. The benefits won't outweigh the harm. Do we really want to encourage people to gamble their hard earned money so some advertisers can make slightly better market forecasts?


The same kind of arguments are used to argue against the Second Amendment: "I don't care if guns make people safer in their homes. If it's that easy to lose control and shoot your spouse, we shouldn't encourage it."

The fact that immoral men exist and occasionally do terrible things should not bias us against the vast majority of good men for whom such acts are unthinkable.


Comparing gambling to guns-in-USA is not a great example, because many other countries do not recognise the right of the people to guns.


Not every country loves freedom ;)


If it's that likely that you're going to shoot your spouse, then you're not safer in your home, are you? Well maybe you are, but your spouse isn't.

Also, if we apply the argument of the original commenter is "We should encourage people to gamble in the name of slightly accurate polling". Now many people have concluded that owning a gun is worth the risk of accidentally shooting your spouse. But who here wants to throw away money to make election polling slightly better.


Can't you say the same thing about stores and credit cards? I dont think gambling is really much worse.


I don't know what "much worse" is, but there's gotta be a line somewhere. That line is currently somewhere between gambling and credit cards.


The (main) problem with gambling is that it's a way to enable corruption, as the casino can act as a one-way function on the money, laundering it.


This is pretty interesting and fun data. The article talks a lot about what games are exciting and it basis this off of wild fluctuations in a team's chances of winning. This makes perfect sense, but there is more to it than that.

If there are two teams playing each other and the score is 43-36 with plenty of ups and downs along the way, is it an exciting game? Sure sounds like it. What if those two teams are the Browns and Dolphins playing in a meaningless game in December with two backup quarterbacks? Is that game still exciting?

These things are hard to quantify because the algorithm needs to put things into context that it may not be able to understand.


What data exactly is this site using?


at a guess - Betfair.


Link to the site being explained: http://www.gambletron2000.com/

and the non-RapGenius about URL: http://www.gambletron2000.com/about

This is unbelievably cool. I am blown away.


Blatant blogspam? It's a single paragraph that links to http://news.rapgenius.com/Atodd-what-real-time-gambling-data...

Edit: I guess it's cross-promotion, which is fine just not what I expected.


The article itself mentions the gambletron website several times, even right at the start:

> Introducing Gambletron2000.com, a tool that uses live in-game gambling data to quantify excitement in sports, ...

So I think it's just that the two websites are related, not blogspam.


There's a script on the page that embeds the Rap Genius article with annotations (a new feature they just debuted), which I'm assuming you're not seeing because you have javascript turned off. The content is on Rap Genius so anyone can add annotations to it. Notice that the linked website (gambletron2000.com) is also a side project of the Rap Genius Engineering team.

All of that aside, I'm not really a hundred percent sure you can call a website's "about" page blogspam...


Ah, I feel silly now! After the first paragraph (which is all that loads), there's a friendly link that says "Read this on NewsGenius." So I really thought that's all there was to the page. I'm using RequestPolicy so it blocked the content from the rapgenius domain from loading, but usually I'm better at noticing when that happens.


How do these betting sites handle real-time events? I believe there was a case recently where a man at the Australian Open tennis in Melbourne was arrested for transmitting point information outside the stadium before it could be broadcast on television (there is always broadcast delay), which obviously could give you a big advantage. I think European football ones go into a vol auction (pause betting) when a goal is scored? But you could have a guy sitting in the stadium, wired up with a buzzer to press when a striker goes one on one with a goalie, or a penalty is awarded, and then just go all-in on the market? The whole thing is a can of worms.


They handle real-time events exactly as your latter example stated: an external (licensed) company provides live data, and orders that were placed during a dangerous situation (e.g. one-on-one) are subsequently voided if a goal is scored.


Why did you edit away your claim that it is to the benefit of the customers? Being a market-maker and being able to void any trades that occur during volatile periods sounds like a dream to me. No need to worry about adverse selection eh? Just printing cash with fat bid-ask spreads and if anyone dares to get the jump on you, they get voided. I wish I could do that in the real markets.


I can't speak for all companies, but some companies I worked with with honour those bets (places just a second before the goal). They try to find a pattern and ban the users after they find them doing this (and it is getting harder since live feeds are faster and faster) but I never saw a company voiding a bet because of this.


Great stuff. Very entertaining read.

This said, I am a bit skeptical about the asessment of "game hotness". Of course games that are tied or close near the end exhibit significant agitation at that stage (and "boring 1st halves") from a betting standpoint.

This might sound obvious, but great games are not just about the outcome. Think of something like soccer, where few points are scored in a given match. It would be very interesting to see what the data looks like for those, as there are fewer data points.


He computes "hotness" for soccer (football, to me) matches too: http://www.gambletron2000.com/?sport=epl%2Cchampions

As you'd imagine, each goal brings about a massive spike in implied win/loss probability.


Wow, very cool. The jumps in the odds are so clear you could probably cross that with a live video feed (or a Twitter stream) to automatically summarize the milestones in a game (goals, penalty kicks, injuries).

PS: Football in Argentina as well :)


The hotness factor is only based on the outcome of the ongoing game. It should also consider the live impact on classification/awards.

Have a look at yesterday's CL game between Paris & Leverkusen : http://www.gambletron2000.com/events/2276/paris-st-g-v-lever..., which was the second of a 2-legs opposition, Paris having won 4-0 on the first one. Which means 99.9% chances of qualifying It has a _mildly hot_ 762 score, where it should have been between 0 and 10. As the game was almost meaningless, the fact that Leverkusen scored first before finally losing the game didn't bring any excitation.


Yeah, but that score is based on in match betting, rather than "to qualify", no? So what I find most interesting there is that the odds continued to rise for PSG after the goal, even though people (like me) put money on when they went behind -better odds, and cheers for the £20. The crucial thing that's at issue with this system is that it's not just the weight of money that's at work here, but also the bookies adjusting the odds to ensure their book is still green, and to encourage/discourAge people putting further weight on particular outcomes.

Tl;dr - the method is really interesting, but imperfect because it's not a pure market, I think. Bet fair odds or similar would be interesting, but their API is horrendous.


"Maybe there’s a slight tendency for teams in the 10-20% range to win at a slightly higher rate than expected (and consequently teams in the 80-90% range to lose more than expected), but the difference is pretty small, and given the number of observations and parameters, it would not be surprising if this deviation occurred completely randomly."

This is a well known phenomena [1] that manifests in almost all prediction markets. People tend to overestimate the likelihood of likely events, and underestimate the chance of a rare events. If you're patient (and, importantly, trading fees are low enough) then it is usually possible to profit from these "sure thing" positions over many event

1. Just one of the many links you find in google on the subject: http://journal.sjdm.org/9729b/jdm9729b.html


I think the Recap functionality might be understated. If they can add more color data about event times, actual scores, player names, then I think they could be on to something in automating recaps of games.


Shameless plug for my very work-in-progress (read: lots of broken links) side-project: http://recappd.com

Any tips for taking the information there and turning it into a natural language sounding recap would be appreciated.

Also, one of the leaders in this field: http://automatedinsights.com/


It will be interesting to compare the data and graphs presented here to historic win probability charts provided by Advanced NFL Stats[1] for the NFL and Fangraphs[2] for MLB. See how Vegas stacks up against models based on historical data.

[1] http://live.advancednflstats.com/ [2] http://www.fangraphs.com/wins.aspx?date=2013-10-30&team=Red%...


Pretty cool data, but doesn't reveal anything too insightful into peoples' gambling behaviors regarding sports. People are prone to a game's intangible momentum, who knew?


I wonder where they get their data, I feel that the community could come up with a better ranking algorithm than the square distance formula that they give


I'm wondering the same. They mention TradeSports, which is Intrade's prediction market for sports, but it hasn't launched yet afaik (I've been on their e-mail list since before Intrade shut down in March 2013, and I haven't received anything).

EDIT: nm. the reference to TradeSports was to 2007 (before it shut down in 2008). http://en.wikipedia.org/wiki/TradeSports


Betfair let you download historical market data at http://data.betfair.com/ for free (but you need an account with them)


I wonder how much better the prediction market does compared to a predictive model based on the score and time remaining. Given data on the games, it would be pretty easy to develop a model on the P(winning|scores,time_remaining). Would that do as well as humans in aggregate? Obviously it doesn't take into account momentum, how well the teams are playing, etc.


Another thought: Some indication of a "trade volume" equivalent might be useful as well (and probably a good input for a "hotness" score, as more popular games will probably get more bets). I don't know if that information is available though.


it definitely is available with betfair - it allows you to filter a huge amount of garbage data because people are always setting 1.01 / 1000.0 positions, and if you don't exclude them they skew actual implied probabilities.


I wonder how this applies to other markets and domains. As well cod it be used to find irregularities and cheating?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: