How to create NBA shot charts in Python

flashman · on July 31, 2015

Statmuse.com has a really nice implementation of this in their natural-language sport statistics engine. You can see an image here http://i.imgur.com/ukganMx.png

...or if you're in the beta, here: https://www.statmuse.com/nba/search?q=michael%20jordan%20sho...

tylerpachal · on July 31, 2015

Just watched their video... I can't believe I have never heard of these guys before!

Looks like they're translating natural-language into some sort of query language. Are there any well-known methods for doing this kind of thing?

__john · on July 31, 2015

When I first heard of statmuse I had your same question, best I could come up with is "Constructing an Interactive Natural Language Interface for Relational Databases"[1]. If anyone knows more about this I'd be curious to hear about it.

[1]http://www.vldb.org/pvldb/vol8/p73-li.pdf

Edit:

Since this got brought up again I've been searching around a little more and found c-phrase, which looks similar.

https://code.google.com/p/c-phrase/ https://www.youtube.com/watch?v=fWio8bHq4wQ

savvas_tj · on July 31, 2015

That's incredible.

nealrs · on July 31, 2015

just, awesome.

jacobolus · on July 31, 2015

The plots showing the full dataset as a scatter plot on the court are great, and the most useful plots in this post. I also like the outlines of the court (though if I were making these charts I’d draw the court lines in gray or pale orange or something so the data would stand out more).

The heat maps are much less useful IMO, because the colors are poorly chosen and the data generalization/binning methods seem kind of arbitrary. Until the shot count gets up into the tens of thousands or more, just show all the data. (If e.g. aggregating all the shots from the whole league, then some kind of binning would become necessary.)

The marginal histograms showing density by x/y coordinates on the court are essentially useless in my opinion. Dramatically more interesting would be marginal histograms related to angle and distance in terms of polar coordinates centered at the basket (it might be necessary to ignore the angles for positions very close to the basket, where angle is kind of irrelevant). To make them even more informative, since the 3 point line isn’t a perfect semicircle, make marginal distributions (in terms of angle and distance) of separate categories of 2 point and 3 point shots, and stack them. Or even three categories showing dunks/layups, 2 point jump shots, and 3 point jump shots.

savvas_tj · on July 31, 2015

I'm the author, thanks for suggestions on improving the charts, especially regarding the marginal histograms.

Just a couple of questions: What color maps would you use for those kde plots?

I know seaborn by default uses the Freedman-Diaconis rule to create bins for the hexbin plot. But, what suggestions do you have for binning?

hvs · on July 31, 2015

We include shot charts on Basketball-Reference.com and allow for side-by-side comparisons as well.

http://www.basketball-reference.com/players/h/hardeja01/shoo...

http://www.basketball-reference.com/play-index/plus/shooting...

pitt1980 · on July 31, 2015

Do you work with the Sport-Reference.com sites?

if so, would you mind if I asked you a somewhat technical question?

Is there some sort of advanced query function?

-------------------------------

Here's kind of what I'd like to be able to do

http://fivethirtyeight.com/features/no-team-can-beat-the-dra...

the first chart in there sort of fits a line from draft position to career AV (7 paragraphs in)

I'm curious if you can hold for certain factors and move that line around

specifically I'm curious if the line changes if you adjust for winning percentage of the player's college team

ie if you did a curve players from schools w/ winning percentages >.750, .750 - .500, .500 - .250, .250 - 0, would that produce 4 noticeably different curves?

------------------------------------

it appears that all the data necessary to do that is contained between NFL-reference and CFB-reference

the way I would think to do that is to scrape the data off the site and put it into an excel sheet to work with

is that the best way to do that?

or is there a function within the site that I can work with such that I don't have to scrape the data?

-------------------------

thanks, sorry if this sort of question was outside the bounds of this message board

hvs · on July 31, 2015

you can download CSVs from the draft finder on pro-football-reference.com. that includes career AV.

http://www.pro-football-reference.com/play-index/draft-finde...

pitt1980 · on July 31, 2015

thanks

prawn · on July 31, 2015

Always wanted to redesign Basketball Reference. Nothing fancy (I can appreciate that the data is king), just clean it up a bit.

hvs · on July 31, 2015

we are in the process of doing a slight graphical upgrade to the sites in the near future to clean them up and make them more mobile friendly. stay tuned!

prawn · on Aug 3, 2015

Good to hear. Great resource either way. I use the site all the time.

markovbling · on July 31, 2015

Very cool! I came across Peter Beshai's work a couple of weeks ago - he has a great d3.js visualisation of basketball shots that might be of interest :)

http://peterbeshai.com/buckets

wallerj77 · on July 31, 2015

Krossover built an interactive version of this that plays clips of each shot taken. Link here jumps to a brief overview of it https://youtu.be/3MZF_u-6OT4?t=1m5s

raverbashing · on July 31, 2015

I wonder how does NBA generates these data (X, Y position and shot attempts), if it's manually input from video or something smarter

mjn · on July 31, 2015

It's automatically extracted from video, using six cameras and a commercial system called SportVU: http://stats.nba.com/tracking/

TheAlchemist · on July 31, 2015

There is a nice TED talk also on the topic: http://www.ted.com/talks/rajiv_maheswaran_the_math_behind_ba...

lghh · on July 31, 2015

I have a growing interest in statistics, partially coming from a lifelong love of basketball. I am comfortable with Python and have done a few small projects in it. How can I use my Python skill with statistics for more projects like this? Can anyone point me the right way? Does not have to be basketball related at all.

skadamat · on July 31, 2015

Lots of resources out there!

www.datasciencemasters.org -- comprehensive list of things to learn / explore

www.dataquest.io -- (disclaimer: I'm involved with the company), teaches data science in the browser via projects

lghh · on July 31, 2015

Thanks! I will look into both of those tonight.

giancarlostoro · on July 31, 2015

Interesting, one of my favourite professors uses Processing for the Introduction to Programming course at my school and decided to make students plot based on an array he pulled from NBA top 10 player data (or at least 10 or so players). Definitely forwarding this one to him, maybe it could be turned into a challenge for students. I wonder how this would look in Processing with Python (it's supported as an addin).

_spoonman · on July 31, 2015

For the life of me, I can't figure out how to navigate to this page: http://stats.nba.com/player/#!/202322/tracking/shotslogs/ I don't see /shotslogs/ anywhere on stats.nba.com

_spoonman · on July 31, 2015

Ignore my lack of caffeine please. Found it.

ajstarks · on Aug 4, 2015

A version in Go: https://twitter.com/ajstarks/status/628262271026876416

ajstarks · on Aug 7, 2015

Now part of the SVGo package: https://github.com/ajstarks/svgo/blob/master/shotchart/shotc... get data from the network or local files)

joeleet · on July 31, 2015

Goldsberry, Jr!