Hacker News new | past | comments | ask | show | jobs | submit login
How to create NBA shot charts in Python (savvastjortjoglou.com)
253 points by ryanb on July 31, 2015 | hide | past | favorite | 28 comments



Statmuse.com has a really nice implementation of this in their natural-language sport statistics engine. You can see an image here http://i.imgur.com/ukganMx.png

...or if you're in the beta, here: https://www.statmuse.com/nba/search?q=michael%20jordan%20sho...


Just watched their video... I can't believe I have never heard of these guys before!

Looks like they're translating natural-language into some sort of query language. Are there any well-known methods for doing this kind of thing?


When I first heard of statmuse I had your same question, best I could come up with is "Constructing an Interactive Natural Language Interface for Relational Databases"[1]. If anyone knows more about this I'd be curious to hear about it.

[1]http://www.vldb.org/pvldb/vol8/p73-li.pdf

Edit:

Since this got brought up again I've been searching around a little more and found c-phrase, which looks similar.

https://code.google.com/p/c-phrase/ https://www.youtube.com/watch?v=fWio8bHq4wQ


That's incredible.


just, awesome.


The plots showing the full dataset as a scatter plot on the court are great, and the most useful plots in this post. I also like the outlines of the court (though if I were making these charts I’d draw the court lines in gray or pale orange or something so the data would stand out more).

The heat maps are much less useful IMO, because the colors are poorly chosen and the data generalization/binning methods seem kind of arbitrary. Until the shot count gets up into the tens of thousands or more, just show all the data. (If e.g. aggregating all the shots from the whole league, then some kind of binning would become necessary.)

The marginal histograms showing density by x/y coordinates on the court are essentially useless in my opinion. Dramatically more interesting would be marginal histograms related to angle and distance in terms of polar coordinates centered at the basket (it might be necessary to ignore the angles for positions very close to the basket, where angle is kind of irrelevant). To make them even more informative, since the 3 point line isn’t a perfect semicircle, make marginal distributions (in terms of angle and distance) of separate categories of 2 point and 3 point shots, and stack them. Or even three categories showing dunks/layups, 2 point jump shots, and 3 point jump shots.


I'm the author, thanks for suggestions on improving the charts, especially regarding the marginal histograms.

Just a couple of questions: What color maps would you use for those kde plots?

I know seaborn by default uses the Freedman-Diaconis rule to create bins for the hexbin plot. But, what suggestions do you have for binning?


We include shot charts on Basketball-Reference.com and allow for side-by-side comparisons as well.

http://www.basketball-reference.com/players/h/hardeja01/shoo...

http://www.basketball-reference.com/play-index/plus/shooting...


Do you work with the Sport-Reference.com sites?

if so, would you mind if I asked you a somewhat technical question?

Is there some sort of advanced query function?

-------------------------------

Here's kind of what I'd like to be able to do

http://fivethirtyeight.com/features/no-team-can-beat-the-dra...

the first chart in there sort of fits a line from draft position to career AV (7 paragraphs in)

I'm curious if you can hold for certain factors and move that line around

specifically I'm curious if the line changes if you adjust for winning percentage of the player's college team

ie if you did a curve players from schools w/ winning percentages >.750, .750 - .500, .500 - .250, .250 - 0, would that produce 4 noticeably different curves?

------------------------------------

it appears that all the data necessary to do that is contained between NFL-reference and CFB-reference

the way I would think to do that is to scrape the data off the site and put it into an excel sheet to work with

is that the best way to do that?

or is there a function within the site that I can work with such that I don't have to scrape the data?

-------------------------

thanks, sorry if this sort of question was outside the bounds of this message board


you can download CSVs from the draft finder on pro-football-reference.com. that includes career AV.

http://www.pro-football-reference.com/play-index/draft-finde...


thanks


Always wanted to redesign Basketball Reference. Nothing fancy (I can appreciate that the data is king), just clean it up a bit.


we are in the process of doing a slight graphical upgrade to the sites in the near future to clean them up and make them more mobile friendly. stay tuned!


Good to hear. Great resource either way. I use the site all the time.


Very cool! I came across Peter Beshai's work a couple of weeks ago - he has a great d3.js visualisation of basketball shots that might be of interest :)

http://peterbeshai.com/buckets


Krossover built an interactive version of this that plays clips of each shot taken. Link here jumps to a brief overview of it https://youtu.be/3MZF_u-6OT4?t=1m5s


I wonder how does NBA generates these data (X, Y position and shot attempts), if it's manually input from video or something smarter


It's automatically extracted from video, using six cameras and a commercial system called SportVU: http://stats.nba.com/tracking/


There is a nice TED talk also on the topic: http://www.ted.com/talks/rajiv_maheswaran_the_math_behind_ba...


I have a growing interest in statistics, partially coming from a lifelong love of basketball. I am comfortable with Python and have done a few small projects in it. How can I use my Python skill with statistics for more projects like this? Can anyone point me the right way? Does not have to be basketball related at all.


Lots of resources out there!

www.datasciencemasters.org -- comprehensive list of things to learn / explore

www.dataquest.io -- (disclaimer: I'm involved with the company), teaches data science in the browser via projects


Thanks! I will look into both of those tonight.


Interesting, one of my favourite professors uses Processing for the Introduction to Programming course at my school and decided to make students plot based on an array he pulled from NBA top 10 player data (or at least 10 or so players). Definitely forwarding this one to him, maybe it could be turned into a challenge for students. I wonder how this would look in Processing with Python (it's supported as an addin).


For the life of me, I can't figure out how to navigate to this page: http://stats.nba.com/player/#!/202322/tracking/shotslogs/ I don't see /shotslogs/ anywhere on stats.nba.com


Ignore my lack of caffeine please. Found it.



Now part of the SVGo package: https://github.com/ajstarks/svgo/blob/master/shotchart/shotc... get data from the network or local files)


Goldsberry, Jr!




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: