Statmuse.com has a really nice implementation of this in their natural-language sport statistics engine. You can see an image here http://i.imgur.com/ukganMx.png
When I first heard of statmuse I had your same question, best I could come up with is "Constructing an Interactive Natural Language Interface for Relational Databases"[1]. If anyone knows more about this I'd be curious to hear about it.
The plots showing the full dataset as a scatter plot on the court are great, and the most useful plots in this post. I also like the outlines of the court (though if I were making these charts I’d draw the court lines in gray or pale orange or something so the data would stand out more).
The heat maps are much less useful IMO, because the colors are poorly chosen and the data generalization/binning methods seem kind of arbitrary. Until the shot count gets up into the tens of thousands or more, just show all the data. (If e.g. aggregating all the shots from the whole league, then some kind of binning would become necessary.)
The marginal histograms showing density by x/y coordinates on the court are essentially useless in my opinion. Dramatically more interesting would be marginal histograms related to angle and distance in terms of polar coordinates centered at the basket (it might be necessary to ignore the angles for positions very close to the basket, where angle is kind of irrelevant). To make them even more informative, since the 3 point line isn’t a perfect semicircle, make marginal distributions (in terms of angle and distance) of separate categories of 2 point and 3 point shots, and stack them. Or even three categories showing dunks/layups, 2 point jump shots, and 3 point jump shots.
the first chart in there sort of fits a line from draft position to career AV (7 paragraphs in)
I'm curious if you can hold for certain factors and move that line around
specifically I'm curious if the line changes if you adjust for winning percentage of the player's college team
ie if you did a curve players from schools w/ winning percentages >.750, .750 - .500, .500 - .250, .250 - 0, would that produce 4 noticeably different curves?
------------------------------------
it appears that all the data necessary to do that is contained between NFL-reference and CFB-reference
the way I would think to do that is to scrape the data off the site and put it into an excel sheet to work with
is that the best way to do that?
or is there a function within the site that I can work with such that I don't have to scrape the data?
-------------------------
thanks, sorry if this sort of question was outside the bounds of this message board
we are in the process of doing a slight graphical upgrade to the sites in the near future to clean them up and make them more mobile friendly. stay tuned!
Very cool! I came across Peter Beshai's work a couple of weeks ago - he has a great d3.js visualisation of basketball shots that might be of interest :)
Krossover built an interactive version of this that plays clips of each shot taken. Link here jumps to a brief overview of it https://youtu.be/3MZF_u-6OT4?t=1m5s
I have a growing interest in statistics, partially coming from a lifelong love of basketball. I am comfortable with Python and have done a few small projects in it. How can I use my Python skill with statistics for more projects like this? Can anyone point me the right way? Does not have to be basketball related at all.
Interesting, one of my favourite professors uses Processing for the Introduction to Programming course at my school and decided to make students plot based on an array he pulled from NBA top 10 player data (or at least 10 or so players). Definitely forwarding this one to him, maybe it could be turned into a challenge for students. I wonder how this would look in Processing with Python (it's supported as an addin).
...or if you're in the beta, here: https://www.statmuse.com/nba/search?q=michael%20jordan%20sho...