Thursday, January 24, 2019

Analysis of Spatial NBA Shot Data in a Polar Coordinate System


Introduction:
With the rise of player tracking data, spatial data analysis is likely the future of sports analytics. NBA “shot-charts” may yield valuable information about not only the distance, but also the angle of the attempt. It was hypothesized that after adjusting for distance, the difficulty of a three-point shot attempt increases with the angle relative to the perpendicular bisector of the baseline. This analysis was undertaken with an eye to further optimize one of the most efficient shots in professional basketball.

Methods:
A Web-Crawling script in Python 3.4 in combination with the Beautiful Soup 4 package1 was used to mine shot location data from the box-score of every NBA game from the last two regular seasons on www.basketball-reference.com. Pixel information was converted to feet from the cylinder and validated against the reported shot distance by basketball reference. The correlation between radial distance and reported shot distance was R=0.9994. Using the hoop as the origin, polar coordinates were used to test and quantify the strength of effect of the absolute value of shooting angle upon shot success via a distance-adjusted logistical regression model, fit to 2016/2017 season and validated in 2017/2018 data.

Results:
Exploratory analysis in the training set suggests that distance and angle of field goals are interactive in nature.  Sub-analysis was conducted in short (3-8 ft), middle (8-16 ft), and long (16-24 ft) range 2 pointers as well as 3 pointers. Short range shots were still found to be interactive with respect to angle and distance (p=0.022). Only the angle was found to affect middle range shots (odds ratio of 0.878, 95% CI 0.836-0.922) suggesting that the odds of success decrease as shooters move off the perpendicular bisector of the baseline (PBB). In longer range shots, the angle becomes insignificant (p=0.838), but additional distance can change odds of success (OR = 0.977, 95% CI = 0.963-0.991). Three point attempts were interactive in nature (p=0.002), so analysis was restricted to traditional 23’9” attempts to rule out the confounding “sweet spot” type of three pointers. Findings were similar to that of the longer jump shots, with only increased distance compounding the difficulty of the shot (OR = 0.920, 95% CI = 0.896-0.945). Similar results regarding significance were obtained in the test set.

References:
1.       https://www.crummy.com/software/BeautifulSoup/bs4/doc/
2.       https://www.basketball-reference.com/play-index/tgl_finder.cgi
3.       Hosmer, D. & Lemeshow, S. (2000). Applied Logistic Regression (Second Edition). New York: John Wiley & Sons, Inc.
4.       Long, J. Scott (1997). Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks, CA: Sage Publications.