Thursday, September 4, 2014

The Home Court Advantage

As the title suggests, this is regarding basketball.  Specifically, men's Division I college basketball.  This effort was part of my master's report, which ultimately was an exercise in data mining rather than statistical analysis (though some straight-forward statistical modeling was employed).  I essentially wrote a simple web-crawler to download all the play-by-play data from the website of the Big 12 Sports Conference.  The data was then parsed and analyzed using a combination of Excel and SAS.  The culmination of the paper (though you should read it in it's entirety, for a nominal fee: http://ijr.cgpublisher.com/product/pub.191/prod.123.) rests in the following model:


Admittedly, the model is useless for prediction (sorry gamblers), as it depends on the post-hoc conference rankings at the end of the season.  Sure, it may bit naive to assume that these rankings follow a linear pattern, but the error terms did appear normally distributed for the model.  Still, I believe the coefficients of the model are useful for inferring the approximate point value of a rebound, assist, blocked shot, and steal.  Possession gaining events like steals and defensive rebounds are worth approximately 1 additional point, while possession ending events like a turnover, cost about a point.  This seems reasonable since the typical shooting percentage is around 50%, so gaining/losing a possession can be expected to net one more/less point.

In terms of application, I think this model would be useful for a division I coach to quantify the contribution of non-scoring events from the box score alone to rate the contribution of his players to the total margin of victory.  Adjusting for total minutes played (i.e. tabulating the point differential per minutes on the court) might allow them to compare players of the same position with respect to these non-scoring statistics.

Tuesday, September 2, 2014

Carrots and Caveats

The Carrot:
To inspire you to read more of my posts, let me tell you a little about myself.  The power of prediction is what lured me into pursing a Masters of Science in Statistics after earning an undergraduate degree in Pure Mathematics.  If mathematics is the mother of all sciences, then surely statistics is her firstborn son.  Few people realize that statistics is an entire field of study, rather than some numbers tacked onto a report at work or some poll in the newspaper.  In fact, whenever I admit to being a statistician, the majority of people respond, "Oh, so you're a numbers guy."  While that may be true, Statistics is SO much more than numbers: it's the systematic study of random variation and the production of meaningful models to understand an academic, financial, or competitive endeavor.

The Caveat:
After completing my graduate work in 2012, I've been applying my science to clinical research.  Some days I doubt that statistics is powerful enough to meaningfully predict outcomes that depend so heavily on fickle human nature.  Still, as George Box said, "all models are wrong; some models are useful."  Since no model is perfect and good statisticians do "precision guesswork", please don't send me angry emails saying, "I applied the model you reported in such-and-such post and we lost a boatload of money,"  There is always a chance that the model prediction was wrong (due to false assumptions in the data or the modeling) or you got unlucky.  Also, as you read these posts, I'd encourage you to approach all analysis with an open mind and tempered with common sense, as statistics should always be used for illumination, rather than support of a preconceived notion.