Monday, December 21, 2015

Why (N-1) degrees of freedom?

This post is a diversion from my previous work in sports statistics to delve into education matter, which has been piquing my interest more and more these days.

Image result for google images standard deviation formulaWhen I tell people I'm a statistician, most people tell me that they had the (dis)pleasure of taking a required statistical course at some point in their academic career.   Since the standard intro course is anything but user-friendly, I usually get comments to the effect of "I did alright in the class, but I don't remember anything."  However, occasionally one gets a very good question like the following:  "Why is (n-1) the degrees of freedom in the standard deviation formula rather than n?"

This question shows some understanding of the concept of a degree of freedom:  if we have n observations (or dimensions) that may change, why aren't there n degrees of freedom?  The technical answer (which is of little use in conversation) is that the sample average (x-bar above) is a mathematically related to these n observations (a linear combination thereof); with a linear dependence between the parameters in the formula,we lose one degree of freedom.  However, if we didn't need to estimate the population mean (perhaps it was known or we exhaustively sampled a small population to find it) then we would divide by n, as intuition dictates.

Some statisticians argue that using n-1 is a bit pedantic as they usually deal in large samples and  n - 1 tends to n asymptotically,  However, it is instructive (and makes for better conversation) to consider what would happen with a small sample.  In particular, what if we sampled a single object?  Why, we would know nothing of the variation!  If we attempted to compute the standard deviation, we would arrive at a division by zero error using the above formula, consistent with the conclusion that the variation is unknowable when sampling only a single object.  However, if instead had a population of one member, the population mean would be none other than the value of that single observation.  In such a (boring) population, there is no variation at all, so variance and standard deviation should be zero.  Indeed, if we compute the variance/SD using the divide by n version of the formula above, we divide 0 (difference between observation and the population mean) by 1 (the degree of freedom).

Thus a complex situation can be better understood by reducing it to an extreme, a technique that is common in both logic and mathematics.