When I tell people I'm a statistician, most people tell me that they had the (dis)pleasure of taking a required statistical course at some point in their academic career. Since the standard intro course is anything but user-friendly, I usually get comments to the effect of "I did alright in the class, but I don't remember anything." However, occasionally one gets a very good question like the following: "Why is (n-1) the degrees of freedom in the standard deviation formula rather than n?"
Some statisticians argue that using n-1 is a bit pedantic as they usually deal in large samples and n - 1 tends to n asymptotically, However, it is instructive (and makes for better conversation) to consider what would happen with a small sample. In particular, what if we sampled a single object? Why, we would know nothing of the variation! If we attempted to compute the standard deviation, we would arrive at a division by zero error using the above formula, consistent with the conclusion that the variation is unknowable when sampling only a single object. However, if instead had a population of one member, the population mean would be none other than the value of that single observation. In such a (boring) population, there is no variation at all, so variance and standard deviation should be zero. Indeed, if we compute the variance/SD using the divide by n version of the formula above, we divide 0 (difference between observation and the population mean) by 1 (the degree of freedom).
Thus a complex situation can be better understood by reducing it to an extreme, a technique that is common in both logic and mathematics.
No comments:
Post a Comment