Wednesday, November 16, 2016

Why Cy Young, why?

The recent public outcry regarding the election of Rick Porcello the Cy Young award winner in the American League has brought the unfairness of the system that balances ranked votes to my attention

The ranks of all the voting sportswriters can be seen on the BBWAA website, but here's a visualization of the voting breakdown for the top 5 pitchers.


The human brain is quite adept at pattern recognition, and the immediate focus of mine is drawn to the mode, or peak, for each pitcher:

Verlander - 1st (14)
Porcello  - 2nd (18)
Kluber - 3rd (12)
Britton - 4th (9)
Sale - 5th (10)

However, this ranking does not match the ranking as determined by the Cy Young formula:

Points = 7 * (# 1st) + 4 * (# 2nd) + 3 * (# 3rd) + 2 * (# 4th) + 1 * (# 5th)

Porcello = 137 points
Verlander = 132 points
Kluber = 98 points
Britton = 72 points
Sale = 40 points

Admittedly, I have no idea how these weights were set, but one should be wary of any (weighted) average of ranks.  The most glaring reason is that the separation between first and second is likely to be larger than that of second and third, and so-on down the line.

However, while this formula captures this in the weighting of first and second, it still considers 2nd, 3rd, 4th, and 5th place to be equidistant from each other, which seems like a major failing.

In fact, if one considers that a 4th place rank is worth twice as much as a 5th place rank, and suggests that first and second be weighted in the same proportion (8 and 4 respectively), we would have had a Verlander victory (146 to 145 points). Just a little food for thought about the mathematical underpinnings of taking a linear combination of ordinal variables.