Performance Explanation

How rankings are evaluated

After the games on each Sunday, six rankings are computed:

  1. AASM+HCA: Ranks on the basis of score margin, strength of schedule, and home court advantage.
  2. AASM: Ranks on the basis of score margin and strength of schedule.
  3. ASM: Rank is the average score margin where overtime games treated like ties.
  4. Colley: Ranks on the basis of wins & losses and strength of schedule. This is my reproduction of Colley's published formula*, except that overtime games are treated like ties (no increment to number of wins or losses).
  5. LRMC0: Ranks on the basis of wins & losses and strength of schedule. This is my reproduction of Sokol's LRMC0 ranking*, which is the same as the random walker algorithm described by Callaghan, et al. applied to basketball instead of football. I use a transition probability p = 0.5 for overtime games to treat them as ties, and use p = 0.9999 (trivially smaller than 1) for all other games to ensure connectedness even if there are undefeated teams.
  6. APS: Rank is the average points scored per game.
  7. APA: Rank is the average points allowed per game.

These rankings are used to predict games during the next seven days, with the number of such games given in the second column. The rankings after the last game of the regular season is used to predict the entire postseason (NIT and NCAA tournaments). For game winner prediction, the team with the better rank is predicted to win. For score margin prediction, the squared correlation coefficient (commonly denoted r2) between the difference in rankings and the actual score margin is reported. For all but the Colley matrix and LRMC0, the ranking difference is expected to equate to the actual score margin. For the Colley matrix, they are expected to be linearly correlated. The LRMC0 rankings may or may not be linearly correlated. Scatter plots are shown for predicted versus actual score margin, combining predictions from all weeks. The density of points is shown by blue to yellow to red colors.

* This is not an attempt to discredit the Colley Matrix or the LRMC. Rather, I appreciate openness and transparency in computer rankings precisely because it allows scientific evaluation by peers. I have also published my formula and encourage others to improve it.


Generally, for either game winner or score margin predictions, performance from from best to worst is AASM+HCA > AASM > ASM > APS > APA. Since AASM is the average score margin adjusted for strength of schedule and AASM+HCA is AASM adjusted for home court advantage, this demonstrates that score margin, strength of schedule, and home court advantage independently improve prediction accuracy.

Prediction accuracy seems to improve hand-in-hand with the ability to accurately summarize the past. The latter can be measured using the standard deviation of the difference between predicted and actual score margins for games included in the rankings. The lower the standard deviation, the better the past-summarizing ability. The ASM rankings usually have a standard deviation of ~9.1 pts, which drops to about 8.8 pts in the AASM rankings, and drops even lower to 8.6 pts in the AASM+HCA rankings. This suggests that strength of schedule and home court advantage are both important in summarizing the past. Nonetheless, the residual variation is quite large and suggests a limit to predictive power unless additional important factors can be identified and incorporated.

Note that, intuitively, a ranking based on score margin should be more accurate than a ranking based solely on win-loss record. Consistent with this idea, the Colley matrix and LRMC0 underperform both the ASM and AASM.

Performance of other rankings

I have tested the weekly performance of many other rankings but they are not shown for simplicity. The results are summarized as follows:

Is there a limit to performance?

If a team's day-by-day performance is random but normally distributed about some average (and retrospectively it is), then we can estimate the probability that a prediction will be right using the formula Φ(Δr/σ) where Φ is the cumulative distribution function of the standard normal, Δr is the ranking difference, and σ is the retrospective standard deviation. The calculation can be done with the ASM, AASM, and AASM+HCA rankings and the results are shown in the performance pages in the row called "Expected". The expected and actual results are very close and suggest that (1) the underlying statistical model is sound and (2) performance of the existing rankings will only improve by random chance.

In order to systematically raise performance, additional information must be incorporated into the rankings. One possibility is to consider the temporal order of games and many other ranking systems more heavily weight recent games to reflect recent performance. I examined the AASM+HCA ranking of each team each week and found there is essentially no serial correlations in improvements or declines. In other words, a team that had higher/lower rankings in consecutive weeks (played better than expected) was no more or less likely to have an increased/decreased ranking in the next week. This seems counterintuitive at first, because it is probably true that teams improve with time (in the sense that a team could beat the past version of itself). However, AASM+HCA rankings are relative, and the relative ability of teams can remain unchanged if they improve at identical rates. The lack of serial correlations suggests that teams improve at roughly even rates and that incorporating time into the rankings will not substantially improve performance.

Better performance may require more data than just date, final score, and location. For example, fouling at the end of close games or letting up at the end of blowout games may create artifactual score margins. Perhaps a better measure of ability is the halftime score margin, especially since scores are not serially correlated in time until the end of the game (at least in professional basketball, see Hal Stern's lecture). More detailed team and player statistics may also be helpful but each addition will need to be carefully justified to avoid overfitting.