Basic explanation
What the numbers mean
I offer several rankings that can be used to predict the outcome of men's college basketball games. They are based on mathematical formulas that incorporate three factors:
- Average score margin: it is highly correlated to winning percentage.
- Strength of schedule: a team is better than another team if they have the same average score margin but the former played tougher opponents.
- Home court advantage: teams perform better than expected at home, usually by about 3.5 points on average.
The numbers that I report are:
- W-L: Stands for win-loss, counting only games between Division I teams.
- AASM+HCA: Stands for adjusted average score margin plus home court advantage. This ranking incorporates all three factors and is usually the best predictor. The difference in values for two teams is the expected score margin if the teams play.
- SOS: Stands for strength of schedule and the value shown is the average AASM+HCA of a team's opponents, with rank in in parentheses. Note that this is not how SOS is incorporated into the AASM+HCA ranking (see mathematical explanation below)
- Last week: The value and rank from the AASM+HCA rankings ignoring the last 7 days is shown.
- AASM: Stands for adjusted average score margin, a ranking that incorporates the first two factors and has good predictive ability. The AASM value and rank are shown, and the difference in values for two teams is the expected score margin if the teams play.
- ASM: Stands for average score margin, and is the first factor only. It has moderate predictive ability. The ASM value and rank are shown, and the difference in values for two teams is the expected score margin if the teams play.
- APS: Stands for average points scored (offense) per game and has poor predictive ability. The APS value and rank are shown.
- APA: Stands for average points against (defense) per game and has poor predictive ability. The APA value and rank are shown.
- σ: The standard deviation of the discrepancy between the predicted and actual score margin is shown. Better rankings tend to have lower values because more of the variation in score margin is captured by the formula.
- HCA: Stands for home court advantage. Add this number to the home team's ranking when predicting game outcomes.
Rankings design
The key feature of my rankings is that the weighting of each factor in the formulas is not arbitrarily chosen. Instead, the rankings, strength of schedule, and home court advantage are computed simultaneously using equations that have rational underpinnings. Thus, the rankings are completely objective and unbiased.
Another key feature of the rankings is that they are designed to predict game outcomes. I make no attempt to maximize the fit to past games or match consensus. Instead, each ranking is intended to predict future games and I have prospectively verified their performance. Extensive performance data is provided elsewhere in this site so that my rankings can be used as benchmarks for other predictive rankings.
Finally, the formulas are published below in detail. This allows others to see that the math is not mysterious and others may verify my rankings and improve upon them. For example, the rankings were designed for NCAA men's division I basketball games, but can be easily adapted to the women's game or other sports if data is available. Basketball happens to be a good sport because of the large number of games and its finely and normally distributed scores. My initial efforts in college football are promising and may appear at this site later. Also, the rankings do not consider the effect of time. Some improvement may be possible by emphasizing more recent games. I would be happy to hear about such efforts in this regard.
The rankings are updated after the games each Sunday. From week to week, rankings will tend to rise for teams that performed better than expected, including losing by a smaller margin than expected. The opposite is also true.
Disclaimer: These rankings are for entertainment purposes only. Raymond Cheong is not responsible for misuse of the data in this website.
Mathematical explanation
Average score margin
Average score margin (ASM) is highly correlated to a team's winning percentage. This makes intuitive sense, because the more potential a team has to outscore an opponent, the more likely the former will win. A naive ranking system that uses differences in ASM to predict the actual score margin does surprisingly well.
Ranking systems that rely on score margins de-emphasize large score margins in the formula. After all, the goal of the game is to win and not to greatly outscore opponents in an unsportsmanlike manner. However, I choose not to cap score margins in my rankings for several reasons. First, there are many games each season with very large score margins and they occur at a predictable Gaussian frequency. Second, if teams ease up when the lead gets large then the actual score margin should be inflated--not capped--in order to reflect the real difference between the teams. Third, it is unclear how to objectively implement a cap and its use might not improve predictions.
Strength of schedule
A team's ASM depends on the strength of the opponents. Suppose we had a measure of each team's strength, r (i.e. the ranking), such that ri - rj is the expected score margin when team i plays team j. For r to be consistent with the available data, then the average discrepancy between the actual and expected score margin should be zero. In other words, for each team i, r ought to satisfy the equation:
where Opp(i) are the opponents of team i
and Si and Sj are the scores of team i and j when they play |
This equation incorporates both the ASM (the second term of the sum) and strength of schedule (the first term of the sum). Furthermore, there is one such equation for each team and they all together form a linear system that can be easily solved with matrices. Note that an additional constraint is needed to ensure a unique solution, because the equations are still satisifed if a constant is added to each r. A natural constraint is to require the r's to add to zero. This ranking is called the adjusted average score margin (AASM), in the sense that the ASM has been adjusted for strength of schedule.
Note that one can recast the AASM formula as the solution to a least-squares regression model where the actual scores are a linear function of ranking differences.
Home court advantage
The home court advantage, if it exists, results in a points bonus for teams playing at home. Thus, in the AASM equation, the HCA must be subtracted from the actual score margin if team i is playing at home, and added if team i is playing away from home. The new equation becomes:
where HCA is the home court advantage
and Di is the number of home minus the number of away games for team i |
The actual value of the HCA can be computed simultaneously. The HCA is the discrepancy between the expected and actual score margins for all home-away games:
where the sum is over all home-away games
and NHA is the number of such games |
Since this equation is linear in terms of the unknown rankings, the AASM+HCA equations are still linear overall. It also has a unique solution with the added constraint of r's summing to zero. This system is called AASM+HCA because it gives the average score margin adjusted for both strength of schedule and the home court advantage.
Note that one can recast the AASM+HCA formula as the solution to a least-squares regression model where the actual scores are a linear function of ranking differences and the home court advantage.
Implementation
Solving the AASM and AASM+HCA equations for r is a simple procedure. The equations can be rewritten into matrix form as follows. For the AASM rankings, construct a game-connectivity matrix G such that:
Gii = # of games played by team i
Gij = - (# of games between team i and team j) |
(My G matrix turns out to be identical to the G matrix used in the explanation of the Colley Matrix.) Then construct a vector b representing the total score margin relative to each team:
Replace any row of G with a row of 1's and the replace the corresponding element of b with 0. The solution is r = G-1b. Note that there is a unique solution if and only if G is invertible (i.e. the teams are fully connected) and this typically occurs about 4 weeks into the season.
For the AASM+HCA rankings, construct the home-away vector D such that Di is the number of home games minus the number of away games for team i. From this define the home-away matrix H and vector h such that
|
where i may equal j
and the sum is taken over all home-away games |
The solution is r = (G-H)-1(b-h) where any row of G+H is replaced with a row of 1's and the corresponding element of b+h is replaced by 0.
Details
Only games between Division I schools are considered. Also, in order to control for the length of each game, overtime games are treated like ties. That is, the score margin in overtime games is set to zero in the above formulas. Since I do not have access to the score at the end of regulation, this may cause a slight inconsistency: APS minus APA may be slightly different than ASM.
History
I began dabbling in sports rankings in 2000. Patrick Stevens, a dormmate at the University of Maryland (Go Terps!) and now a sports writer for the Washington Times, was organizing a basketball pool and commented that those who knew least about sports often do best in pools. Certainly I fit into the category of sports ignorance so I gave it a try. To choose my bracket, I created a simple algorithm based on seed values and a random number generator ("quality factor"), inspired by the algorithm of Robert Rohde, a close friend and dormmate. Indeed, I ended up in the lead after the regional finals, but Michigan State prevailed and I did not. Nonetheless, this sparked an interest in tournament prediction and I tried some variations of the quality factor algorithm in the years following.
Later, Robert created a serious ranking system (it uses the quality factor terminology but it is quite different!). This again inspired me to try to create a real ranking and, in 2003, I gave it a shot. I analyzed some score data and made the simple observation that score margin is a strong predictor of a team's performance, and this lead to the AASM system. The algorithm was incredibly successful and often achieves 90th percentile or better in the ESPN tournament challenge. In 2005 it correctly predicted the Final Four and national champion. The algorithm has not changed substantially except in 2007 when a home court advantage was incorporated.
Notes
-
2007-2008
- AASM matrix method plus home court advantage, overtime games count as ties
- Implemented in Matlab 7, data courtesy Ken Pomeroy
- 8th annual algorithmic tourney winner: Bobby's "Evil Plan 100" (one of the permutations of all #1 seeds in the Final Four, ESPN 99.5%ile)
- Notable entry: Jason's "Team Victory" (cluster-based method, best "player" was ESPN 99.1%ile)
-
2006-2007
- AASM matrix method plus home court advantage, overtime games count as ties
- Implemented in Matlab 7, data courtesy Ken Pomeroy
- 7th annual algorithmic tourney winner: Massey's Consensus, entered by Shawn (ESPN 95.2%ile)
- Notable entries: Nina's "Little sisters are better" bracket, Danny's "Average Height" bracket
-
2005-2006
- AASM matrix method, no home court advantage, overtime games count as ties
- Implemented in Mathematica 5.0, data courtesy Ken Pomeroy
- 6th annual algorithmic tourney winner: Raymond's average score margin bracket (ESPN 89.9%ile)
- Notable entry: Bobby's "Ray must be defeated" bracket
-
2004-2005
- AASM matrix method, no home court advantage, overtime games count as ties
- Implemented in Mathematica 5.0, data courtesy Ken Pomeroy
- 5th annual algorithmic tourney winner: Raymond's straight-rank bracket (ESPN 98.8%ile)
- Notable entry: Jason Ernst's Vegas Odds bracket
-
2003-2004
- AASM matrix method, no home court advantage
- Implemented in Mathematica 5.0, data courtesy Ken Pomeroy
- 4th annual algorithmic tourney winner: Raymond's straight-rank bracket (ESPN 99.0%ile)
-
2002-2003
- AASM matrix method, home court advantage included
- Implemented in Matlab, data courtesy CollegeRPI.com
- 3rd annual algorithmic tourney winner: Raymond's straight-rank bracket
- Notable entry: Jason Ernst's GoogleRank bracket
-
2001-2002
- Quality Factors based on ESPN Power Rankings
- Implemented in Matlab
- 2nd annual algorithmic tourney winner: Shawn's custom quality-factor bracket
-
2000-2001
- Quality Factors based on ESPN Power Rankings
- Implemented in Matlab
- 1st annual algorithmic tourney winner: not recorded
-
1999-2000
- Quality Factors based on tournament seed
- Implemented manually on TI-85 calculator