# Rosenthal: A statistical ranking of NCAA basketball teams

Jeffrey Rosenthal, Special to TSN and TSN.ca
3/18/2013 2:57:14 PM
Text Size

I was asked by TSN to make predictions for the 2013 NCAA Men's Basketball "March Madness" tournament bracket based solely on a statistical analysis, without using any specific knowledge of NCAA teams (which is just as well since, although I like sports and watch them sometimes and even play a bit of neighbourhood pick-up basketball myself, I haven't closely followed any spectator sports in years).

So I proceeded by:

a) Gathering lots of different data variables for each team, for each of the past four regular seasons.

b) Separately gathering the results of each game of each of the past three years' March Madness tournaments.

c) Combining all of that data together for my computer programs to read (which turned out to be very time-consuming, since different data are available on different web sites in different formats with different team name abbreviations, so I had to "teach" my computer to match them all up).

d) Exploring different "non-negative linear combinations" of the data, i.e. formulas which use the data from a given regular season, to give an overall score to each team (I use the phrase "regular season" to include all games from that season prior to the NCAA March Madness tournament, including conference tournament games).

e) Developing computer programs to "fit" the formula based on previous seasons, i.e. to do an extensive search to figure out which of those formulas did the best job of predicting the winners for each game in that year's tournament, using data from the corresponding regular season.

f) Eventually coming up with a single best formula for this, which I call the "Rosenthal Fit."

g) Then, filling in the actual bracket simply by picking, for each game, whichever team has a larger value of their Rosenthal Fit.

The formula for the Rosenthal Fit, plus an evaluation of how well it performed when applied to data from the previous three years' tournaments, is provided below. Corresponding values for all teams for the 2012-2013 regular season (to be used to predict the 2013 tournament bracket) are listed just below:

General Observations:

The NCAA tournament is inherently hard to predict. Indeed, the total number of different ways of filling in your bracket predictions is 2^63 (i.e., 63 different 2's all multiplied together), which works out to about 9 x 10 to the 18th, i.e. a nine followed by 18 zeros, which equals nine billion billion, or nine million million million. That's a lot of possibilities!

In fact, even the experts find it challenging. For example, in past tournament games, the higher-seeded team only won about 70 per cent of the games. This means that even when many of the most knowledgeable people get together to seed the teams, they can still only correctly predict the winner about 70 per cent of the time. Individual expert basketball predictors (e.g. Kem Pomeroy at KenPom.com) tend to perform similarly, accurately predicting the winner in only about 70 per cent of the tournament games. Part of the reason is that each matchup is a single-elimination game, rather than e.g. a seven-game series, so there is lots of inherent day-to-day randomness, and it is quite possible for a weaker team to beat a "better" team in any one game, making predictions that much more difficult.

So, despite my extensive computer programming and statistical modeling, I do not expect to do better than calling about 70 per cent of the games correctly.

Indeed, I would say that anyone who does much better than 70 per cent would have to get fairly lucky (in addition to perhaps having a good predictive model and/or good knowledge of the basketball teams).

Statistical Data Considered:

To perform my statistical analysis, I downloaded and considered lots of different statistics, including the following (listed with sources):

- WinFrac: The team's overall game-winning fraction for the entire regular (pre-March Madness) season. (teamrankings.com)

- WinFrac3: The team's game-winning fraction in their final three regular season games. (teamrankings.com)

- CWinFrac: The team's game-winning fraction for games within their own conference. (realtimerpi.com)

- NCWinFrac: The team's game-winning fraction for games outside of their own conference. (realtimerpi.com)

- OffEff: The team's unadjusted offensive effiiency rating. (teamrankings.com)

- DefEff: The team's unadjusted offensive effiiency rating. (teamrankings.com)

- SOS: The team's "Strength of Schedule", a measure of the average strength of the opponents they played. (realtimerpi.com)

- RPI: The team's "Ratings Percentage Index". (realtimerpi.com)

- PntPG: The team's average number of points scored per game. (teamrankings.com)

- OpPnt: The team's average number of points scored against them per game. (teamrankings.com)

- I also examined the team statistics provided at ncaa.com and at espn.go.com, but they largely overlapped with the above statistics, so in the end I did not need to use them directly.

Finally, and most importantly, the "outcome" measure was:

- TourRes: The game-by-game, line-by-line win/loss results for each game of each of the past three March Madness tournaments. (kusports.com)

Statistical Modeling Approach Taken:

My approach was to try to figure out which linear combination of (i.e., formula using) the above-listed regular-season statistical values would do the best job of ranking the teams from highest to lowest, in terms of who won which games in the corresponding year's tournament. I computed this using regular-season statistical values, and corresponding tournament game results, for each of the three seasons 2009-2010, 2010-2011, and 2011-2012.

To perform this computation, I wrote computer programs in C and in R, which used such techniques as "linear regression", "constrained linear regression," and finally a "Monte Carlo (randomised) search algorithm," to find an optimal formula.

Although my computer programs considered all of the above variables, they ultimately selected just a few of those variables as being most relevant for prediction, namely: WinFrac, WinFrac3, OffEff, DefEff, SOS, and NCWinFrac.

Final Formula:

Using the above statistical analysis, the resulting best linear combination turned out to be:

Rosenthal Fit = 6:2337 x WinFrac + 1:7180 x WinFrac3 +1:1179 x OffEff + 1:9189 x DefEff + 11:9846 x SOS + 7:3712 x NCWinFrac

I then applied this linear combination formula to the regular-season statistics for the current (2012-2013) season. This provided an overall numerical rating for each team this year, based on their regular-season statistics. These ratings are listed, in order from highest to lowest below.

Then, to fill out this year's tournament bracket using this Rosenthal Fit, simply choose, for each game, whichever team has a higher value of the Rosenthal Fit.

Note: The above rating system is based purely on statistical analysis, without taking any other factors into account. Certain late-breaking events (e.g. Kentucky Wildcats superstar Nerlens Noel's major injury on February 12) could potentially have a large impact on a team's tournament performance despite making only small changes to their regular-season statistics, which could throw off my model's predictions. I did consider making a few post-hoc adjustments to account for such developments, but in the end I decided not to - thus keeping the Rosenthal Fit as a purely statistical measure.

Comparison to Other Predictors:

The following table shows how the Rosenthal Fit, and also the tournament seedings, and also the RPI (Ratings Percentage Index) itself, would have done at predicting tournament games in each of the past three tournaments. (In two of the tournaments, there was one game between two equally-seeded teams; those two games are excluded from the evaluation of the tournament seedings)

Season Seedings RPI RF
2009-2010 42/62 (67.74%) 44/63 (69.84%) 48/63 (76.19%)
2010-2011 43/63 (68.25%) 38/63 (60.32%) 43/63 (68.25%)
2011-2012 46/62 (74.19%) 44/63 (69.84%) 45/63 (71.43%)
Total 131/187 (70.05%) 126/189 (66.67%) 136/189 (71.96%)

This table shows that the Rosenthal Fit compares favourably with RPI and with the tournament seedings. This should not be taken as evidence of any particular superiority, since the Rosenthal Fit was developed precisely to try to maximise these predictions. Still, it does suggest that the Rosenthal Fit is at least roughly comparable in predictive power to these expert measures.

In a few weeks, we will know how well it performed this year.

Jeffrey Rosenthal is a professor in the Department of Statistics at the University of Toronto, and the author of the bestseller Struck by Lightning: The Curious World of Probabilities. His analysis can seen during TSN's coverage of the 2013 NCAA Men's Basketball tournament.

List of Rosenthal Fit Values:

```                          Duke    24.1150
Louisville    23.7559
Kansas    23.6584
New Mexico    23.5325
Gonzaga    23.4355
Arizona    23.2148
Indiana    23.0785
Michigan    22.6300
Ohio St.    22.6260
Georgetown    22.5934
Syracuse    22.5526
Creighton    22.5324
Miami (FL)    22.3322
Notre Dame    22.2744
Pittsburgh    22.1597
Memphis    22.1042
Wichita St.    22.0946
Saint Louis    22.0907
Florida    22.0731
Michigan St.    22.0105
Butler    21.9748
Kansas St.    21.9461
Oregon    21.9407
Mississippi    21.8169
UNLV    21.7975
Cincinnati    21.7373
N.C. State    21.7080
VCU    21.6183
Bucknell    21.5939
Oklahoma St.    21.5885
St. Mary's    21.5479
Illinois    21.3910
Maryland    21.3721
Belmont    21.3090
UCLA    21.3080
Marquette    21.2605
Temple    21.2184
North Carolina    21.1325
Wyoming    21.0634
Wisconsin    20.9743
Missouri    20.8896
Charlotte    20.8322
Minnesota    20.8182
Middle Tenn.St.    20.8046
IowaSt.    20.8036
Valparaiso    20.7961
San Diego St.    20.6748
Connecticut    20.6519
Iowa    20.6125
Boise State    20.5151
Albany    20.3990
Utah St.    20.3426
Akron    20.3190
Southern Miss    20.2688
LaSalle    20.1715
Arizona St.    20.0918
Oklahoma    19.9951
Rutgers    19.8699
LSU    19.7374
Tennessee    19.5588
Villanova    19.5010
Houston    19.4979
Virginia    19.4679
Stanford    19.4496
Santa Clara    19.4331
Kentucky    19.3383
Brigham Young    19.3114
Lehigh    19.2614
Seton Hall    19.2364
Texas A&M    19.2074
California    19.1917
Stony Brook    19.1861
Georgia Tech    19.0646
Ohio    19.0342
New Mexico St.    18.9641
Florida St.    18.8859
S Dakota St.    18.8602
Arkansas    18.8197
Davidson    18.7817
Baylor    18.7774
Alabama    18.7748
Dayton    18.7484
Fla Gulf Cst    18.7107
Tulane    18.6753
Loyola (MD)    18.6450
Texas    18.6347
Murray St.    18.6279
Richmond    18.6116
Rob. Morris    18.5161
Providence    18.4669
AirForce    18.4451
Iona    18.4391
Illinois St.    18.3915
Vermont    18.3840
Oregon St.    18.3567
South Florida    18.3112
Indiana St.    18.3080
Washington    18.2090
Evansville    18.2070
Harvard    18.1508
Bryant    17.9622
Denver    17.8817
TX El Paso    17.8263
Xavier    17.7947
W. Kentucky    17.7828
Utah    17.7690
St. John's    17.7554
Canisius    17.6712
Wagner    17.6241
Fairfield    17.5919
Tulsa    17.5297
Montana    17.4721
Pacific    17.4308
Vanderbilt    17.3922
Arkansas St.    17.3845
Penn St.    17.3180
Northern Iowa    17.3111
Northwestern    17.2556
Long Island    17.2556
Detroit    17.2379
George Mason    17.2111
Loyola (IL)    17.0722
Elon    17.0680
St. Bonaventure    17.0655
Mercer    17.0336
Drake    17.0289
NW State    17.0187
Wake Forest    17.0182
Niagara    16.9581
Purdue    16.9563
Hartford    16.9487
Texas Tech    16.9233
Boston U    16.8685
Rider    16.8067
Clemson    16.7166
De Paul    16.6454
Princeton    16.5938
UAB    16.5054
UC Irvine    16.5046
Delaware    16.4777
Towson    16.4171
Georgia    16.3679
Lafayette    16.3253
West Virginia    16.2019
San Diego    16.1158
NC A&T    16.1027
Southern    16.0950
Toledo    16.0701
Hawaii    16.0292
Cal Poly    15.8982
Idaho    15.8592
Cleveland St.    15.7620
IPFW    15.7000
Savannah St.    15.6405
Fresno St.    15.6242
Pepperdine    15.6083
Norfolk St.    15.5815
Holy Cross    15.5070
Marshall    15.4374
Army    15.3794
Oral Roberts    15.3730
USC    15.3022
Sam Houston St.    15.2898
Yale    15.1663
Winthrop    15.1356
Brown    15.0842
Drexel    15.0668
TX San Antonio    15.0024
Oakland    14.9904
McNeese St.    14.9467
Quinnipiac    14.9358
North Texas    14.8990
Duquesne    14.8985
Troy    14.8513
Morgan St.    14.7504
Georgia St.    14.7192
LA Lafayette    14.7140
Lipscomb    14.7121
Long Beach St.    14.7059
Manhattan    14.6780
UC Davis    14.5437
Columbia    14.5091
St. Peter's    14.4304
High Point    14.3977
Auburn    14.3659
Marist    14.3493
Wofford    14.3461
San Jose St.    14.3070
Cornell    14.2636
Buffalo    14.2271
Rhode Island    14.1902
Liberty    14.0328
Portland    13.9293
Delaware St.    13.7218
Miami (OH)    13.6686
South Dakota    13.6241
Stetson    13.5838
Fordham    13.5698
N.C. Asheville    13.5688
UCSB    13.5529
Campbell    13.4454
Colgate    13.4360
North Dakota    13.4358
Monmouth    13.3985
Chattanooga    13.3883
Dartmouth    13.2551
Maine    13.1639
Seattle    13.0385
Montana St.    12.8383
Jacksonville    12.8043
Siena    12.7232
Hampton    12.7056
Navy    12.4556
Chicago St.    12.3891
SE Louisiana    12.2742
Jackson St.    12.1361
Austin Peay    12.0914
Rice    11.8819
E. Tenn. St.    11.8395
Old Dominion    11.7348
Nicholls St.    11.6002
IUPUI    11.5430
LA Monroe    11.2691
Samford    11.2131
Portland St.    11.1429
Howard    11.0323
Hofstra    11.0204
Alabama St.    10.9835
Longwood    10.7365
Furman    10.6795
Presbyterian    10.5587
New Orleans    10.4705
Lamar    10.2693
Florida A&M    10.0584
UC Riverside     9.9920
Kennesaw St.     9.7770
Binghamton     9.6115
Ste F Austin     9.1443
Weber State     9.1435
Col Charlestn     8.5979
N Dakota St.     8.4912
W Illinois     8.4275
UMass     8.2533
NC Central     8.1872
E Kentucky     8.1479
TX Southern     8.1355
W Michigan     7.9282
Kent State     7.8832
Ark Pine Bl     7.8599
Wright State     7.8351
LA Tech     7.8212
Gard-Webb     7.7709
Mt St.Mary's     7.7613
Jksnville St.     7.7428
Charl South     7.6965
E Carolina     7.6424
TX-Arlington     7.6413
Northeastrn     7.6071
Florida Intl     7.5889
TN State     7.5248
Central FL     7.4869
WI-GrnBay     7.4113
Boston Col     7.3077
SE Missouri     7.3018
St Josephs     7.2322
AR Lit Rock     7.2321
Ball State     7.2072
CS Bakersfld     7.0927
S Alabama     7.0577
San Fransco     7.0325
App State     7.0239
SC Upstate     7.0083
S Illinois     6.8708
VA Military     6.7742
TX-PanAm     6.7519
Fla Atlantic     6.7064
Central Ark     6.6817
Wash State     6.6623
IL-Chicago     6.6607
N Kentucky     6.6580
W Carolina     6.6478
Youngs St.     6.6009
E Michigan     6.5514
TN Tech     6.4862
Beth-Cook     6.4780
E Illinois     6.4418
N JIT     6.4057
Central Conn     6.3871
Prairie View     6.3793
Sac State     6.3764
Houston Bap     6.3659
S Methodist     6.3245
Wm & Mary     6.3052
S Carolina     6.2965
Cal St Nrdge     6.2900
Texas State     6.2669
St Fran (NY)     6.1695
Coastal Car     6.1570
Geo Wshgtn     6.1219
Loyola Mymt     6.0941
N Florida     6.0594
Missouri St.     6.0032
Neb Omaha     5.9981
GA Southern     5.9894
Miss State     5.9503
Utah Val St.     5.8729
Central Mich     5.8298
Bowling Grn     5.7696
CS Fullerton     5.6973
E Washingtn     5.6854
VA Tech     5.6753
Maryland BC     5.5945
TX Christian     5.4742
Alab A&M     5.4483
Coppin State     5.4384
U Penn     5.2858
TN Martin     5.2321
N Arizona     5.2213
N Hampshire     5.1888
NC-Grnsboro     5.1397
American     5.1245
Alcorn State     5.0936
Sacred Hrt     5.0929
UMKC     5.0428
NC-Wilmgton     5.0005
S Utah     4.9083
WI-Milwkee     4.8219
St Fran (PA)     4.7744
TX A&M-CC     4.7630
SIU Edward     4.7194
Idaho State     4.6572
Miss Val St.     4.6007
F Dickinson     4.5634
S Car State     4.4436
N Illinois     3.6664
Maryland ES     3.3431
Grambling St.     3.0220
```

(Photo: David Becker/Getty Images)

## NCAA Basketball Men's Final Four

The Connecticut Huskies surprised the field from the No. 7 seed to capture their fourth national championship men's title.

Full 2014 Tournament Bracket

## Heisman Watch

Florida State quarterback Jameis Winston won the Heisman Trophy, becoming the youngest player to win college football's most prestigious individual honour. More...

Heisman Trophy Winners