Wednesday, May 20, 2015

Math Formulas for Predicting the Super-Regionals

For the first time, I'm going to apply some established sports-prediction formulas to the NCAA softball super-regionals, which begin Thursday.

The starting point is what's called each team's "Pythagorean" season-long expected win percentage, based on its total runs scored and total runs allowed. The Pythagorean formula was originally developed for Major League Baseball, by Bill James, but has been modified for college softball by researchers at Fastpitch Analytics. As shown in the chart below, for example, Florida would have been expected to win roughly 85% of its regular-season games this past season, based on how many runs the Gators scored and allowed.

The next step is to apply a formula known as "Log5," which gives the probability of one team beating another head-to-head, based on each team's Pythagorean value. The favored team and its Log5 win probability are shown in the last two columns of the following chart.

Home Team
Road Team
Favorite's Win
Probability (Log5)
Florida (1)
Oregon (2)
NC State
Michigan (3)
Georgia (14)
Auburn (4)
LA-Lafayette (13)
LSU (5)
Arizona (12)
Alabama (6)
Oklahoma (11)
UCLA (7)
Missouri (10)
Tennessee (8)
Florida State (9)
Florida State

Not surprisingly, No. 1-seed Florida has the highest Log5 win probability against its opponent, Kentucky, which emerged from the Notre Dame (No. 16-seed) regional.

More interesting is that the Pythagorean and Log5 formulas predict two upsets, No. 13 Louisiana-Lafayette over No. 4 Auburn (with 68% likelihood) and No. 11 Oklahoma over No. 6 Alabama (with 56% likelihood). Though No. 9 Florida State has a slightly higher Pythagorean value than No. 8 Tennessee, the series is essentially a 50/50 proposition.

What appears to be holding Auburn back in the Pythagorean calculation is its total runs allowed, rather than its total runs scored. The Tigers scored 460 runs in the regular season, second only to Oklahoma's 462 among ranked teams.  However, Auburn allowed 199 runs, not so great considering that many top teams gave up 115 or fewer.

I don't know how useful these formulas will be for this year's super-regionals, as they don't differentiate the series that much. In five of the eight series, the calculations say the favored team has roughly a two-thirds probability of winning.

For readers interested in such "bread and butter" measures as hitting, pitching, and defense, Fastpitch Analytics evaluates the super-regional teams on additional statistical metrics.