Don't use KenPom to select the NCAA Tournament field

College basketball is back, and it feels like every single season there’s a new set of innovations in the public predictive modeling space. Just in the past few weeks, I’ve seen Evan Miyakawa roll out a new set of individualized player metrics (and presumably an associated update to BPR) and BartTorvik extend to include NCAA women’s basketball ratings. My own NCAAW ratings site should begin to populate soon as well once I clean up some code odds & ends. This post is a pre-emptive complaint thrown into the wind in November with hopes it becomes relevant come March.

That complaint is about the inevitable whining we’ll see once the NCAA Tournament field is selected, and some team ranked 32nd in KenPom is left out of the field for an at-large team ranked 58th (rankings approximate). This is wrong. The NCAA Tournament field should descriptively reflect past results rather than predictively select for the best teams in the sport. Thankfully, the best teams tend to shine through whether you’re using descriptive of predictive rankings to choose your tournament field. However, on the margins there will always be circumstances where Team A has a better resume than Team B, but Team B would be favored on a neutral court.

How should we predict the outcome of a game between Team A and Team B?

Just use the Vegas spread. This incorporates the most possible information, blending what is certainly a performant proprietary model with outside information – injuries, perhaps – which other predictive models don’t see.
Use predictive forecasts from KenPom, BartTorvik, EvanMiya, etc. These models are all reasonably good at projecting college basketball results.
Look at a “semi-descriptive” metric like WAB or NET. Both of these metrics incorporate only information about past results without incorporating any prior information, though they do it in an efficiency-rating fashion which is a nod towards improving the model’s predictive power, hence the “semi-descriptive” moniker. The NET blends a “Team Value Index,” which sounds like it’s just strength of record, with a predictive efficiency-margin calculation; WAB compares a team’s observed results to the expected results from an imaginary replacement-level bubble team against the same schedule.
Use a purely descriptive metric. Who’s got the better win percentage? I’m riding with them.

Intuitively, I think most college basketball fans would prefer method 1 to 2, method 2 to 3, and method 3 to 4 by a pretty wide margin. But what happens if we flip the question on its head and want to select the team with the strongest resume this season?

Well, we clearly can’t roll with an unadjusted descriptive metric, right? Obviously, we’d need to adjust something like average scoring margin or team win percentage for strength of schedule. So we’ll move up the predictive ladder to the semi-descriptive metrics. These metrics implicity use efficiency-margin calculations – which you can think of as point-differential-based projections adjusted for strength of schedule – to formulate values that describe a team’s season to date. WAB is the gold standard: it puts every team on the same scale, is far more transparent than the NET, avoids any dependence on the nonsensical quadrant system, and offers very intuitive values. If a team’s WAB is above 0, that’s a tournament team!

If we wanted to get the best teams in the field, we’d keep moving up the spectrum. But we don’t want the best teams. We want the most deserving teams. And there’s actually an insidious consequence to trusting the explicit predictive metrics over values such as WAB when choosing tournament seeding. Descriptive models incorporate no prior information on a team entering a season: after controlling for observed results, it doesn’t care about conference or coach or previous-season outcomes.¹ Squeezing predictive juice out of those external metrics is how predictive models beat the descriptive models in the first place! Especially early in the season, where our descriptive metrics don’t have much data to work off, the predictive models will make far better forecasts. But they’re doing that precisely by looking at data from previous seasons.

How might this affect our rankings? Naturally, incorporating this prior information will reward teams that have been good in previous seasons, and teams that play in strong conferences. From a predictive modeling sense, this is absolutely the right choice. But descriptively? Greg Sankey would love nothing more than to seed based on predictive metrics which are “biased”² in favor of teams which have been successful in previous seasons & teams that play in strong conferences.

So, want more mid-majors in the NCAA Tournament field? Ignore KenPom when you’re seeding, and put WAB at the top of the team sheet. But when you’re picking your bracket, switch right back over to that KenPom tab.

It might be useful to think in a Bayesian sense of models like WAB starting the season with flat priors on each team’s rating, while predictive models like KenPom, Torvik, etc. include informative priors which then update based on observed games. ↩
I use the word “bias” here in a purely statistical sense, and will reiterate that for the purposes of the predictive models – forecasting future games – this is the correct thing to do. ↩

Tags: college basketball whining