Bracketology is a relatively saturated internet market but Andy Cox has carved out his own niche. Cox has created a computer system that uses statistical machine learning techniques to predict how the selection committee will select and seed teams for the tournament. The system has improved for five consecutive years, most recently selecting 33 of 34 at larges, and it provides a different and more educational perspective on tournament selection.
As someone that studies the past behavior of the selection committee, we like to probe Andy’s mind come March and see what he thinks about the current bubble situation. The most recent Crashing the Dance projections (as of Tuesday morning) have Michigan listed as the third to last team in. Now here’s what Andy had to say about Michigan, the bubble, and more. (Photo: Freep)
What are the basic strengths and weaknesses of Michigan’s resume?
- Good road wins, including Michigan State (likely in) and Clemson (bubble)
- Nine wins over RPI top 100 teams, and not the RPI-fooling kind of top 100 wins. Eight of them are against the top 67. That’s pretty good among bubble teams.
- Four good road wins against teams ranked 67 or better.
- No “signature” wins. Their best RPI win (Harvard – RPI #35) was at home over a team that may not make the field
- Only one game over .500 against the RPI top 200.
- Most bubble teams have at least one loss to a team outside the top 100, but their loss at Indiana (RPI #176) is lower than you’d like to see.
Then the real question: what does Michigan still have to do to make the Dance? Is a win over Illinois necessary in the 4-5 game of the Big Ten Tournament?
As usual, the answer is that it depends. Obviously, it depends on what other bubble teams do this week. The bottom of the bubble is so fluid that every little bit you can do to set you apart from the mire will help.
I wouldn’t say a loss would necessarily be the end. I don’t know that committee members believe in “good losses,” but just about every team on the bubble will lose this week. For some on the committee, it could come down to the quality of this loss, which for most will be on a neutral court.
Another thing to think about is that any committee members who consider Illinois and Michigan relatively even could use this game as a tiebreaker.
I understand that the committee doesn’t look at specific conferences but how would you compare the resumes of the four Big Ten teams hovering around the bubble: llinois, Michigan, Michigan State, and Penn State.
Illinois probably has the strongest non-conference profile of the four, though it’s much closer than I expected. Michigan went 5-1 against the other three, best among the four-way head-to-head. I think our selection model is slightly overvaluing Penn State. In my opinion, they have more work to do than the other three.
Photo: Michigan Daily
The common refrain among Michigan fans is “if Michigan State makes the tournament, Michigan has to make the tournament because they swept the Spartans”. My impression is that this isn’t quite true. What are your thoughts?
While every committee member has their own opinion about head-to-head results, my general sense is that they’re mostly used as tiebreakers when the teams under consideration are very similar. The fact is those were just two games in a 30-plus game profile.
That said, the profiles of Michigan and Michigan State are similar, so it would be difficult for the committee to ignore the head-to-head results.
You spend a lot of time studying the selection committee and what they’ve done in the past. Obviously this all goes into your model, and I’m not sure if it is proprietary and confidential information, but what would you say are some of the most important factors of an NCAA tournament resume?
One theme that comes out every year during the CBS post-selection interviews with the selection committee chair is to schedule as many good non-conference teams as possible and win as many of those games as possible. I have noticed some inconsistency from year to year about whether winning more games against a good-but-not-great schedule is more important than merely being competent against a great schedule. That is to be expected with subjective judgement by a constantly changing group of people, but it’s frustrating for everyone trying to forecast this stuff, and I’m sure it’s frustrating for the coaches who are left out.
As far as specific attributes, I don’t know that it can be narrowed down to just a few. When this started as a grad school project in 2005, I used this great article by Luke Winn to help select the attributes I included in the model. After that, the model does all the work crunching the numbers. Each year I review the teams the model handled poorly (and there always are some) and look for trends. I’ve tweaked the attributes I feed to the model and it seems to have helped. (See this page and its links for details if you’re interested.)
Can you think of examples of how the committee has handled teams with similar resumes to Michigan in the past?
One thing I’d like to add to the site in the future is a tool to search for most similar profiles in the 10-year archive I’ve built up. Until then, the best I can do is try to identify some meaningful criteria and look at the history. It can be arbitrary at times, but it’s better than nothing. For example, I looked at all teams that:
- Were ranked between 25th and 50th in the Sagarin ratings
- Had no RPI top 25 wins
- Had 3 or fewer RPI top 50 wins
There were 63 matches over the last 11 seasons, and 37 of these (59%) were at-large selections or would have been had they not won their conference. Matching teams last year ranged from California (#8 seed) to Virginia Tech (out). Is that the best comparison to use? Probably not, but if you narrow the criteria too much, you end up with only a handful of matching teams that don’t provide a meaningful comparison.
Michigan lacks the top 50 wins – just Michigan State (2x) and Harvard – but the Wolverines have beaten a number of teams on and around the bubble. How many “good” top 100-ish wins does a team need to make up for the lack of signature wins?
When I added having 7 or more RPI top 100 wins to the above criteria, the result was 24 total teams, 18 of which were at-large selections. The problem with binning win/loss totals into 1-25, 26-50, etc. is that it loses some of the subtlety that the committee certainly investigates, especially with the last few at-large teams. A win over RPI #55 goes into the same bin as a win over #95, but it’s obviously a better win.
That also explains some of the odd movement you’ll see in the model. This actually happened from Sunday to Monday with Michigan. Michigan State was just inside the top 50, so Michigan’s two wins over the Spartans counted as top 50 wins. On Monday, the Spartans slipped just a few spots outside the top 50, which cost Michigan two of its three top 50 wins. For bubble teams, that could make all the difference.