It’s become a bit of an annual tradition for Andy Cox (@crashthedance) to answer questions about the NCAA tournament bracketing process. Cox runs Crashing the Dance, a website and artificial intelligence model designed to predict and seed the NCAA tournament field. In recent years, those questions have generally surrounded Michigan’s chances to make the NCAA tournament as a bubble team. This year, the conversation is a bit different as we discuss the intricacies of seeding and bracketing and how they could effect who and where Michigan will play next week.
As a refresher, tell us about your site, your project and statistical model. How successful has it been in predicting the NCAA field?
Crashing the Dance grew out of a grad school project I did at Georgia Tech. We know about a lot of the profile attributes the committee uses, but we know very little about which attributes are more important than others. (That’s actually changing a bit this year with the release of the full 1-68 seed list during an hour-long discussion show following the selection show. I’m not too optimistic about learning many specifics though.)
The selection and seeding models on the site compare each team this year to a database of past teams (up to 12 years of data now) to estimate (1) how well its profile matches that of a typical at-large selection and (2) what seed is most likely.
Both the selection and seeding models have done well in recent years. They have outperformed the average web bracketologist (according to the Bracket Project) in each of the last three years, and have also outperformed some of the big names over the last three years combined. Part of that is my having a better understanding of what kinds of profiles give the models trouble, and part is each year providing more data to learn from. Some of it may well be luck too.
Michigan is a 3-seed in your most recent model. How confident are you in that prediction and what do you think would have to happen for the Wolverines to move up to No. 2 or down to No. 4?
The 2 seed line may be hard to crack, though with three other Big Ten teams ahead of them and the co-regular season title on their resume, a tournament championship that runs through Ohio State and Michigan might do the trick in the committee’s eyes.
The tricky thing about hypothetical seed questions based on “if the season ended today” projections is that it depends on what the teams ahead and behind do. We have them as the last 3 seed right now, and there is little separation between the other 3s and some of the 4s. A loss to someone other than their two co-champs would probably knock them down, depending on how bad a loss, but if they make it to the weekend it’s hard to see too much downward movement.
Talk to me about geography. You have this nifty chart detailing distances from all of the sites to representative schools. Break down the basic pod geography rules and how they might affect Michigan. Any early projection where Michigan ends up?
I’m working on a hypothetical bracketing of the top 4 seed lines based on my latest seed list, but the basic process at the top of the bracket goes like this:
- After the committee ranks all teams in the field from 1-68 (they call this each team’s “true seed”), they place each of the top 4 seed lines in a region in order of their true seed.
- Then, each team on the top 4 lines is assigned, again in order of their true seed, to one of the eight pod sites.
Obviously, the higher you are on the seed list, the more likely you’ll get a preferred pod site. There are no conference restrictions on pod sites as there are on region sites, and the two pods at each site don’t have to feed into the same regional. For example, Duke and North Carolina won’t be in the same region, but will most likely be together in the Greensboro pod as they were last year in Charlotte.
From the distance chart you mentioned, Michigan’s preferred pod sites (considering geography only) are roughly in the order: Columbus, Pittsburgh, Louisville, Nashville, Greensboro, Omaha, Albuquerque, and Portland. Michigan is 12th on my latest seed list, and when I got to them to assign their pod, the first three were already filled and they ended up in Nashville.
One helpful site restriction from Michigan’s perspective is that Ohio State cannot play in the Columbus pod because they are the host institution. If the Buckeyes end up ahead of Michigan on the seed list, that might result in a better pod for Michigan depending on the rest of the seed list ahead of them.
As a follow up, could Michigan play in Columbus? Could a 3 seed be sent to Nashville to face a No. 6 seeded Vanderbilt? Where is the line drawn?
The three teams most likely to steal a Columbus spot from Michigan are Michigan State, Marquette, and Indiana. Passing the Spartans on the seed list is unlikely, so it could come down to Marquette and finishing ahead of Indiana.
Vanderbilt is not the host of the Nashville pod (the Ohio Valley Conference is doing the honors), nor did they play any games at the Bridgestone Arena, so there is no restriction on them staying in town. However, the committee will assign them to a region, and the pod will follow indirectly from that. For example, if Vanderbilt is the number 6 seed in the West, they will be in the same pod as the number 3 seed in the West, which would already be assigned as described above.
The committee will not put teams on the top five seed lines – the so-called “protected seeds” – at a home-court disadvantage in the round of 64. After that, they’re on their own.
The Big Ten has five teams that could be on the top four seed lines, what sort of interesting twists does that add to the bracketing process?
The main rules that will come into play are (1) the first three teams from a conference, as ranked by the seed list, must be placed in different regions, (2) no more than one team from a conference may be seeded in the same grouping of four in lines 1-4, and (3) conference teams shall not meet prior to the regional final, unless a ninth team is selected from a conference.
It’s relatively straightforward to put teams into regions based on geographic preference and the bracketing rules. Much of the trickiness comes when they need to move teams around to maintain competitive balance. Five teams in the top four lines leads to fewer options for moving teams around, especially if most of those teams are on the 2 and 3 lines based on their true seeds. The 2 and 3 lines would meet in the regional semifinals, which is normally prohibited without more than eight teams from a conference. However, the committee can waive this if necessary “after exhausting all reasonable options.” They can move teams up or down a seed line if necessary, even in the top four lines.