A statistical model to more accurately predict March Madness

As an economics undergraduate, I learned a decent amount about statistics. In my econometrics class, my most rigorous course in statistics, I had to do a research project. Unsurprisingly, I chose a sports topic. I looked at team data for every team that made the  NCAA men’s basketball tournament for a period of 4 years to see if there were any factors that helped predict a team’ success in the tournament. Most of my findings weren’t earthshaking, but I did find a few interesting nuggets.

-The strongest finding was that the number of a team’s regular season wins matters, so don’t go picking that 22-10 team to run the table. There’s a reason they lost 10 times in the regular season.

-A team’s conference matters; all else being equal, teams from the “power conferences” do better. The Butlers, George Masons, and VCUs are going to make a run to the final four once in a while, but don’t you dare try to predict it. This research was based off of the years 2006-2009, so I’m curious how the shakeup of conferences has affected this result. Additionally, the general consensus is that the “mid-major” conferences have gotten a lot stronger in the past couple of years. But if you want a winning bracket, you should still lean towards taking the team from the bigger conference.

-A team’s points, rebounds, and assists per game are mostly irrelevant. But a team’s turnover’s per game can be a decent predictor of success. Teams that limit turnovers and make the most of each possession are more likely to do well in the tournament. Turnovers per game being a better predictor than points, rebounds, or assists was the most interesting thing I learned from the study.

-The correlation is weak, but a team’s point distribution matters. A team with many decent scorers tends to fair better than a team that relies on one scorer. Teams with forwards as their leading scorers tend to do better than teams with guards as their leading scorers.

For those of you who are curious, I used this model to make a bracket for the 2010 tournament and the bracket scored in the 70th percentile. In other words, using these principles should give you a bracket that’s a little better than average.