Everyone knows that causation does not equal correlation - you didn't even have to get a B in Statistics for Engineers like yours truly to understand and utilize this particular truth (to be fair, it was a two-credit class that was easy like Emerson sororities, and I'm pretty sure I actually skipped class during a test). I deal with correlative stats on a daily basis, though, and even I learned an important lesson while participating in perfectly legal NCAA pools this year: know the difference between "indicative" and "predictive" and how to apply correlations to sports wagering.
I found myself in an interesting quandary while filling out my brackets for this season's NCAA tournament - in every one, no matter how I did the analysis (and I did them in a variety of ways, from using basic stats to using advanced metrics to using who has the hottest cheerleaders), everything pointed toward one conclusion: the four #1 seeds in the Final Four.
This was a problem, because everyone knows that the four #1 seeds have never been in the Final Four prior to this year. Because it is such common knowledge, it's become a mantra at this point (just like picking a 12-seed over a 5 in the first round) - you never take the four #1 seeds into the Final Four, because it's never happened before. I know this happens because, even though I am basically a rational, stats-minded, in-depth gambler, I intentionally changed my bracket picks to exclude a #1 seed in this tournament (generally Texas over Memphis, although in one bracket I tried to expose a perceived inefficiency by taking Louisville over UNC - puke).
I'm not being results-oriented in claiming this was Corky Thatcher-level retarded, although that may be what it seems at this point - there is simply no viable reason for excluding the possibility of all four #1 seeds in the Final Four. Let's go through the perception, and see the issues.
First, there is really no way to support any assertions that the NCAA Tournament Selection Committee shows any real inefficiencies in selecting the field, or more specifically the top 4 teams to become the #1 seeds in any given tournament. At every point in the tournament, the lower seed shows a higher winning percentage over lower seeds, from the first round on - all data shows that the selection committee, as a whole, gets it right.
But wouldn't that mean it would be incredibly unlikely to go this long without all four #1 seeds reaching the Final Four? To be blunt, no - it's not that unlikely, really. Given the modern 6-round structure of the tournament, even if a given #1 seed were a 3:1 favorite over every team it played (which seems like a fairly impossible situation), that team would only have a 23.7% chance to reach the Final Four - or approximately 1:4 chance. The variance is huge in a single-elimination structure.
That's really what the problem becomes, then - the fact that the Final Four had not been comprised of solely #1 seeds in the past should not be used as predictive - rather, it is simply indicative of the high variance involved in the tournament itself. This means you should recognize that even the better team will often lose over the course of a given six-game, high-pressure stretch, and that the tournament only gives the best team over that stretch, not over the course of the season. This is not an obtuse lesson, by the way - you can actually guide your selections using this information.
For example, Louisville was underrated by most predictions and most analysis systems because Padgett was hurt for approximately the first third of the season - Loiusville's true talent level was closer to their stats over the last half of the season, which showed them to be closer to the level of Texas than that of Pitt or Xavier. Wisconsin was underrated by most - their pace numbers and stifling defense play a low-variance game, one that is a.) well suited to tournament play and b.) subject to being derailed by a hot-shooting team. Wisconsin's matchup against Davidson was thusly terrible, while they should likely have been picked over Georgetown - that's the kind of brief analysis that can lead to much better tourney results.
At the end of the day, I shaped my Final Four picks around some flawed assumptions - namely, that I "had to" leave a #1 seed out, even when everything told me that Memphis was simply the best team in that bracket, and that they matched up well with both Texas and Stanford. Had I not, I would be in slightly better shape in my pools.
However, all is not lost - if Kansas wins, I win two pools outright (one a winner-take-all pool of the degenerates from the big opening-weekend bacchanalia) and finish either first or second in the last pool, with first place coming if the final is KU over UCLA. Why the reliance on Kansas? Well, according to some stats, KU was the best team in the nation and had the highest probability of winning each of its six games (thus, the highest EV) . . . according to others, this was not the best pick. UNC was going to benefit too much from its status as the #1 overall team in the nation, as they become a "trendy" pick among people scared of screwing up their pools and losing to people who "know more" (these fears are primarily unfounded, by the way), so I felt like there just wasn't the value in picking UNC that KU carried.
It turns out I was right - of the biggest/highest-payout pools I'm in, I'm the highest ranked player picking KU in all of them, and the champion-heavy scoring used by CBS and Yahoo! means that I'll win, even from 5th or 7th place, should KU pull it out. Now, this is definitely a flaw, but one I'm more than happy to exploit. However, I would have put a little more space between me and them had I stuck to both my gut and the stats, and stayed on with all four #1 seeds. Lesson learned.