Friday, March 2, 2012

Deep Thoughts for Dorks: What are we Doing Here?

[ Disclaimer: This post is not meant to be a definitive solution to how to rank and compare teams, but rather to spark a discussion about how we evaluate offenses and defenses. ]

I recently finished reading "Sports for Dorks", the Mike Leach-compiled collection of essays, research, and analysis of issues surrounding college football. Two chapters were of particular interest to me: "The Missing Ingredient" -- an attempt to come up with a better computer model for the BCS -- and "The No-Huddle Offense" -- an analysis of teams that use the no-huddle offense and whether or not it's effective. I enjoyed both chapters, but I feel there was something at the heart of both chapters that made it difficult for me to agree with the analysis they put forth. They were both symptoms of a larger issue that I feel permeates most analysis of college football.

This post will take a quick look at these two chapters, discuss where I feel they went astray, and describe a thought experiment on how we should be going about identifying good teams.

The Missing Ingredient
By Michael Nemeth

Mr. Nemeth is the author of the 2009 book, "Cinderella's Slipper", a proposal to find the fairest college football team in the land. In this chapter he walks us through the history of the BCS and how we got to where we are today. He discusses the possibility of a four or eight team playoff, but notes that for even an eight-team playoff the BCS needs a more credible way to rank teams. The current system, he notes, if effectively GIGO: garbage in, garbage out. The limited set of statistics the BCS makes available to the computer models -- per-game dates, locations, and scores, per-team records, and opponent records -- is very restrictive and not necessarily the ideal set of numbers a serious stats geek would choose.

So why this set of data and no more? Because over the years the BCS has repeatedly handcuffed the computers in an attempt to make the computer rankings "fit" with how the humans would rank the teams. In some cases the computers do produce counter-intuitive results; case in point, the 2000 season in which the BCS chose Florida State for the title game in place of Miami, despite the Hurricanes having defeated the Seminoles earlier in the year. "Illogical! Unfair!" the humans screamed. Never mind that the college football universe has two simple yet conflicting mantras each year: "full body of work" and "settle it on the field." In this case the Seminoles had a superior body of work compared to Miami, despite the fact that this matchup had been "settled on the field" earlier in the year. It was counter-intuitive to the humans, so the computer must be neutered "fixed."

What, then, is the solution?

Mr. Nemeth explored five supposedly-key statistics:
  • time of possession;
  • number of first downs;
  • penalty yardage;
  • total yards; and
  • number of turnovers.
He then (correctly) noted that in many cases it is possible to win a game without being dominant in these categories, or lose a game despite coming out ahead. Yet there was something amiss here (more on that in a minute).

Mr. Nemeth then starts to formulate his proposal for determining the "right" statistics to use when identifying good teams. He looks at the last four years' worth of data and identifies a total of 96 "good" teams. He then sets about attempting to distinguish what these teams did right as compared to their opponents. On its face, this seems reasonable: examine teams that had a proven track record of winning, and then attempt to identify what they did right versus teams that got inferior results. He ultimately concludes that metrics used to distinguish good teams from bad include (among others):
  • long field proficiency;
  • short field proficiency;
  • big plays;
  • turnovers;
  • major penalties;
  • number of 3-and-out possessions; and
  • special teams big plays.
I assert, however, that this is fundamentally the wrong approach for two reasons.

First, it examines good teams and then attempts to work backwards to identify relevant statistics; "These teams did X, Y, and Z, therefore all teams must do X, Y, and Z in order to succeed." Second, it ignores the pace of the game. For example, Mr. Nemeth explicity cites Texas A&M as a team with the most number of "self-inflicted failures" on long field possessions; this overlooks the fact that the Aggies are consistently one of the fastest teams in FBS and therefore would have more opportunities to fail (or succeed). In the 2011 season, A&M averaged 183.3 opponent-adjusted plays per game; by comparison, Alabama averaged 149.2 OAPPG, a mere 81.4% of the plays in an A&M game. Therefore if Alabama had even 82% of the 3-and-outs as A&M, you could argue that the Crimson Tide was inferior to A&M by that metric.

Let's now move to the second chapter of interest.

The No-Huddle Offense
By Coach G. Mark McElroy

This chapter explores the proliferation of the no-huddle offense in FBS over the last several years. The pro-no-huddle arguments are that a no-huddle team can wear out the opposing defense and not allow for substitutions, thereby creating the opportunity for confusion and mismatches that favor the prepared no-huddle offense. Coach McElroy points to the Oregon-Auburn title game in 2010 as an example of the results gained by this approach.

The analysis done by Coach McElroy notes the staggering numbers put up by no-huddle offenses. Teams routinely score well over 50 points in a game, and can average close to 500 yards of offense per game through the season. He also notes that this can have a negative effect on the defense, though, as a no-huddle offense gives their teammates on the defensive side of the ball less time to rest and gives the opposing offense more bites at the apple.

Readers of this blog, though, will know that we (politely) disagree with the standard characterization of "more points/more yards equals a superior offense." Measuring an offense (or defense) requires one to examine the pace-and-opponent-adjusted efficiency. For example, Coach McElroy identifies these teams as the top ten offenses in FBS in 2010 as measured by yards per game:

Rank Team YPG
1 Oregon 530.7
2 Boise State 521.3
3 Oklahoma State 520.2
4 Nevada 519.1
5 Tulsa 505.6
6 Hawaii 500.6
7 Auburn 499.2
8 Michigan 488.7
9 Arkansas 482.5
10 Oklahoma 482.1

Yet what happens if we look at the 2010 season from a tempo-and-opponent-adjusted perspective? What if we look at the teams as ranked by adjusted offense (AdjO) measured in points per hundred (PPH) plays, and include the number of plays per game (PPG)?

Team Offense Pace
AdjO Rank PPG Rank
Stanford 33.0 1 159.2 95
Wisconsin 32.6 2 155.2 116
Boise State 31.8 3 162.8 73
Auburn 31.6 4 166.5 47
Alabama 30.8 5 154.5 118
TCU 30.3 6 159.1 96
Virginia Tech 30.3 7 155.3 115
Arkansas 29.9 8 168.1 36
Nevada 29.2 9 165.7 54
Oregon 28.6 10 181.8 3
Oklahoma State 26.4 18 179.6 5
Oklahoma 25.3 21 183.3 1
Hawaii 23.8 27 163.3 65
Tulsa 23.4 29 175.7 11
Michigan 21.5 45 175.2 14

Note that of the teams in the top ten for raw yardage, only five are in the top ten for actual pace-and-opponent-adjusted scoring. In other words, Oklahoma State, Oklahoma, Tulsa, and (especially) Michigan could be described as "going nowhere fast." Hawaii totalled over 500 yards per game of offense, yet couldn't crack the 25 in our adjusted statistics. They didn't play fast, which means they instead piled on the yards against inferior competition.

So who is at the top of our adjusted offense, cracking the 30 PPH barrier? Seven teams, five of whom are in the bottom quarter of the league for pace. In other words, these teams are deliberate, slow, and ruthlessly efficient.

What's going on here?

The Thought Experiment

As I mentioned earlier, too often we attempt to analyze backwards by identifying the characteristics of a good team, and assuming that a team that is superior in those statistical categories is in fact superior overall. But as Mr. Nemeth pointed out there are many cases where this simply isn't true. So how do we do this "the right way"?

I propose a thought experiment in which we envision a hypothetical "most dominant team ever created" to pit against a merely mortal opponent. How would such a game play out, and how would that be reflected in the statistics? How could our hapless opponent counter or delay the inevitable, and how would those counter-measures be reflected in the statistics? From there we can identify the statistics that are truly indicative of a dominant team and compare how teams from the past few years would measure up.

The Game Begins

Hapless U is kicking off to the Fightin' Deities. HU boots a long, arcing kick that is caught 9 yards deep in the end zone. Unfortunately this is utterly irrelevant to our invincible squad, as it simple runs the ball 109 yards for a touchdown. A two-point conversion later, an the FDs are up 8-0.

It's Hapless U's turn to receive. The ball arcs its way across the field and is caught at their own 1. The Deities charge downfield, dislodge the ball, and pick up the turnover for another touchdown. Yet another two point conversion later and the score is 16-0.

On the next kickoff, HU is a bit smarter about their special teams play and simply runs the ball until the FDs are within a few yards of their player, and take a knee. They are (let's say) at their own 13 yard line. Unfortunately for them, the FDs are still stronger and better coached than HU's offense, and the ball is punched out of their running back's hands and returned for a touchdown. Seconds later it is 24-0.

Employing their "stop, drop, and roll" return method, HU once again makes it to their own 13. This time, using special Stick-em gloves, the HU backs have ensured that there will be no more turnovers. But the HU offense is still no match for the FDs, and a quick 3-and-out later means it's time to punt.

We'll assume for the moment that HU actually gets the punt off, but again that's irrelevant as FD once again takes the return back for a touchdown. 32-0.

Another kickoff. Another 1st-and-10 from their own 13. Another punt. But something strange happens this time: the FD returners shoelaces are untied and he trips on the return. Deity ball at the HU 21. Another play, another touchdown, right? No, another shoelace issue. 1st-and-10 from the HU 7. On this next attempt, though, there are no mistakes and it's another TD. 40-0.


Let's pause the action and make a few key observations:
  • The speed at which the Deities and HU each play has been irrelevant. The Deities could be rushing back to the line of scrimmage or they could be sauntering. It doesn't matter because ....
  • On most of their plays, the Deities are scoring.
  • On offensive plays where the Deities don't score, they get a first down.
  • On defensive plays where the Deities don't score, they don't allow a first down.
Based on these observations, what statistics are meaningful? Primarily we care about the:
  • percentage of plays in which a team scores;
  • percentage of non-scoring plays in which a team gets a first down; and
  • percentage of non-scoring plays in which a team allows a first down.
Our current game stats are:

Team Scoring % 1st Down %
Fightin' Deities 29.4% 100.0
Hapless U 0.0 0.0

The Deities have scored touchdowns on five of the 17 plays so far. As Hapless U adapted to FD's dominance, the FD scoring percent went from 100% (2-2) to 75% (3-4) to 44% (4-9) and finally 29% (5-17). Once again, note that pace doesn't enter in this. That's because no-huddle is a technique, not a sign of a superior team in and of itself.

Also note that many of the stats mentioned by Mr. Nemeth as signals denoting a good team can be simplified through this lens, and the anomalies he noted in his analysis make more sense. The ability to sustain long drives is a side-effect of a high first down percentage, as is a low number of 3-and-outs. Similarly, big plays on special teams or offense leads to scores or first downs, increasing the ratio of scores or first downs to plays in the game.

Back to the Game

Let's say that Hapless U has started to gain some serious traction. They've figured out an weak spot in the Deities' offense that allows HU to gain a steady 3.5 yards per play each and every time. HU takes the kickoff and starts on their own 16 yard line. With 84 yards to go and 3.5 yards per play, HU picks up 10.5 yards every 3 plays (and the associated first down). HU employs a hurry-up offense that only uses 18 seconds of clock each play; 24 plays and 7:12 later they're in the end zone. And since the two-point conversion is only from 2.5 yards out, HU is finally on the board with 8 points.

HU kicks off and is able to finally kick the ball out the back of the end zone, forcing a touchback. The Deities, though, can get 20 yards per play. At their leisurely 35 seconds per play, four plays and 2:20 later they're in the end zone. Another two point conversion, another 8 points.

Drive Comparison

What would a traditional box score look like for these two drives?

Hapless U Deities
Points 8 8
Yards 84 80
Plays 24 4
1st Downs 7 3
Time of Poss. 7:20 2:20

By this conventional box score, both teams have the same score, but HU has more yards, more time of possession and more first downs. That's means they're better, right?

Of course not. We know better. Let's look at this through the lens of our thought experiment:

Hapless U Deities
Points/100 33.3 200.0
% Scoring Plays 4.2 25.0
% non-scoring 1st down 30.4 100.0

On the topic of no-huddle offense and its benefits, I would love to see a study that actually puts the various claims to the test. Do no-huddle offenses become more efficient as a drive progresses (due to exhaustion and mismatches)? Do no-huddle offenses become more efficient as the game progresses? Are no-huddle offenses more likely to exploit mismatches for big plays? Unfortunately we at the Tempo-Free Gridiron lack detailed play-by-play data in an easily digestible form, although we are working on being able to parse a potential data source. Putting a no-huddle offense to the true test and evaluating measurable claims would be of great interest to many people.

As I mentioned in the disclaimer, this post is not meant to be a definitive solution to how to rank and compare teams, but rather to spark a discussion about how we fundamentally evaluate offenses and defenses. I'm almost positive that there are teams or analysts who are out there and using this exact methodology, but given how infrequently we read/see/hear the phrases "efficiency" or "first down percentage" there's still value in having this discussion.

In fact one of the best possible responses I could get to this post is "you're just reinventing the methodology put forth by John Doe in his landmark 1973 paper 'Everyone Is Doing It Wrong, You Morons' and a simple Google search would have saved you a lot of typing." Until then, we welcome your comments and feedback.

Follow us on Twitter @TFGridiron