Monday, August 23, 2010

RBA undergoes significant modifications

RBA entered the TFG rotation near the middle of last season and lacked polish.  Justin's TFG algorithm has been going strong for about three years, so it's no real surprise that it kicked my tail near the end of the season.  However, RBA is coming back with a series of improvements for 2010.

  1. Cross-season historical data:  Last season, RBA did not consider history from previous seasons.  In 2010, RBA will maintain history across seasons to better identify historically strong programs and historically weak programs.
  2. Weighted least squares interpolation:  RBA used the least squares algorithm to estimate offensive and defensive efficiency as a function of team strength.  With the incorporation of historical data, RBA uses a weighted least squares algorithm.  The weight is computed as w^k, where 0 < w < 1 and k is the number of weeks since the game took place.  This allows RBA to naturally filter out older games in favor of newer ones.
  3. Dropped penalties and turnovers:  Last year's algorithm was a deliberate departure from Justin's TFG algorithm because I wanted to explore different approaches to predicting football games.  Analysis indicated that these metrics didn't really add a lot to the accuracy.  In fact, over the course of over 3500 games, they only improved RBA's prediction accuracy by a whopping three picks.  The leading hypothesis is that penalties and turnovers effectively "double count" efficiency.
  4. Drastically improved run-time:  RBA's performance is now six times faster than 2009's version.  This doesn't mean a lot for the output, but it allows for significantly faster iteration.  Indirectly, this improves RBA's accuracy because it allows me to experiment much more quickly and tune the algorithm more easily.
All these changes add up to nearly 2% improvement.  Since 2003, RBA is now up to 71.9% accurate from 70.2%.  It's not quite Justin's 74%, but it's getting there.