|And some websites just like to churn out Elo rating|
systems because they say "Elo" around the office
the same way Kristian Nairn says "Hodor" at work.
This is where Elo comes in.
Elo is well suited for these situations for a handful of reasons. Elo rating systems
- only need to know who won and who lost;
- can provide win probabilities prior to a game; and
- work well in a (relatively) closed system in which there are a fixed number of teams or players.
Formula OneIn this regard, Elo ratings are a good framework for approaching Formula One. The question is how to structure the system and choose certain parameters to handle the nature of Formula One. Elo rating systems have a few parameters to suss out, including
- the number of initial points each participant gets;
- the magnitude of points transferred from losers to winner (the K-factor); and
- the potential separation of teams or players into divisions.
The specific application of Elo to Formula One is this: every race is a round-robin tournament in which you compete with every other driver. Therefore if you finish third in a field of 18 cars, you are said to have tallied 15 wins (the cars behind you) and two losses (the cars in front of you. Furthermore, because the data set does not indicate if DNFs are due to crashes (presumably partially the driver's fault) or a mechanical failure (presumably the car's fault), drivers that do not finish the race are treated as if they did not compete at all.
K-FactorsIn the Elo system, the K-factor controls how the magnitude of the swing between winners and losers. A large K-factor includes new information rapidly, while a small K-factor resists overreacting to new results. The World Chess Federation uses a range of factors depending on the experience of the players; new players use a factor of 40, while experienced players can use a factor as small as 10. The NBA Elo model at FiveThirtyEight uses a fixed Elo of 20, which incorporates new results at a relatively high rate.
Ultimately we settled on a modified version of a K-factor scale formerly used by the U.S. Chess Federation:
K = AdjType * max((800 / Ne), 16)
In this formulation, Ne is the total number of races in which a driver has participated. There is an effective floor on the K-factor of 16. Effectively this means that during the first two-and-a-half seasons of a driver's Formula One career they will have a relatively large K-factor.
On top of the experience adjustment, there is an multiplier based on the type of result: was this a qualifying session, or a full race? The race adjustment is simply 1.0, whereas the multiplier for qualifying is 0.1. Qualifying involves only a handful of laps, but has the advantage of reliably including every car in the field. This is especially important when we go back more then twenty years, where anywhere from 20-40% of the cars would fail to finish due to mechanical failures.
AdjustmentsOnce we have a K-factor for each driver, we can adjust the ratings of each driver based on the results of a race or full qualifying session. For each pair of drivers A and B we can calculate the expected probability that A would defeat (or finish in front of) B. (The details of this model can be found on the Elo wiki page.) For each driver we can calculate these odds and then sum them up. This gives us the expected score, while the result gives us the actual score. A player's rating is adjusted as follows:
R'a = Ra + K(Sa - Ea)
Let's say that a driver with a rating of 1200 races against a driver with ratings of 1000, 800, and 600. The sum of the individual win probabilities would be 2.638. If that driver has a K-factor of 16 and finishes first (i.e., wins 3.0 contests), their new rating would be
1200 + 16(3.0 - 2.638) = ~1206
However if the driver with a rating of 600 (and similar experience) wins, their new score would be
600 + 16(3.0 - 0.362) = ~642
Sharp-eyed readers may have noticed that this rebalancing is only point-neutral if all drivers have the same K-factor. However this is effectively never the case, as there will always be drivers in the point-swing-heavy early portions of their careers. In order to get everything to balance out, we calculate an adjustment ratio for each race. That is, we take the sum of the total points of all drivers before the race and divide it by the sum of the total points of all drivers after the race. We then multiply each driver's post-race rating by this ratio, preserving the total number of points in the pool after each race.
Elo InflationHowever we still run into the situation where young drivers are brought into Formula One for a season (or even a few races), fail miserably, dump all their points into the pool, and exit. There are two main ways we fight inflation:
- Revert each driver back to the mean slightly at the end of each year; and
- Use a smaller, fixed K-factor for new drivers for a number of races.
The first approach I blatantly
|Love you guys!|
The second approach limits the rate at which new drivers can bleed points into the pool until they've been around for a little while. Currently both the K-factor and the minimum number of races are set at 16, meaning that for a driver's first season they're somewhat capped in how many points they can lose. Note that because of how Elo works, though, excellent rookie drivers such as Villeneuve or Hamilton can still climb the ranks at the same rate as experienced drivers.
Putting it all togetherIn the end, after crunching 36 years of Formula One data ... we'll get to some real fun stuff tomorrow. But for now here are some numbers you can use as a reference point when looking at long-term Elo scores:
With that in mind, let the ratings begin!