The mathematics of random walks: skill ratings in Overwatch

UPDATE: Added a new condition below

Overwatch is a very fun team-based shooter that has an interesting skill rating (SR) system. Every player has an SR, and a matchmaking system tries to create fair games by assembling teams with similar SR. The winners get an SR increase and losers get a decrease, but it can be different amounts for each player. A hidden algorithm picks the actual value based on both the expected outcome of a match and individual performance.

For this post, I wanted to look at the contributions of skill vs. chance in the way SR can change over time. After all, players should theoretically only be able to win if they’re more skilled than their opponents, but it’s possible for the matchmaking system to place someone with very good teammates, leading to a win even though that person’s skill is low. The opposite is possible too, where a person should win a game but doesn’t due to their teammates. These two scenarios shouldn’t have an effect in the long run, but it still leads to terms like “Elo hell” (named for the Elo rating system), which refers to the supposed impossibility of climbing out of the lower ranks to the “true” SR. But does Elo hell actually exist?

Random walks

The way in which SR climbs and falls can be described by the concept of a random walk. For a simple example, say you’re standing on a sidewalk and flip a coin. For heads, you take a step forwards, and for tails, you take a step back. Where do you end up after 100 coin flips? 1000? This concept of a random walk can be used to describe things like stock prices and the physics of diffusion.

To put an example in the context of Overwatch, say your SR is right in the middle at 2500 out of a maximum 5000. If you have a 50% chance to win every game and play 300 games over the course of the 90 day season, what is your final SR? How likely is it that you climb 500 SR, which is the amount needed for the next rank? (Overwatch has ranks at different tiers called gold, platinum, diamond, etc.). I tried to answer these questions with Monte Carlo simulations, meaning I simulated a large number of season SR paths and calculated probabilities from the distribution of results.

To demonstrate, here are some simulated SR paths where the probability of winning was a constant 50%, starting from 2500 SR and assigning +25 SR for wins and -25 SR for losses. Each path is has different color in this plot, and the maximum season SR over 300 games is marked with a dot.

You can see each of these SR paths peaked at different times during the season. Since the next highest rank (diamond) occurs at 3000 SR, then only the blue path made it to diamond in this simulation.

If I repeat this process many many times, I can take the maximum SR of each path and get a distribution of season peak SRs. Below is a histogram of this distribution for 100,000 random walks when the win probability was always 50%.

You can see from this distribution that most paths had a peak below 3000 SR, i.e. they did not climb to diamond. But a significant fraction still did (0.25 or 25% in this case). This fraction is what I use below to estimate the probability that a player can climb a complete rank through pure chance. That’s the power of Monte Carlo simulations.

Making the simulations more accurate

One thing to notice in the simulations above is the tendency for the paths to diverge, i.e. if a path shoots up after a string of wins, there’s nothing to force the SR back to 2500. In reality, if a player gets many consecutive wins, they’ll likely find themselves playing with much more skilled players, which should eventually decrease their rank back to the “true” value. The difficult thing to know is whether a climb is due to a true increase in skill or merely chance. In a simulation, though, these are easy factors to separate.

To see the contribution of chance to climbing, I compared three win probability conditions:

  1. Flat – The win probability was always 50%, i.e. a purely random walk.
  2. Shallow – The win probability decreased linearly by 5% for each 500 change in SR
  3. Steep – The win probability decreased linearly by 10% for each 500 change in SR.

So for the shallow and steep conditions, there will be a normalizing force that will tend to drive the SR paths back to 2500 once they start to diverge.

Here’s what these win probabilities look like as a function of SR. The lines are centered at 2500, since that was the assumed “true” rank for these simulations.

Other Technical Details

  • Points per game – In the simulations above I used a constant 25 SR per game, but for the full set below, I made this value another random number. Specifically, it was a normal random number with μ = 23 SR and σ = 4 SR, i.e. 95% of the values ranged from 15 to 31 SR. These parameter estimates are from a personal data set of mine that I recorded from season 2 of about 120 games. I think these estimates are still accurate for more recent seasons.
  • Games per season – Here I used another three conditions: 100, 300, and 500 games per season. Since each season is ~90 days, I think these values span the range from casual to avid Overwatch players. If anything, 500 games might be an underestimate, since I’ve seen streamers with >1000 games in a season, so >11 games per day. That’s a lot of Overwatch.
  • Draws – I decided to leave out draws for the time being, because not only are they rare, but Blizzard is continually making changes in order to make draws as rare as possible. In my season 2 data set, <8% of my games were draws, and this was before Blizzard made their changes.
  • Streaks – Blizzard has a streak system in place to allow players very far away from their true SR to climb quickly. However, the details of this system are unknown, and it was also recently changed to become less significant. Therefore, I decided not to consider streaks.

Results

To demonstrate the different win probability conditions, here is a sample of 50 paths from the Flat and Steep conditions over 300 games. It’s impossible to distinguish individual paths with so many on the same plot, but you can get a sense of the distribution. The steep paths rarely go above 3000 SR, whereas a few of the flat paths do manage to climb ranks.

To estimate the probability that a path climbs a full rank (500 SR), I generated 100,000 paths for each condition and measured the fraction of maximum season SRs above 3000 SR. The starting point of 2500 SR was arbitrary. These results can be applied just as easily for a climb from silver to gold (1500 to 2000 SR) or diamond to master (3000 to 3500 SR). The only place I wouldn’t trust these results are for the very edges, e.g. a climb from 4000 to 4500 SR, since there are so few players at these high ranks that matchmaking becomes a bit of a mess.

Anyway, here are the results. The numbers reported are fractions, e.g. a value of 0.21 means there is a 21% chance of climbing a full 500 SR through chance alone.

There are two main trends. (1) Playing more games always increases the chances of climbing. It seems very unlikely to climb by playing only 100 games. (2) As the slope of the win probability curve increases (i.e. a stronger normalizing force), it becomes more and more difficult to climb through chance alone.

I was surprised by how large some of these probabilities were. Considering that tens of millions of people have played Overwatch, it certainly seems possible that some people have managed to climb ranks without any significant increase in actual skill.

But what if skill actually increases?

I next wanted to consider the cases when the “true” SR does increase, which should theoretically happen over time with more games played. The easiest way to mimic this in my simulations was to shift the win probability curves to a higher SR, so the point at which the win rate equals 50% will increase over time.

I repeated the previous set of simulations for two more conditions.

  1. 500 SR increase – Over the course of 500 games, I linearly increased the true SR by 500 points. This means that for the 100 game season, the true SR only increased by 100 points.
  2. 1000 SR increase – I linearly increased the true SR by 1000 points over 500 games, so by the end of a 500 game season, the true SR will have climbed two complete ranks.

Just to see what these SR paths look like, here is a sample using the steep win probability curve, this time with 5 paths for each of the rank increase conditions. There’s still a significant amount of individual variation, but overall there’s a gradual increase in SR over time, with a larger increase in the 1000 SR condition.

I next recalculated the probabilities of climbing a single rank with 100,000 simulated SR paths. Here are the probabilities for the 500 true SR increase.

The flat probabilities remained the same, as they should since shifting a flat line doesn’t change anything. The probabilities for 100 games also did not change much, since 100 games was not enough time for the true SR increase to kick in. However, there are significant increases in the 500 game probabilities, e.g. an increase from 5% to 62% for the steep condition. Interestingly the steep probability actually surpassed the shallow case (62% vs. 47%). This is counter-intuitive since the steep case allows less variation, but it actually benefits with a true SR increase, since the force driving the SR to the true value is stronger.

Lastly, here are the probabilities for a 1000 SR increase. Note that the probabilities are still for climbing a single rank, even though the true SR increased by two ranks.

Again, the chances for climbing in only 100 games are practically zero, even with this relatively quick increase in skill. I think this supports the claim that it’s quite difficult to climb as a casual player. On the other hand, chances of climbing with more games are much higher, and practically certain for 500 games. I think the most extreme examples of this are the Twitch streamers that do a “Bronze to Grandmaster” stream. These players climb very quickly, but their true SR is much, much higher than their opponents, which makes winning almost certain.

Conclusions

So does “Elo hell” actually exist? It’s hard to conclude anything certain with this data since this is all essentially a glorified thought experiment. But there are a few points in which I’m fairly confident.

  • Due to the nature of random walks, it’s possible (but unlikely) to climb a full rank through pure chance.
  • Climbing through chance alone is highly sensitive to how the win probability depends on SR. Even a 10% decrease at a higher rank can make it virtually impossible to climb with pure chance.
  • If your actual skill increases over time, climbing is much easier, but still not guaranteed. It’s only certain after many games or large differences in SR, e.g. it’s easy to climb to platinum if you actually deserve diamond.
  • Playing more games always increases you chances of climbing. Said another way: it is extremely difficult to climb when only playing casually, even if you get better over time.

Now onto more speculative conclusions: does Elo hell exist? I think for some people, it has. Of the millions of people playing Overwatch, some unlucky players might have legitimately increased in skill but not climbed due to chance. However, I think the chances of this are extremely rare, to the point that nearly every person complaining of Elo hell is actually at their true SR and just venting. For those very few unlucky souls where it’s actually true, hopefully they just keep playing, because they will inevitably climb after more games.

None of these conclusions are probably surprising to those that play a lot of competitive Overwatch, but I still think it’s useful to consider how big a role chance can play in a skill-based game. I was certainly surprised at how high some of these probabilities were. For another perspective on this skill vs. chance trade-off, check out this super interesting video from Vox on professional sports. Or just play more Overwatch. After all, you can’t git gud if you don’t play.

UPDATE:

After getting some feedback, I decided to run one additional set of simulations. Specifically, what happens when you start the season with a true SR at a higher rank, i.e. your true rank is 3000 but you’re trying to climb from 2500. I kept the true SR at 3000 for the entire season and recalculated the probabilities of reaching 3000 with the actual SR.

I also added another win probability condition, called Very Steep, which has twice the slope of the Steep condition, i.e. the win probability decreased by 20% linearly for a 500 change in SR. I think the true shape of the win-probability curve is more complicated than these simple lines, but they do illustrate a spectrum of normalizing forces. In reality, the curve is likely shallow for nearby SR, steep for larger differences, and then very steep near the edges.

Here are the probabilities for climbing when the true SR starts 500 points above the actual SR and remains constant throughout the season.

Interestingly, even with the very steep condition, the chances of climbing after 100 games is still <50%. I believe this very steep condition overestimates the normalizing force (especially near one’s true SR), so keep that in mind.

The story changes quickly with more games, and the chances of climbing are much higher after 300 games and especially after 500 games. This supports the conclusion I made earlier, that even if you’re unable to climb even while getting better at the game, it just takes more time.

Leave a Reply

Your email address will not be published. Required fields are marked *