Monday, February 10, 2014

How likely are you to get 12 wins in the Hearthstone Arena, given your skill level?

Blizzard recently released Hearthstone, a TCG-style video game similar to Magic the Gathering.  Hearthstone has a play mode called the Arena where the player assembles a deck out of random cards and uses it to play against randomly-selected other players.  The player plays games until they lose 3 times (or win a maximum of 12 games).  The player is rewarded depending on the number of wins they get.

A few weeks ago on reddit, there was a post titled "How hard is the Arena? The answer, with Math (TM)", which showed how likely the different arena outcomes would be if each game was decided randomly.  You should read the full post  -- there are some great points about how unlikely 8+ wins is, and how outcomes that some players view as devastating, like 1-3, are actually common.

However, there is something important missing from that analysis -- some decks are better than others!  Here, I'm going to extend on that analysis by incorporating the strength of each deck and the player's skill.

Let's represent each player’s power using a number between 0 and 1.  A player’s power is the fraction of players that player is stronger than.  The weakest player has power 0.  The strongest player has power 1.  The average (median) player has power 0.5, and so on.  Note that this power incorporates both the player's skill and the power of their deck.  Therefore whatever your skill level, you will have a different power level each time you run the arena.

We’re going to make three assumptions about the game:

1) In the arena, you always play someone with the same win-loss record as you.  Blizzard has said they try to perform matchmaking to make this the case, although it is not true all the time.

2) The advantage a player has is proportional to the difference in their power values.  The best player (power 1) has the same advantage over the average player (power 0.5) as the average player has over the worst player (power 0).

3) Prob[A beats B] = Logistic( X * (Pow(A) - Pow(B)) ) .  The Logistic function is a function that converts a number into a probability value, where Logistic(0) = 0.5, and Logistic(Y) gets closer to 1 the larger Y is, and closer to 0 the more negative Y is (see plot below).  (Note that Logistic is symmetric, so the Prob[A beats B] = (1 - Prob[B beats A]), as we would expect.

The value X determines how important power is to determining the course of the game. If we think hearthstone is totally luck-based (like the card game “War”), we would set X to 0, meaning that the outcome of every game is 50-50, regardless of the players’ skills.
If we think hearthstone is very skill-based (like Chess, say), we would set X to a large number, so that if A is even a slightly stronger player than B, A has a very high chance of winning.  From my intuition, I think X=5 is a reasonable value -- the results below use this value.  However, I computed all the arena outcome probabilities for values of X between 0 and 100.  Here is a table of win probabilities given different power differences for X=5:

050.00%
0.0151.25%
0.162.25%
0.2577.73%
0.592.41%
0.7597.70%




Given just these assumptions, we can compute exactly how likely each arena outcome is for a deck of a particular power.  To do that, we start off with all the players at 0-0, with equal frequencies of players of all powers.  (In my calculations, I group the players into 1000 bins).  Then, for players of a given power, we compute the chance that player encounters a player of each other power and the player's likelihood to win against them.  That gives us the fraction of the players at a particular power that will move to 1-0, and how many will move to 0-1.  We repeat that process, calculating how many get to 2-0, 1-1 and 0-2, and so on, all the way up until 12-2.

From that calculation, we can see what the chance of different arena outcomes are for players of different power levels.  First let's look at the outcomes of the average player (power 0.5):
1-316.69%
2-328.24%
3-327.17%
4-315.52%
5-35.63%
6-31.42%
7-30.28%
8-30.04%
9-30.01%
10-30.00%
11-30.00%
12-20.00%
12-10.00%
12-00.00%

As we can see, the average player gets between 2 and 4 wins.  It's worth noting that, unlike the case where all games are decided randomly, the average player is very unlikely to get 0 wins, and it is virtually impossible for them to get 12 wins.  This is because, as the player performs poorly (or well) on the first few games, they get paired with weaker (stronger, respectively) players, pushing their outcome closer to the average.

Now let's look the chance of outcomes for a strong player (power 0.9):
0-30.14%
1-30.96%
2-33.62%
3-38.83%
4-314.72%
5-317.92%
6-317.10%
7-313.62%
8-39.49%
9-36.00%
10-33.53%
11-31.96%
12-21.62%
12-10.42%
12-00.06%

The stronger player generally gets between 4 and 9 wins.  However, even strong players rarely reach 12 wins.  This is because virtually all of the decks at 8+ wins are also very strong.

It's worth noting that extreme outcomes (0-3 or 12 wins) are somewhat more common in the real game than they are according to this analysis because of the fact that you aren't always matched to someone with identical arena records.  This probably doesn't make much difference for common records (like 1-1), but it could make a big difference for rare records like 10-0.  In those cases, you're likely to be matched to a deck with a worse record than yours, and therefore have a higher chance of winning and going on to 12 wins.  Common outcomes (e.g. 3-3) are (very) slightly less likely due to the same fact.

You can view all the results in this spreadsheet.  The spreadsheet shows the full outcome probabilities for many different skill levels and what skill levels you're likely to encounter at different arena records, all for multiple different values of X:

You can see my Python code I used to do the calculations on github.

What do you think of these statistics?  Do they influence the way you play the Arena, or the way you feel about your results?  Leave a comment below.  Also, see more comments on reddit.