September 24

A / B Testing

abtesting

Games these days aren’t just a single self-serving offering, but actual services that evolve adapt and grow over time. After a game is launched, a Producer or Product Manager (often more than one work on a title) is responsible for optimising game performance by increasing KPI’s (Key Performance Indicators) to improve monetisation / retention / conversion / virality.

In order to do this, A/B Testing is one of the best methods available to optimise live games.

A/B Testing – The General Idea

With A/B testing, a PM wants to try to make one more changes to try to improve an aspect of their product. For example, they might want to change the prices troops cost in their game to see if more money can be made if the troops are more expensive than before. To successfully A/B test, the following methodology is used:

Identify a Problem – Use dashboard and metrics to isolate weaknesses in a live game, e.g 1-day retention rate.

Hypothesis – Think of an experiment that will have a positive impact on the weakness by changing as little as possible with the game. For example, the tutorial is too long and complicated and if the tutorial is shorter, more people will come back to play the game after one day.

Experiment – Create two sub-sets of players, a control group and a change group. The changed group has the version with the changes in it.

Run the experiment for a predefined set of time – The time the experiment should be run for depends on the type of experiment. In this example, we are testing 1 day retention rate, so a testing period of 1-2 weeks is more than adequate.

Analyse Results – Look at both sets of results and compare against each other and with the rest of the game performance. In this example, we find that the 1-day retention rate improved by 5% on the test group.

Conclusion – Keep the change, drop it, or refine the change further. In this example, we see that the retention rate increased by 5% and monetisation also improved by $0.01 ARPDAU. We can make the conclusion that changing the tutorial meant more people played and as more people played more spent.

Caveats

• Use at least 10K DAU if possible for statistical significance.
• The time used to run the experiment depends on the type of test. 1DRR can be measured over a week, 30DRR might take a few months.
• Be careful when drawing your conclusions. For example if you run a Hard Currency sale, more people may spend. However, how do you know that the people that spent money won’t have spent money if there was not a Hard Currency sale? Don’t jump to conclusions, refine, re-test and analyse.
Advertisements