In this chapter, we’ll learn the basics of A/B testing – the experiments that will win the game of growth for you. Remember, the more A/B tests and experiment you do, the closer you are to the success.
A/B testing is not only crucial but mandatory for organizations to avoid big failures and losses, and rather adopt a methodology to grow with smaller but more frequent successes.
In a way, the process behind A/B testing is very similar to that of growth marketing – smaller but more frequent experiments leading to a massive optimization or growth. This process helps you to make better and trustworthy decisions.
In this chapter, we are going to study planning and executing A/B testing as discussed in the course by Ton Wesseling (A/B testing expert and instructor at CXL Institute).
🛡️ You are reading “A/B Testing: A Game of Growth (Chapter 4)” – a series of articles on growth marketing. To read the first chapter of this series, go here 👉 Fundamentals of Growth Marketing: A Game of Growth (Chapter 1)
What is A/B Testing
A/B testing is the process of comparing two versions of the same webpage, email, or other digital assets – a challenger (B) and a control (A) version to determine which one performs better on a pre-defined goal, which could be sign ups, open rates, purchases, clicks, etc.
Please note that challenger B can have more than 1 change. Most people think that during your A/B experiments you can only change 1 thing to be able to understand the reason for the improved results. But we’ll see later on why it’s okay to have more than 1 change.
Planning A/B Testing
Planning your A/B tests are as important as executing them, if not more. Before you begin A/B testing, keep in mind these 4 questions that are very critical to the success of your A/B testing program.
- Do you have enough data to conduct A/B tests?
- Which KPIs to follow?
- How to form hypotheses?
- How to prioritise A/B tests?
1. Do you have enough data to conduct A/B tests?
One of the biggest mistakes digital marketers make while conducting A/B testing is to identify false-positive as their winning experiment. Most often, this is because they either don’t have enough data to find a statistically significant winner. But also, because they run their tests without any hypothesis.
To answer this question, let us first understand the ROAR model, represented by this graph which has Conversions per month on its y-axis and time span on its x-axis.
This model divides a company’s growth into four phases i.e. Risk, Optimization, Automation and Re-think (ROAR) based on their conversions per month. The graph represents the growth of the business. A conversion can be leads, purchases, clicks, transactions, downloads, etc.
If you’re a new business and has lower than 1000 conversions per month, you are in the Risk phase. At this stage, you can not run A/B tests because the conversions are so low that it will be difficult to find a statistically significant winner of your tests.
Between 1000 and 10,000 conversions, you are in the optimization phase. This is where you should be A/B testing more and more, especially if you are closer to 10K conversions line.
For your winner to be the statistically significant winner, it has to beat the control by 15%. It means that if the control has 100 conversions, the challenger has to have 115 conversions. This is a huge uplift in the conversions and it happens very rarely.
Whereas, if you are in the 10K conversions range, you need to have an uplift of 5% to declare a winner. This is more achievable. Use this calculator https://abtestguide.com/abtestsize/ to find out how much conversions uplift you will need to declare a winner.
In risk phase, you will need to just do more research, run growth experiments by coming up with hypothesis and following basic guidelines to increase your conversions up to a level where you can start running A/B tests on them.
Also, during optimization, you can have just one person who can run these A/B tests but once you cross the 10K mark and reach the automation phase, this is where you will need a full team to manage the tests.
2. Which KPIs to follow for A/B testing?
There are a number of KPIs that you could optimize your tests for, mainly
- Clicks: increasing the number of clicks on a button can be a KPI to measure but a significant uplift in clicks doesn’t really translate to uplift in revenue or growth of your business.
- Behaviour: Shifting behaviour can be optimized for especially if you are low in transaction numbers. Reducing the sales cycle by adding a free-trial offer is an example of shifting behaviour. This is better than clicks but again not all behaviours can be easily associated with higher revenues.
- Transactions: This is a good KPI to follow – whether it is leads or purchases. It’s easy to measure and translate into business growth. However, there is a caveat in measuring the transactions. You can always increase the number of leads or transactions by lowering your price or giving products away for free. So this KPI alone should not be measured.
- Revenue per user: A higher level KPI than transactions, it takes into account the revenue earned per user so that you’re not optimizing for money-losing transactions. However, this might be difficult to measure with A/B testing.
- Potential Lifetime Value: Another higher level KPI that you could optimise for. It is harder to measure potential lifetime value metric, but this is what you should be striving for. Also, it is not an immediate goal that can be optimized.
The best way to define metrics in a large organization is a weighted sum of smaller key metrics that can give you an overall evaluation of business growth. It is called the Overall Evaluation Criterion (OEC).
3. How to form a hypothesis?
Building your hypothesis is very important in order to bring everyone on the same page in terms of the experiments you’re conducting. A hypothesis answers the question: Why are you running this experiment?
There are three parts to a hypothesis:
- Proposed solution
- Predicted Outcome
A hypothesis looks like this:
If (I apply this), then (this behavioural change) will happen among (this user group) because of (this reason).
An example of hypothesis is:
If (I add real estate case studies on the homepage), then (the conversions will increase) among (the real estate agents) because (they will see the results we brought in for other real estate agents who are their peers.)
A hypothesis helps you to keep a focus on why you are running an experiment and once you have the results, what actions you should take.
4. How to prioritise A/B tests?
Two of the most popular frameworks to prioritise tests are:
i) PIE – Potential x Importance x Ease
ii) ICE – Impact x Confidence x Effort
They are both similar models and most growth marketers or A/B testers use them to simplify the prioritization of your tests.
But there’s one thing missing from these models and that is the location of the tests – where to apply the tests so that it has a higher impact. To solve that, I want to introduce you to another model which is called PIPE model.
P (Potential): What is the chance that the hypothesis is true?
I (Impact): Where would this hypothesis have a bigger impact?
P (Power): What are the chances of finding a significant outcome?
E (Ease): How easy it is to test and implement?
In the PIPE model, you evaluate the potential of a hypothesis based on various locations and 6-V model, which we are going to learn in another post.
Executing A/B Testing
Executing your A/B test is not just slapping your pages codes in a tool and let them run for a set time. Every A/B test, especially if you work for a large organisation or have a high traffic website must go through these 4 steps before you start the experiments.
iii) Quality assurance
1. Design A/B tests
- Design just 1 challenger
- Come up with as many hypotheses as you can. But be aware of the implementation costs.
- The change should be visible, scannable, and usable. Don’t bother changing things that do not meet these criteria.
- Follow the usability guidelines and best practices to do the minimum, and then test on top of that. For example, if you’re adding reviews on your website, use an Amazon style review color scheme. The majority of the users are used to reading and interacting with reviews on Amazon. But once you do that, test if you can improve the conversions of that control item.
- For optimization, if you make more than 1 change in element, it is okay. If you want to move fast, it’s okay to make bigger changes in more than just 1 element. Only if you want to prove that a particular hypothesis is right or wrong, you should have one change. But otherwise, it’s okay to combine several hypotheses together and run the experiment. If you see there was no change or negative change, then you investigate further to test those hypotheses individually.
- Always be sure that the design change is aligned with the hypothesis. Sometimes, you design a new change but it is not exactly what the hypothesis stated. That’s a big mistake.
- Consider the Minimum Detectable Effects (MDE). That is how big the impact must be to have a high enough chance to be detected as real. With higher conversions, you only need to take smaller risks to have a significant impact. With lower conversions, you need to have a greater impact and thus more risks.
2. Developing your A/B tests
- Use WYSIWYG code editors ONLY if you’re changing smaller things. For bigger changes, you’d want to have someone who can make changes in the code.
- If your code is too lengthy and you can’t code it within the time limit, propose design changes.
- Consider injecting the code into the client-side default code. This is so that both the codes have the same user experience. This way you’re sure that the challenger code doesn’t lose out because it was slower than the default.
- Add analytics events to your code
3. QA your A/B tests
- Test only the major devices and browsers to see that your experiment is running properly in main browser/device combinations. Don’t need to spend time testing for every device/browser unless you have a very high conversions
- Test if your experiment still functions if you remove one element from the page. Since most variations are built on top of the existing elements on the client-side, sometimes if you take out an element, the experiment breaks. Example: if you have 3 images and you remove 2 of them, does it still function or does it break?
4. Configure A/B tests in your tools
Now, it’s time to configure your A/B tests in the tool of your choice. There are several tools available in the market.
Most popular ones are:
- AB Tasty
- Google Optimize
Which tool you should choose is of another discussion, but Google Optimize is a free and easy to use tool that is most popular. It’s a great tool that does most of the job, but there is a slight learning curve to it.
Configuring the experiments is not that hard, and is almost the same in most of the tools. Here’s how it’s done in Google Optimize.
Step 1: Name your experiment, choose the URL and the experiment type.
Please note that you should keep a separate file to save the experiment information such as who created the experiment, where is it implemented and what are the changes made. This way you will be able to easily know what these tests are about.
Step 2: Create 2 variants of the test – Default and Challenger
Note that Google Optimize creates a default option for you from the start. So you will actually see 3 variants with 33% weights – the original default, the default that you just created and the challenger variant that you created.
The reason for this is that you don’t control the code library of the Original variant, which might have extra code or something that can provide a different user experience. When you create a new default and challenger, you control the code library of both the variants. With the same code, you can provide the same experience to your users.
Step 3: Set the weights for each of the variants
Since you have 3 variants with 33% weights each, you should add 0 weight to the Original, 50% to Default and 50% to Challenger.
To do this, click on the text “33% weight” which should open the side window to edit the weights. In this window, choose “Custom Percentages” as the option which will allow you to edit the weights.
Set the Original to 0% and you’re good to go.
Step 4: Add the experiment codes in both the variants and conduct your tests.
Simply add the code that you developed in your default and challenger variants, and start your experiment.
How long should you run A/B tests?
Always run your tests for 1 week/ 2 weeks/ 3 weeks/ 4 weeks. This is to avoid any variations due to the day of the week.
If you run tests for only 4 days or 15 days or 24 days, you might have variations in numbers due to the difference in traffic and their behaviours on weekends. If your site gets more traffic on some days, that might also impact the variations.
Your tests shouldn’t run for more than 4 weeks because then you’re taking too long to find your winner and at this point, you’re slowing down the test frequency or maybe you don’t have enough conversions.
Don’t run the tests for only 1 week also. Because then you are not sure if the winner is statistically significant and whether it’s a true winner.
In order to be sure that your winner is not a false-positive, you must have a significance of 95%.
Use this calculator to find out how long your tests should run: https://abtestguide.com/abtestsize/
Now that you have learnt how to plan and execute your tests, you should get started and create one simple test on Google Optimize using what you’ve learnt in this article.
The only thing left to study is what to do after the tests are done, which we will write in another blog post.
If you want to learn more about A/B Testing, join the A/B Testing Mastery Course from CXL Institute. This course is more than 5 hours long and goes in-depth into the ins and outs of A/B testing.
You will learn things like the maths behind identifying a winner, how to shorten the length of a test, how to monitor your tests and report the results, etc.
This is the most detailed and comprehensive course on A/B Testing I have come across. It’s simple to follow given the fact that A/B testing as a subject is a very complicated one. My recommendation is to watch it multiple times and each time, you will grasp something you missed earlier.
If you have questions related to this article or the A/B testing master course by CXL, please leave a comment below or reach out to me.
🛡️ You just finished reading A/B Testing: A Game of Growth (Chapter 4) – a series of articles on growth marketing. To read the next article, click here 👉 Questions to Ask While Using Google Analytics: A Game of Growth (Chapter 5)