In this post we will talk about the consequences of ending your A/B test too soon. The idea came to me when I saw some A/B and MTV tests case studies, which after running only a few days were declared as winning by the testing tools. Then I got ‘lucky’ and came across this very good example, as one of our tests we were running turned out to be a winning one on the following day. Mind you, this was a site with a high volume of traffic.
Let’s have a look at this actual example below. The following day after we launched the test, our testing tool declared a winner. According to our testing tool, we improved our conversion by a respectable 87.25% with 100% confidence level. Great! Well, not really. What’s the issue?
Technically, if you input the data (conversions & visits) into any statistical tool, it would show that this test was statistically valid. So seems like no issue here. However the issue is that the test didn’t run for long enough.
If we stopped the test then and pat each other on the shoulder about how great we were, then we would probably make a very big mistake. The reason for that is simple: we didn’t test our variation on Friday or Monday traffic, or on weekend traffic. But, because we didn’t stop the test (because we knew it was too early), our actual result looked very different.
The actual test result after 4 weeks of running was 10.49% improvement with 99% confidence level. The actual results differ from the initial ‘winning’ result by -731.74%. How is this possible? The reason is, every day you receive different traffic to your website and each day’s traffic behaves differently too.
Now, back to the consequences if we stopped this test then. Let’s say you were running this test in checkout, and on the following day you say to your boss something like “hey boss, we just increased our site revenue by 87.25%”. If I was your boss, you would make me extremely happy and probably would increase your salary too. So we start celebrating, but at the end of the month, instead of having 87% more money in our bank account, we see the same money we had last month.
To avoid this type of blunder, always be patient and run your tests for a minimum of 2 weeks with recommended maximum of 6 weeks and confidence level no less than 95%. Also, once your testing tool declares a wining variation, don’t stop your test immediately. Run it for another week to see if the result is solid. A solid winning variation should, during this ‘control’ week, hold its winning status. If it doesn’t, then you haven’t found your winning version.
If you test like this, you will keep bringing sustainable, solid improvements to the site and results you can rely on.