Multivariate and User testing – What they don’t tell you!

We’ve recently undergone some rather intensive user testing and are following it up with a similarly intensive multivariate test. For anyone new to the field, multivariate testing is a method through which you can test out various configurations of your website (different images, different text, different colours etc..) in order to see which combination works best. This is not to be confused with A/B testing, which is where you have two or more completely separate pages and test those against each other: here we are talking about one page, with various sections which get switched out and in.

There are various tools available to implement this testing – some of which are free. By far the best free tool I’ve been exposed to (and the one I’m using for the current testing I’m doing) is Google Website Optimizer (GWO). If your company is anything like mine, they’ll basically entrust all of the consultation and advice giving to an external marketing company who will sit there stating the obvious for days on end before letting you do any actual work. Assuming that part is out of the way, there are many things that you really can only learn actually from doing multivariate and user testing.

This is therefore a guide from a developers perspective, including the gotchas and some of the bitter truths you’ll need to face when you start a test. What this article will not do is shower you in little “secret tips” like many of these types of guide profess to do. Without the aid of clairvoyance, it is impossible to know anything before you test. If it was, there would be no need for any of this. By all means go and read the psychic ramblings of some tight jeaned hipster, but when you’re done, come back here for the truth 🙂

There will be loud voices

Multivariate testing is very, very interesting, and for this reason it will attract the attention of every marketing person who is involved. It is a very simple topic to grasp in principle, but is very deceptive in that respect. The bits that look hard, such as actually implementing the test itself in code, are actually very easy. The bits that look easy, such as deciding what to test and commenting on the results, are actually incredibly difficult. Herein lies the problem!

If you take the time to try and explain this difference, you’ll probably find that a queue of people form who are more than willing to “take on” that mantle, naively believing that they posses some kind of inner knowledge that nobody else has which allows them to instinctively know about strategies and analytics. I’m not in any way suggesting you go to the mattresses with your marketing team, but anyone who steps up and immediately starts making suggestions based on hunches and feelings basically needs to be subdued asap!

The important thing to take away from this is that these strategies are all very well documented; even google’s site itself is an absolute goldmine for research. I’m not going to go into all that here, because the information is out there already. All I will say is that you should use these resources and make sure anyone involved uses them too.

Multivariate testing can take place over multiple pages

GWO is capable of running tests over multiple pages. Let’s say, for example, you wanted to test the logo on your site, and you had four different ideas for a design. The logo appears on every single page on your site, but during the GWO test setup wizard it will all be explained in the context of this happening on one page.

All you need to do in this case is pick one page that the logo appears (or whatever it is you’re testing) and do the setup on that page. As long as the javascript tags surround that section across your entire site, it’ll make it into the test. The tracking script should also appear on every page. The only one you don’t want to duplicate is the conversion script, which should only appear on your conversion page.

Exclude some IP ranges

In our organization, we have a call centre who take orders over the phone. When an order comes through, they place the order using our website. At all costs, you need to exclude these people from the tests! Let’s say you have one, really good call centre employee (for the purposes of this demonstration, we’ll call him “Matthew”). Matthew is so good, he converts every single call he gets into a sale. Upon the first visit he makes to your new “multivariate-test enabled” site, Mathew is given a cookie indicating one of your test configurations.

Over time, this configuration appears to be winning – but it’s actually not. The customer on the end of the phone can not see what Matthew sees, and Matthew is going to make the sale whatever the site looks like. Your data is essentially contaminated. Equally, you’ll be contaminating it yourself every time you visit the site for other developments (your data entry people, for example, will be given configurations which they may never use to book but which are still contributing to your statistics).

The way to avoid this is basically to exclude the tracking and conversion scripts from your internal people. How you do this basically depends on your own organization’s infrastructure. I managed to achieve it in by excluding the tracking and conversion scripts using a Placeholder control which showed/hid them based on IP addresses. Remember to exclude the “home” IP address as well, otherwise you’ll be contaminating the data on your development machine too.

If you do this, you’ll lose (internally) the ability to preview your combinations in GWO. A small price to pay for clean data.

Don’t run more than one test at a time

GWO will let you run multiple tests at once. Don’t do it! Imagine one test was to check out various logo designs, while another was to test out various basket pages. You have no way of knowing, from the results, how one of the tests has interacted the other. For example, you might end up completely serendipitously with every customer who sees the good logo ending up on the crap basket page, or vice versa, which will completely skew your results. You don’t want this!

Multivariate testing can take a long time

You can test multiple sections on your page, and each section can have multiple variations. You therefore end up with a number of “combinations” for the page (blue logo with yellow button, yellow logo with blue button…). Even if you keep it as simple as doing a test with six sections, each with one variation, you are left with a whopping 64 separate combinations!

How long it will be before you see results will very much depend on how much traffic you are getting. GWO itself includes an intelligent recommendation system and my advice would essentially be to follow it to the letter. It can be very tempting to make assumptions early on – after a week it can look like you have a clear winner. LEAVE IT.

Tests can take months before you any conclusive trends emerge. The process can be sped up through various methods. GWO, for example, includes a big-brother style eviction facility, where poorly performing combinations can basically be excluded from the test if their conversion rates become obviously worse than the rest. The main thing to take away is that if you are looking for “quick wins” then you shouldn’t be using multivariate tests to identify them – it takes time and if you stop them early you’ll gain nothing.

One piece of advice I was given a while back is “test little, test often”. I wouldn’t say that was a hard fast rule, but if you do test lots, don’t test often.

It might not just be the design

Marketing people are often very, very quick to blame low conversion rates on the look/design of a website. While this can make a difference, It is not the only avenue worth exploring. If you’re finding that every design test you do results in little or no improvement, try testing things like images or, most importantly, copy.

The marketing people where I work are quite big fans of flowery language and coupling up every word with an adjective. They also are fans of waxing lyrical about how much of a saving you will make if you buy OUR product, and how WE want you to be satisfied – it’s all very self obsessed and needlessley long. I managed to convince them to run a test to change to purely factual copy, which tells the customer just what they’re getting and doesn’t pad it out with extra words. This version is currently showing a 15% uplift – I would hypothosize that therefore people ARE reading the copy and they JUST want to know what they’re getting.

Be prepared to be shocked

It’s a bitter pill to swallow, but sometimes you just need to accept that the “rubbish version” is the version you need to use. You don’t go into e-commerce to gain the admiration of your peers or to embarrass your competitors: you go in to sell more kit than they do. Along with the multivariate testing we undertook some user testing – basically this included a set of users coming in and using our site. Unbeknownst to them, we were sitting just up the hall watching their every move, and frankly we couldn’t believe some of the crap and abuse they were coming out with.

You are a web-person. You aren’t most people – most people are idiots. This forces us to the terrible conclusion that all of the beloved best practices and modern techniques that you spend many hours learning may need to be thrown in the bin. If a big ugly border, heavy images and comic sans sells more kit, you basically need to use them. Ignoring this fact is like pissing in dark trousers – you get a nice warm feeling but nobody notices, and you smell afterwards.

There will also be times when you just need to accept that what you changed (the button colour, the size of the image etc.) simply makes absolutely no difference at all. Multivariate tests do take a long time to yield definitive results, but I have experienced on many occasions that sometimes, the customer just doesn’t care.

Google might not be able to see your site

We had an issue with one of our sites where basically the GWO wizard went crazy and said it couldn’t see our sites. If you get this, just proceed on the assumption that it can. The way we get around it is just basically uploading dummy notepad files to the google server with the scripts on (although, if you do this, be sure you’ve installed all the scripts okay for real). In fact, in the case of most conversion pages, you’ll want to do this anyway as generally you can’t just “visit” the thank you page.

Doing it this way around has no negative impact on the test at all.

Don’t expect miracles

Multivariate testing is an invaluable resource in improving your conversion rate – however it does require a LOT of patience. You will hear many folk stories about web developers who increased their conversions by 20 or 30 percent from changing the colour of a button. In my experience, this is not normally the case. More realistically you’ll be looking at a 1 or 2 percent increase, which you then implement and move on to the next test. Innovation can come over time, it doesn’t nessecarily need to be an instant “boom” moment.

This is why it’s also important not to “peek” at the results. Make no mistake – when you first run the test, the first week will show massive improvements with a clear winner. These statistics should be disregarded at all costs – there exists an element of luck in which users get which combinations and some will work for some and not for others. In a nutshell, those combinations that appear to be streaking into the lead initially could just as easily end up being the absolute losers. This brings me back to the earlier point about not making any drastic decisions too early – you need to let the test settle down.

That’s not to say your 20/30 percent increase won’t happen, but if you think it has happened you need to be damn sure about it before taking action

Be prepared for inconclusive results

While the online world of “influential authorities” seem to be falling over themselves to tell you about the importance of testing, in reality you will probably find that most of your tests actually end up being inconclusive (or, with very little difference). One of the tests we performed recently was a copy test on a product page which was barely English – it was re-written to very high standards and a test was performed. After 3 months, there was very little between the variations, with the original ever so slightly in the lead. There are two conclusions we could draw from this;

  • Illegible, gramatically inaccurate copy doesn’t matter at all
  • Illegible, gramatically inaccurate copy doesn’t matter to our customers

For whatever reason, be it lack of traffic volumes or just that the customer is only focused on price, there will be cases where no-brainer tests (which marketing people will force you to implement all the time, by the way) will fail to perform.


I’m probably going to re-visit this article over time as I learn more, but hopefully this should assist anyone who is undertaking multivariate testing in a practical environment.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s