Experiment Driven Web Publishing
Do users find big headlines more relevant? Does using long text lead to more, or less, visitor engagement? Is that latest change to the shopping cart going to make things worse? Are your links just the right shade of blue?
If you want to put an end to tiresome subjective arguments about page length, or the merits of your clients latest idea, which is to turn their website pink, then adopting an experimental process for web publishing can be a good option.
If you don’t currently use an experiment-driven publishing approach, then this article is for you. We’ll look at ways to bake experiments into your web site, the myriad of opportunities testing creates, how it can help your SEO, and ways to mitigate cultural problems.
Controlled Experiments
The merits of any change should be derived from the results of the change under a controlled test. This process is common in PPC, however many SEO’s will no doubt wonder how such an approach will affect their SEO.
Well, Google encourages it.
We’ve gotten several questions recently about whether website testing—such as A/B or multivariate testing—affects a site’s performance in search results. We’re glad you’re asking, because we’re glad you’re testing! A/B and multivariate testing are great ways of making sure that what you’re offering really appeals to your users
Post-panda, being more relevant to visitors, not just machines, is important. User engagement is more important. If you don’t closely align your site with user expectations and optimize for engagement, then it will likely suffer.
The new SEO, at least as far as Panda is concerned, is about pushing your best quality stuff and the complete removal of low-quality or overhead pages from the indexes. Which means it’s not as easy anymore to compete by simply producing pages at scale, unless they’re created with quality in mind. Which means for some sites, SEO just got a whole lot harder.
Experiments can help us achieve greater relevance.
If It ‘Aint Broke, Fix It
One reason for resisting experiment-driven decisions is to not mess with success. However, I’m sure we all suspect most pages and processes can be made better.
If we implement data-driven experiments, we’re more likely to spot the winners and losers in the first place. What pages lead to the most sales? Why? What keywords are leading to the best outcomes? We identify these pages, and we nurture them. Perhaps you already experiment in some areas on your site, but what would happen if you treated most aspects of your site as controlled experiments?
We also need to cut losers.
If pages aren’t getting much engagement, we need to identify them, improve them, or cut them. The Panda update was about levels of engagement, and too many poorly performing pages will drag your site down. Run with the winners, cut the losers, and have a methodology in place that enables you to spot them, optimize them, and cut them if they aren’t performing.
Testing Methodology For Marketers
Tests are based on the same principles used to conduct scientific experiments. The process involves data gathering, designing experiments, running experiments, analyzing the results, and making changes.
1. Set A Goal
A goal should be simple i.e. “increase the signup rate of the newsletter”.
We could fail in this goal (decreased signups), succeed (increased signups), or stay the same. The goal should also deliver genuine business value.
There can be often multiple goals. For example, “increase email signups AND Facebook likes OR ensure signups don’t decrease by more than 5%”. However, if you can get it down to one goal, you’ll make life easier, especially when starting out. You can always break down multiple goals into separate experiments.
2. Create A Hypothesis
What do you suspect will happen as a result of your test? i.e. “if we strip all other distractions from the email sign up page, then sign-ups will increase”.
The hypothesis can be stated as an improvement, or preventing a negative, or finding something that is wrong. Mostly, we’re concerned with improving things - extracting more positive performance out of the same pages, or set of pages.
“Will the new video on the email sign-up page result in more email signups?” Only one way to find out. And once you have found out, you can run with it or replace it safe in the knowledge it's not just someone's opinion. The question will move from “just how cool is this video!” (subjective) to “does this video result in more email sign-ups?”. A strategy based on experiments eliminates most subjective questions, or shifts them to areas that don’t really affect the business case.
The video sales page significantly increased the number of visitors who clicked to the price/guarantee page by 46.15%....Video converts! It did so when mentioned in a “call to action” (a 14.18% increase) and also when used to sell (35% and 46.15% increases in two different tests)
When crafting a hypothesis, you should keep business value clearly in mind. If the hypothesis suggests a change that doesn’t add real value, then testing it is likely a waste of time and money. It creates an opportunity cost for other tests that do matter.
When selecting areas to test, you should start by looking at the areas which matter most to the business, and the majority of users. For example, an e-commerce site would likely focus on product search, product descriptions, and the shopping cart. The About Page - not so much.
Order areas to test in terms of importance and go for the low hanging fruit first. If you can demonstrate significant gains early on, then it will boost your confidence and validate your approach. As experimental testing becomes part of your process, you can move on more granular testing. Ideally, you want to end up with a culture whereby most site changes have some sort of test associated with them, even if it’s just to compare performance against the previous version.
Look through your stats to find pages or paths with high abandonment rates or high bounce rates. If these pages are important in terms of business value, then prioritize these for testing. It’s important to order these pages in terms of business value, because high abandonment rates or bounce rates on pages that don’t deliver value isn’t a significant issue. It’s probably more a case of “should these pages exist at all”?
3. Run An A/B or Multivariate Test
Two of the most common testing methodologies in direct response marketing are A/B testing and multivariate testing.
A/B Testing, otherwise known as split testing, is when you compare one version of a page against another. You collect data how each page performs, relative to the other.
Version A is typically the current, or favored version of a page, whilst page B differs slightly, and is used as a test against page A. Any aspect of the page can be tested, from headline, to copy, to images, to color, all with the aim of improving a desired outcome. The data regarding performance of each page is tested, the winner is adopted, and the loser rejected.
Multivariate testing is more complicated. Multivariate testing is when more than one element is tested at any one time. It’s like performing multiple A/B tests on the same page, at the same time. Multivariate testing can test the effectiveness of many different combinations of elements.
Which method should you use?
In most cases, in my experience, A/B testing is sufficient, but it depends. In the interest of time, value and sanity, it’s more important and productive to select the right things to test i.e. the changes that lead to the most business value.
As your test culture develops, you can go more and more granular. The slightly different shade of blue might be important to Google, but it’s probably not that important to sites with less traffic. But, keep in mind, assumptions should be tested ;) Your mileage may vary.
There are various tools available to help you run these test. I have no association with any of these, but here’s a few to check out:
4. Ensure Statistical Significance
Tests need to show statistical significance. What does statistically significant mean?
For those who are comfortable with statistics:
Statistical significance is used to refer to two separate notions: the p-value, the probability that observations as extreme as the data would occur by chance in a given single null hypothesis; or the Type I error rate α (false positive rate) of a statistical hypothesis test, the probability of incorrectly rejecting a given null hypothesis in favor of a second alternative hypothesis
For those of you, like me, who prefer a more straightforward explanation. Here’s also a good explanation in relation to PPC, and a video explaining statistical significance in reference in A/B test.
In short, you need enough visitors taking an action to decide it is not likely to have occurred randomly, but is most likely attributable to a specific cause i.e. the change you made.
5. Run With The Winners
Run with the winners, cut the losers, rinse and repeat. Keep in mind that you may need to retest at different times, as the audience can change, or their motivations change, depending on underlying changes in your industry. Testing, like great SEO, is best seen as an ongoing process.
Make the most of every visitor who arrives on your site, because they’re only ever going to get more expensive.
Here’s an interesting seminar where the results of hundreds of experiments were reduced down to three fundamental lessons:
- a) How can I increase specify? Use quantifiable, specific information as it relates to the value proposition
- b) How can I increase continuity? Always carry across the key message using repetition
- c) How can I increase relevance? Use metrics to ask “why”
Tests Fail
Often, tests will fail.
Changing content can sometimes make little, if any, difference. Other times, the difference will be significant. But even when tests fail to show a difference, it still gives you information you can use. These might be areas in which designers, and other vested interests, can stretch their wings, and you know that it won’t necessarily affect business value in terms of conversion.
Sometimes, the test itself wasn't designed well. It might not have been given enough time to run. It might not have been linked to a business case. Tests tend to get better as we gain more experience, but having a process in place is the important thing.
You might also find that your existing page works just great and doesn’t need changing. Again, it’s good to know. You can then try replicating this successes in areas where the site isn’t performing so well.
Enjoy Failing
“Fail fast, early and fail often”.
Failure and mistakes are inevitable. Knowing this, we put mechanisms in place to spot failures and mistakes early, rather than later. Structured failure is a badge of honor!
Thomas Edison performed 9,000 experiments before coming up with a successful version of the light bulb. Students of entrepreneurship talk about the J-curve of returns: the failures come early and often and the successes take time. America has proved to be more entrepreneurial than Europe in large part because it has embraced a culture of “failing forward” as a common tech-industry phrase puts it: in Germany bankruptcy can end your business career whereas in Silicon Valley it is almost a badge of honour
Silicon Valley even comes up with euphemisms, like “pivot”, which weaves failure into the fabric of success.
Or perhaps it’s because some of the best ideas in tech today have come from those that weren’t so good. (Remember, Apple's first tablet devices was called the Newton.)
There’s a word used to describe this get-over-it mentality that I heard over and over on my trip through Silicon Valley and San Francisco this week: “Pivot“
Experimentation, and measuring results, will highlight failure. This can be a hard thing to take, and especially hard to take when our beloved, pet theories turn out to be more myth than reality. In this respect, testing can seem harsh and unkind. But failure should be seen for what it is - one step in a process leading towards success. It’s about trying stuff out in the knowledge some of it isn’t going to work, and some of it will, but we can’t be expected to know which until we try it.
In The Lean Startup, Eric Ries talks about the benefits of using lean methodologies to take a product from not-so-good to great, using systematic testing”
If your first product sucks, at least not too many people will know about it. But that is the best time to make mistakes, as long as you learn from them to make the product better. “It is inevitable that the first product is going to be bad in some ways,” he says. The Lean Startup methodology is a way to systematically test a company’s product ideas.
Fail early and fail often. “Our goal is to learn as quickly as possible,” he says
Given testing can be incremental, we don’t have to fail big. Swapping one graphic position for another could barely be considered a failure, and that’s what a testing process is about. It’s incremental, and iterative, and one failure or success doesn’t matter much, so long as it’s all heading in the direction of achieving a business goal.
It’s about turning the dogs into winners, and making the winners even bigger winners.
Feel Vs Experimentation
Web publishing decisions are often based on intuition, historical precedence - “we’ve always done it this way” - or by copying the competition. Graphic designers know about colour psychology, typography and layout. There is plenty of room for conflict.
Douglas Bowden, a graphic designer at Google, left Google because he felt the company relied too much on data-driven decisions, and not enough on the opinions of designers:
Yes, it’s true that a team at Google couldn’t decide between two blues, so they’retesting 41 shades between each blue to see which one performs better. I had a recent debate over whether a border should be 3, 4 or 5 pixels wide, and was asked to prove my case. I can’t operate in an environment like that. I’ve grown tired of debating such minuscule design decisions. There are more exciting design problems in this world to tackle.
That probably doesn’t come as a surprise to any Google watchers. Google is driven by engineers. In Google’s defense, they have such a massive user base that minor changes can have significant impact, so their approach is understandable.
Integrate Design
Putting emotion, and habit, aside is not easy.
However, experimentation doesn’t need to exclude visual designers. Visual design is valuable. It helps visitors identify and remember brands. It can convey professionalism and status. It helps people make positive associations.
But being relevant is also design.
Adopting an experimentation methodology means designers can work on a number of different designs and get to see how the public really does react to their work. Design X converted better than design Y, layout Q works best for form design, buttons A, B and C work better than buttons J, K and L, and so on. It’s a further opportunity to validate creative ideas.
Cultural Shift
Part of getting experimentation right has to do with an organizations culture. Obviously, it’s much easier if everyone is working towards a common goal i.e. “all work, and all decisions made, should serve a business goal, as opposed to serving personal ego”.
All aspects of web publishing can be tested, although asking the right questions about what,to test is important. Some aspects may not make a measurable difference in terms of conversion. A logo, for example. A visual designer could focus on that page element, whilst the conversion process might rely heavily on the layout of the form. Both the conversion expert and the design expert get to win, yet not stamp on each others toes.
One of the great aspects of data-driven decision making is that common, long-held assumptions get challenged, often with surprising results. How long does it take to film a fight scene? The movie industry says 30 days.
Mark Walberg challenged that assumption and did it in three:
Experts go with what they know. And they’ll often insist something needs to take a long time. But when you don’t have tons of resources, you need to ask if there’s a simpler, judo way to get the impact you desire. Sometimes there’s a better way than the “best” way. I thought of this while watching “The Fighter” over the weekend. There’s a making of extra on the DVD where Mark Wahlberg, who starred in and produced the film, talks about how all the fight scenes were filmed with an actual HBO fight crew. He mentions that going this route allowed them to shoot these scenes in a fraction of the time it usually takes
How many aspects of your site are based on assumption? Could those assumptions be masking opportunities or failure?
Winning Experiments
Some experiments, if poorly designed, don’t lead to more business success. If an experiment isn’t focused on improving a business case, then it’s probably just wasted time. That time could have been better spent devising and running better experiments.
In Agile software design methodologies, the question is always asked “how does this change/feature provide value to the customer”. The underlying motive is “how does this change/feature provide value to the business”. This is a good way to prioritize test cases. Those that potentially provide the most value, such as landing page optimization on PPC campaigns, are likely to have a higher priority than, say, features available to forum users.
Further Reading
I hope this article has given you some food for thought and that you'll consider adopting some experiment-based processes to your mix. Here's some of the sources used in this article, and further reading:
Comments
Cheers! Your most recent blog post, authored by Peter Da Vanzo, entitled "Experiment Driven Web Publishing" surpasses the new reader's expectations of what an SEO blog post should be. Importantly, we learned to experiment with our blogs from your well written content. We were reminded to produce quality over quantity. In essence you have told us that the scientific method works here, just like it did in 9th grade physics. Come up with a hypothesis, and test it out, in some controlled way. Turn what might seem only subjective, to something that is objective. Again, cheers to you and your SeoBook team, Mr. Wall. http://youtu.be/BZCq1hFbFT8
Manual comment approval> Auto-approve in the SEO biz.
...comment spam, in part because we require account registration to post comments & in part because we are generally pretty quick at cleaning most of it up.
The above comments that were deleted were only live for a fraction of a day.
Have you looked at KayJay's actual comment? Not sure of his game plan with it. It feels* automated......but there are moments of non-automation and a weird video to go?
...it looked a bit weird to me too, but is somewhere in that plausible gray area. There were 3 or 4 other people (bots?) who posted overt link drops that I deleted already.
Add new comment