I spent a lot of money on Google in the RV industry (to the tune of $500,000/month for a certain fortune 500 company) & there are a lot of issues with this post:
First up, his stats across the board aren't even close. His impressions vary by a factor of 4 for the different ads. From that alone, I bet he forgot to turn of the "optimize my ads" setting from Google and switch on "rotate evenly".
The "optimize my ads" feature sounds like a good thing to leave checked, but like most things is designed to make Google more money. That setting basically allows Google to start optimizing whichever ad gets out in front first - which doesn't let you figure out which one is actually superior. So the data is tainted from the start.
Also,
> The .co domains have a performance penalty versus their .com brethren but the differences should be proportional.
Where do you get this from? In PPC I haven't seen any studies on this at all. I could understand .com leakage on .co domains for people typing it in directly out of habit, but Google doesn't penalize ads based on their TLD.
Not to take away from the strategy (it's a great idea & this is exactly how Tim Ferriss picked the name for his Four Hour Work Week back in 2007[1]), but there's a lot of flaws with this specific example.
What's it like being in a position to choose how $500k per month is spent? Wow. That's equivalent to about forty engineers' salaries per year.
EDIT: I visited your website and am not sure what to think. http://joelrunyon.com/ Is it true that those sorts of marketing tactics will convince people to entrust you with how to spend $500k/mo? (Personal branding is rather important. I was just a little surprised that those particular methods would be effective.)
I run a few businesses. My agency site [1] is where my marketing services are. I don't sell through the site - it's all word of mouth / client referrals.
I handled the account with I was still at a desk in agency-world (I've since ejected).
As for what is was like, when I stepped back to think about it, it was a little surreal. But, after a while, they're all just numbers, except with more digits. If you hit your main KPIs and keep things under control, it's fun to be able to scale things up with a client where you didn't have to necessarily worry about cash flow.
Not the parent, but I managed and eventually helped lead the paid search group at one of the top search agencies. We had clients with 7 figure budgets all the way up to 9 figure budgets (or essentially unlimited as long as we were within certain parameters).
It's a bit scary and heart attack inducing if something goes majorly wrong (like going dark for a day because someone forgot to check the flight dates in the account after you took it over from the previous agency). But like the OP said, it just becomes numbers.
The cool thing is you not only get much better support and escalation paths at Google, but you can test things at a very large scale and get answers VERY quickly. And little things that might not make a huge difference on smaller accounts, suddenly equate to tens or hundreds of thousands of dollars difference in total revenue. But again, all things are relative, so %'s might not change much.
Exactly - the fact that Google actually pays attention to you is a nice plus. If I had an issue with a campaign, I had a direct phone number for someone who could fix things.
That said, at that point, support had already started to go downhill & we found the "reps" usually knew very little / less than us about campaigns. Interestingly enough, we found that BING reps were much more helpful as they were eagerly trying to make up market share.
[EDIT]: You also get access to betas & other google "tests" they're running with adwords that most normal advertisers don't. Most of them are designed to make Google more money, but every once in a while they'd come up with some cool stuff that'd be really helpful.
Also, without diving into a little bit of how your campaigns were structured, the number of keywords & match types, it's tough to say that one domain will perform better long term, as your quality score on your ad level & landing page could change depending on the keyword/ad/landing page correlation.
The ads started on January 2 and took 6 days to hit 98% statistical confidence.
No! You have to pick the time first, wait that amount of time, and later analyze how much confidence you achieved. With random variations, you could have a run of any size of positive or negative results, and you could just wait until you get the result you want. Sorry but tons of people make this mistake with A/B tests.
So since Optimizely basically lets the test run until it "reaches statistical significance" with their Chance to Beat Baseline number, would you say they are doing it wrong?
I've often run tests that seem to be in the 90%+ chance to beat baseline, but the graphs just didn't look like they were finished or had enough data, so I'd let it run a bit longer. Sure enough, big changes in percent lift, and also big changes in confidence dropping down below 90% (sometimes MUCH lower), and occasionally coming back up.
Weird... I always thought that if you are running a split test in parallel (all at the same time), then you can figure out the number of samples needed to compare the branches with statistical confidence. I mean it makes sense to me. As the number of samples increases for each split test the distribution shifts from a binomial distribution to a gaussian distribution by the central limit theorem, and that happens around 1000 samples with a reasonable conversion rate. Then you're just comparing Gaussians, centered around the mean conversion value, with a width proportional to the number of samples. Taking the difference between two gaussians will give you the "chance to be different". Standard practice is to wait until one branch has a 95% chance to be better and then declare it the winner. This will test for false positives, which is usually what you are concerned about. False negatives don't matter that much when it comes to things like picking a name.
Thanks for the link to the blog post. It raised an important point worthy of inspection. I ran some numbers and "peeking" after the first 1000 trials does change the outcome. The chance that the outcome will reverse from declaring branch A the winner with 95% confidence to declaring branch B the winner with 95% confidence is rather small, less than 10%. However, if you lower your requirements to 80% confidence then the chance of the winner swapping increases to over 50%! For reference, I used the Wilson approximation for binomial distributions. I'm sure the Wald approximation fares worse.
Interesting, this is a weakness of significance testing, in particular, its parametric model. Using Bayesian inference you would be able to look early withou messing your results up.
Interesting. As a thought experiment, what if I had run a sample size estimator beforehand and concluded that I needed 4,000 data points and estimated it would take 6 days.
In this alternate reality, I began the experiment on January 2 and ended it on January 8. Would that render the invalid experiment valid?
The numbers were crystal clear though: bounce rate was the same for all the domains indicating that users were getting what they expected after clicking through.
Depending on your keywords you targeted in the initial campaign, you may run into some issues scaling out your campaigns into phrase + BMM match keywords as "rv recipes" & rv menu creation seem to be something people are looking for quite a bit.
Also, conversion rate is a better overall metric than bounce rate (harder to test that from the start though).
I'm going to restart the tests using three domains and let them run over an extended period so that I can see whether I am, in fact, getting users who're searching for meals vs rentals.
Why wouldn't you get the domain with the most impressions and then work to improve the click-thru rate? (other than Joel's great point about Google controlling impressions)
Because you have no reason to assume that the one with the most impressions is the top performing ads.
Because of the way impression share is distributed (and because we don't know what ad rotation setting OP had on), the top impression ad might only have the most impressions because Google decided to show it more rather than it actually being superior on any actual performance metric.
So, were these ads actually appearing evening across the same sites or search results, and in the same locations? If rvmenu.com appeared more frequently in one location than the others, that would skew the entire results.
I'm not convinced with the data you posted. The sample is small, and there's a lot of unknowns in the testing parameters. I wouldn't advise anyone to take this approach when selecting a domain.
Perhaps ... "rv menu" appears to perform well because those looking for food menus for when they are off in their RV are interested in just _looking_ at RVs? Did they mainly click the expensive ones? [That's what I'd do, like "ooh, fancy; I'd like one of those"]
Interest, indeed clicks, don't mean revenue will follow unless they were actual sales actions.
Now, it could also be that you can sell RV hire to people researching for an RV trip using "RV menu" ...
First up, his stats across the board aren't even close. His impressions vary by a factor of 4 for the different ads. From that alone, I bet he forgot to turn of the "optimize my ads" setting from Google and switch on "rotate evenly".
The "optimize my ads" feature sounds like a good thing to leave checked, but like most things is designed to make Google more money. That setting basically allows Google to start optimizing whichever ad gets out in front first - which doesn't let you figure out which one is actually superior. So the data is tainted from the start.
Also,
> The .co domains have a performance penalty versus their .com brethren but the differences should be proportional.
Where do you get this from? In PPC I haven't seen any studies on this at all. I could understand .com leakage on .co domains for people typing it in directly out of habit, but Google doesn't penalize ads based on their TLD.
Not to take away from the strategy (it's a great idea & this is exactly how Tim Ferriss picked the name for his Four Hour Work Week back in 2007[1]), but there's a lot of flaws with this specific example.
[1] http://boingboing.net/2010/10/25/howto-use-google-adw.html