Would it be safe to say that A/B testing should only be used for things that can...

akiselev · on Sept 29, 2022

IMO it's only safe to A/B test things where the product vision doesn't matter one bit. Stuff like where to place the button on the screen to drive conversions on a landing page or what copy to use to target X market segment.

Edit: A better way to put that might be - only use A/B testing to make tactical decisions, but never use it to decide strategy or logistics.

sbierwagen · on Sept 30, 2022

What's the threshold of "affording to be wrong?"

A/B testing is a tool. It gives you data, not answers.

Would I make a color change off of 100 data points and a p value of 0.05? Well, multiply the risk of being wrong against the cost of being wrong. A 1 in 20 chance of being wrong isn't terrible when the cost of being wrong is low.

Would I make sweeping changes to my sales funnel at n=100 and p=0.05? Maybe not. n=10000 and p=0.001? Slam dunk yes. (And you should have probably halted the test earlier... imagine all the potential customers who got the "bad" leg and bounced?)

If results are currently at n=5000 and p=0.06001, and it's going to take 6 more months to hit n=10000, and you've only got 5 months of runway... Well then, give up on that test and do something else. Opportunity cost is a cost too!

At the end of the day, it's up to you to make an informed business decision using your best judgement. "Bounded rationality." All decisions are made with imperfect information and under time pressure. Like in TFA, if your chargeback rate spikes after a change, is it because the change was bad, or because you're expanding out of early adopters and into choosier (more annoying, more normal) customers? Or it's just a random Thursday and your sales volume is so low that every day is spiky and weird?

As @joelrunyon notes above, in the very earliest days, when your total sales number in the double digits, no A/B test in the world is going to be powered enough, short of "normal landing page/404 error". You just have to be good at your job, and lucky.

uoaei · on Sept 29, 2022

A/B tests are statistical tests. So you should pre-register an appropriate confidence interval and stick to that. The width of your confidence interval should be relevant to the risks you are taking.

travisjungroth · on Sept 29, 2022

Implicit in your question is that AB testing is used at the exclusion of other decision making tools. Definitely not the way to go.

The right starting point for thinking about AB testing is every experience is something that feels safe to put into production. That's what you're doing, but something about it being a "test" makes it not feel real to people. Maybe that threshold is low for some call to action copy, maybe it's much higher for something more critical. Use whatever tools you otherwise would to get to this point. Then you can AB test to see if it actually works.

The second part is that in theory AB testing doesn't specifically drive to local optimizations (you can test and iterate on big changes) but in practice it sure does. It also tends to encourage short term thinking, maybe comically so, because those things are faster to test. It's local optimization on the time axis. "Changing the expected ship date for custom work from 90 days to 30 days increased sales 4% with no increase in support tickets!" and they ran the test for 4 whole weeks, so how could it be wrong? Maybe someone should tell manufacturing about this test and how great it went...

geoduck14 · on Sept 29, 2022

>Would it be safe to say that A/B testing should only be used for things that can "afford to fail/be wrong"?

No. Definitely not. I worked at a company that lent money for car loans. We A/B tested the HELL out of our lending strategies. Testing, data collection, dashboards, monitoring was a HUGH part of our strategic advantage.

treis · on Sept 30, 2022

You've gotta A/B test the right thing. They A/B tested to increase sign ups but the outcome they really care about is getting paid. If they did the test right then the experiment would have failed due to the higher charge backs/non payments.