" they (developers) tried a glossy gradient on a button, and they noticed the traffic went down that day, or conversions sank. so they assume it happens on a grand scale."
This is really not how A/B testing works. There is this small branch of mathematics called statistics which helps us avoid such "one off" errors and is central to A/B testing.
Now A/B testing can (and should) be subject to scrutiny. But know whereof ye speak.
EDIT: The parent entry was deleted so this may not make sense any more.