"You can't just say "Amazon is ugly and it works.""
And I'm not. I'm saying "Amazon is ugly and it works. We have teams of Web-2.0 guys doing redesigns on every facet of the site day-in-day-out, testing with live customers constantly, and this is still currently the best design - and we are still iterating."
A bit of a mouthful, though.
"I'm sure they could measurably prove that customer satisfaction on the web site and perceived brand value would both increase."
That's just it though. Where I work, we have proven numerically that this is a false assumption. "I'm sure they could prove" is a far cry from "We have proven". This is a problem that is prevalent throughout the UX community I think - a dogmatic worship of several principles without ever sanity-testing your assumptions with large-scale metrics, instead focusing on ephemeral and unreliable things like anecdotal user stories.
"It looks like it works better" and "it works better" are entirely two different beasts. One thing you can say off-hand, the other requires backup.
For what it's worth, I was on your side at one time. I hated "dirty" 90s-looking websites like eBay, Amazon, et al, and I loved the new-age Web-2.0-y stuff.
Then I got this job and got a sneak peek into what the user data actually says. Some things defy common logic - or at the very least, user experience design common logic.
(I had to look at your profile to see if you worked where I do. I don't think you do, but it's hard to say.)
It's absolutely amazing what wins in A/B testing sometimes. Like the parent poster, we do split-run testing continuously and consistently. I often have my own pre-conceived notions of what will win or lose big, and am often proven wrong by the people that matter: the ones out on the interwebs buying our products.
Side, but related, note: watching customers interact with your site in a facilitated session is also very informative (bordering on mind-blowing sometimes). In our new building, we built a specific lab for this, but we used to and you could easily do it via closed-circuit TV. We've found that having one facilitator in the room with the customer (past customer or in our market but not familiar with the site) and the rest of the observers out of sight (but disclosed to the participant beforehand) works the best.
(Many) People think that computers are magic devices, following no discernable rules or patterns. We've had users try to drag this "thing" over there for 3+ minutes (an eternity when you are watching them struggle) and when it finally works they are giddy with a sense of accomplishment and report "I don't know why it finally worked; it's magic!" but they aren't pissed off in any way. (Obviously, we work to improve this experience, but my point is: what you, as a competent accomplished computer user, expect, prefer, want or will tolerate doesn't trump what Joe Main Street wants/expects if he's in your target market more than the readers of HN.)
Take a spin around our website and you may very well see 10 design WTFs, 7 of which likely have statistically significant test results backing them, 2 of which are in test right now, and 1 of which we don't know about or aren't yet testing.
Not as scientific as A/B testing actual conversions, but here's what some people think of the 2 designs. Still collecting answers, but so far a slight preference for the original Zappos site.
Skimming through user comments I see a lot of people pointing out that the original design showed more shoes than the redesign. This is a key point that the redesign author missed completely - Zappos sells shoes, you can't make a shoe-selling website and then spend more time talking about company culture than shoes.
Also interesting are users who point out that the original site has less "ads". People don't see testimonials and "how Zappos rocks" as insightful, they see them as obtrusive advertisements.
A/B Testing works fantastically for small, iterative, changes. It is not the correct tool for assessing sweeping design changes, however. Some sort of a focus group or large survey probably makes more sense for tracking something this big (which may cause substantial lateral shifts in user behaviour).
And I'm not. I'm saying "Amazon is ugly and it works. We have teams of Web-2.0 guys doing redesigns on every facet of the site day-in-day-out, testing with live customers constantly, and this is still currently the best design - and we are still iterating."
A bit of a mouthful, though.
"I'm sure they could measurably prove that customer satisfaction on the web site and perceived brand value would both increase."
That's just it though. Where I work, we have proven numerically that this is a false assumption. "I'm sure they could prove" is a far cry from "We have proven". This is a problem that is prevalent throughout the UX community I think - a dogmatic worship of several principles without ever sanity-testing your assumptions with large-scale metrics, instead focusing on ephemeral and unreliable things like anecdotal user stories.
"It looks like it works better" and "it works better" are entirely two different beasts. One thing you can say off-hand, the other requires backup.
For what it's worth, I was on your side at one time. I hated "dirty" 90s-looking websites like eBay, Amazon, et al, and I loved the new-age Web-2.0-y stuff.
Then I got this job and got a sneak peek into what the user data actually says. Some things defy common logic - or at the very least, user experience design common logic.