Wouldn't the B variant show higher session count? If your A/B testing tool doesn't detect imbalances in cohort size I would imagine you have a bigger problem, since it's easy to accidentally measure the A and B groups differently.
Important takeaway is they rewrote their website from scratch before even having the A/B testing. Probably one of the best arguments I've ever seen against writing bloated website code. We now know that software can hit a point where it's so slow, that it produces false metrics about its own slowness. Imagine how many people are still out there, who didn't write website B, and are thinking, oh my bounce rate is fine, I don't need to invest money in performance, since the numbers are telling me people don't care. Folks who trust numbers at face value, don't always know what they don't know.
> We explicitly only changed the infra which served our landing pages, and kept the content - the HTML/CSS/JS - identical. Once the new infra was shown to work, we would begin to experiment with the website itself.