It would still result in basically anecdotes. Maybe team A picked a library that made life difficult for themselves or misinterpreted one of the requirements? Small things completely throw the results off.
Funny you mention that. The only "methodology" I feel I have enough anecdata to comfortably vouch for is YAGNI. In my experience, whichever team fucks themselves with bad library choices is going to lose this experiment 100% of the time, agile be damned.
The first team does A in "Agile" and B in "Waterfall"
The second one does A in "Waterfall" and B in "Agile"
I bet working like this would pull out at least some interesting stuff