"Moderate evidence" is still evidence. TDD forces your code to be tightly focuse...

"Moderate evidence" is still evidence.

TDD forces your code to be tightly focused. It's very hard to write a test for reams of functionality before you write that functionality, so your code is automatically tight (and as a result, easier to refactor, maintain, understand, etc). I don't see how this is so hard to see or why you need empirical evidence for that part at least. A lot of what "value" is is quite subjective, even in programming. You know "good code" when you see it. Why don't you try TDD and form your own opinion?