> even the smartest people make hundreds of tiny experiments
This is the most important point, and why DeepSeek’s cheaper training matters.
And if you check the R1 paper, they have a section for “things that didn’t work”, each of which would normally be a paper of its own but because their training was so cheap and streamlined they could try a bunch of things.
This is the most important point, and why DeepSeek’s cheaper training matters.
And if you check the R1 paper, they have a section for “things that didn’t work”, each of which would normally be a paper of its own but because their training was so cheap and streamlined they could try a bunch of things.