The disconnect between benchmarks and practical use is interesting. I guess it's...

The disconnect between benchmarks and practical use is interesting. I guess it's hard to build accurate benchmarks.

I find gpt-4 and gpt-3 a world apart. Perhaps it's just that it crossed a tipping point that made it useful for _me_ and the next 10x investment in training wouldn't achieve the same effect for me (but perhaps for somebody else?)