To me it's not even about performance (speed). It's just that the quality gap be...

To me it's not even about performance (speed). It's just that the quality gap between cloud LLM services and local LLMs is still quite large, and seems to be increasing. Local LLMs have gotten better in the past year, but cloud LLMs have even more so. This is partly because large companies can afford to keep throwing more compute at the problem, while quality at smaller scale deployments is not increasing at the same pace.

I have a couple of 3090s and have tested most of the popular local LLMs (Llama3, DeepSeek, Qwen, etc.) at the highest possible settings I can run them comfortably (~30B@q8, or ~70B@q4), and they can't keep up with something like Claude 3.5 Sonnet. So I find myself just using Sonnet most of the time, instead of fighting with hallucinated output. Sonnet still hallucinates and gets things wrong a lot, but not as often as local LLMs do.

Maybe if I had more hardware I could run larger models at higher quants, but frankly, I'm not sure it would make a difference. At the end of the day, I want these tools to be as helpful as possible and not waste my time, and local LLMs are just not there yet.