Ever heard about Federated Learning? This is the way it goes. Also, I do run training with no matrix multiplication, just 3-bit weights, addition in log space, slight accuracy degradation, but much faster CPU only training.
Okay but I meant generating results, not training. If you're running Stable Diffusion, the weights are given, but it's not going to run on a random PC.