The bandwidth would end up costing way more than the compute would.

BulgarianIdiot · on April 6, 2023

Depends how you do it. LLM are huge, but their input and output is miniscule bits of text. If you find a way to put "narrow paths" in the hidden layers, basically to subdivide a model into smaller interconnected models, then the bandwidth will be similarly massively reduced.

This is not without precedent, look up how your brain hemispheres and the regions within are connected.