lol good luck running a 13B model on a single GPU

ddren · on Feb 27, 2023

Seeing the performance of implementations like FlexGen [1], I don't think it would be entirely unreasonable to run a 13B model on a single GPU for personal usage purposes. You are not going to a run a public service out of it, but it probably would be good enough to run your own ChatGPT or Copilot locally.

[1]: https://github.com/FMInference/FlexGen

visarga · on Feb 27, 2023

You need a RTX 3090 24gb