> I'm not sure why they didn't go larger -- having a more efficient model that fits on hardware the size of the RPi could be a gamechanger (IIRC TinyLlama 7B does run, barely).
I'm not sure that RPi is the right target for the next step of local LLMs, and I think that it's worth considering web-deployment on engines like WebLLM [1].
A 7B model may "run fine" on a Raspberry Pi, but I've (personally) found 7B models to be a bit larger than I want to download / run for web-based interfaces.
However, a solid 125M model is the sort of thing that I can run on a webpage, and the time it takes to download to the local user's browser (combined with my bandwidth costs) aren't exorbitant.
I'm not sure that RPi is the right target for the next step of local LLMs, and I think that it's worth considering web-deployment on engines like WebLLM [1].
A 7B model may "run fine" on a Raspberry Pi, but I've (personally) found 7B models to be a bit larger than I want to download / run for web-based interfaces.
However, a solid 125M model is the sort of thing that I can run on a webpage, and the time it takes to download to the local user's browser (combined with my bandwidth costs) aren't exorbitant.
[1] https://github.com/mlc-ai/web-llm