> I'm not sure why they didn't go larger -- having a more efficient model that f...

> I'm not sure why they didn't go larger -- having a more efficient model that fits on hardware the size of the RPi could be a gamechanger (IIRC TinyLlama 7B does run, barely).

I'm not sure that RPi is the right target for the next step of local LLMs, and I think that it's worth considering web-deployment on engines like WebLLM [1].

A 7B model may "run fine" on a Raspberry Pi, but I've (personally) found 7B models to be a bit larger than I want to download / run for web-based interfaces.

However, a solid 125M model is the sort of thing that I can run on a webpage, and the time it takes to download to the local user's browser (combined with my bandwidth costs) aren't exorbitant.

[1] https://github.com/mlc-ai/web-llm