Hacker News new | past | comments | ask | show | jobs | submit login

What kind of use cases do these sub 10B param models serve? Are they mostly useful for code completion?



You can run them either for general purpose inference. You can also fine-tune them and get improved performance for specific use cases.

It's safe to assume they're worse at every task than larger models, so I wouldn't look at use cases in terms of what tasks they can do compared to larger models.

But what's good about them is they're smaller so they can run on smaller and cheaper hardware. So an example would be to fine-tune and then run on some sort of local user device rather than in the cloud. This might become more practical in the future as hardware improves.


Yeah, my point is moreso is are smaller models ever "smart" enough to perform useful tasks?

Perhaps for basic code completion and simple writing tasks?


say you had very vertical trained models, such that you had like 1000 separate LLMs trained on specialized data and then others LLMs trained on which LLM is most likely to have the data you need, sort of like the way Wikipedia is interlinked, or hierarchical, or essentially like a db index, over nested LLMs, performance would scale higher with many more highly focused models, at least that's my understanding of what possible here.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: