Hacker News new | past | comments | ask | show | jobs | submit login

I'm pretty ignorant on which is the best self hosted LLM for such a task or how to fine-tune it. Do you know of any resources on how to set that up?

It seems like llama2 is the biggest name on HN when it comes to self hosting but I have no idea how it actually performs.




You could just try it out if you have the hardware at home.

Grab KoboldCPP and a GGML model from TheBloke that would fit your RAM/VRAM and try it.

Make sure you follow the prompt structure for the model that you will see on TheBloke's download page for the model (very important).

KoboldCPP: https://github.com/LostRuins/koboldcpp

TheBloke: https://huggingface.co/TheBloke

I would start with a 13b or 7b model quantized to 4-bits just to get the hang of it. Some generic or story telling model.

Just make sure you follow the prompt structure that the model card lists.

KoboldCPP is very easy to use. You just drag the model file onto the executable, wait till it loads and go to the web interface.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: