I've been playing around with
https://github.com/imartinez/privateGPT and wanted to create a simple Python package that made it easier to run ChatGPT-like LLMs on your own machine, use them with non-public data, and integrate them into practical GPU-accelerated applications.
This resulted in Python package I call OnPrem.LLM.
In the documentation, there are examples for how to use it for information extraction, text generation, retrieval-augmented generation (i.e., chatting with documents on your computer), and text-to-code generation: https://amaiya.github.io/onprem/
Enjoy!
And... If you'd like a more hands on approach, here is a manual approach to get llama running locally
and a little script like this will get it running swimmingly Enjoy the next hours of digging through flags and the wonderful pit of time ahead of you.NOTE: I'm new at this stuff, feedback welcome.