I think the llm utility[0] (the one from Simon, not Google) is probably the best quickstart experience you can find. Gives the option to connect to services via API or install/run local models.
As simple as
pip install llm
# add the local plugin
llm install llm-gpt4all
# Download and run a prompt against the Orca Mini 7B model
llm -m orca-mini-3b-gguf2-q4_0 'What is the capital of France?'
Alternatively, you could use the llamafile[1] which is a tiny binary runner which gets packaged ontop of the multigigabyte models. Download the llamafile and you can launch it through your terminal or a web browser.
From the llamafile page, after you download the file, you can just launch it as
./mistral-7b-instruct-v0.2.Q5_K_M.llamafile -ngl 9999 --temp 0.7 -p '[INST]Write a story about llamas[/INST]'
As simple as
Alternatively, you could use the llamafile[1] which is a tiny binary runner which gets packaged ontop of the multigigabyte models. Download the llamafile and you can launch it through your terminal or a web browser.From the llamafile page, after you download the file, you can just launch it as
[0] https://llm.datasette.io/en/stable/index.html[1] https://github.com/Mozilla-Ocho/llamafile
Edit: added llm quickstart from the intro page