I think the llm utility[0] (the one from Simon, not Google) is probably the best...

I think the llm utility[0] (the one from Simon, not Google) is probably the best quickstart experience you can find. Gives the option to connect to services via API or install/run local models.

As simple as

  pip install llm
  # add the local plugin
  llm install llm-gpt4all
  # Download and run a prompt against the Orca Mini 7B model
  llm -m orca-mini-3b-gguf2-q4_0 'What is the capital of France?'

Alternatively, you could use the llamafile[1] which is a tiny binary runner which gets packaged ontop of the multigigabyte models. Download the llamafile and you can launch it through your terminal or a web browser.

From the llamafile page, after you download the file, you can just launch it as

  ./mistral-7b-instruct-v0.2.Q5_K_M.llamafile -ngl 9999 --temp 0.7 -p '[INST]Write a story about llamas[/INST]'

[0] https://llm.datasette.io/en/stable/index.html

[1] https://github.com/Mozilla-Ocho/llamafile

Edit: added llm quickstart from the intro page