If you have sufficiently good hardware, the 34B code llama model [1] (hint: pick the quantised model you can use based on “Max RAM required”, eg. q5/q6) running on llama.cpp [2], can answer many generic python and flask related questions, but it’s not quite good enough to generate entire code blocks for you like gpt4.
It’s probably as good as you can get at the moment though; and hey, trying it out costs you nothing but the time it takes to download llama.cpp and run “make” and then point it at the q6 model file.
So if it’s no good, you’ve probably wasted nothing more than like 30 min giving it a try.
It’s probably as good as you can get at the moment though; and hey, trying it out costs you nothing but the time it takes to download llama.cpp and run “make” and then point it at the q6 model file.
So if it’s no good, you’ve probably wasted nothing more than like 30 min giving it a try.
[1] - https://huggingface.co/TheBloke/CodeLlama-34B-Instruct-GGUF [2] - https://github.com/ggerganov/llama.cpp