My issue so far with the various code assistants isn't the quality necessarily, but the ability of them to draw in context from the rest of the code base without breaking the bank or proving so much info that the middle gets ignored. Are there any systems doing that well these days?
If I'm not mistaken, this is not on the models itself, but rather on the implementation of the addon.
I haven't found an open source VSCode or WebStorm addon yet that allows me to use a local model and implements code completion and commands as good as GitHub Copilot.
They either miss a chat feature and/or inline action / code completion and/or fill-in-the-middle models. And if they do, they don't provide the context as intelligently (? an assumption!) as GH's Copilot does.
One alternative I liked was Supermaven: It's really really fast and has a huge context window, so it knows almost your whole project. That was nice! But - one thing I ultimately didn't continue using it for: It doesn't support chat and/or inline commands (CTRL+I on VSCode's GH Copilot).
I feel like a really good Copilot alternative is definitely a still missing.
But: Regarding your question, I think GitHub Copilot's VSCode extension is the best - as of now. The WebStorm extension is sadly not as good, it misses the "inline command" function which IMHO is a must.
Continue.dev allows for this. You can even mix hosted Chat options like GPT-4 (via API) with local completion. I typically use a smaller model for faster text completion and a larger model (with a bigger context) for chat.
Seconding Supermaven here, from the guy that made Tabnine.
Supermaven has a 300k token context. It doesn't seem like it has a ton of intelligence -- maybe comparable to copilot, maybe a bit less -- but it's much better at picking up data structures and code patterns from your code, and usually what I want is help autocompleting that sort of thing rather than writing an algorithm for me (which LLMs often get wrong anyway).
You can also pair it with a gpt4 / opus chat in Cursor, so you can get your slower but more intelligent chat along with the simpler but very fast, high context autocomplete.
Yeah. This is what I imagined should be how these things work. But it is tricky. The system needs to pattern match on what types you've been using, if possible. So you need to vector search for code to do that. Then you need to vector search for the actual dependency source. It's not that simple, but would be the ultimate solution.