Has anyone put together a good and regularly updated decision tree for what mode...

leetharris · 2024-10-16T15:59:16.000000Z

People keep making these, but they become outdated so fast and nobody keeps up with it. If your definition of "great" changes in 6 months because a new model shatters your perception of "great," it's hard to rescore legacy models.

I'd say keeping up with the reddit LocalLLama community is the "easiest" way and it's by no means easy.

kergonath · 2024-10-17T10:14:41.000000Z

> I'd say keeping up with the reddit LocalLLama community is the "easiest" way and it's by no means easy.

The subreddit is… not great. It’s a decent way of keeping up, but don’t read the posts too much (and even then, there is a heavy social aspect, and the models that are discussed there are a very specific subset of what’s available). There is a lot of groupthink, the discussions are never rigorous. Most of the posts are along the lines of “I tested a benchmark and it is 0.5 points ahead of Llama-whatever on that one benchmark I made up, therefore it’s the dog’s and everything else is shite”. The Zuckerberg worshiping is also disconcerting. Returns diminish quickly as you spend more time on that subreddit.

potatoman22 · 2024-10-16T17:34:24.000000Z

Someone should use an LLM to continuously maintain this decision tree. The tree itself will decide which LLM is used for maintainence.

mark_l_watson · 2024-10-16T16:54:36.000000Z

I tend to choose a recent model available for Ollama, and usually stick with a general purpose local model for a month or so, then re-evaluate. Exceptions to sticking to one local model at a time might be needing a larger context size.

iamjackg · 2024-10-16T16:07:01.000000Z

This is definitely a problem. I mostly take a look at the various leaderboards, but there is a proliferation of fine-tuned models that makes it incredibly daunting to explore the model space. Add to that that often they're not immediately available on turn-key tools like ollama, and the friction increases even more. All this without even considering things like licenses, what kind of data has been used for fine tuning, quantization, merges, multimodal capabilities.

I would love a curated list.