Maybe the key to a good universal LLM is having multiple fine tuned models for v...

0x008 · 2024-02-27T06:47:29 1709016449

That is actually the same idea as the (now) popular "Mixture of Experts" approach.

apetresc · 2024-02-27T14:27:48 1709044068

> This also helps distributes traffic as a side effect.

I'm not sure it's "distributing" traffic so much as amplifying it.

ninetyninenine · 2024-02-27T15:47:13 1709048833

Two users. One user is talking about physics, the other about art. Two different models are utilized.

Load is divided across 2 models. Load balancing is a feature for free and division is across subjects. Of course this is assuming each model owns it's own set of gpus.

apetresc · 2024-02-29T14:50:19 1709218219

Ah okay, I misunderstood. I thought you meant inferencing on all the different models and letting them judge to what extent they're the best fit.

What you're suggesting is just simply intent classification and using a specific model per intent. That's what everyone did _before_ LLMs.

ninetyninenine · 2024-03-02T16:39:04 1709397544

Given the existence of specialized LLMs people can still do this.

felurx · 2024-02-27T18:28:36 1709058516

You could use the same GPUs to run two instances of one model.

MountainMan1312 · 2024-02-27T06:41:29 1709016089

The user could talk to an "expert opinion aggregator" model which in turn makes a bunch of queries to specialized models.