Hacker News new | past | comments | ask | show | jobs | submit login

Maybe the key to a good universal LLM is having multiple fine tuned models for various domains. The user thinks he's querying a single model but really there's some mechanism that selecting the best model for his query out of say like 300 different possibilities.

This also helps distributes traffic as a side effect.

I guess the problem is how the conversation would flow. If the user changes topics from say art to quantum physics then asks a question about quantum physics and art then I'm not sure what the algorithm should do.




That is actually the same idea as the (now) popular "Mixture of Experts" approach.


> This also helps distributes traffic as a side effect.

I'm not sure it's "distributing" traffic so much as amplifying it.


Two users. One user is talking about physics, the other about art. Two different models are utilized.

Load is divided across 2 models. Load balancing is a feature for free and division is across subjects. Of course this is assuming each model owns it's own set of gpus.


Ah okay, I misunderstood. I thought you meant inferencing on all the different models and letting them judge to what extent they're the best fit.

What you're suggesting is just simply intent classification and using a specific model per intent. That's what everyone did _before_ LLMs.


Given the existence of specialized LLMs people can still do this.


You could use the same GPUs to run two instances of one model.


The user could talk to an "expert opinion aggregator" model which in turn makes a bunch of queries to specialized models.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: