If you try to use LLMs as a Google replacement you're going to run into problems pretty quick.
LLMs are better thought of as "calculators for words" - retrieval of facts is a by-product of how they are trained, but it's not their core competence at all.
LLaMA at 4bit on my laptop is around 3.9GB. There's no way you could compress all of human knowledge into less than 4GB of space. Even ChatGPT / GPT-4, though much bigger, couldn't possible contain all of the information that you might want them to contain.
But... it turns out you don't actually need a single LLM that contains all knowledge. What's much more interesting is a smaller LLM that has the ability to run tools - such as executing searches against larger indexes of data. That's what Bing and Google Bard do already, and it's a pattern we can implement ourselves pretty easily: https://til.simonwillison.net/llms/python-react-pattern
The thing that excites me is the idea of having a 4GB (or 8GB or 16GB even) model on my own computer that has enough capabilities that it can operate as a personal agent, running searches, executing calculations and generally doing really useful stuff despite not containing a great deal of detailed knowledge about the world at all.
depending on the max tokens, I think you can pretty easily fine-tune a model to return answers with actions required then wrap your prompt app to react to those, paste answers and "reask/reprompt" the same question...
similar stuff is being research under "langchains" term
If you try to use LLMs as a Google replacement you're going to run into problems pretty quick.
LLMs are better thought of as "calculators for words" - retrieval of facts is a by-product of how they are trained, but it's not their core competence at all.
LLaMA at 4bit on my laptop is around 3.9GB. There's no way you could compress all of human knowledge into less than 4GB of space. Even ChatGPT / GPT-4, though much bigger, couldn't possible contain all of the information that you might want them to contain.
https://www.newyorker.com/tech/annals-of-technology/chatgpt-... "ChatGPT Is a Blurry JPEG of the Web" is a neat way of thinking about that.
But... it turns out you don't actually need a single LLM that contains all knowledge. What's much more interesting is a smaller LLM that has the ability to run tools - such as executing searches against larger indexes of data. That's what Bing and Google Bard do already, and it's a pattern we can implement ourselves pretty easily: https://til.simonwillison.net/llms/python-react-pattern
The thing that excites me is the idea of having a 4GB (or 8GB or 16GB even) model on my own computer that has enough capabilities that it can operate as a personal agent, running searches, executing calculations and generally doing really useful stuff despite not containing a great deal of detailed knowledge about the world at all.