I've come to the conclusion that gpt and gemini and all the others are nothing but conversational search engines. They can give me ideas or point me in the right direction but so do regular search engines.
I like the conversation ability but, in the end, I cannot trust their results and still have to research further to decide for myself if their results are valid.
I’m a local LLM elite who stopped using chat mode whatsoever.
I just go into the notebook tab (with an empty textarea) and start writing about a topic I’m interested in, then hit generate. It’s not a conversation, just an article in a passive form. The “chat” is just a protocol of in a form of an article with a system prompt at the top and “AI: …\nUser: …\n” afterwards, all wrapped into a chat ui.
While the article is interesting, I just read it (it generates forever). When it goes sideways, I stop it and modify the text in a way that fits my needs, in a recent place or maybe earlier, and then hit generate again.
I find this mode superior to complaining to a bot, since wrong info/direction doesn’t spoil the content. Also you don’t have to wait or interrupt, it’s just a single coherent flow that you can edit when necessary. Sometimes I stop it at “it’s important to remember …” and replace it with a short disclaimer like “We talked about safety already. Anyway, back to <topic>” and hit generate.
Fundamentally, LLMs generate texts, not conversations. Conversations just happen to be texts. It’s something people forget / aren’t aware of behind these stupid chat interfaces.
Gemini. I asked it to remember my name. It said it'd remember. My next question was asking it what my name was. It responded that it can't connect to my workspace account. It did this twice.
I asked it what was in a picture. It was a blue stuffed animal. It described it as such. I asked it what kind of animal it thought it was supposed to be. It responded with "a clown fish because it has a black and white checkerboard pattern". It was an octopus (at least it got a sea creature?).
I asked it for directions to the closest gas station. It wanted to take me to one over a mile away when there was one across the street. I asked why it didn't suggest the one nearest to me. It responded with "I assumed proximity was the primary criteria" and then apologized for calling me names (it didn't).
One amusing way to put this is that LLMs energy requirements arent self-contained, since they use the energy of the human prompter to both prompt and verify the output.
Reminds me of a similar argument about correctly pricing renewable power: since it isnt always-on (etc.) it requires a variety of alternative systems to augment it which aren't priced in. Ie., converting entirely to renewables isnt possible at the advertised price.
In this sense, we cannot "convert entirely to LLMs" for our tasks, since there's still vast amounts of labour in prompt/verify/use/etc.
I can ask ChatGPT extremely specific programming questions and get working code solving it. This is not something I can do with a search engine.
Another thing a search engine cannot do that I use ChatGPT for on a daily basis is taking unstructured text and convert it into a specified JSON format.
From my perspective, it's not useful to dwell on the fact that LLMs are often confidently wrong, or didn't nail a particular niche or edge-case question the first time, and discount the entire model class. That's expecting too much. Of course LLMs constantly don't help solve a given problem. The same is true for any other problem-solving approach.
The useful comparison is between how one would try to solve a problem before versus after the availability of LLM-powered tools. And in my experience, these tools represent a very effective alternative approach to sifting through docs or googling manually quote-enclosed phrases with site:stackoverflow.com that improves my ability to solve problems I care about.
I wish someone could explain me Bing. If you search on Bing, the first result appears BELOW the ChatGPT auto-generated message, and this message takes 10 seconds to be "typed" out.
I can click the first result 1 billion times faster.
I do agree that I rarely use Google now, I search into a chat to have a summary and this saves lot of aggregation from different sites.
The same for Stack Overflow, no use if I find the answer quicker.
It’s exactly that for me, a conversational search engine. And the article explains it right, it’s just words organized in very specific ways to be able to retrieve them with statistical accuracy and the transformer is the cherry on top to make it coherent
Replace "gpt and gemini and all the others" with "people" and funny enough your statement is still perfectly accurate.
You have a rough mathematical approximation of what's already a famously unreliable system. Expecting complete accuracy instead of about-rightness from it seems mad to me. And there are tons of applications where that's fine, otherwise our civilization wouldn't be here today at all.
These anthropomorphizations are increasingly absurd. There's a difference between a human making a mistake, and an AI arbitrarily and completely confidently creating entirely new code APIs, legal cases, or whatever that have absolutely no basis in reality whatsoever, beyond being what it thinks would be an appropriate next token based on what you're searching for. These error modes are simply in no way, whatsoever, comparable.
And then you tell it such an API/case/etc doesn't exist. And it'll immediately acknowledge its mistake, and ensure it will work to avoid such in the future. And then literally the next sentence in the conversation it's back to inventing the same nonsense again. This is not like a human because even with the most idiotic human there's an at least general trend to move forward - LLMs are just coasting back on forth based on their preexisting training with absolutely zero ability to move forward until somebody gives them a training set to coast back and forth on, and repeat.
I mean I can definitely remember lots of cases for myself, in school especially, when I made the same mistake again repeatedly despite being corrected every time. I'm sure today's language models pale in comparison to your flawless genius, but you seriously underestimate the average person's idiocy.
Agreed that the lack of some mid tier memory is definitely a huge problem, and the current solutions that try to address that are very lacking. I highly doubt we won't find one in the coming years though.
It's not just this. LLMs can do nothing but predict the next token based on their training and current context window. You can try to do things like add 'fact databases' or whatever to stop them from saying so many absurd things, but the fact remains that the comparisons to human intelligence/learning remain completely inappropriate.
I think the most interesting thought experiment is to imagine an LLM trained on state of the art knowledge and technology at the dawn of humanity. Language didn't yet exist, slash 'em with the sharp part was cutting edge tech, and there was no entirely clear path forward. Yet we somehow went from that to putting a man on the Moon in what was basically a blink of the eye.
Yet the LLM? It's going to be stuck there basically unable to do anything, forever, until somebody gives it some new tokens to let it mix and match. Even if you tokenize the world to give it some sort of senses, it's going to be the exact same. Because no matter how much it tries to mix and match those tokens it's not going to be able to e.g. discover gravity.
It's the same reason why there are almost undoubtedly endless revolutionary and existence-altering discoveries ahead of us. Yet LLMs trained on essentially the entire written corpus of human knowledge? All they can do is provide basic mixing and matching of everything we already know, leaving it essentially frozen in time. Like we are as well currently, but we will break out. While the LLM will only move forward once we tell it what the next set of tokens to mix and match are.
Sure, it happens. How often it happens really depends on so many factors though.
For example, I have this setup where a model has some actions defined in its system prompt that it can output when appropriate to trigger actions, and the interesting bit is that initially I was using openhermes-mistral which is famous for its extreme attention to the system prompt, and it almost never made any mistakes when calling the definitions. Later I swapped it with llama-3 which is way smarter, but isn't tuned to be nearly as attentive and far more often likes to make up alternatives and don't get fuzzy matched properly. Someone anthropomorphizing it might say it lacks discipline.
I've come to the conclusion that gpt and gemini and all the others are nothing but conversational search engines. They can give me ideas or point me in the right direction but so do regular search engines.
I like the conversation ability but, in the end, I cannot trust their results and still have to research further to decide for myself if their results are valid.