Hacker News new | past | comments | ask | show | jobs | submit login

It’s fundamentally same technology with same limitations. You can throw a lot of money at it to try to fix short comings, but they’ll be just bandaids.

LLMs are extremely amazing autocomplete mechanisms. But that’s all they are at the core - there’s no intelligence involved.




They are transformers, that are just as good at transforming English into German as they are at turning "halp a bad man noked me dn an ran away, how do i su him" into a coherent legal suggestion (which, for this example, should be ~ "Call the police (and if necessary an ambulance), then get an actual lawyer").

When it comes to complex step-by-step reasoning, sure, they're stupid; when it comes to linguistic comprehension, are better at this than the average human — and GPT-4 beats the average law students taking at least one bar exam.


That's a pretty rich statement considering they perform pretty good at tests which we have designed to measure intelligence. I don't see how you can say there is no intelligence.


IMO the word intelligence doesn't seem a good description for the thing LLMs possess, they perform pretty well on some tests measuring it. But they're also sometimes wrong in ways intelligent beings (humans) never are. Like this[0] riddle about boats and farmers i stumbled over recently:

> A farmer stands on the side of a river with a sheep. There is a boat on the riverbank that has room for exactly one person and one sheep. How can the farmer get across with the sheep in the fewest number of trips?

It's obviously riffing on the classic wolf sheep lettuce riddle, but I don't think that's gonna fool any humans into answering anything but the obvious. ChatGPT-4o on the other hand thinks it'll take three trips.

They perform a good approximation of intelligence most of the time but the fact that their error pattern is so distinct from humans in some ways, suggests that we probably shouldn't attribute intelligence to them. At least in a human sense of the word.


I just tried it with chatgpt and got what seems like the right answer:

The farmer can get across the river with the sheep in one trip. Here's how:

1. The farmer and the sheep get into the boat. 2. They both cross the river together.

Since the boat can hold one person and one sheep, they can make the journey in a single trip. The fewest number of trips required is just one.


That's fair. I get the wrong answer on the gpt3 and gpt4-o models, but there's always some uncertainty involved in these gaps. When I appended "Consider your answer carefully." to my prompt, it answered as though the goal was for only the sheep to get across, and that the farmer had to get back to the original shore.


They are indeed — every test, even every informal measure, that I saw before about a decade ago —there's some AI which can do that now; but I think this is revealing that what used to be a distinction without a difference in humans is now suddenly very important indeed.

I used to see people getting criticised for being "book-smart" and lacking practical experience… but someone who was able to learn from books can quickly learn from real life, too.

AI need a lot of examples to learn from, and make up for this by being on hardware that beats biological neurones by the same degree to which marathon runners beat continental drift, so it can go through those examples much faster, leading to fantastic performance.

But the shape of that performance graph is very un-human — you never see a human that's approximately 80-90% accurate at every level of mathematics from basic algebra to helping Terence Tao: https://pandaily.com/mathematician-terence-tao-comments-on-c...


Every computing task that was previously the domain of humans, even simple arithmetic back when that was astonishingly rare for a machine, could be described as intelligence if you want to anthropomorphize it or as something less if you don't.


I'm simply applying "We use this test to measure intelligence in humans what does this AI do.". We have a priori here that it measures intelligence, before the AI existed. Now the AI scores high on this measurement. There is nothing else but t conclude the AI is intelligent.


You’re not understanding the tests. They were designed for humans. They were also designed to be taken once. LLMs have been trained on these tests numerous times. LLMs are also not humans. We can’t conceivably compare the two


Just saying these tests were designed for humans doesn't mean anything. You have to specify why exactly it doesn't work for an AI.

Or rather let me pose this question. What is the intellectual test you envisions that proves an AI is intelligent that any non-disabled human can easily pass. I'm willing to bet 500 USD it will pass that hurdle in the coming 10 years if you are willing to put your money where your mouth is.


To test intelligence by humans or AI, one needs a question where the answer hasn't been memorized (or answered by someone in its training set).

Indeed, you can see something like ChatGPT fall down by simply asking a modified form of a real IQ test question.

For example, ChatGPT answers a sample Stanford binet question "Counting from 1 to 100, how many 6s will you encounter?" correctly, but if you slightly modify it and ask how many 7s instead, it will only count 19.

Having written this out however, I've now invalidated the question since they use webcrawls to train.


Yes there is. I could conclude that the test wasn't actually measuring intelligence, but just one component that when summed with other components displays intelligence; That is, if the test was purported to alone measure intelligence, it was a flawed assumption.

We used to measure intelligence with IQ tests, those are now known to largely be bunk. What's to say our other intelligence tests aren't similarly flawed?


Huh? IQ tests are by far the best measure of intelligence we have. Where did you read it is bunk?


Well, they were trained on those tests. What LLMs literally do is map inputs to outputs. That’s why they can’t do much outside of training data.


I don't see evidence that humans do better here.


Most humans are aware that adding Elmer's glue to pizza cheese to "get the cheese to stick" is a humorous statement that would not work in reality, despite the fact that "glue" makes things "sticky." This should provide ample evidence that humans do better than LLMs here.


I used Claude 3 Sonnet against the cheese sliding off prompt and it gave sensible responses such as "let it cool, don't put so many toppings on it" and no hint of glue. Then again, I'm finding Anthropic to be a better steward of LLM hype than most other companies, which may be why I use Claude 3 in my company.


"Glue makes things sticky but it's evident it can't be put on a pizza" is the new "Chloroquine kills viruses but it's evident it can't be ingested". Still, someone died because he trusted a prominent idiot politician suggestion like he was an expert in medicine.

Some people will trust AI as much as they trust the strong man in power, no matter how obtuse that man is, and one day someone will eventually die or be seriously harmed because of a wrong advice by AI; Google should turn off that nonsense for good before someone is harmed.


Humans are trained that glue is not to be eaten. It's a meme even that young children eat glue. The example you give is exactly something Humans are trained for.


It's a meme because it's unusual, eating glue is a shorthand for having low intelligence because most humans don't eat glue. Beyond that, most humans understand that given the way glue works and given the way cheese works the premise of using glue to make cheese stick to a pizza doesn't even make sense, thus the statement is immediately understood as a pun, not a credible attempt at a recipe.

If humans were no better than LLMs in the way that this input were processed, there would be no meme, nor would there be a thread about how ridiculous Google's LLM is. Humans would simply accept as fact that glue can be added to pizza to make the cheese stickier, because the words make syntactic sense. Yet here we are.


I expect that humans would accept that glue is acceptable to add to pizza if we were not taught otherwise. Look at smoking, arguably worse then eating some types of glue, yet for a large part of human history this was normal and not even seen as unhealthy.

And yet, of course, now people think it's obvious that inhaling the burnt remains of some plant might not be so healthy.


>I expect that humans would accept that glue is acceptable to add to pizza if we were not taught otherwise.

That's doubtful, since pizza isn't improved upon with the addition of glue, and (again) because the premise that glue can make pizza cheese "stick" is absurd on its face. Humans don't simply add random ingredients to their food for no reason, or because no one taught them to do otherwise. There is process, aesthetic, culture and art behind the way food is designed and prepared. it needs to at least taste good. Glue covered pizza wouldn't taste good.

>Look at smoking, arguably worse then eating some types of glue, yet for a large part of human history this was normal and not even seen as unhealthy.

Again, the relative health benefits of glue or lack thereof is not the reason people don't use glue on pizza, nor is it why people consider the LLM's statement of a joke presented as fact to be absurd or exceptional.

>And yet, of course, now people think it's obvious that inhaling the burnt remains of some plant might not be so healthy.

And yet, there are also plenty of people who don't.

You just keep proving my point. There are layers of complexity and nuance to the human interpretation of all of this that simply don't exist with LLMs. The fact that we're here discussing it at all is evidence that a distinct difference exists between human cognition and LLMs.

I can see that you're deeply invested in the narrative that LLMs are functionally equivalent to humans, a lot of people seem to be. I don't know why. It isn't necessary, even with a maximalist stance on AI. But if you literally believe something as absurd as "humans would accept that glue is acceptable to add to pizza if we were not taught otherwise" and that, therefore, there is nothing wrong with an LLM presenting that as a fact, because humans and LLMs process information in exactly the same way, then I don't know what to tell you. You live in a completely different reality than I do, and I'm not going to waste any more of my time trying to explain color to the blind.


Interpretation has nothing to do with intelligence. Intelligence without context is often exactly what AI is.


The meme started as eating paste, back when that meant wheatpaste (made from flour and water). In that context it's less surprising that kids might try to eat it!

I wonder if there's been a bit of a conflation with the other meme, about sniffing glue, which has also lost much of its context considering that rubber cement and other similar types of glue which contain volatile solvents are also less widely used than they once were.


Sounds like a problem with the tests being administered. There is a lot of woo around LLMs, a lot of people have a vested interest in hyping and selling AI; heck, even referring to LLMs as AI is a form of branding.


So turn it around then. When is an AI intelligent? Since when it performs at human standard in intelligence tests doesn't seem enough for you.


https://arstechnica.com/science/2023/07/a-jargon-free-explan...

There's no intelligence involved in "artificial intelligence" as it currently stands. It's all marketing hype around a really fancy statistical completion engine. Intelligence would require thought and reasoning, which current "AI" does not do, no matter how convincingly it fakes it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: