AI isn’t good enough

ambrozk · on Aug 25, 2023

This entire piece is based on one massive, unsupported assertion, which is that LLM progress will cease. Or, as the author puts it, "we are at the tail end of the first wave of large language model-based AI... [it] ends somewhere in the next year or two with the kinds of limits people are running up against." I want to know only one thing, which is what gives him the confidence necessary to say that. If that one statement is untrue, the thesis of the piece completely fails, and I do not know how any one person alive today can be certain that that statement is true. Has Paul Kedrovsky or Eric Norlin spent $150M training a 2T parameter model that no one's heard about? Do they have access to classified OpenAI docs which state that GPT5 exists already, and it's no better than GPT4? Without this sort of information, "LLMs are not going to get smart enough for widespread practical use" is an unsubstantiated bet, and nothing more.

jug · on Aug 25, 2023

There has actually been research that found that there are strong diminishing returns in terms of at least expanding parameter sizes. While I think there are still breakthroughs to be made in terms of window sizes and workarounds like Mixture of Experts, I'm not sure how much farther we will get here in the long term in terms of raw performance of the LLM itself. FWIW, Sam Altman agrees and has a surprisingly similar quote as found here "I think we’re at the end of the era where it’s gonna be these giant models, and we’ll make them better in other ways": https://techcrunch.com/2023/04/14/sam-altman-size-of-llms-wo...

stuckinhell · on Aug 25, 2023

The open source community has breakthrough after breakthrough lately. its absolutely stunning how fast it's advancing.

I can run art models and llms on cpu/gpu now. I've tested out opensource models with quality better than chatgpt3.5turbo, and I can even fine tune them on my notes and books for better results. It's all so easy with so many one click installers now too!

My husbands D&D group uses some AI for their games now (koboldcpp?).

It's staggering how fast it's moving.

jprete · on Aug 25, 2023

> I can run art models and llms on cpu/gpu now. I've tested out opensource models with quality better than chatgpt3.5turbo, and I can even fine tune them on my notes and books for better results. It's all so easy with so many one click installers now too!

Do you have a good resource to find this stuff? I’m behind the times.

(Also, to be a pedant, most of that is inference and not training. But I can’t say much about fine tuning so I’m not really trying to argue against your point.)

stuckinhell · on Aug 27, 2023

Honestly no good single resource, I spend 8 hours a week just taking notes from hackernews,discord, hackernoon, slack ai groups, reddit, github issues (probably the most useful) on various open source ai libraries.

It's real work to keep up on everything.

FrustratedMonky · on Aug 25, 2023

How can research predict this wave is coming to an end, when research also didn't think this wave would happen either. It seems like there are always people saying 'it can't be done'. Then it happens. If there was a way to predict the future, then wouldn't that research need to know how something would be implemented, in order to know it can't be?

StackOverlord · on Aug 25, 2023

For the first time with GPT4, OpenAI as been able to predict model progress with accuracy:

> A large focus of the GPT-4 project has been building a deep learning stack that scales predictably. The primary reason is that, for very large training runs like GPT-4, it is not feasible to do extensive model-specific tuning. We developed infrastructure and optimization that have very predictable behavior across multiple scales. To verify this scalability, we accurately predicted in advance GPT-4’s final loss on our internal codebase (not part of the training set) by extrapolating from models trained using the same methodology but using 10,000x less compute:

> Now that we can accurately predict the metric we optimize during training (loss), we’re starting to develop methodology to predict more interpretable metrics. For example, we successfully predicted the pass rate on a subset of the HumanEval dataset, extrapolating from models with 1,000x less compute:

> We believe that accurately predicting future machine learning capabilities is an important part of safety that doesn’t get nearly enough attention relative to its potential impact (though we’ve been encouraged by efforts across several institutions). We are scaling up our efforts to develop methods that provide society with better guidance about what to expect from future systems, and we hope this becomes a common goal in the field.

Source: https://openai.com/research/gpt-4

sidewndr46 · on Aug 25, 2023

Isn't this all based off self-attestation? There is no comprehensive audit of their research data and finances I am aware of. If I was OpenAI and blew millions of dollars training models that showed exponentially worse performance for incrementally more resources expended training the model, my next step would not be to publish about it.

FrustratedMonky · on Aug 25, 2023

There was an entire Post/Discussion on HN a few weeks ago discussing whether or not this Post from OpenAI was for:

"just trying to temper investor enthusiasm"

"trying to downplay AI threats to calm down regulators"

etc....

But it is not some 'Proof', that LLM's have reached a limit.

It is a self reported note along the lines of : "nothing to see here, we're at our limit, it's all good, stop probing us".

wredue · on Aug 25, 2023

Blockchain was a complete bust, and middle managers need buzzwords to sell senior leadership.

That’s my personal take on the current wave.

earthboundkid · on Aug 25, 2023

I am a mild LLM skeptic. But I find the response of "oh, it's all just post-crypto scamming" really weird. Crypto was a total scam. There was never a concrete, non-criminal (important caveat) application where crypto was easier than just using PayPal or whatever. LLMs are very imperfect and still have a lot of work to do, but they do actually do some job related tasks today. If I have JS snippet and I wish it were in Go or vice versa, I can ask an LLM and get a good translation. If I have a document, I can ask the LLM if there's something I should add or simplify. Are these world changing capabilities? No, not yet. But they do exist and they are real and they are new. So yes, of course investors are interested in a new technology whose future limits we don't know yet.

jedbrown · on Aug 25, 2023

The LM industry valuation would be way smaller if they were not laundering behavior that would be illegal if a human did it. If "AI" were required to practice clean-room design (https://en.wikipedia.org/wiki/Clean_room_design) to avoid infringing copyright, we would laugh at the ineptitude. If people believed the FTC-CFPB-DOJ-EEOC joint statement was going to lead to successful prosecutions, the industry valuation would collapse. https://www.ftc.gov/system/files/ftc_gov/pdf/EEOC-CRT-FTC-CF...

unusualmonkey · on Aug 25, 2023

Humans aren't required to use clean room design. Using copyrighted materials as inspiration/reference is not uncommon or illegal.

jedbrown · on Aug 25, 2023

If you spend weeks drilling flash cards on copyrighted code, then produced pages of near-verbatim copies with copyright stripped, any court would find you to have violated the copyright. A lot of people right now are banking on "it's not illegal when AI does it", and part of that strategy is to make "AI" out to be something more than it is. That strategy has many parallels to cryptocurrency hyping.

unusualmonkey · on Aug 25, 2023

'near-verbatim' is the key.

Though it'd be hard to copyright any code small enough to fit on a flash card.

earthboundkid · on Aug 25, 2023

If it requires changing the law to make the existence of LLMs legal, they will change the law. It’s too interesting to just ban it out right.

IKantRead · on Aug 25, 2023

> Crypto was a total scam.

As someone who have been very anti-crypto for a long time, it wasn't always a complete scam.

The first wave of the crypto boom, before anyone that wasn't a programmer had even heard of it, there was a lot of real work being done that very much mirrors current AI work. Lot's of very sharp developers learning about block chain, figuring out how to implement things, experimenting with ideas. Back then everyone owned their own wallet and you would meet at coffee shops to exchange cash for BTC.

Most of the serious engineers that were really into crypto during the first crypto boom of 2012 left in disgust when the second boom came around.

Having worked in AI/ML for a long time, I myself can start to see how they felt. We do have some really cool technology in front of us, I think it has a lot of potential, but so many of the loudest voices in this space are entirely out of touch with what's possible, and far more interested in hype and making money than the underlying technology.

AnthonyMouse · on Aug 25, 2023

> There was never a concrete, non-criminal (important caveat) application where crypto was easier than just using PayPal or whatever.

That isn't true.

You can use it anywhere that irreversibility matters. Suppose you're going to commit significant resources to the customer's request, so you charge them, commit the resources, deliver the goods, and then discover that they gave you a stolen credit card and you get a chargeback. Cryptocurrency avoids that.

You can use it to accept payments from all over the world. Someone in Asia or Africa may not be able to open a US bank account or get a US credit card, but if they can find a Bitcoin ATM to put their local currency into, they can pay you, or vice versa.

It allows you to pay for something over the internet without giving your name. There are situations where this is important.

The main impediment to using it is, ironically, regulatory. The IRS decided that it's an investment and not money so every time you want to use it for what it's actually supposed to be for, they treat it like a securities transaction where you have to fill out paperwork, even if you're just buying a pack of gum. Which makes it much less convenient for ordinary people to use than cash or credit cards which don't require this -- presumably on purpose in order to destroy its utility in the US.

But it can still be useful for people in countries that don't do this, or in the US if a less explicitly antagonistic regulatory environment could be established.

earthboundkid · on Aug 25, 2023

Those applications are for sympathetic criminals. It’s still a crime even if the law is dumb or wrong.

AnthonyMouse · on Aug 25, 2023

Which of those things is a crime?

wredue · on Aug 25, 2023

I apologize, because HN does tend to automatically go to crypto when discussing blockchain.

I was not talking about crypto. Blockchain “solutions” in enterprise were spinning up all over the place for non-crypto applications. In particular, in you work financial, supply chain, government or random startups, you probably heard blockchain a lot in non-crypto contexts.

AareyBaba · on Aug 26, 2023

I would be interested in knowing of any non-crypto use case of decentralized blockchains that has gone mainstream and is financially viable.

FrustratedMonky · on Aug 25, 2023

There was incredible hype. That doesn't mean there was 'nothing to get hype about'. Just that the hype might have overshot, not that GPT isn't amazing.

It doesn't follow that when new tech arrives, the post hype everything goes back to status quo. Typically after the hype, the new tech just grows or gets absorbed a little more quietly, in un-foreseen ways, and does end up having a big impact. Just when the impact is stretched out a little, people stop noticing.

Like replacing drive through ordering with GPT like tools. Kind of under-radar, not fancy, not flashy. At some point you'll notice that the drive through you are talking too isn't a human, and go 'huh, that's interesting'. But, big impact on jobs, so nobody is hyping it.

People/Companies are already using AI to replace or augment Graphic Artists, Coding, etc... etc... That is happening, not just a power point from a middle-manager.

ActionHank · on Aug 25, 2023

I've seen this as well as the dying down of the hype once the new cycle dropped off on AI hype.

coryfklein · on Aug 25, 2023

"Research" is not some monolithic single concept. One might also ask, "How can research produce ChatGPT when for decades research failed to produce ChatGPT?"

FrustratedMonky · on Aug 25, 2023

Exactly. So why now are we trusting 'Research' that is trying to predict the future of other 'Research'. The linked article is just some estimates on the error built into the current LLM model.

How can we extrapolate that to be "well, gosh darn, these LLM's are already played out, guess we're all done"

brandall10 · on Aug 25, 2023

GPT-4 is already based on a MOE approach. Just noting as this was placed in the "breakthroughs to be made".

nine_k · on Aug 25, 2023

The current crop of LLMs is like Paleozoic megafauna, or like Egyptian Pyramids. It takes a few relatively simple approaches and stretches them wildly, using colossal computing resources.

Live systems in nature seem to solve similar problems with way less compute available. There should be better architectures.

Also, as somebody said, every exponential growth curve is a lower part of a sigmoid. LLMs will plateau at some level. That level may be impressively high though.

garblegarble · on Aug 25, 2023

>Live systems in nature seem to solve similar problems with way less compute available

Do they really? They're certainly more energy-efficient in business-as-usual mode, but a human brain has 86 billion neurons, 600+ trillion synapses(!), and each instance takes 15-20+ years to train to do complex logical tasks. Even if the per-cell work is tiny (and, is it? cells are amazingly complex), 86 billion (or 600+ trillion) times 20 years is a lot of computation.

pdimitar · on Aug 25, 2023

You are ignoring the fact that a toddler, once their musculature develops, is able to learn to walk after several tries.

Show me a humanoid robot that can do that.

M4v3R · on Aug 25, 2023

> You are ignoring the fact that a toddler, once their musculature develops, is able to learn to walk after several tries.

It's definitely not several tries. I've actually witnessed that very recently with my daughter. It takes a child literally several thousand attempts and falls (around a hundred falls per day) and 1,000 hours of dedicated practice, before they can make their first step without holding anything [1]. And a lot more until they can walk reliably.

[1] Source: Becoming You, S1E2, from 21:30

garblegarble · on Aug 25, 2023

At that point, the toddler's still run years of self-reinforced learning (as well as learning supported by parents, and observing other humans solving the problem) on controlling their muscles... and again, think of the computational power expended over that time (by a computational approach that's been highly-optimised over millions of years of trial-and-error recursive descent style). Our intellect and capabilities are the result of a truly vast amount of computation in the grand scheme of things.

Don't get me wrong, I'm not claiming that Machine Learning is as generally capable as animals/humans, and I don't entirely disagree with the OP (i.e. I don't know if our current approach has a chance of scaling to human-level capabilities). I just don't think computation-wise it compares that badly to animals, considering that it's the result of a few decades of work, largely on repurposed silicon

YeGoblynQueenne · on Aug 25, 2023

>> Our intellect and capabilities are the result of a truly vast amount of computation in the grand scheme of things.

This is an interesting paradox in machine learning: the data that people use to (ostensibly!) simulate the human mind is simultaneously way too much, and not nearly enough. It's way too much in that humans can, e.g., learn to generate and recognise natural language without having to train on the entire World Wide Web of a corpus. And it's not nearly enough because humans can do that only after billions of years of evolution, which amount to training on all the data in the real world, not a mere few petabytes of data on the internet [1].

This paradox has to be resolved (i.e. understood) before we can really compare humans minds to artificial systems. Unfortunately, I don't see anybody trying. In machine learning so far people are happy to just plug in the data and turn the crank- something will come out eventually.

______________

[1] Source: https://archive.li/EY9ak

cgio · on Aug 25, 2023

Human is not the most efficient learner for walking. See other animals, e.g. deer or elephant, that walk straight after being born. Relevant since the discussion is not about human learning but animal learning in nature.

nine_k · on Aug 25, 2023

It's not just learning abilities.

Human babies are born at a stage that would be considered premature for most mammals. Because of the limitations on the size of the head, they can't afford to develop enough in utero, like deer or even dolphins can.

jimbokun · on Aug 25, 2023

Isn't that just biped vs quadruped difference in skill required to walk?

notahacker · on Aug 25, 2023

Nope, puppies are walking within the first month of their life, a stage where human babies have barely any voluntary limb control and no apparent desire to move.

The puppies are benefitting from a different type of "learning" process of course, in having genetic instincts tuned by many generations of ancestors who had good enough genetic wiring of motor capabilities and instincts sufficiently well attuned to their environments to survive. Humans have weaker priors and reach physical and mental maturity much more slowly but ultimately achieve greater mastery of their environment by learning it from first principles or communication with other humans (there may be a tradeoff, and of course humans are so good at protecting their young over years of infancy the early instincts aren't that important to retain)

Drakim · on Aug 25, 2023

To be fair a toddler's brain isn't "offline" while it's musculature develops, they kick and move their limbs constantly, and then later crawl and and even stand, before they finally take the first walking steps. It's just objectively wrong to say a toddler goes from zero to walking in "several tries".

pdimitar · on Aug 25, 2023

Sure, not several. Not several trillions either though.

As mentioned in a sibling comment, HN anthropomorphizes AI too much. And is too optimistic about it. I just don't see the results and the value, people trip up all over themselves to congratulate themselves and the researches, yet 99.99% of the problems in the world persist.

I am one of these a-holes that wants to see results when money are invested. It still comes as a shock to some apparently.

flangola7 · on Aug 25, 2023

Blank checks are flying like it is the dot com era again. Money is most definitely being invested in it.

pixl97 · on Aug 25, 2023

I mean, if we're taking the random walk that nature took to get human level intelligence from the first multicellular organisms you'll have to wait 600 million years or so.

AI takes a shortcut around having a timeline of a trillion different versions history optimizing the learning algorithm and putting the successful ones in a timeline, but at a cost of trillions of operations to teach it.

TL;DR, the problem space is huge and evidently must be iterated.

pdimitar · on Aug 25, 2023

You are extremely generously assuming that AI evolution has a direction and a purpose/goal like the organic one has.

It doesn't. We are steering it. And I doubt that we know better than the natural evolution.

pixl97 · on Aug 26, 2023

Natural evolution has no direction itself, huge amounts of it are negative. It just happens those with positive traits get to hang around and breed some more.

pdimitar · on Aug 27, 2023

Well, you are in fact correct, yep.

burnished · on Aug 25, 2023

You're missing all of the subskills that are developed along the way. They don't just grow, braindead, and twitch a couple times until they get the hang of walking.

Its a joy to watch a child grow up, but also its super interesting watching them figure out the most basic shit. Would highly recommend if you get the opportunity.

pdimitar · on Aug 25, 2023

I would love it if AI is raised like that but so far it seems to be not the case. :|

YeGoblynQueenne · on Aug 25, 2023

Turing wrote about "the child machine" in his "Intelligent machinery" paper. See:

Bringing Up Turing’s ‘Child-Machine’ by Susan G. Sterrett

https://link.springer.com/chapter/10.1007/978-3-642-30870-3_...

sebzim4500 · on Aug 25, 2023

Not sure about humanoids but there are RL algorithms that can learn to walk from scratch in a quadruped in minutes.

pdimitar · on Aug 25, 2023

Well that now is a proper technical revolution. The rest seems to mostly justify swallowing 9-digit numbers from investors from where I am standing, without the beneficiaries being able to show much for it.

lynx23 · on Aug 25, 2023

A toddler capable of walking is still miles away from the topic at hand, a LLM capable of processing relatively complex text. It is well know that humans have a particularily long training period compared to other animals. So I dont understand why you are bringing up the walking-training. It seems quite unrelated.

pdimitar · on Aug 25, 2023

Because humans learn from much less samples sizes than "AI". The overly optimistic point of view on "AI" is over-represented in HN and that makes the discussion unbalanced and puts nearly everyone in it in a filter bubble.

That's not OK.

lynx23 · on Aug 30, 2023

Well, let me tell you a story. I have early beta access to GPT-4 with the ability to add pictures to the chat. This is rolled out as part of BeMyEyes, a volunteer-based app which lets volunteers have access to the camera for helping blind people. Now, BeMyAI lets me take a picture, send it to the AI, and have it describe the picture to me. Including OCR and text translation. And I can ask questions, interrogating the AI about the current picture. I can also add more pictures to the "chat", and the AI can see all of them, and even compare differences. I am 100% blind. After a few days of testing, I already know the AI is better then most humans I'd randomly ask to help me with pictures or scene descriptions. And it is more capable, as it can translate between every language I've encountered so far. If you ask me, the AI-will-kill-the-world crowd has a overly pessimistic view when it comes to AI, or they have been cought in the recent alarmism movement, or something. GPT-4 with picture integration is such a powerful feature, I'd say we haven't had anything equivant in terms of flexibility in the assistive technology market since, what, 30 years? This feature will liberate a few millions of blind people from the barriers put in fromt of them by society. And guess what, society didn't really solve them so far, OpenAI solved them as a side effect. But IT-activist still insist on seeing evil and demise everywhere.

pdimitar · on Aug 30, 2023

Well that's just super wholesome to read, thank you for sharing!

These and exactly these are the tech advancements that I imagined as a teenager would improve people's lives!

I just feel that 99% of the technology nowadays does not improve almost anyone's life. It gets easier and easier to be jaded and lose hope.

But your story is very reassuring and nice to read. Thanks for sharing!

pixl97 · on Aug 25, 2023

Again, as per my previous response, you're missing the optimization timeline and how long it took nature to do it.

"Walking" at least on dry land has been around 420 million years or so. The locomotion part of the learning algorithms has been around a very, very long time, and it's going to be very hard to duplicate the efficiency that nature brute forced.

Nature has not hype optimized human level intelligence. Really human intelligence is at the limits of birth. Our heads can't be any larger at birth or we'll kill the mother too much of the time. Then the human body has a power budget that's been the defining factor of human/animal survival for just about forever. Humans were already 'smart' before we invented agriculture and excess fuel storage.

AI takes that a step further. We're not using reversible computing so the heat generated is pretty high. We've not been doing it very long and our algorithms are not optimal in any sense. But what we do have that nature does not is nearly unlimited energy, nearly unlimited cooling capacity, and a lot of computer systems working on the problem at once. In an evolutionary sense we're doing closer to the bacteria method, we have a lot of different experiments going on at once and at least a few of them are likely to be fruitful.

Personally I hope the AI development is no insanely rapid. Rapid and powerful changes in society can cause as many problems as they find fixes for, and the pace of society change is not that fast.

pdimitar · on Aug 27, 2023

I can't disagree with almost anything you said except maybe one thing: in the physical nature there are objective and physical criteria about whether a certain organism will procreate and thus propagate its (presumably) successful genes.

Not sure if the AI area, represented by human supervisors, does so in such an objective and ruthless manner as nature does.

If there's no fitness function and natural selection then the AI area might be doomed to meander and go in circles for centuries.

nine_k · on Aug 25, 2023

Indeed, the human brain has much more logical parts. They all run at sub-MHz speeds though, and don't seem to be using matrix multiplication (which is still worse than quadratic with the best algorithms we have).

Also, human brains do a lot more than language processing.

LLMs are definitely touching something very important about intelligence, much like counting sticks touches something very important about numbers. But the real power of numbers is unleashed with the invention of digits and positional notation, a different representation that makes things literally exponentially easier.

I hope there is a transition step from what we do now using billion-parameter models, to some better representations, more compact and thus more powerful.

Also I hope that ready-made logic and efficient numerical computation can be connected to the learning systems more directly, not taught painstakingly from first principles, much like vision is relatively directly wired into the human brain.

chongli · on Aug 25, 2023

86 billion (or 600+ trillion) times 20 years is a lot of computation

What is that in kilowatt hours? Human brains are remarkably energy-efficient with their compute. We should get credit for that. Comparing one of us to an AI being trained at a data centre with the energy budget of a small city isn’t really fair, is it?

garblegarble · on Aug 25, 2023

Seems that ~100W is regularly quoted as the average human energy output, so that'd be something like 18 MWh for 20 years (assuming 100% efficiency of food input). I suppose there's also the energy cost of all the training infrastructure (daycare, school teachers, homes, transportation, etc.), and also all the energy consumption of all humans that have come before and built those cities and knowledge infrastructure (although LLMs sit atop that anyway), and then the millions (billions?) of generations of creatures going back to the earliest organisms, who have been optimising how cells work to lower their energy cost.

For that setup cost, you get a single-threaded human, starting to specialise in a single field. They will work in that field for around 7.5 hours every weekday, with ~20 days of holiday a year, for around 45 years, and then retire.

Llama v2 70B cost 1,720,320 GPU-hours[1] at 400W, so 688MWh. Once trained, it can be run 24/7, and you can spin up as many instances as you want on much lower spec hardware. That model produces output faster than a human while consuming around ~30W on my Macbook Pro.

Now, I know we're in pretty shaky spherical cow territory here, comparing a human (albeit a highly educated one) to Llama v2 in logical reasoning... but consider that this is the state of the art in generally-available LLMs after a few decades of research into machine learning & we're using repurposed silicon to compute vs the amazing complexity and physics-leveraging approach of human neurons... and the training cost is only 1 order of magnitude off humans.

Again, I'm not disagreeing with the general point of the OP, I'm not saying that the models we're using right now are the best/right ones (or that the hardware they're running on are the most efficient way of executing them), but I don't think the energy efficiency gap is actually all that high considering

Edit: If you look at the total footprint (considering not just the efficiency of the neurons, but the whole animal) the figures are very close - the llama model card indicates that v2 70B caused the emission of ~300T of CO2, and an average human in the US emits 16T of CO2 a year, so a human would emit ~320T of CO2 in 20 years. I assume children don't have as high a CO2 output, but even so it seems like it's the same order of magnitude.

1: https://github.com/facebookresearch/llama/blob/main/MODEL_CA...

chongli · on Aug 25, 2023

I don't think it's fair to include the billion years of human evolution as a cost on the human side of the chart but not include it in the AI side. AIs didn't evolve themselves out of the primordial ooze, they were built by humans and required herculean human effort to develop and improve. They stand on our shoulders, yet they're still in infancy when it comes to capability. I have yet to see any evidence of an LLM being able to replace an accountant, or an HR person, never mind a trash collector.

The best use of LLMs that I've seen so far is as a boilerplate-producing autocomplete system. Considering that we have better ways to automate this (better programming languages that can abstract away the boilerplate), this is not very high praise.

Edit: If you look at the total footprint (considering not just the efficiency of the neurons, but the whole animal) the figures are very close - the llama model card indicates that v2 70B caused the emission of ~300T of CO2, and an average human in the US emits 16T of CO2 a year, so a human would emit ~320T of CO2 in 20 years. I assume children don't have as high a CO2 output, but even so it seems like it's the same order of magnitude.

If you're going to include the whole animal on the human side, you need to include the whole supply chain on the LLM side. The cost of building all the fabs and doing all the R&D to develop and manufacture model training-specific computers (matrix multiplier hardware). Just like with crypto, these resources had to be diverted away from other things (e.g. causing the price of gamers' graphics cards to skyrocket). It's only fair to interrogate the ROI.

garblegarble · on Aug 25, 2023

>The best use of LLMs that I've seen so far is as a boilerplate-producing autocomplete system. Considering that we have better ways to automate this (better programming languages that can abstract away the boilerplate), this is not very high praise.

I think it's in our nature as software people to look at their ability to work with code, but they're quite good when applied to general language tasks. I've been using them for summarisation and reading comprehension and they're quite effective. I've also been working with a teacher friend on seeing if they can generate well-scoring essays on highschool English essays (as always, the problem is prompting and context).

On code, GPT-3.5 (moreso GPT-4) seems to have a good ability to generate and translate smaller scale code problems (hundreds of lines in low-boilerplate languages) but yes, they're like an eternally junior engineer whose work you're constantly having to oversee for subtle bugs, and I don't know that it actually saves time.

I'm pretty sure people are working on different approaches to applying them to code, with better prompting+context from larger codebases, and multi-step processing (i.e. rather than just a single prompt->response, letting the model iterate through a few steps independently, possibly guided by other adversarial/supervisor agent instances, testcase generators, etc.)

>If you're going to include the whole animal on the human side, you need to include the whole supply chain on the LLM side. The cost of building all the fabs and doing all the R&D to develop and manufacture model training-specific computers (matrix multiplier hardware). Just like with crypto, these resources had to be diverted away from other things (e.g. causing the price of gamers' graphics cards to skyrocket). It's only fair to interrogate the ROI.

That's fair, although it gets complicated to work out numbers because we don't train many LLMs, whereas we're constantly training humans, each of whom cost the planet tons of CO2 emissions every year... and, of course, your point that LLMs just aren't very good yet. I fear that they're good enough (or appear to be to the layperson) that execs will replace customer support staff with them, even if the outcomes overall aren't as good.

chongli · on Aug 25, 2023

That's fair, although it gets complicated to work out numbers because we don't train many LLMs, whereas we're constantly training humans, each of whom cost the planet tons of CO2 emissions every year... and, of course, your point that LLMs just aren't very good yet. I fear that they're good enough (or appear to be to the layperson) that execs will replace customer support staff with them, even if the outcomes overall aren't as good.

I'm not as fearful. When customer support gets too expensive companies already outsource it to India. Indian customer support workers cost far less than Westerners in terms of energy and CO2 emissions, both in terms of training and ongoing costs. India is about 2T/person in CO2 emissions, compared to 15T for North Americans. And that number is averaged over the whole country. I would imagine the poorer areas of the country have much lower emissions and the bulk of CO2 comes from the wealthier big cities.

FrustratedMonky · on Aug 25, 2023

Everyone has tunnel vision on LLM's like they are the only thing just because they are the latest big shiny thing.

"oh, they are at their limit". People were saying they were at their limit 5 years ago too.

There are a lot of AI techniques, some more advanced than LLM, just with other limitations so they don't look 'amazing'.

Seems like engineering at this point to tie them together for the next 'big thing'.

StackOverlord · on Aug 25, 2023

https://thequantuminsider.com/2023/08/22/tqi-exclusive-bob-c...

the8472 · on Aug 25, 2023

> There should be better architectures.

The drumbeat of progress has been quite steady. On log charts.

https://files.catbox.moe/w93was.png

__loam · on Aug 25, 2023

I'm not convinced we're going to get much further than this, at least without a see change in how this stuff is done. We saw a pretty big amount of diminishing returns with GPT-4, which cost 2 orders of magnitude more money to train than 3.5. Is the performance more than 2 orders of magnitude better than 3.5?

How much compute is too much to be spending on this bullshit. 30% of humanity's total computing resources? 70%?

GPT was a big innovation in parallelizing the training process. I think the optimism that we're on the beginning of a sigmoid curve here ought to be questioned.

danenania · on Aug 25, 2023

The performance doesn't have to scale with the training cost. It's ok if we have diminishing returns. The only question is if we can keep getting improvements. If we could get a 10x improvement over GPT-4 (while maintaining safety/alignment), the potential benefits to productivity across the entire world economy are so great that it justifies nearly unlimited investment.

__loam · on Aug 25, 2023

This is delusional.

IKantRead · on Aug 25, 2023

I work in this space and think that it's far more rational to accept the axiom that LLM progress will not be significant than to bank product work on assuming it will increase drastically.

I do think we still have yet to squeeze the most value out of current LLMs, but most people's radical AI dreams are completely out-of-touch with reality for anyone working closely on these problems.

My biggest fear in this space is that disappointment in the inability of these tools to live up to the hype will cause people to irrationally abandon exploring the spaces where they do work.

6D794163636F756 · on Aug 25, 2023

I think people assume that the current LLM release cycle is similar to other software releases which is to say, MVP followed by full product. ChatGPT is an actual product though, it's not the beta version of the technology. I think the dramatic increases in the product have likely mostly happened before launch but that you're right and we will see minor improvements.

ambrozk · on Aug 25, 2023

It may be rational and it may be irrational. All I'm saying is I want to see an argument. My impression of the AI space generally is that there are many, many obvious ideas which are simply waiting to be picked up off the ground and tested, and that it's not clear to anyone how much better (what are we even quantifying this with -- log loss?) the base LLM capabilities have to be before they're suitable for making tools that automate large quantities of work. Even if you assume that 240B is the limit for how many parameters number of engineers who have the ability to fine-tune GPT-4 or whatever Google is about to come out with is vanishingly small compared to the number of engineers who are participating in the open-source ML community, and the number of engineers in the open-source ML community is vanishingly small compared to the number of engineers in the long-tail of app developers who will ultimately adopt LLMs to their use-cases. Even assuming that GPT-4 is the best an LLM can possibly be, (which, again, I've seen no argument for), the widening of LLM availability and the building of practical tooling is a strong reason to believe that the utility of LLMs to concrete products will dramatically increase in the next 4-5 years.

usrbinbash · on Aug 25, 2023

> This entire piece is based on one massive, unsupported assertion, which is that LLM progress will cease.

Which is countered by...the assertion that it won't?

LLMs won't get intelligent. That's a fact based on their MO. They are sequence completion engines. They can be fine tuned to specific tasks, but at their core, they remain stochastic parrots.

> I want to know only one thing, which is what gives him the confidence necessary to say that.

I want to know only one thing, what gives the confidence to say otherwise?

glenstein · on Aug 25, 2023

>Which is countered by...the assertion that it won't?

No it's countered by principled restraint in not making an affirmative claim one way or the other.

I've heard this referred to as the overconfident pessimism problem. Which is that normal, well founded scientific discipline and evidence-based restraint go out of the window when people declare, without evidence that they know certain advances won't happen.

Because people get mentally trapped into this framing of either have to declare that it will happen or that it won't, seeming to forget that you can just adopt the position of modesty and say the dust hasn't yet settled.

tlarkworthy · on Aug 25, 2023

AI has been around the corner since the 1950s, this is the historical evidence for the pessimistic stance against over optimistic predictions.

LLMs are a huge stride forward, but AI does not progress like Moore's law. LLM have revealed a new wall. Combining multi agents is not working out as hoped.

glenstein · on Aug 25, 2023

Perhaps without intending to, you've cited a pretty appropriate example of overconfident pessimism. Philosopher Hubert Dreyfus is most responsible for this portrayal of AI research in the '50s and '60s. He made a career of insisting that advances in AI would never come to pass, famously predicting that computers couldn't become good at chess because it required "insight", and routinely listing off what he believed were uniquely human qualities that couldn't be embodied in computers, always in the form of underdefined terms such as intuition, insight, and other such terms.

Many of the things AI does now are exactly the type of things that doomsayers explicitly predicted would never happen, because they extrapolated from limited progress in the short term to absolute declarations over the infinite timeline of the future.

There's a difference between the outer limits of theoretical possibility on the one hand, and instant results over the span of a couple new cycles, and it's unfortunate that these get conflated.

RandomLensman · on Aug 25, 2023

There is no doubt that some leading AI proponents in the 1960s were overconfident.

ambrozk · on Aug 25, 2023

This is exactly the point. The pessimists get the pedantic thrill of pointing out that, 60 years ago, some proponents were overconfident. But they neglect to notice the larger picture, which is one of extraordinary progress. They'll be sitting in the passenger seat of a self-driving car, arranging their travel itinerary with a chatbot fluent in English and 60 other languages, and smugly commenting on HackerNews about how "AI pessimists got it right in the 60s."

RandomLensman · on Aug 25, 2023

Lots of progress in some areas, very little in others. Neither the pessimists nor the optimists got it right.

(Btw., a really hardened pessimist might even say your example is mostly things that were doable when there were not many computers around at all... taxi driver to drive you, a secretary to book a flight, meet in a club to discuss things, ...)

ambrozk · on Aug 25, 2023

To me, if artificial intelligence means anything, it means automating mundane tasks -- most of which humans can accomplish already. If someone tells me they're an AI pessimist because they think automating tasks like "converse intelligently about nearly any subject at length" and "drive my car anywhere while I sleep in the back" isn't an impressive AI achievement, then I think our disagreement pertains to our ambitions for the field, and lies outside the realm of the technical.

jimbokun · on Aug 25, 2023

And AI has been consistently successful since the 1950s, steadily achieving more and more that was once only the purview of human beings.

glenstein · on Aug 31, 2023

Right, I believe this is the more normal framing. I think Hubert Dreyfus spent his life constantly retelling the story of the 50s and 60s with a pessimistic narrative, and that is what made it stick, to the extent that it has. But it is a bizarre framing that, even if it once looked like a guiding light on how to set AI expectations, I think talking about the 50s and 60s over and over is not a helpful way of engaging with what has unfolded over the past 2-3 years.

ambrozk · on Aug 25, 2023

1. AI has not been "around the corner since the 1950s," and 2. if you think there hasn't been forward progress in AI since the 1950s, you have no valid opinions on this subject.

bowsamic · on Aug 25, 2023

> No it's countered by principled restraint in not making an affirmative claim one way or the other.

It’s maddening to me that people don’t get this nuance

YeGoblynQueenne · on Aug 25, 2023

I think that depends on what "it" refers to. If I claim that "it" is the end of the world, and that it will happen no later than 2024, I suspect that most scientifically minded folks would have no problem considering my claim as having a low probability of coming true.

First of all because so many predictions of the end of the world have been made and we tend to have a hunch about the kind of person who makes them. Which is stereotyping, sure, but at least it's a heuristic, not a straight-up abandonment of evidence-based thinking.

I agree there's uncertainty about the future of AI development, but it's true that we have no idea how to create AI, right now, so the uncertainty is about whether it will happen, not whether it won't. If that makes sense.

usrbinbash · on Aug 25, 2023

> when people declare, without evidence

The evidence for this, albeit empirical, is the history of AI development itself.

AI doesn't show continuous development over a long period of time. It always developed in steps. A new architecture or method is discovered and able to solve some previously hard or unsolveable problems.

Then this solution slowly develops in capability, mostly based on better and cheaper hardware, while it's quality plateaus.

It may be that LLMs will not follow that pattern. I don't think so, for reasons outlined above. But until it can be shown that they don't, that this really is the long-sought-after AI architecture that just gets better and better over time, I don't think that a healthy dose of pessimism is unwarranted based on history.

beefield · on Aug 25, 2023

> LLMs won't get intelligent. That's a fact based on their MO.

I kind of agree. However, I see a real possibility that in the near future LLM behaviour would be practically indistinguishable from intelligent/sentient behavior. And at that point we (or at least I) are facing some really interesting/difficult questions, namely how do you know an intelligent looking thing actually is intelligent (or sentient). How do you prove me you/LLM are/aren't a philosophical zombie?

How we are supposed to treat very much intelligent/sentient looking things when we are not sure if they are sentient/intelligent or not? Let's face it, lots of people are dumb as rock (too often very much me included). Why we should be able to treat something badly just because we think we know they can't be intelligent, even if they walk , look and quack like intelligent duck?

I personally have started to think that the behavior of humans should be judged by the behaviour, not the target. If you want to behave like an asshole towards a teddy bear, then you most likely are an asshole.

js8 · on Aug 25, 2023

Any sufficiently advanced intelligence is indistinguishable from an LLM?

Cute, but practically speaking, I would prefer the former.

velvetz · on Aug 25, 2023

Off-topic and might sound strange but I find it intriguing that you wrote behavior and behaviour in these two different forms in the same sentence:)

coldtea · on Aug 25, 2023

Quite easy to happen if like many of us you were taugh British English, and then you remember that you're on a US forum, and everybody around you uses the US spelling for things (and you get to read the US variants all the time in other comments).

beefield · on Aug 25, 2023

I don't have any excuses but being non native in english having exposure to both British and US english seems to confuse my brain.

savolai · on Aug 25, 2023

It seems to me that the real question here is what is true human intelligence. Ai has made it plain to see, by being able to replicate it so convincingly, that much of what we have considered intelligence has been pattern matching or acting as complex parrots.

There is much more to the abilities of human body-mind-emotional-experiential being, but it is only slowly becoming mainstream.

(Edit: Of course there are also many analytical skills that AI cannot match at this point. My point is that we shouldn’t overlook any area of human capacity.)

One salient question in this is: will we reach a level of intelligence where we become beings capable of actual collaboration that doesn’t waste so much effort in conflicts, or one that is capable of living in harmony within its environment?

What capabilities of awareness, trauma work, emotional maturity and self reflection does this require? What resources hidden inside humanity that we have forgotten do we need to wield?

Does AI have something to contribute to this process happening?

worrycue · on Aug 25, 2023

> It seems to me that the real question here is what is true human intelligence.

IMHO the main weakness with LLMs is they can’t really reason. They can statistically guess their way to an answer - and they do so surprisingly well I will have to admit - but they can’t really “check” themselves to ensure what they are outputting makes any sense like humans do (most of the time) - hence the hallucinations.

NumberWangMan · on Aug 25, 2023

Apparently GPT-4 is getting pretty good at knowing when it's wrong: https://thezvi.substack.com/p/ai-26-fine-tuning-time#%C2%A7g...

(They asked GPT-3.5 and GPT-4 "are you sure" to see if it would change its answer, both when the original answer was right, and when it was wrong)

worrycue · on Aug 25, 2023

Does it do that because it can check it’s own reasoning? Or is it just doing so because OpenAI programmed it to not show alternative answers if the probability of the current answer being right is significantly higher than the alternatives?

NumberWangMan · on Aug 25, 2023

I don't know. I don't think anyone is directly programming GPT-4 to behave in any way, they're just training it to give the responses they want, and it learns. Something inside it seems to be figuring out some way of representing confidence in its own answers, and reacting in the appropriate way, or perhaps it is checking its own reasoning. I don't think anyone really knows at this point.

naasking · on Aug 25, 2023

As the other poster said, they can check themselves but this requires an iterative process where the output is fed back in as input. Think of LLMs as the output of a human's stream of consciousness: it is intelligent, but has a high chance of being riddled with errors. That's why we iterate on our first thoughts to refine them.

worrycue · on Aug 25, 2023

Why do we have to feed the input back? Why doesn’t it do it itself?

Maybe it’s because it can’t tell when it’s wrong and needs to “try again”, and we have to do it for them.

naasking · on Aug 25, 2023

Because that's how LLM chatbots are designed. Those papers describe systems where the review process is automated for better results.

ChatGTP · on Aug 25, 2023

I'm getting suspicious that there is a bit of a blind spot in understanding the world and the usefulness of what we call intelligence, as Noval Yuah Harari says, intelligence is overrated. Look at what we've done to the planet and our environment we have fucked it properly, yet we consider ourselves to be intelligent?

Could it be that intelligence is overrated and discovery of new ideas / thing is underrated? Our egos tell us it's intelligence that makes us special and creative and awesome but maybe most of the special stuff is already there for us to find and we conflate discovery with extrapolation. Maybe knowledge and experience are the "important bits" of intellect.

Example: Einstein didn't really invent anything, he discovered things about the world that blew our mind and changes our lives. Yes he was a great thinker and a courageous soul to go against the grain and he had the balls to be open minded enough to discover new things. We obviously believe Einstein to be intelligent but was he just a great explorer ?

I have a similar attitude towards technological progress, yes we've done amazing things but fundamentally the air we breathe, the water we drink and the beauty we are subjected too when looking at a sunset are taken for granted while we stare at our phones.

NumberWangMan · on Aug 25, 2023

What we've done to the planet is perhaps less a consequence intelligence, more a consequence of multi-polar traps. Though it's in our collective self-interest to protect the planet, it's in our individual self interest to ignore the problem / prepare so that our own children can "weather the storm".

It kinda depends on how much you care about people across the world and future generations.

But yeah, if humans were more intelligent, we probably would have sorted all this out by coming up with better coordination mechanisms, and by overcoming our tribal tendencies more effectively.

ChatGTP · on Aug 25, 2023

But, if we were more intelligent, we might have raped the earth much faster and much harder too?

I see this too often, more intelligence = positive outcomes, but no, some of the smartest people ever put their intellect towards stupid causes, such as oil exploration and AI to capture peoples attention.

I hope you're right though and I'm wrong ;)

Teever · on Aug 25, 2023

All of this can be boiled down to the simple maxim: "Stupid is as stupid does."

coldtea · on Aug 25, 2023

>One salient question in this is: will we reach a level of intelligence where we become beings capable of actual collaboration that doesn’t waste so much effort in conflicts, or one that is capable of living in harmony within its environment?

That's more about ethics and an wish for moral behavior and conflict aversion, than about intelligence.

Intelligence (human and AI) could just as well opt for conflict and evil, if this helps it get the upper hand for its own private goals and interests.

Simply put, the interests of the collective, are not necessarily the interests of the individual intelligence.

(Even assuming there was a single, easy to agree upon, "interest of the collective" for most problems).

savolai · on Aug 27, 2023

Not sure how being able to reason about good behaviour in an effective manner that's collectively beneficial isn't in the domain of intelligence.

We have moral philosophy as an academic discipline, after all.

Human brains develop in interrelation. Much, if not all of our intelligence gets developed in relation to other humans and beings.

coldtea · on Aug 27, 2023

>Not sure how being able to reason about good behaviour in an effective manner that's collectively beneficial isn't in the domain of intelligence.

That's neither here nor there.

Having the intelligence "to reason about good behaviour in an effective manner that's collectively beneficial" doesn't mean you're constrained to reason and act only on that, and not also able to reason and act on behavior which is beneficial to you to the detriment of others and the collective.

And it's pefectly intelligent to follow the latter if you can get away with it, and if the benefit for you is more than your share of the collective benefit alternatives would be.

>We have moral philosophy as an academic discipline, after all.

And how has that been working out for us?

(Not to mention, keyword: academic).

>Human brains develop in interrelation. Much, if not all of our intelligence gets developed in relation to other humans and beings.

Yes, and a lot of it is devoted to duping and getting the upper hand of those other humans and beings. So?

golol · on Aug 25, 2023

>LLMs won't get intelligent. That's a fact based on their MO. They are sequence completion engines. They can be fine tuned to specific tasks, but at their core, they remain stochastic parrots.

This is absolutely wrong. There is nothing about their MO that stops them from being intelligent. Suppose I build a human LLM as follows: A random human expert is picked and he is shown the current context window. He is given 1 week to deliberate and then may choose the next word/token/character. Then you hook this human LLM into an auto-GPT style loop. There is no reason it couldn't operate with high intelligence on text data.

Not also that LLMs are not really about language at all anymore, the architectures can be used on any sequence data.

Right now we are compute limited. If compute was 100x cheaper we could have GPT-6, bring 100x bigger, we could have really large and complex agents using GPT-4 power models, or we could train on tupled text-video data of subtitles videos. Given the world model LLMs manage to learn out of text data, I am 100% certain that a sufficiently large transformer can learn a decent world model from text-video data. Then our agents could also have a good physical understanding.

ambrozk · on Aug 25, 2023

Humans will never be intelligent. They're optimized for producing offspring, not reasoning. Humans may appear to be intelligent from a distance, but talk to one for any length of time and you'll find they make basic errors of reasoning that no truly thinking being would fall for. /s

ksaho · on Aug 25, 2023

Take out the /s tag and you are right on the money. Humans can not be trusted with anything because they are trivially fallible. Humans are terribly stupid, destroy their own societies and refuse to see reason. They also hallucinate when their destructive tendencies start catching up to them.

ambrozk · on Aug 25, 2023

If the most intelligent machines ever observed in the universe do not count as "intelligent," then we have a semantic, and not a substantive difference of opinion.

ftxbro · on Aug 25, 2023

the new standard for true intelligence is godlike omniscience omnipotence and benevolence

nopinsight · on Aug 25, 2023

Geoffrey Hinton, Andrew Ng, and quite a few other top AI researchers believe that current LLMs (and incoming waves of multimodal LFMs) learn world models; they are not simply 'stochastic parrots'.

If one feeds GPT-4 a novel problem that does not require multi-step reasoning or very high precision to solve, it can often solve it.

auggierose · on Aug 25, 2023

Anyone who has worked a bit with a top LLM thinks that they learn world models. Otherwise, what they are doing would be impossible. I've used them for things that are definitely not on the web, because they are brand new research. They are definitely able to apply what they've learnt in novel ways.

semi-extrinsic · on Aug 25, 2023

What really resonated with me is the following observation from a fellow HNer (I forgot who):

In many cases, we humans have structured our language such that it encapsulates reality very closely. For these cases, when an LLM learns the language it will by construction appear to have a model of the world. Because we humans already spent thousands of years and billions of actually intelligent minds building the language to be the world model.

lordnacho · on Aug 25, 2023

Perfectly reasonable, isn't it?

But in a sense when YOU learned language YOU also learned a world model. For instance when your teacher explains to you the difference between the tenses (had, have, will have) you realize that time is a thing that you need to think about. Even if you already had some sense of this, you now have it made explicit.

Why should we say the LLM hasn't learned a world model when it's done what a kid has done, and everyone agrees the kid understands things?

From what I see, there are some things it hasn't learned correctly. Notably with limbs, it doesn't know how fingers and elbows work, for some reason. But it does know something about what they should look like, and so we get these hilarious images. But I also don't see why it shouldn't overcome this eventually, since it's come pretty far as it is.

daveguy · on Aug 25, 2023

The reason why the LLM apparent world model should not be considered to be the same as a human's world model is because of the modality of learning. The world model we learn as we learn a language includes the world model embedded in language. But the human world model includes models embedded in flailing about limbs, the permanence of an object, sounds and smells associated with walking through the world. Now, all those senses and interactions obviously aren't required for a robust world model. But I would be willing to make a large wager that training on more than "valid sequences of words" definitely is required. That's why hallucinations, confident wrongness, and bizarre misunderstandings are endemic to the failings of LLMs. Don't get me wrong. LLMs are a technological breakthrough in AI for language processing. They are extremely useful in themselves. However, they are not and will not become AGI through larger models. Lessons learned from LLMs will transfer to other modes of interaction. I believe multi-modal learning and transfer learning are the most interesting fields in AI right now.

lordnacho · on Aug 25, 2023

That makes sense, but isn't this a matter of presenting it with more models? Maybe a physical model discovered via video or something like that? Then it will be similar to what babies are trained with, images and sound. Tactile and olfactory would be similar.

By doing this you'd glue the words to sights, sounds, smells, etc.

But it also seems like this is already someone has thought of and is being explored.

daveguy · on Aug 25, 2023

You are correct, there is active research on this. And words and pictures are associated in models like stable diffusion. There has been some success combining GANs and LLMs, but it is far from a solved problem. And as the training data gets more complex the required training resources increase too. Currently it's more like a confusing barrier than a happy extension of LLMs.

ted_bunny · on Aug 25, 2023

Do LLMs trained on languages that treat any double (or more) negatives as one have a slightly different world model than those that treat negatives like separate logical elements, like English? I wonder if that'd be one way to demonstrate what you're saying.

freehorse · on Aug 25, 2023

This statement on "learning world models" lies between overhyping, nitpicking and wishful thinking. There are many different ways we represent world knowledge, and llms are great in problems that relate with some of them, and horrible at others. For example, they are really bad with anything that has to do with spatial relations, and with logical problems where a graphical approach helps. There are problems that grade school children can easily solve with a graphical schema and the most advanced LLMs struggle with.

You can very easily give "evidence" of gpt4 being anywhere between emerging super-intelligence and a naked emperor depending what you ask it to solve. They do not learn models of the world, they learn models of some class of our models of the world, which are very specific and already very restricted in how they represent the world.

naasking · on Aug 25, 2023

> For example, they are really bad with anything that has to do with spatial relations, and with logical problems where a graphical approach helps

Of course they are, they haven't been trained on anything spatial, they've only been trained on text that only vaguely describes spatial relations. A world model built from an anemic description of the world will be anemic.

RandomLensman · on Aug 25, 2023

If they learn world models, those world models are incredible poor, i.e., there is no consistency of thought in those world models.

In my experience, things outside coding quickly devolve into something more like "technobabble" (and in coding there is always a lot of made-up stuff that doesn't exists in terms of functions etc.).

the8472 · on Aug 25, 2023

It's like if a squirrel started playing chess and instead of "holy shit this squirrel can play chess!" most people responded with "But his elo rating sucks"

RandomLensman · on Aug 25, 2023

I don't understand why anyone was surprised by computers processing and generating language or images.

the8472 · on Aug 25, 2023

There are many reasons. Failing at extrapolating exponentials. Uncertain thresholds for how much compute and data each individual task requires. Moravec's paradox, and relatedly people expecting formalizable/scientific problems to be solved first before arts. There are still some non-materialists. And a fairly basic reason: Not following the developments in the field.

auggierose · on Aug 25, 2023

I see them more as creative artists who have very good intuition, but are poor logicians. Their world model is not a strict database of consistent facts, it is more like a set of various beliefs, and of course those can be highly contradictory.

RandomLensman · on Aug 25, 2023

That maybe sufficient for advertising, marketing, some shallow story telling etc., it is way too dangerous for anything in the physical sciences, legal, medicine, ...

auggierose · on Aug 25, 2023

On their own, yes. But if you have an application where you can check the correctness of what they come up with, you are golden. Which is often the case in the hard sciences.

It's almost like we need our AI's to have two brain parts. A fast one, for intuition, and a slow one, for correctness. ;-)

RandomLensman · on Aug 25, 2023

Unclear to me. The economics might not be so great as you might need (i) expensive people, (ii) there could be a lot to check for correctness, and (iii) checking could involve expensive things beyond people. Net productivity might not go up much then.

For some industries where I understand the cost stacks with lower and higher skilled workers, I'd say it only takes out the "cheap" part and thereby not taking out a large chunk of costs (more like 10% cost out prior to paying for the AI). That is still a lot of cost reduction, but something that also will potentially be relatively quickly be "arbitraged away", i.e., will bleed into lower prices.

Philpax · on Aug 25, 2023

My interpretation of the parent post is not that LLMs' output should be checked by humans, or that they are used in domains where physical verification is expensive; no, what they're suggesting is using a secondary non-stochastic AI system/verification solution to check the LLM's results and act as a source of truth.

An example that exists today would be the combination of ChatGPT and Wolfram [1], in which ChatGPT can provide the method and Wolfram can provide the execution. This approach can be used with other systems for other domains, and we've only just started scratching the surface.

[1] https://www.wolfram.com/wolfram-plugin-chatgpt/

auggierose · on Aug 25, 2023

Yes, your interpretation is correct. I think the killer app here is mathematical proof. You often need intuition and creativity to come up with a proof, and I expect AI to become really good at that. Checking the proof then is completely reliable, and can be done by machine as well.

Once we have AI's running around with the creativity of artists, and the precision of logicians, ... Well, time to read some Iain M. Banks novels.

__loam · on Aug 26, 2023

> But if you have an application where you can check the correctness of what they come up with, you are golden.

You're glossing over a shocking about of information here. The problems we'd like to use AI for are hard to find correct answers for. If we knew how to do this, we wouldn't need the AI.

__loam · on Aug 25, 2023

So bullshit then?

coldtea · on Aug 25, 2023

>If they learn world models, those world models are incredible poor, i.e., there is no consistency of thought in those world models

Incredibly poor compared to ours, but thousands of times better than what "AI" we had before.

RandomLensman · on Aug 25, 2023

Not sure that matters much as they are only for low risk stuff without skilled supervision, so back to advertising, marketing, cheap customer support, etc.

cgio · on Aug 25, 2023

I would love to see examples. In my attempts to get something original on a not that challenging field (finance), with lots of guidance and hand holding on my end, I was getting a very bad version of what would be a consultant's marketing piece in a second rate industry publication. I am still surprised in other respects, e.g. performance in coding but not in terms originality and novel application.

Viliam1234 · on Aug 25, 2023

A typical parrot repeats after you said something. A parrot that could predict your words before you said them, and could impersonate you in a phone call, would be quite scary (calling Hollywood, sounds like an interesting move idea). A parrot that could listen to you talking for hours, and then provide you a short summary, would probably also be called intelligent.

tibbydudeza · on Aug 25, 2023

Our parrot does not simply repeat - he associates sounds and intent with what we doing.

At night when he is awake (he sleeps in our room in a covered cage) he knows not to vocalize anything more "Dear" when my wife gets up - he says nothing when I do this as he is not bonded to me.

When I sit at my computer and put on my headset he switches to using English words and starts having his own Teams meetings.

When the garage door opens or we walk out he the back door he starts saying Goodbye - Seeya later and then does the sound of the creaky outside gate.

ted_bunny · on Aug 25, 2023

Surfing Uncertainty was a really cool book. I think we think too highly of ourselves.

w1nk · on Aug 25, 2023

Just to further this, it's not just 'big names' that feel this way. Read this paper from a team at Microsoft Research: https://arxiv.org/abs/2303.12712 . These folks spent months studying properties of GPT-4, that paper is ~150 pages of examples probing the boundaries of the model's world understanding. There is obviously some emergent complexity arising from the training procedure.

RandomLensman · on Aug 25, 2023

That paper makes some pretty strong claims in the abstract that are not all really supported by the body of the paper. For example, there isn't much on the law or medicine claims in the paper.

martindbp · on Aug 25, 2023

If something stays in motion and has been so for some time, it's more important to explain why it would not continue rather than the default assumption that it will stop instantaneously. Show me a curve of diminishing returns and I'll believe you. If an object is in motion you'd need to show me that there is deceleration, or there is a wall just up ahead.

But the fact is that the loss goes down predictably with increased compute budget, data and model size (see Chinchilla Scaling Law). We've also seen that decreased loss suddenly results in new capabilities in discontinuous jumps. There is all reason to believe there is still some juice left in this scaling, exactly how far it can be taken is difficult to tell.

IanCal · on Aug 25, 2023

> LLMs won't get intelligent. That's a fact based on their MO. They are sequence completion engines.

A system that could perfectly predict what I would do in response to any particular stimuli, as a continuing sequence, would be exactly as intelligent as me.

> They can be fine tuned to specific tasks, but at their core, they remain stochastic parrot

Othello GPT was an attempt at answering this exact question, it's a simplified setup and appears to learn a world model: https://thegradient.pub/othello/

lewhoo · on Aug 25, 2023

A system that could perfectly predict what I would do in response to any particular stimuli, as a continuing sequence, would be exactly as intelligent as me.

That's certainly interesting but it's not a depiction of a LLM is it ? LLM's are not deterministic, and (perhaps) so are we so two non-deterministic systems can only occasionally align (or so I assume). Intuition says they may get "close enough", whatever that might be, and close enough is good enough in this case but I think you are making a giant assumption to the likes of since we can speed up matter to 1000km/h then IF we sped it up to light speed then ...[something]...

stormfather · on Aug 25, 2023

LLMs are deterministic if the temperature parameter is set to 0. Randomness is artificially injected into their outputs otherwise in order to make them more interesting, but they're just a series of math operations.

Philpax · on Aug 25, 2023

To elucidate on this: the LLM can be viewed as a function that takes the context as an attachment and produces a probability distribution for the next token over all known tokens.

Most inference samples from that distribution using a composition of sampling rules or such, but there's nothing stopping you from just always taking the most probable token (temperature = 0) and being fully deterministic. The results are quite bland, but it's perfect for extraction tasks.

(note: GPT-4 is not fully deterministic; there's no details on this but the running theory is it is a mixture of experts model and that their expert routing algorithm is not deterministic/is dependent on the resources available)

lewhoo · on Aug 25, 2023

Randomness is artificially injected

I'd argue everything about a LLM is artificial, there is no natural process involved is there ? Since its design is to mimic us (at face value, though I don't know how fair of a description this is) then randomness is essential I think.

IanCal · on Aug 25, 2023

> LLM's are not deterministic,

They definitely can be, but it doesn't matter.

> but I think you are making a giant assumption to the likes of since we can speed up matter to 1000km/h then IF we sped it up to light speed then ...[something]...

This is an odd comparison.

The point here is that a sequence prediction system can be as intelligent as the system it's predicting unless you invoke woo. That doesn't make llms intelligent but it means the argument that they just predict the next thing isn't enough to say the can't be.

ted_bunny · on Aug 25, 2023

Why do you assume the human brain is non-deterministic? It might still be, but at a level of complexity we don't yet grasp.

squeaky-clean · on Aug 25, 2023

> LLMs won't get intelligent.

I think this sentence doesn't mean much unless we have a strict definition of what intelligence means.

Just today ChatGPT helped me solve a DNS issue that I would not have been able to solve on my own in one day, let alone an hour. I'd consider it already more intelligent than myself when it comes to DNS.

Aperocky · on Aug 25, 2023

It's seen more DNS content than you and anybody else have seen in their entire lives, and are able to regurgitate what it read because it has far faster memory access than you did.

A dictionary contain knowledge but no intelligence.

TeMPOraL · on Aug 25, 2023

And LLMs are the opposite of dictionary, actually. They suck at storing facts. They excel at extracting patterns from noise and learning them. It's not obvious to me that this isn't intelligence; on the contrary, I feel it's very much a core component of it.

BartjeD · on Aug 25, 2023

Isn't that a dictionary of patterns?

worrycue · on Aug 25, 2023

> They excel at extracting patterns from noise and learning them.

You can argue those are facts too.

lucubratory · on Aug 25, 2023

Yes, I have noticed that a lot of extreme AI cynics have been arguing that any and every example of reasoning or thinking that an LLM displays is just some variant of memorisation.

worrycue · on Aug 25, 2023

The biggest evidence that LLMs can’t reason is hallucinations. If it could reason it would have rejected fictional generated output that make no sense.

naasking · on Aug 25, 2023

> The biggest evidence that LLMs can’t reason is hallucinations.

If I asked you a question and you had to respond with a stream of consciousness reply, no time to reflect on the question and think about your reply, how inaccurate would your response be? The "hallucinations" aren't a problem with the LLM per se, but how we use them. Papers have shown that feeding the output back into the input, as happens when humans iterate on their own initial thoughts, helps tremendously with accuracy.

layer8 · on Aug 25, 2023

Maybe it’s more accurate to say that LLMs lack (self-)awareness. Because when you point out things that make no sense, they do have some limited ability to produce reasoning about that. But I agree that this lack of awareness is a serious and maybe fundamental deficit.

worrycue · on Aug 25, 2023

And how often does it get that wrong too?

It’s more likely it’s just, once again, generating the most probable answer - and if you shake the magic 8 ball enough you will get the answer you were expecting.

TeMPOraL · on Aug 25, 2023

Yeah, but the thing is, this seems exactly what people are doing too, at the boundary of conscious and unconscious, with the "inner voice" being most directly comparable to LLMs. It too generates language that feels like best completion, regardless of whether the output is logically correct or not.

TeMPOraL · on Aug 25, 2023

In comparing to human minds, LLMs are better understood as the "inner voice" part, not the mind. From that perspective, it's eerie how similar the two are in success and failure modes alike.

Yes, I'm saying here that peoples' inner voices are hallucinating in very similar fashion; "rejecting fictional generated output that makes no sense" is a process that's consciously observable and involves looping the inner voice on itself.

burnished · on Aug 25, 2023

Wouldn't that be evidence at most that its reasoning was flawed, or could contain errors?

squeaky-clean · on Aug 26, 2023

Find me a dictionary that will turn to the correct page when I say or type "what's a word that means (thing)" and yes, I'd consider it intelligent.

But still, you don't have to agree with me about what intelligence means. But it is important in these discussions to understand that not every participant shares the same definition of the term intelligence.

Aperocky · on Aug 26, 2023

That's just more advanced indexing.

intelligence to me has to be based off initiative. In that sense, a dog or a cat have more intelligence than GPT.

I actually very much look forward silicon (or other non-biotic material) attaining intelligence, I consider that the only way that Earth civilization can colonize space. But this aint it.

PlunderBunny · on Aug 25, 2023

“… unless we have a strict definition of what intelligence means.”

There were some Greek guys working on that exact problem a few (thousand) years ago.

datpiff · on Aug 25, 2023

> Just today ChatGPT helped me solve a DNS issue that I would not have been able to solve on my own in one day, let alone an hour. I'd consider it already more intelligent than myself when it comes to DNS.

Would you consider a search engine, or a book, to be as intelligent?

roenxi · on Aug 25, 2023

I'm put in mind of the OpenAI DoTA bot that was winning 99% of its games and some people refused to admit that it knew how to play DoTA based on some esoteric interpretation of the word "play".

We're going to see exponential increases in processing power of the best GPU clusters and human brains are a stationary target. And there is precious little evidence that the average human is much more than an LLM. LLMs are already more likely to understand a topic to a high standard than a given human.

They're going to progress and if they aren't intelligent then intelligence is overrated and I'd rather have whatever they have.

albertzeyer · on Aug 25, 2023

Look at the scaling laws.

We found that extrapolating the performance given a few data points with smaller models is actually very accurate. That's how they determined hyper parameters, by tuning them on multiple smaller scale models and then extrapolating. So far, all those predictions were quite good.

Together with a bigger model, we also need more data to get better performance. If we add video and audio to the text data, we have still a lot more data we can use, so this is also not really a problem.

It would be very unexpected that those scaling laws are suddenly not true anymore for the next order of magnitude in model and data size.

dontupvoteme · on Aug 25, 2023

Scaling laws apply to a single model. The best single model right now is supposedly a 8x mixture of experts, so not even really a single model in the purist sense.

I still expect the final solution will be more along the lines of picking the best model(s) from a sea of possible models, switching them in and out as needed, and then automatically reiterating as needed.

stared · on Aug 25, 2023

> LLMs won't get intelligent. That's a fact based on their MO. They are sequence completion engines. They can be fine tuned to specific tasks, but at their core, they remain stochastic parrots.

Yet, we are different, right?

naasking · on Aug 25, 2023

> LLMs won't get intelligent. That's a fact based on their MO. They are sequence completion engines.

Where's the proof that sequence completion engines can't be intelligent?

czbond · on Aug 25, 2023

> stochastic parrots I mean, let's be honest, so are enough of the bell curve humanity - so LLM's don't need to be amazing. They need to chain together communication that makes them seem sentient (as now) & then be exposed to smaller data sets with specialized, higher level knowledge. This is how humans are... and the reason some are smarter than others.

alpaca128 · on Aug 25, 2023

> LLMs won't get intelligent

Even assuming that is true: LLMs aren't all that exists in AI research and just like LLMs are amazing in terms of language it's possible similar breakthroughs could be made in more abstracted areas that could use LLMs for IO.

If you think ChatGPT is nice, wait for ChatGPT as frontend for another AI that doesn't have to spend a single CPU cycle on language.

oceanplexian · on Aug 25, 2023

The next AI wave hasn't even started. Imagine an LLM the size of GPT-4 but it's trained on nothing but gene sequence completion.

All the models being used in academia are basically toys, none of those guys are running hardware at a scale that can even remotely touch Azure, Meta, etc, and right now there is a massive global shortage of GPU compute that's eventually going to clear up. We know models get A LOT better when they are scaled up and are fed more data, so why wouldn't the same be true for other problems besides text completion?

benterix · on Aug 25, 2023

> LLMs aren't all that exists in AI research

Frankly, I'm a bit worried about all the rest now that LLMs proved to be so successful. We might exploit them and arrive to a dead end. In the meantime, other potentially crucial developments in AI might get less attention and funding.

rhn_mk1 · on Aug 25, 2023

It's countered by not making the assertion and not being able to make conclusions. You only need a lack of confidence for that. It's orders of magnitude easier to not have knowledge compared to having it.

tempodox · on Aug 25, 2023

> I want to know only one thing, what gives the confidence to say otherwise?

Remember that UFO poster? “I want to believe”.

edf13 · on Aug 25, 2023

> They are sequence completion engines

But at the basic level - isn't our own brain just a sequence completion engine too?

HPsquared · on Aug 25, 2023

The brain does seem to do a lot of pattern-matching and prediction.

nly · on Aug 25, 2023

Parrots are pretty intelligent. Seems like a an unfair analogy

intended · on Aug 25, 2023

I can take a bet that it haha already failed - the hype cycle has already made a promise that LLMs can’t keep.

Hallucinations to the normal person are a bug.

The issue is that only humans can hallucinate. We know there is a “reality”.

For an LLM, everything it does is a hallucination.

That’s why you have more POCs than production goods. Your “hallucination rate” is unknown.

Yesterday Ars has an article that described LLMs as a new type of CPU for a new type of architecture. Others want “LawLLM” or “healthLLM”.

These are simply not going to happen.

If you even get it to 80% accuracy- it’s a 1/5 chance you have a relevant answer.

The issue isn’t a technical one.

It’s expectations.

Rereereetret · on Aug 25, 2023

And while you mention one article with a negative experience, tons of positive article came out too.

GitHub copilot is really good and useful.

All demos I saw which use LLMs were spectacular.

The ai race started this year for everyone which means we will continuesly see progress.

And while you only mention LlM the whole ai space is crazy.

There is a high chance that the architecture from LLMs will change.

And we haven't even touched all possibilities with multi modal LLM models.

intended · on Aug 25, 2023

Sorry, I didn’t want to talk about what I have personally done to test this. I felt it would be better to refer to other people.

In the past months I have used Gen AI to create multiple proof of concepts, including labelling and summarization tools. In addition, to make sure I took a project to conclusion, I built a website from scratch, without any prior knowledge - using Gen AI as extensively I could.

I am being pretty conscientious with my homework. The results of those experiments are why I am confident in this position. Not just because of the articles.

I am also pointing out that its not the tech, its the expectations in the market.

People expect Chat GPT to be oracular, which it just cant - the breathless claims from proofs of concepts fan the flames.

I leave it to you to recall the results and blame, when unrealistic expectations were not met.

nottorp · on Aug 25, 2023

> All demos I saw which use LLMs were spectacular.

Aren't demos always spectacular?

danielbln · on Aug 25, 2023

Blockchain demos sure weren't.

barrysteve · on Aug 25, 2023

The bigger problem is that an accurate LLM is such a massive speed up in coding (an order of magnitude, hypothetically at least), that there is zero incentive to share it.

All American programming tech has relied on an time-and-knowledge gap to keep big companies in power.

Using visual studio and c++ to create programs is trivial or speedy if you have a team of programmers and know what pitfalls to avoid. If you're a public pleb/peasant who doesn't know the pitfalls, you're going to waste thousands of hours hitting pointless errors, conceptual problems and scaling issues.

Hallucinating LLMs are marketable to the public. Accurate LLMs are a weapon best kept private.

I am always intriguied by the people who say LLMs provide a massive benefit to their programming and never ever provide examples............

dontupvoteme · on Aug 25, 2023

Why not "simply" multigen every (important) query and take the statistical average?

Hallucinations are random, the truth isn't.

This is absurdly expensive with GPT4, cheaper with 3, and dirt cheap locally with LLaMA

jiggawatts · on Aug 25, 2023

That’s an established technique with papers written on the topic and everything.

Anecdotally I tested this by having GPT4 translate Acadian cuneiform — which it can just barely do. I had it do this four times and it returned four gibberish answers. I then prompted it with the source plus the four attempts and asked for a merged result.

It did it better than the human archeologists did! More readable and consistent. I compared it with the human version and it matched the meaning.

Expensive now… soon to be standard?

intended · on Aug 25, 2023

I don’t know if I will sound rude, but your example itself illustrates the crux of problem.

The only way you could know that the output was wrong was because you could verify it in the first place.

You can’t verify answers for questions in unfamiliar domains - or even for novel questions in your own domain.

Hah, it feels like a weird version of P!=NP.

dontupvoteme · on Aug 25, 2023

Makes sense, it's too obvious to not have already been studied :) Do you know what I might search for to find info on it? >Expensive now… soon to be standard? With translation, better safe than sorry? It's a very important field and preserves human history so, why not?

jiggawatts · on Aug 25, 2023

It's trivial to test, just use ChatGPT yourself and ask it to solve the same problem several times in new sessions. Then paste in all attempts and ask for a combined result.

The main issue is context length: if you use 4 attempts you have to have to fit in the original question, four temporary answers, and the final answer. So that's 6 roughly equal sized chunks of text. With GPT4's 8K limit that's just 1300 tokens per chunk, or about 900 words. That's not a lot!

The LLMs with longer context windows are not as intelligent, and tend to miss details or they don't follow instructions as accurately.

Right now this is just a gimmick that demonstrates that more intelligence can be squeezed out of even existing LLMs...

xmprt · on Aug 25, 2023

That only works if the generated outputs are completely independent and not correlated. I'd be interested in research that shows whether multigen actually reduces hallucination rates.

dontupvoteme · on Aug 25, 2023

True, I'm just throwing multigen out there as a wild ass solution However you could do multigen across different models, e.g. GPT/Claude/LLaMA which should not correlate entirely

intended · on Aug 25, 2023

How?

Truth is a human thing. Statistically averaging out 4, 5, 6, N text generations from an LLM will not converge to any “truth”.

You have essentially stated that outputs from a text generator are normally distributed around “Facts”.

May I gently suggest, that and older quote about an infinite number of simians, typewriters and the works of Shakespeare, is more appropriate ?

marcosdumay · on Aug 25, 2023

What makes you say that neural networks hallucinations are random? There's absolutely no reason for them to be.

thatjoeoverthr · on Aug 25, 2023

Just try it!

dontupvoteme · on Aug 25, 2023

I'll give this a shot after work I think.

The question I have is what's a prompt which reliably hallucinates but still produces the correct answer some of the time?

I know it gets some python functions "wrong", but i think they were actually "right" in the version it was trained on, so software seems out.

thrwjud · on Aug 25, 2023

Every deep learning tech had an exponential growth phase, followed by a slowdown followed by a platou the nothing could break until a fundamentally new architecture came along. People get excited about the first part, project it into the second and start companies by the time we're well into the third.

ambrozk · on Aug 25, 2023

Yes, but how do they know where we are in that growth phase? If you confidently tell me that "GPT-4 is the plateau," I want to know how you know that, specifically. "Well, because all deep learning technologies eventually slow down" is not a good argument. You need to show me the diminishing returns, or give me a good theoretical argument for why we're reaching them.

thrwjud · on Aug 25, 2023

The more compute you need to get state of the art performance the closer to the plateau you are. If you didn't need the compute researchers would be getting better results with smarter training. Given that the gpt family of models need more energy to train than Nevada needs to keep the lights on they are very much on the flat part of the logistic growth curve.

ambrozk · on Aug 26, 2023

This isn't true. In fact, it's the reverse of true. If you think a bit more carefully about your argument, you'll realize that you've asserted that the single most revolutionary advance of modern deep networks (i.e. network architectures whose performance scales neatly with their parameter counts & training epochs) automatically portends "the plateau of forward progress."

coldtea · on Aug 25, 2023

>This entire piece is based on one massive, unsupported assertion, which is that LLM progress will cease.

Sounds like a pretty good guesstimation. Well, not cease, just fizzle out.