I give it a 95% chance that an AI winter is coming. Winter in a sense that there...

screye · on March 27, 2023

    We are currently mining just about all the internet data that's available. We are heading towards a limit and the AIs aren't getting much better.

The entire realm of video is under-explored. Think about the amount of content that lives in video. Image + text is already being solved, so video isn't the biggest leap. Embodied learning is underexplored. Constant surveillance is underexplored.

    There's a limit to the processing power that can be used to assemble the LLM's and the more that's used the more it will cost

If the scaling law papers have shown us anything, it is that that the models don't need to get much bigger. More data is enough for now.

    People will guard their data more and will be less willing to share it.

Fair. Though, companies might be able to prisoner's dilemma FOMO their way into everyone's data.

    was defined decades ago

The core ideas around self-attention came about around 2015-2017. The ideas are as new as new ideas get. It's like saying that the ideas for the invention of calculus existed for decades before Newton because we could compute the area & volume of things. Yes, progress is incremental. There are new ideas out there, and we'll inevitably find something new in 20 years that some sad-phd is working on today, all while regretting not working on LLMs themselves.

    interesting times are ahead

Yep

gcr · on March 28, 2023

I think OP was talking more about upending the universal dogma of "statistical optimization to minimize feedforward loss functions via gradient descent." We now have plenty more tricks of the trade (ReLUs were a big one, self-attention was another, making things convolutional was a third, and the smaller ones like dropout/batchnorm/LR schedules). However, in broad strokes, this paradigm remains relatively unchanged since LeCunn+Bengio+Hinton in the 80s. The fundamental idea informally stretches back to Rosenblatt's perceptrons in the 50s, which itself was built on McCulloch and Pitts' modeling of neurons as multiply-accumulate units from 1943. We've spent lots of time questioning these assumptions (e.g. branching off into Boltzmann machines or spiking neurons, which are really fascinating), but to my knowledge, there haven't been promising alternatives yet.

We have "something that appears to work" but is this itself a local minimum? Biology found its own efficient processes that look a bit different after all.

In my opinion, "fundamental breakthroughs" would answer questions like:

- is there an inherent reason that humans learn through a process akin to statistical optimization, or is it a coincidence that feedforward-only gradient descent seems to work?

- Are there other models for "neurons" that go beyond GEMM, like spikey activations? We know from biology that they're nonlinear; how might they be modeled better?

- Are there ways of training effective agents beyond just the reinforcement Q-learning techniques that we've settled on?

- What does it even mean for a model to be able to "reason;" does our current view of recursive models with hidden state really map to that philosophy one-to-one or is there more?

adammarples · on March 28, 2023

But he has a very good point. Transformers are such a quantum leap over anns, it's almost a whole new paradigm, Karpathy calls them a brand new way to do computation

gcr · on March 28, 2023

Right, but transformers are artificial neural networks though, so it's not clear to me what you mean.

The leap from RNNs to the Transformer architecture in 2017ish is similar to vision networks' leap from fully-connected layers to convolutional layers in 1987ish. In both cases, the key is to change the architecture to exploit some implicit structure in the data that wasn't quite used effectively before.

For images, each pixel is strongly related to its close neighbors, so convolution is a natural way to simplify models while capturing that locality.

For language, the opposite is true - almost any part of the text implicitly reference anything that's been said previously, so you need representations that are somewhat position-independent. Approaches that can model that structure naturally work better than an RNN munching tokens one by one.

Each one was certainly a leap forward that unlocked its respective field, but at the end of the day, it's "just" an architecture tweak, you know?

mnky9800n · on March 28, 2023

also most scientific data lives in silos somewhere unknown. i went to a really interesting talk last year where a scientist in uzbekistan saved a closet full of notebooks from old professors at his university. They had meticulously catalogued hydrology data, photos of glaciers, etc. and suddenly there was daily data going back to basically the formation of the soviet union for a region that had never had much data before. the department secretaries were going to burn all this as trash. stuff like this exists everywhere including on the internet.

gcr · on March 28, 2023

EXACTLY! there's so much research and citizen science being done, and even the field of CS is no good at identifying, preserving, and disseminating the good ideas that our lossy proxies miss. (proxies like "university prestige" or "went to grad school" or "was accepted at top-tier conference")

alexvoda · on March 28, 2023

> The current crop of AIs will be very useful but it won't lead to the scary AGI people predict.

I argue that this won't matter. I think Generative AIs will be much scarier than any sci-fi AGI. And this is because while sci-fi AGI apocaliptic scenarios involve the AGI seeking to exterminate humanity usually as a form of revenge (see The Matrix, The Orville and many others) or as part of an optimization, scenarios involving Generative AIs simply involve humanity destroying itself, and these scenarios are already very plausible.

Prepare for the complete destruction of objectivity, the onslaught of spam, grifts and fake news at an unprecedented scale and a flood of security vulnerabilities unlike any other. This will be the annihilation of interpersonal trust and of knowledge.

BizarroLand · on March 28, 2023

I mean, imagine for a moment that Russia or China were to spend a few tens of billions of dollars on a cluster to run an AI.

They have the human capability to ensure that it is state of the art in all current ways, and could theoretically use some processes that aren't disclosed to the public. I don't think anyone would say that China at least doesn't have the brainpower to do something like that.

Then, these government leaders could ask the AI to come up with simulations on how to conquer the world.

The AI would probably say something like, "America has to fall, since it's military might is the strongest resistance to your ability to dominate the globe".

How would you do that?

Ensure weak or compromised politicians get elected, which will make the American people lose confidence in their government.

Spend resources to ensure that the financial stability of the country is negatively affected. Do this by disrupting the flow of goods into the country, viciously competing with any product based company of any merit, and purchasing housing in the major financial centers of the country.

This serves three purposes:

1: Businesses will become tight-fisted with their spending wherever they can as they won't always know if they will be able to purchase the goods they need when they need them. The easiest way to reduce spend in the short term is to underpay and overwork your employees by firing the more expensive employees and shifting their workloads to others, so that will be the first thing that happens.

2: The smarter and capable people will attempt to escape the rat race by relying on their inventiveness. They will suffer in drudgery for years while they perfect a product that someone needs and is willing to pay for, and then, since manufacturing in America is ridiculously expensive thanks to China's price competition, they will outsource the manufacturing of their product to China. This will happen frequently, and they will be taking all of the risk for themselves.

Once they have a successfully tested product, China will clone the product and flood the market with inferior quality but massively cheaper clones, ensuring that the original business never achieves the success it could have and then has to cut costs, putting the employees of that company into the same conditions that the founder started the company to escape.

3: Housing will become very expensive, so the citizens will not be able to afford the same quality of housing that their parents did. This will further increase their unhappiness and destabilize the power of America.

Once all of that is done, then you let the kettle simmer for a while. Spend the money, time, and energy to keep the pressure up. Use the time to buy or subvert more politicians.

When this is done, the country will be weakened by its internal stresses and so charged that all it will need is a single strong event to vent all of that pressure on. A match to a powder keg will blow the whole thing up and America will collapse from civil war.

While that civil war is going on and the whole world is watching, launch your attacks. Take over your neighboring countries. Threaten nuclear annihilation on anyone that stands against you as within a few years you quadruple the size and power of your country.

By the time America pulls it's head out of the fog and realizes what is going on, it will be too late.

bstockton · on March 27, 2023

>4) The basic theory that got us to the current AI crop was defined decades ago and no new workable theories have been put forth that will move us closer to an AGI.

I guess it really depends on what you mean by "basic theory" but my view is that the framework that got us to our current crop of models (vision now too, not just LLMs) is much more recent, namely transformers circa 2017. If you're talking about artificial neural networks, in general, maybe. ANNs are really just another framework for a probabilistic model that is numerically optimized (albeit inspired by biological processes) so I don't know where to draw the line for what defines the basic theory...I hope you don't mean backprop either as the chain rule is pretty old too.

galaxytachyon · on March 27, 2023

Some of these are good arguments and can turn out to be true. But they are equally likely to be false.

1/ We humans are still generating data. To live is to generate data and while it is dystopian to think about how these data can be harvested, it is still possible to get more information to feed the next gen AIs. And remember that right now, we have only used text and images. Videos, audio, sensory inputs (touch, smell, taste, etc), and even human's brainwaves are still available as more training data. We are nowhere close to running out of stuff to teach AIs yet.

2/ Fine tuning and optimizing training has shown tremendous effects in reducing the size of these LLMs. We already have LLMs running on laptops and mobile phones! With reasonable performance and in only half a year from the big release. There is lots of room to grow here.

3/ My nieces laughed when I told them TikTok is too invasive. Most people outside of HN does not care about data privacy as much as you might think.

4/Sometimes it only takes 1 big breakthrough to open the floodgate. Transistors was that point for computers and we are still developing new techs based on that 100 years old invention. We don't know how much potential there is in these decades old AI invention especially when many of them was only put into proper practice the last decade. We didn't learn the big mistake until very recently after all.

Just some ways things can go differently. It is the future, we can't really predict it. Maybe an AI can...

chongli · on March 28, 2023

5) AI-generated content will become more and more common over time. This will inevitably end up in the training sets of future AI. By recycling its own training material, AI will get less meaningful results over time.

jbenjoseph · on March 28, 2023

>By recycling its own training material, AI will get less meaningful results over time.

I read this a lot, and it sounds intuitive on the surface. But I don't understand how it's justifiable. For example, all that exists in the Universe is a result of the application of very simple rules. It would make sense that information is not what is important towards intelligence and complexity, but computation. It ought to be possible to create a superintelligence with a few bytes of training data, given enough compute.

yencabulator · on March 29, 2023

Current approaches to ML are largely in the camp of throw enough "found data" at it and hope for the best. The exceptions are mostly games, like AlphaGo, where the ML can play against itself.

For generative AI, the hallucinations will poison the well. And they're not random, so same/similar hallucinations will pop up all over and reinforce each other.

chongli · on March 28, 2023

Computers are electrical automatons that process binary signals and little else. Without a human at the end of the chain to interpret these signals (whatever representation they’ve been given), there is no meaning whatsoever.

So an AI that simply recycles its own input ad infinitum might produce something but it won’t be meaningful to us humans. Hence, it’s unlikely to be useful apart from the novelty of it.

plokiju · on March 29, 2023

What is it about humans that makes only them capable of meaning? And how would you define meaning here?

elil17 · on March 28, 2023

>We are currently mining just about all the internet data that's available.

If AI starts to become economically important, there will be an incentive to create more data. If I'm OpenAI, why not pay a company to put microphones around their office, transcribing everything everyone says for training data. Buy troves of corporate (or personal) emails. If there are one hundred million office workers in the world, and you can convince/pay 10% of companies to let you spy on them, and each says or writes 5,000 words per day (these are all low estimates, in my opinion), you'd be getting ten billion more words of data per day. You'd double the size of the Chincilla training data in about a month.

If you incorporate video for a truly multimodal model, now you're talking about not only the entirety of Youtube, but potentially also CCTV footage. If you could get the data from every surveillance camera in the world, you'd generate one YouTube worth of video every ten minutes (based on 150 million hours of video on YouTube).

Workaccount2 · on March 28, 2023

Imagine Open.AI purchasing troves of corporate data from asset liquidations following bankruptcies, and the AI inadvertently becoming incredibly skilled at bankrupting corporations.

margorczynski · on March 28, 2023

Then you just the opposite of what it says, still useful.

Uupis · on March 28, 2023

> 3) People will guard their data more and will be less willing to share it.

I recently came across this myself when writing a reply on another forum. A feeling of reluctance to attempt to contribute something maybe-useful in a public, 'minable' space.

Almost feels like this has tarnished the 'magic' of Internet, of sharing information and knowledge. I'm not saying this is the appropriate reaction, but it is what it is.

HarHarVeryFunny · on March 28, 2023

> I recently came across this myself when writing a reply on another forum. A feeling of reluctance to attempt to contribute something maybe-useful in a public, 'minable' space

I used to be like this, but the reality is that ideas are a dime a dozen. Any idea that you or I (or anyone else) have had, have likely also been thought by thousands (or maybe add a few more 0's) of people.

What makes things happen is people not ideas. Talk to any VC or people involved in startups - it's all about the team, not the ideas they are working on.

atleastoptimal · on March 27, 2023

1. I'd wager AI models will begin to learn via interacting with the world rather than just reading lots of text. That will reduce the need for a huge corpus of training data. 2. More efficient methods of training and running LLMs are emerging at an exponential rate 3. There's already enough data to train an AGI. 4. Transformer architecture isn't that old

edanm · on March 28, 2023

> I give it a 95% chance that an AI winter is coming. Winter in a sense that there won't be any new ways to move forward towards AGI.

Just note that this is not at all what most people mean when they say "AI Winter".

It is usually defined as less funding and general interest in investing in AI. Not "we don't know how to move forward towards AGI".

samvher · on March 27, 2023

1) Is everything really being mined already? My sense is that another round of GPT-like training on YouTube, EdX and Coursera data + some other large video archives (BBC and the like) could still make quite a bit of difference. Text and images independently is one thing, having them together in context might be something else.

2) The available power seems to be growing pretty rapidly and dropping in prices. I think there are still quite some gains to be had from architectural optimizations (both in hardware and in models).

4) They were defined decades ago but I think they did actually seem to move us closer to AGI only recently.

You might be right, and there are definitely interesting times ahead! But I kind of doubt that we will have decades to sort out what we have and figure out its impact on society (which is a bit scary).

rf15 · on March 28, 2023

I agree with this perspective. But I also see the current crop of AI as a "brute force" method towards AGI, which could still be combined with a symbolic system to make it significantly more efficient at basic reasoning. I sadly don't know why certain neural-symbolic aproaches haven't been explored further despite their incredibly promising results in the past.

HarHarVeryFunny · on March 28, 2023

1) The AI's aren't getting much better since when? Last week?

You appear to be assuming that the only way to make current LLM-based AI's better is by feeding them more and more data, but that's simply not true. These current (really first generation, not withstanding GPT's versioning) AI's are very data/parameter inefficient, and do not differentiate between data used to develop reasoning and data that is just pure knowledge. The short-term path forward is more judicious training set design to use less data, and require less model parameters, so that the model itself just learns to reason, and doesn't need to do double-duty as a database repository of facts as well. Retrieval-based plugins and similar approaches can be used to give access to external knowledge.

2) There will be many different sizes and capabilities of AI. They don't all need to be superhuman.

We are entering the AI age and just as the computer age didn't result in everyone needing a mainframe or datacenter in their garage (although there are uses for them), the AI age will not result in everyone running the world's most powerful super-human AI on their smartphone. There will be a range of AI's for different uses cases.

Current models are also very inefficient. We are essentially day 1 into the NN/LLM AI era. Think of ChatGPT as the ENIAC of the age. DeepMind's Chinchilla scaling laws have already proven how inefficient these systems are, with a suitably trained 70B model equaling the performance of a 175B one. There is tons more work to be done on efficiency and smarter model design.

3). Perhaps, but per point 1) we're not running out of data anyway, and there will anyways always be people/companies happy to sell data for a price, even if it is not for free.

Future AI's will learn the same way we do by experimentation and curiosity/surprise, and will not need to be spoon fed a training set in the same way as today's systems.

4) No idea what you're talking about here. NN optimization is hardly spent as a technique - that'd be like saying "we've reached the limit of what we can to with software development - we need a new idea".

LLMs as a path to AI, or building block of AI, are of course a very new idea - a few years old at best. A few decades ago no-one was even working on neural nets (current DeepLearning revolution basically started with AlexNet in 2012) .. it was all GOFAI symbolic systems - failed approaches like SOAR and CYC.

If you compare the Transformer architecture to the human cortex, and our brain's overall cognitive architecture, some of the missing pieces are extremely obvious. We're not at any sort of impasse as to how to move things forward.