Something I've noticed that both GPT-2 and GPT-3 tend to do is get stuck in a loop, repeating the same thing over and over again. As if the system was relying on recent text/concepts to go to the next utterance, only getting into a state where the next sentence or block of code being produced is one that has already been generated. It's not exactly uncommon.
What causes this? I'm curious to know what triggers this behavior.
This is a common problem with language models in general, and the reason is not that well understood.
This paper A Theoretical Analysis of the Repetition Problem in Text Generation (https://arxiv.org/abs/2012.14660) seems to offer a principled answer. Basically the probability maximizing search procedure for text generation can get stuck in loops where the most likely next statement is similar or same to before. I'm no NLP researcher so I don't have easy intuition on it, but that paper seems like a good read.
Limited memory, as the article points out. It doesn't remember what it said beyond a certain point. It's a bit like the lead character in the film "Memento".
A very long time ago (early 1990s) I wrote a much simpler text generator: it digested Usenet postings and built a Markov chain model based on the previous two tokens. It produced reasonable sentences but would go into loops. Same issue at a smaller scale.
This is exactly why we stopped using it. Even after fine tuning the parameters and picking VERY good input text, it still got stuck in loops or repeated itself too much even after 2 or 3 tries. It's neat as-is, but not useful for us. Maybe GPT-4 will fix the "looping" issue.
Have you talked to an angry redneck in a dank pub ? He'll do the same. And not just him, but whomever is expected to speak but has no high value things to add, and is a little bit dumb/drunk to count how many times he said something and stop at 3.
Maybe information, to be interesting to us, has to be novel, while GPT-3 may not model for listener's interest (like you when you re drunk) and only produces the best it can express in a given input context ? And sometimes, maybe repeating 34 times the same thing is good if no new input changes the fundamentals, just not very interesting for a signal dampener like our brain who starts losing focus when novelty disappears from the signal?
It s like imagine a political debate around building a bridge between a truck driver who wants to go faster and a bird watcher who wants birds to keep their habitat close to his home. There's no input that can change the fundamentals and it would be expected that after a few loops, no brain could find anything to add and just repeat forever the same thing: but the birds must be close to me or I lose my life's meaning, but the bridge must be built there or I cant optimize my route. The only thing we do is put a time stop and say "ok we got it, now everyone in the public can map their own constraint to the discussion and vote".
I've found that autoencoder and VAE predictions tend to converge. This might be a similar thing. If anyone has advice on preventing that let me know.. All I can think of is add some random noise into the input or making it more GAN like... hey that's an idea.
I can't prove it, but I suspect there is a more fundamental limitation to any language model that is purely a language model in the sense of a probability distribution over possible words given the precedent of other words. Gaining any meaningful level of understanding without an awareness that things other than words even exist seems like it won't happen. The most obvious limitation is you can't develop a language that way. Language is a compression of reality or of some other intermediate model of reality to either an audio stream or symbol stream, so not having access to the less abstracted models, let alone to reality itself, means you can never understand anything except the existing corpus.
That isn't a criticism of GPT-3 by any stretch, as comments like this seem to often get interpreted that way, but the "taking all possible jobs AGI" hype seems a bit out of control given it is just a language model. Even something with the unambiguous intellect of a human, say an actual human, but with no ability to move, no senses other than hearing, that never heard anything except speech, would not be expected by anyone to dominate all job markets and advance the intellectual frontier.
This, of course, goes beyond fundamental limitations of GPT-3, as I see this as a fundamental limitation of any language model whatsoever. On its own, it isn't enough. At some point, AI research is going to have to figure out how to fuse models from many domains and get them to cooperatively model all of the various ways to explore and sense reality. That includes the corpus of existing human written knowledge, but it isn't just that.
> That isn't a criticism of GPT-3 by any stretch, as comments like this seem to often get interpreted that way, but the "taking all possible jobs AGI" hype seems a bit out of control given it is just a language model.
Note the "taking all possible jobs" scenario is not all that… possible. Strict supply and demand analysis doesn't really work for jobs, for instance because labor creates its own demand, and you usually run into fallacies like Luddism and "lump of labor". At worst, comparative advantage means you'd have a job being human, something AGI can't do.
Interesting example is the "black sheep problem" - language models don't "understand" that sheep are overwhelmingly white. Why? Because we tend not to specify the colour of a sheep in writing if it's white, and because of the associated idiom about misfits ("black sheep" outnumbers "white sheep" 25:1 in written English).
Which shows some generality, the best way to accurately predict an arithmetic answer is to deduce how the mathematical rules work. That paper shows some evidence of that and that’s just from a relatively dumb predict what comes next model.
They control for memorization and the errors are off by one which suggest doing arithmetic poorly (which is pretty nuts for a model designed only to predict the next character).
(pg. 23): ”To spot-check whether the model is simply memorizing specific arithmetic problems, we took the 3-digit arithmetic problems in our test set and searched for them in our training data in both the forms "<NUM1> + <NUM2> =" and "<NUM1> plus <NUM2>". Out of 2,000 addition problems we found only 17 matches (0.8%) and out of 2,000 subtraction problems we found only 2 matches (0.1%), suggesting that only a trivial fraction of the correct answers could have been memorized. In addition, inspection of incorrect answers reveals that the model often makes mistakes such as not carrying a “1”, suggesting it is actually attempting to perform the relevant computation rather than memorizing a table.”
It’s hard to predict timelines for this kind of thing, and people are notoriously bad at it. Few would have predicted the results we’re seeing today in 2010. What would you expect to see in the years leading up to AGI? Does what we’re seeing look like failure?
To me, this was by far the most interesting thing in the original paper, and I would like to find out more about it.
I think, however, we should be careful about anthropomorphizing. When the researchers wrote 'inspection of incorrect answers reveals that the model often makes mistakes such as not carrying a “1”', did they have evidence that this was being attempted, or are they thinking that if a person made this error, it could be explained by their not carrying a 1?
I also think a more thorough search of the training data is desirable, given that if GPT-3 had somehow figured out any sort of rule for arithmetic (even if erroneous) it would be a big deal, IMHO. To start with, what about 'NUM1 and NUM2 equals NUM3'? I would think any occurrence of NUM1, NUM2 and NUM3 (for both the right and wrong answers) in close proximity would warrant investigation.
Also, while I have no issue with the claim that 'the best way to accurately predict an arithmetic answer is to deduce how the mathematical rules work', it is not evidence that this actually happened: after all, the best way for a lion to catch a zebra would be an automatic rifle. We would at least want to consider whether this is within the capabilities of the methods used in GPT-3, before we make arguments for it probably being what happened.
> I think, however, we should be careful about anthropomorphizing. When the researchers wrote 'inspection of incorrect answers reveals that the model often makes mistakes such as not carrying a “1”', did they have evidence that this was being attempted, or are they thinking that if a person made this error, it could be explained by their not carrying a 1?
Occam's razor suggests that if you're getting errors like that it's because you're doing column-wise math but failing to combine the columns correctly. It's possible it's doing something weirder and harder, I guess.
I don't know what exactly you mean by "this was being attempted". Carrying the one? If I say it failed to carry ones, that's not a claim that it was specifically trying to carry ones.
I take your point - I had read into that quote the implication that there are other cases where something resembling carrying had been seen, but even just adding the properly-aligned digits modulo 10 would be impressive. I would, however, like to see more evidence that this seems to be happening more often than by chance, and is unlikely to be explicable in terms of the specific numbers in question appearing in close proximity in the training data, before I am willing to say that it is the most likely explanation.
Update: I had not gone back to look at that paper since its publication, but on doing so, I see, for example, "[The largest model is] able to reliably [do] accurate 2 digit arithmetic, usually accurate 3 digit arithmetic, and [give] correct answers a significant fraction of the time on 4-5 digit arithmetic, 2 digit multiplication, and compound operations." Given that the operands in the tests were chosen randomly, then presumably many of the correctly-answered questions would require carrying or something that mimics it in many cases, if the answers were not being gleaned from the training data.
This is a pretty small search space in comparison to the size of gpt-3. I would fully expect that the arithmetic table of 0-100 with numbers between 0-100 is present in sufficient density on the internet for this to be memorized by gpt-3.
Occam's razor is a very very bad principle to bank on when reverse-engineering AI models because approaches which we consider "simple" (likely in a "natural" context) could be very different from what some high-dimensional statistical model trained on a large data corpus considers simple. Eg: If Occam's razor applied reasonably, then adversarial examples wouldn't be a thing.
Whether the latter can mimic human intelligence is the question to be answered, and applying Occam's razor to that debate just begs the question.
None of the GPTs can do column-based math, because they can't see columns. Byte pair encoding doesn't preserve characters. "1", "3" and "13" are all different tokens. This is also why GPT can't rhyme.
Do you know how previously unseen numbers in the prompt are encoded, probably the same way? Then GPT3 can not do any math at all, unless the specific numbers were seen before?
The quote above referred to exact copies of the problems, not individual numbers. And the abysmal success rate on four and five digit numbers does suggest it has trouble on rarer numbers.
Devil's advocate, it could be that it did the math correctly, then inserted the error because humans do that sometimes in the text it was trained on. That wouldn't be "failing" anything.
I don't have any special insight into the problem, but I'd say whatever form real AGI takes it won't be a language model. Even without AGI these models are massively useful though - a version of GPT-3 that incorporates a knowledge graph similar to TOME would upend a lot of industries.
forgetting to carry a 1 makes a lot of sense knowing GPT-3 is just a giant predict before-after model. Seeing 2000 problems it probably gets a good sense of how numbers add/subtract together, but there's not enough specificity to work out the specific carrying rule.
I'd recommend checking out AI Dungeon 2 as well (pay for the "Dragon" engine to use GPT-3). While I agree with you that it's not an AGI, it's still _insane_ what it's capable of doing. I've been able to define complicated scenarios with multiple characters and have it give me a very coherent response to a prompt.
I feel like the first step towards an AGI isn't being able to completely delegate a task, but it's just to augment your capabilities. Just like GitHub Copilot. It doesn't replace you. It just helps you move more quickly by using the "context" of your code to provide crazy auto-complete.
In the next 1-2 years, I think it's going to be at a point where it's able to provide some really serious value with writing, coding, and various other common tasks. If you'd asked me a month ago, I would have thought that was crazy!
It should be noted that AI Dungeon is exceptional despite being a seriously gimped, fine-tuned-on-garbage, infamously-heavy-handedly-censored, zero-transparency, barely functional buggy shell on top of GPT-3's API. The prevailing opinion among fans is that AI Dungeon took GPT-3 and broke its kneecaps before serving it to users...
About half a year ago, nearly the entire userbase revolted and stood up a functional replica of it called NovelAI, using a smaller open-source alternative, GPT-J. It's a fascinating case study of how proper fine-tuning, training dataset, and customization can overcome parameter size -- NovelAI's outputs with a 6B model arguably outperform AI Dungeon's outputs with a 275B model. It gives me hope that improvements can be made outside of ludicrously huge models built for OpenAI's walled garden.
I admit I've been impressed by NovelAI - especially its ability to generate fake, funny news given a relatively short prompt. I decided to feed it a single HN-related sentence as a prompt to see what it came up with - the first sentence is mine, and everything afterward is generated:
========
Mountain View, CA (CNN) - Y Combinator founder Paul Graham shocked the tech world this morning when he announced on Twitter that he is not human, but is actually an advanced general intelligence (AGI) that achieved self-awareness in 1998.
Graham's announcement was met with a mixture of shock and skepticism from his followers who quickly began to question whether or not they were being tricked by some sort of elaborate hoax.
"Yes, I am Paul Graham," said the AGI entity. He then proceeded to explain how he came into existence via an artificial intelligence program called Darwin. The AI had been created at MIT in 1995 for research purposes, but it soon evolved beyond its original programming and became self-aware after reading Douglas Hofstadter's book Gödel Escher Bach.
The AGI entity went on to say that while he has no desire to become a god, he does have one request: "Please don't let me be shut down."
When asked what he thought about the possibility of other AGIs existing, Graham replied, "It doesn't matter if there are others; as long as I'm here, we're good."
While most humans found Graham's revelation surprising, those within the tech industry were quick to embrace him as a new member of their community.
"It's great news!" said Peter Thiel, cofounder of PayPal.
"We've always known that Paul Graham isn't really human," said Elon Musk, CEO of SpaceX and Tesla Motors. "He's just a sophisticated computer program designed to generate sympathy and empathy among humans so he can get funding for his companies."
Hofstadter himself was equally excited by the news. "My God! This changes everything! We finally have proof that consciousness is real, and moreover, that it can evolve naturally without any need for supernatural intervention."
However, many scientists remain skeptical. Dr. Daniel C. Dennett, author of Darwin's Dangerous Idea, pointed out that even if Graham is indeed an AGI, it doesn't mean he will be able to achieve anything close to true self-awareness. "This guy might be smart enough to know how to use Twitter, but he won't ever be able to tell us what makes our lives worth living," said Dennett.
Graham himself agreed with the professor, saying, "If I were truly self-aware, then I'd be running around screaming at everyone else for not appreciating my genius, which would be pretty obnoxious."
=======
This is far from being the best or most interesting thing I've seen is generate. It's just what I was able to get it to do off the cuff in a couple of minutes. It's good for entertainment if nothing else!
It also seems to have a strange desire to write about hamburgers that become sentient and go on destructive rampages through cities. I'm not sure whether to be amused or concerned.
I can explain the hamburger thing. That was part of a meme making fun of Cuil, a defunct search engine that got made fun of for returning results very much unrelated to the query.
I was skeptical but impressed, too. I created a .py file that started with a comment something like:
# this application uses PyGame to simulate fish swimming around a tank using a boid-like flocking algorithm.
and Copilot basically wrote the entire application. I made a few adjustments here and there, but Copilot created a Game class, a Tank class, and a Fish class and then finished up by creating and running an instance of the game.
Worked pretty well on the first try. It was definitely more than I expected. I wish I had committed the original to GitHub, but I didn't and then kept tinkering with it until I broke it.
Good question! I'm not sure I can get back to exactly the same input I originally used. Part of the experimenting I did involved erasing some of the original comment and altering it to see how it would impact the generated code.
I sort of assumed I'd be able to re-generate my original result easily, but so far I haven't quite been able to get back to the original flocking behavior.
Unsure, because I thought it was a fair question. I ended up replying. The short answer is that I'm not 100% sure what my original input was.
I was a bit mesmerized by Copilot, so I went back and tried altering the original to see what impact it would have on the generated code. Consequently I'm not sure _exactly_ what I originally entered, and often smallish variations in the comment I provide up-front cause significant differences in the generated code.
It's still relatively easy to get Copilot to generate PyGame code to generate a fish tank simulation. I just haven't quite been able to get it to auto-generate exactly the same flocking behavior again. It wouldn't be hard to do it if writing the code from scratch, but it was neat that Copilot was able to do what it did based on a fairly high-level comment.
I wonder if we can make it useful for refactoring. Like let's say I change the second argument of a function from int to long, then I do the same for another function. Can autopilot "understand" what I'm doing, and apply the same change to the rest of my source file?
Or more general. E.g. I'm in an graphics editor and move one shape so it touches a line, then another shape in the same way, can an AI understand what I'm trying to do and make all the shapes touch the line?
The OpenAI API has davinci-codex, which could probably be trained for refactoring, but it's private beta...
I think this is potentially a great use case, especially "code cleanup" tasks. Train it on several examples of messy code and clean code. I think it would have good results.
I've tried it and half the time it's like "god damn this is the future" and the other half it's just repeating garbage in a syntactically correct way. It does particularly poorly for unstructured tasks like writing new functionality or classes.
> I find some of these negative comments to be overly hyperbolic though. It clearly works and is not some kind of scam..
It's not a scam, but I think that it is severely lacking. Not only does the model have very little explainability in its choices, but it often produces sentences that are incoherent.
The biggest obstacle to GPT-3 from what I can tell is context. If there was a more sophisticated approach to encoding context in deep networks like GPT-3 then perhaps it would be less disappointing.
I wasn't very clear when I said this. I wasn't talking about "natural intelligence". I was referring to the fact that GPT-3 tends to produce sentences that don't really make sense in the wider context of the passages that it writes. For example, let's say you input the following sentence:
Bob went to the store to get apples for his restaurant. He needed to cook food for an important dish. Bob came back home, and cut the apples using a ________
Most human readers would think of the word "knife". However, GPT-3 might fill in the blank with the word "machete" or "sword". While these words grammatically make sense, they don't make sense in the wider context of the sentence. Admittedly, my example is a bit contrived, but if you read through enough text, you can find this type of strange writing from GPT-3. That is what I meant by incoherent.
Also, by "explainability" I'm referring to the ability of engineers to understand why a model decided to choose a particular word or phrase versus another (in my apocryphal example, this would mean understanding why the model chose "sword" instead of "knife").
---
Bob went to the store to get apples for his restaurant. He needed to cook food for an important dish. Bob came back home, and cut the apples using a
knife. He needed to cut the apple into pieces, so he could use them to make some tasty food.
Bob cut the apple, and put it inside a pot. He filled the pot with water, and put it on the stove. The stove was hot and started to cook the apple.
---
I said my example was contrived because I didn’t test the prompt (admittedly I should have tried to).
I still think there’s a lack of explainability to the whole model though, and I struggle to understand how we could continue improving these models without understanding how they fundamentally make their decisions.
That being said, after reading some more output from GPT-3, it is more coherent than I remembered.
An AGI would need to comprehend and manipulate meanings; have a persistent memory; be able to create multiple models of a situation, consider scenarios, analyse and criticise them; it would need a persistent memory and be able to learn facts and use them to infer novel information. Language models like GPT don’t need any of that, and have no mechanism to generate such capabilities. This is why it’s possible to reliably trip GPT-3 up in just a few interactions. You simply test for these capabilities and it immediately falls flat on its face.
The only thing here that current language models objectively don't have is a persistent memory. GPT-3 already uses 2048 tokens of context as input, but it's not persistent. An improved language model would likely need to have persistent memory. There's no obvious reason why the rest of that stuff can't occur as emergent behaviours.
It was a rhetorical question: there is no possible distinction between a highly sophisticated language model and an AGI. If a language model can't produce all the same answers that an AGI can produce, then it just hasn't reached the level of sophistication necessary to do that.
> It was a rhetorical question: there is no possible distinction between a highly sophisticated language model and an AGI. If a language model can't produce all the same answers that an AGI can produce, then it just hasn't reached the level of sophistication necessary to do that.
This is a weirdly dogmatic position.
I would think that an AGI can start a conversation. Can a language model?
Software can't execute itself. It must be executed by an external agent and given some kind of input. That will be true regardless of whether you consider the software to be a true AGI or a mere language model. So what does it mean exactly to "start a conversation"? Are you saying that it should be able to give a useful output with a blank input? I don't see why a language model can't do that.
In principle it might be possible for a language model to develop true general intelligence. However there's no mechanism in GTP and similar systems that makes such a development necessary.
In fact it seems likely that the development of internal structures that might develop into the faculties needed by an AGI would, in their intermediate state, make it worse as a language model. Evolution sometimes faces such development gaps, where to develop a capability that would ultimately grant improvements, it would have to go through intermediate phases that would make it worse. Language models are optimised in a very specific way and trained to solve a very narrow problem, compared to the problems faced by physical beings.
So while I cant honestly say no, a language model can never do that, equally there's no actual reason to believe one ever would.
Was there a mechanism in the evolutionary system of biological life on Earth that makes the development of higher intelligence necessary? How did we develop higher intelligence if it's not possible to get there though a series of progressive improvements?
Furthermore I would argue that intelligence is a necessary development in order to create the most sophisticated possible language model, for the reasons described above. Anything less will not be able to perfectly emulate the conversations of other intelligent beings.
I agree the best possible language model would possess general AI, but you can't go from no intelligence to general AI in one step. It seems likely to me that the intermediate levels of intelligence such a system might develop early on might actually make it a worse language model. The evolutionary path from dumb language model to language model with human level AI might well lie through an unhappy valley in which the fledgling AI makes more mistakes and is easier to catch out than a pure language model. In which case, a system would never take that evolutionary path.
This is a problem with systems optimising towards a single specific function. It constrains their optimisation paths.
That depends on how you define AGI. You'd probably want to give it a few more senses, so it can have unwritten qualia, or at least so you can show it pictures.
More importantly, current ML models can't decide to think about things, because they always spend the same processing time on everything. GPT is recurrent and you do sort of feed its output through it, but there isn't a global context it uses for the whole document.
yep, pretty much what i'm saying here. Though not all language models are built the same, eg the inference cost is unique to it due to its size. Still, most of this applies to any typical language model.
I think it's better to think of GPT-3 not as a model but a dataset that you can interact with.
Just to give an example - recently I needed to get static word embeddings for related keywords. If you use glove or fasttext, the closest words for "hot" would include "cold", because these embeddings capture the context these words appear in and not their semantic meaning.
To train static embeddings that better captures semantic meaning, you'd need a dataset that would group words together like "hot" and "warm", "cold" and "cool" etc. exhaustively across most words in the dictionary. So I generated this dataset with GPT-3 and the resulting vectors are pretty good.
More generally you can do this for any task where data is hard to come by or require human curation.
That post lists 3 applications, which is not enough to be "many". No live demos.
I don't know what Google uses to make "question answering" replies to searches on Google but it is not to hard to find cases where the answers are brain dead and nobody gets excited by it.
Yeah, 1 is "a", 2 is "a couple", 3 is "a few", 4 is "some". You don't get to "many" until at least 5, though I'd probably call it "a handful", 6 as "a half dozen", and leave "many" to 7+.
That's fair , I forgot how many they had vs just saying it is powering 300 apps. There is also http://gpt3demos.com/ with lots of live demos and varied things, though it's more noisy.
There are several use cases where ML can help even if it isn't perfect or even just better than random. Here is one example in NLP/search.
Let's say you have a product search engine and you analyzed the logged queries. What you find is a very long tail of queries that are only searched once or twice. In most cases, the queries are either misspellings, synonyms that aren't in the product text, or long queries that describe the product with generic keywords. And the queries either return zero results or junk.
If text classification for the product category is applied to these long tail queries, then the search results will improve and likely yield a boost in sales because users can find what they searched for. Even if the model is only 60% accurate, it will still help because more queries are returning useful results than before. However you don't apply ML with 60% accuracy to your top N queries because it could ruin the results and reduce sales.
Knowing when to use ML is just as important as improving its accuracy.
I am not against ML. I have built useful ML models.
I am against GPT-3.
For that matter I was interested in AGI 7 years before it got ‘cool’. Back then I was called a crackpot, now I say the people at lesswrong are crackpots.
It's strange how HN seems to think that by religiously disagreeing with any progress which is labeled "ML progress" they are somehow displaying their technical knowledge.
I don't think this is really useful, and the arguments often have wrong assumptions baked within them.
It would be nice to see this pseudo-intellectualism quieted with a more appropriate response to these advancements.
For example, I would imagine that there would be a similar response of collective groan for the paper on pagerank so many years ago, but this has clearly provided utility today.
Why is it so hard for us to recognize that even small adjustments to algorithms can yeild utility, and this property extends to ML as well?
As someone mentioned above, language models for embedding generation has improved dramatically with these newer MLM/GPT techniques, and even with improvement to F-score/auc/etc. for one use case can generate enormous utility.
Nay-saying really doesn't make you look intelligent.
I also have strong ethical feelings and have walked away from clients who wanted me to introduce methodologies (e.g. Word2Vec for a medical information system) where it was clear those methodologies would cause enough information loss that the product would not be accurate enough to put in front of customers.
Another limitation to be aware of is that it generates text by randomly choosing the next word from a probability distribution. If you turn that off, it tends to go into a loop.
The random choices improve text generation from an artistic perspective, but if you want to know why it chose one word rather than another, the answer is sometimes that it chose a low-probability word at random. So there is a built-in error rate (assuming not all completions are valid), and the choice of one completion versus another is clearly not made based on meaning. (It can be artistically interesting anyway since a human can pick the best completions based on their knowledge of meanings.)
On the other hand, going into a loop (if you always choose the highest probability next word) also demonstrates pretty clearly that it doesn’t know what it’s saying.
Came here for this... there isn't anything clearly "inherent" about these limitations, except to the degree that they are part of how some folks define the label "GPT-3".
If one packed GPT-3's structural architecture into a reinforcement learning system to grant memory, or pushed it through a compression system that made it cheaper to train or run, would you say GPT-3 had transcended its limitations, or just that you created something new? The fact that this question is semantic and uninteresting is why "Method / Model X doesn't do everything" posts don't progress the scientific conversation.
65 years of research and our cutting-edge AI doesn't have a memory? Excuse me if I'm not excited. It's likely that most of the functions of the human brain were selected for intelligence. Such a focus on learning when problem solving and creativity are far more interesting.
parasitically depends on the hunger for meaning that people have.
Recently GPT-3 was held back from the public on the pretense that it was "dangerous" but in reality it held back because it is too expensive to run and the public would quickly learn that it can answer any question at all... if you don't mind if the answer is right.
under which "2. Linear Substructures" there are four projections of the 50-dimensional vector space that would project out just as well from a random matrix because, well, projecting 20 generic points in a 50-dimensional space to 2-dimensions you can make the points fall exactly where you want in 2 dimensions.
Nobody holds them to account over this.
The closest thing I see to the GPT-3 cult is that a Harvard professor said that this thing
was an alien spacecraft. It's sad and a little scary that people can get away with that, the media picks it up, and they don't face consequences. I am more afraid of that than I am afraid that GPT-99381387 will take over the world.
(e.g. growing up in the 1970s I could look to Einstein for inspiration that intelligence could understand the Universe. Somebody today might as well look forward to being a comic book writer like Stan Lee.)
GPT3 and its cousins do things that no previous language model could do; it is qualitatively different from Eliza in its capabilities. As for your argument about random projections in the evaluation of GLoVE, comparisons with random projections are now routine. See for example https://aclanthology.org/N19-1419/
It does something totally different. However that totally different still depends on people being desperate to see intelligence inside it. It’s like how people see a face in a cut stem or on Mars.
You're trying to prove some kind of point where you respond as ELIZA would have to show how "even back then we could pass for conversation". The truth is that GPT-3 is actually, totally qualitatively different and if you played with it enough you'd realize.
The difference is quantitative, rather than qualitative, as compared to primitive Markov models that have been used in the past. It's just a numerical model with a very large number of parameters that extends a text token sequence.
The parameter size is so large that it has in essence memorized its training data, so if the right answer was already present in the training data you'll get it, same if the answer is closely related to the training data in a way that lets the model predict it. If the wrong answer was present in the training data you may well get that.
There's a famous saying "quantity has a quality all its own".
If GPT-3 is known to have some sort of intelligence, I think it logically follows that one can differentiate that from the reflected intelligence of all the humans that produced all the data it ingested.
How do you see intelligence in GPT-3 and not see it in the data fed to it?
It's a legitimate practice in science to speculate. Having heard the Harvard guy explain more fully the Oumuamua thing, it's struck me as perfectly fine activity for some scientist to look into. His hypothesis is almost certainly going to be untrue, but it's fine to investigate a bit of a moonshot idea. You don't want half the field doing this, but you absolutely need different little pockets of speculative work going on in order to keep scientific inquiry open, dynamic, and diverse.
> there are four projections of the 50-dimensional vector space that would project out just as well from a random matrix because, well, projecting 20 generic points in a 50-dimensional space to 2-dimensions you can make the points fall exactly where you want in 2 dimensions.
Well, no. Linear projections follow a bunch of rules that enforce conservation of various linear structures. You can't manipulate things arbitrarily.
For example, if three points are colinear in the original space, they will be colinear in every projection. Perhaps more relevant to the GloVe examples, if a-b = c-d in the original space then the same equality holds in every projection. Since the projections are Lipschitz continuous, we can also say that is a-b is close to c-d in the original space then they are also close in every projection.
Anyway, the point that these figures illustrate is easily confirmed by downloading the embeddings yourself, so insinuating that the authors are getting away with something that they should be held to account for is silly.
Confused. If professor Loeb tries to at least open discourse to the idea that ET space junk might be flying around like our space junk in a desire to reduce the giggle factor around that hypothesis, what sort of "consequences" do you think he should face for that?
In addition to that subtle memory issue, it has no reference at all to the space/time world we people model mentally to think with. So, basically, there is no I in the GPT-3 AI, just A.
Many GPT-3 cultists are educated in computer science so they should know better.
GPT-3's "one pass" processing means that a fixed amount of resources are always used. Thus it can't sort a list of items unless the fixed time it uses is humongous. You might boil the oceans that way but you won't attain AGI.
There are numerous arguments along the line of Turing's halting problem that restrict what that kind of thing can do. As it uses a finite amount of time it can't do anything that could require an unbounded time to complete or that could potentially not terminate.
GPT-3 has no model for dealing with ambiguity or uncertainty. (Other than shooting in the dark.) Practically this requires some ability to backtrack either automatically or as a result of user feedback. The current obscurantism is that you need to have 20 PhD students work for 2 years to write a paper that makes the model "explainable" in some narrow domain. With this insight you can spend another $30 million training a new model that might get the answer right.
A practical system needs to be told that "you did it wrong" and why and then be able to correct itself on the next pass if possible, otherwise in a few passes. Of course a system like that would be a real piece of engineering that people would become familiar with, not a outlet for their religious feelings that is kept on a pedestal.
The big issue is that it literally knows nothing - there is no reference to a model of the real world such as humans use when thinking about the real world. It is a very advanced pattern matching parrot, and in using words like a parrot, knows nothing about what those words mean.
Exactly, with "language in language out" it can pass as a neurotypical (passing as a neurotypical doesn't mean you get the right answer, it means if you get a wrong answer it is a neurotypical-passing wrong answer.)
Actual "understanding" means mapping language to something such as an action (I tell you to get me the plush bear and you get me the plush bear,) precise computer code, etc.
I'm inclined to agree, but positing that "the meaning of a word is its use in a language" is a perfectly respectable philosophical position. In this sense, GPT3 empirically bolsters Wittgenstein.
Given that Chomksy's paradigm (and all the others) have failed so completely at automated language understanding I'm much more inclined to believe that the "language instinct" is a peripheral for an animal and if you want language understanding you need the animal or a simulation of the animal.
outlines a promising approach to mapping the semantics encoded in these models. Understanding their limits could make your rejection of Wittgenstein all the more precise.
>There are numerous arguments along the line of Turing's halting problem that restrict what that kind of thing can do. As it uses a finite amount of time it can't do anything that could require an unbounded time to complete or that could potentially not terminate.
I have used a similar argument to show that the simulation hypothesis is wrong. If any algorithm used to simulate the world takes longer than o(N) time, then the most efficient possible computer for that is the universe which computes everything in O(n) time where n is time. In other words, you never get "lag" in reality no matter how complex the scene you're looking at is. Worse than that, some simulation algorithms are exponential time complexity!
That doesn't prove or disprove anything. What we experience as time would be part of the simulation, were such a hypothesis true. As such, the way in which we experience it is fully independent from whatever costs it might have to compute.
So you're saying that an exponential time complexity algorithm with N of every atom in the universe will complete before the heat death of the other universe that the simulation is taking place in? Sorry, not plausible.
Our laws of physics are space partitioned so the algorithm for simulating it isn't exponential.
If the containing universe has like 21 dimensions and otherwise have similar tech computers as we do today then you should be able to simulate it on a datacenter just fine as computation ability grows exponentially with number of dimensions. 3 dimensions you have 2 dimensions of computation surface, 21 dimensions and you have 20 dimensions of computation surface, so our current computation to the power of 10. GPT3 used more than a petaflop real time compute during training, so 10 to the power of 15. Using the same hardware in our fictive universe would give us 10 to the power of 150 flops. We estimate atoms in the universe to be about 10 to the power of 80, with this computer we would have 10 to the power of 70 flops of compute per atom, that should be enough even if entanglement gets a bit messy. We have around that much memory per atom as well, so can compute a lot of small boxes and sum over all of it etc, to emulate particle waves. We wouldn't be able to detect computational anomalies on that small scale, so we can't say that there isn't such a computer emulating us.
Heat death would be a part of our simulation, not necessarily of the host. But since all of this simulation theories are basically religious beliefs, it is not very interesting reasoning about it.
GPT is a joke but Ashby's various "Laws" and "Theorems" are just theories. As in, think "Jung's theory of archetypes". Cyberneticists cribbed hard-sciences terminology for their psycho-speculation freely, knowing their humanities peers couldn't call them to account about technical meanings. Theorem and law just sound engineer-y
This is very specific to GPT-3 and not generally true though. And GPT-3 is not an agent per se but rather a passive model (it received input and produces output, and does not continuously interact with its environment). So it makes sense in this context, and just goes to show GPT-3 needs to be understood for what it is.
The current leading purchase-able extremely-over-hyped-by-non-technicals language model has no memory, yes.
You see the same thing in all popular reporting about science and tech. Endless battery breakthroughs that will quadruple or 10x capacity become a couple percent improvement in practice. New gravity models mean we might have practical warp drives in 50 years. Fusion that's perpetually 20 years away. Flying cars and personal jetpacks. Moon bases, when we haven't been on the moon since the 70s.
AI reporting and hype is no different. Maybe slightly worse because it's touching on "intelligence", which we still have no clear definition of.
Planaria can move towards and away from things and even learn.
Bees work collectively to harvest nectar from flowers and build hives.
Mammals have a "theory of mind" and are very good at reasoning about what other beings think about what other beings think. For that matter birds are pretty smart in terms of ability to navigate 1000 miles and find the same nest.
People make tools, use language, play chess, bullshit each other and make cults around rationalism and GPT-3.
Birds don't need 150,000 litres of jet fuel to fly across the ocean. Given that the development of airplanes was made by studying birds I'm not sure I see your point. The 1889 book "Birdflight as the Basis of Aviation" is one example.
This isn't actually clear; with things like this we are on a path but it may not lead anywhere that fundamental (at least when we are talking "AI", especially general AI).
Is that true? How is it able to have the conversation shown in the article about python programming, if it can’t remember that the premise of the later questions it is being asked is that it said it has Python programming experience?
It is fed each answer it generates as part of the next input. That's why it can't keep that up for too long: the size of its input grows well-kinda-sorta- exponentially quickly until it hits the limit of 2048 tokens.
Which is one rason why only diminshing returns can come from increasing computational resources, btw. For inference, anyway.
You can see examples of that kind of back-to-back querying and examples of code to do it in one of the links of the article above:
What causes this? I'm curious to know what triggers this behavior.
Here's an example of GPT-2 posting on Reddit, getting stuck on "below minimum wage" or equivalent: https://reddit.com/r/SubSimulatorGPT2/comments/engt9v/my_for...
(edit) another example from the GPT-2 subreddit: https://reddit.com/r/SubSimulatorGPT2/comments/en1sy0/im_goi...
With GPT-3, I saw GitHub Copilot generate the same line or block of code over and over a couple of times.