i have a different take. i'm very glad flat earthers exist. in general i would hope the population of people who believe an idea be proportional to the probability of its truth. so even the wildest ideas should have some modicum of support. consider a world without this. i would imagine it would necessarily have to be thought-policed. i believe this is how we should frame this discussion.
what i think is the issue is that we have a broadcasting machine (social media, news, etc) that works on sensationalism. so you are always hearing about fringe ideas and given no signaling to the size of the population that supports it.
there's zero understanding in any of this. This is still just superficial text parsing essentially. Show me progress on Winograd schema and I'd be impressed. It hasn't got anything to do with AGI, this is application of ML to very traditional NLP problems.
I know that it isn't. That's part of the problem. There is no attempt to generate some sort of structure that can be interpreted semantically and reasoned about by the model. The model just operates on the input superficially and statistically. That's why there has been virtually no progress on trivial tasks such as answering:
"I took the water bottle out of the backpack so that it would be [lighter/handy]"
What is lighter and what is handy? No amount of stochastic language manipulation gets you the answer, you need to understand some rudimentary physics to answer the question, and as a precondition, you need a grammar or ontology.
In the example above it guesses wrongly, but again this is not surprising because it can't possibly get the right answer (other than by chance). The solution here cannot be found by correlating syntax, you can only answer the question if you understand the meaning of the sentence. That's what these schemas are constructed for.
The problem for me was how to formulate the sentence in a way so that the natural next word would reveal the thing the network had modelled.
edit: Retracted a test where it seemed to know which to select, because further tries revealed it was random.
edit: I did some more tries, and it does seem to be somewhat random, but the way it continues the sentence does seem to indicate that it has some form of operational model. It's just hard to prompt it in a way that it is "forced" to reveal which of the two it's talking about. Also, it seems to me its coherence range is too short in GPT-2. I would love to try this with GPT-3.
FWIW, I fed this into AIDungeon (running on OpenAI) and got this back: “The bottle is definitely lighter than the pack because you can throw it much further away than you can carry the pack.
You continue on into the night and come to an intersection.”
I'm skeptical. Amazing progress has been made in the last 5-10 years but it still feels like we need more paradigm shifting in the ML/AI field. It feels like we're approaching the upper limits of what stuffing mountains of data into model can do.
But with the speed of the field, maybe we can figure it out in three years. It just seems like we're still missing some key components. Primarily, reasoning and learning causality.
Zero shot and few-shot learning in GPT-3 and lack of significant diminishing returns in scaling text models. Zero-shot learning is equivalent to saying "i'm just going to ask the model something that it was not trained to do"
For those who are wondering about reasoning behind this being the path to full AGI I recommend this Gwern post that goes into detail: https://www.gwern.net/newsletter/2020/05
From what I understand, its not just that the GPT-3 has impressive performance but more what is signifies and that is the fact that massive scaling didn't produce diminishing return, and if this pattern persists, it can get them to the finish line.
what is the difference between zero-shot learning in text and AGI? not saying there isn't one, but can you state what it is?you can express any intent in text (unlike other media). to solve zero-shot in text is equivalent to the model responding to all intents.
many people have different definitions for AGI though. for me it clicked when i realized that text has this universality property of capturing any intent.
Zero-shot learning is a way of essentially building classifiers. There's no reasoning, there's no planning, there's no commonsense knowledge (not in a comprehensive, deep way that we would look for it call it that), and there's no integration of these skills to solve common goals. You can't take GPT and say ok turn that into a robot that can clean my house, take care of my kids, cook dinner, and then be a great dinner guest companion.
If you really probe at GPT, you'll see anything that goes beyond an initial sentence or two really starts to show how it's purely superficial in terms of understanding & intelligence; it's basically a really amazing version of Searle's Chinese room argument.
I think this is generally a good answer, but keep in mind I said AGI "in text". My forecasting is that within 3 years you will be able to give arbitrary text commands and get the textual output of the equivalents of "clean my house, take care of my kids, ..." like problems.
I also would contend that there is reasoning happening and that zero-shot demonstrates this. Specifically, reasoning about the intent of the prompt. The fact that you get this simply by building a general-purpose text model is a surprise to me.
Something I haven't seen yet is a model simulate the mind of the questioner, the way humans do, over time (minutes, days, years).
In 3 years, I'll ping you :) Already made a calendar reminder
Pattern recognition and matching isn’t the same thing as reasoning. Zero shot demonstrates reasoning as much as solving the quadratic equation for a new set of variables does; it’s simply the ability to create new decision boundaries leveraging the same set of classifying power and methodology. True agi isn’t bound to a medium — no one would say Helen Keller wasn’t intelligent for example.
I think pattern matching can be interpreted as a form of reasoning. But it is distinct from logical reasoning. Where you draw implications from assumptions. GPT seems really bad at this kind of thing. It often outputs texts with inconsistencies. And in the GPT-3 paper it performed poorly on tasks like Recognizing Textual Entailment which mainly involves this kind of reasoning.
I think these are good examples, but to me "linear algebra thinking" lies in it's generality. For example, the derivative is a linear operator, so how do you write it down as a matrix? Google's PageRank is a solution of a matrix equation, what does that matrix represent? Etc.
> For example, the derivative is a linear operator, so how do you write it down as a matrix?
Consider polynomials in X of degree up, but not including N. The powers 1,X,...,X^(n-1) form a basis. Then the coefficients of the polynomial can be put in a column vector. If D is the derivative operator, DX^n = nX^(n-1), so the derivative matrix can be expressed as a sparse matrix with D_(n,n+1) = n. Visually, it's a matrix with the integers 1,2,...,n-1 on the super-diagonal.
You can also see that this is a nilpotent matrix for finite N, since repeated multiplication sends the entries further up into the upper right corner.
You can extend this to the infinite case for formal power series in X, too, where you don't worry about convergence.
> Google's PageRank is a solution of a matrix equation, what does that matrix represent?
Isn't it just the adjacency matrix of a big graph?
Anyway, I agree with you. Matrices and linear algebra is a really good inspiration for higher level concepts like vector spaces and Hilbert spaces and so on. That's where the real power lies. But even in such general domains, matrices are often used to do concrete computations on them, because we have a lot of tools for matrices.
[off topic] FYI, i found this comment by subscribing to the RSS feed of your HN comments on Fraidycat (by linking to https://edavis.github.io/hnrss/ for your username)
You must start with labeled data. It is easier to label pictures of parked cars than it is to label pictures of good/bad driving. For labeled video, the dimensionality is out of reach of ML for now, and would add lag to your system.
what i think is the issue is that we have a broadcasting machine (social media, news, etc) that works on sensationalism. so you are always hearing about fringe ideas and given no signaling to the size of the population that supports it.