The scary part is that you can give it instructions in plain language, and it wi...

haldujai · on March 17, 2023

Models that can do this or are involved in robotics have existed for YEARS in the military and presumably industry, I'm not sure really buying GPT4 represents a revolutionary leap forward and it seems really backwards to use an autoregressive language model for this use case..

Does any reputable academic or expert in LLMs actually support the hype behind GPT/autoregressive models as much as the HN crowd seems to?

Percy Liang and particularly Yann LeCun are pretty meh about them despite being thought leaders in this space and running leading groups in the NLP space.

I'm not sure where along the way we confused next-token prediction which by literal definition results in output that must SOUND really smart and coherent because it's trying to make a plausible sounding output with any real intelligence built in. To our knowledge (GPT-4 isn't really open) there is zero grounding happening with the outputs, certainly was not part of ChatGPT.

Someone tweeted a thread about GPT4 modifying a molecule in an anti-malarial and it didn't even get the original base molecule correct or the substitution, something that can be trivially done without using an LLM querying ope biochemical databases...

> Sure, it's "just generating text". But if it can sensibly and correctly generate arbitrary text, it's an AGI - it can solve any problem presented to it, as long as you present it in text form and accept its output in text for

No it doesn't. It APPEARS to solve problems like these where the actual task is either solved or trivial. If you read OpenAI's disclosures, and other papers by the FAIR group the all disclosure that the answers are routinely incorrect and just SOUND right. 'Write a snake game in python' is like lecture 2 of a 'python for non-engineers' course.

> but it seems no longer obvious that the same approach, just with more training data and compute thrown at it, won't be able to solve this.

Is it? Says who? Once again there are many major experts criticizing the end-game of RLHF. The problem space is much larger than what you can correct with a reward function.

If anything recent work by FAIR and Stanford NLP is suggesting that more compute is not the end game.

OpenAI themselves acknowledge that we still haven't figured out how to reliably ground a language model in truth and avoid spewing BS the moment you're not showing it some trivial thing.

At best the current approaches seems like glorified STS/IR models with the ability to output reasonable sounding text (again by definition given that they work by next token).

tgsovlerkhgsel · on March 17, 2023

It is possible that it will hit a dead end. But I believe nobody expected the current architecture to be anywhere near this good.

I can literally tell it, in plain text with a single example, to become a home assistant to control the lights and output JSON, then prompt it with natural language like "make it look like a submarine on battle stations" and it will output a setting where all the lights are red. In the exact format I asked it to use.

That's a bit more than "appearing to solve a problem". That, right there, is a directly usable application. I could literally plug a speech to text -> ChatGPT -> something that filters invalid JSON -> light controls together, teach it a few more commands, and have a much better home assistant than anything I've seen commercially available, limited mainly by the speech recognizer.

The incredible thing about this is that it can't just do that with ONE problem, it can do that with MOST simple problems that don't require really extensive background knowledge, and it is clearly able to encode knowledge (battlestations -> red), so it seems plausible to me that more data will let it handle more knowledge.

haldujai · on March 17, 2023

I'm not sure thats as impressive as it seems. It's good at predicting sequeneces that it has seen before, so STS on steroids.

It's good filtering invalid JSON because it's seen how to do that many times, and is basically acting as a very good semantic text similarity with generative output.

If you ask it to act in a way of something that is niche enough to not be something it's seen a lot it fails horribly.

We haven't exactly figured out what exactly it's encoding. I do not think this empiric example is proof of that, whether the model actually has understanding of what 'red' is as a human does is yet to be determined.

kanbara · on March 17, 2023

i asked chatgpt to generate hex colours in a pleasing palette. the hex codes matched the colours, and the palette was alright but one colour was off.

then i told it to replace the colour but keep the others the same, and it picked new slightly different hex colours.

haldujai · on March 17, 2023

And this demonstrates what exactly?