This is a great set of examples for showing off how current chatbot AIs have impressive capabilities but are also prone to getting important details wrong.
For instance:
1. The description it gives of how to generate a Shepard tone is wrong; to make a Shepard tone, you need to modulate _frequency_, not (or not only) amplitude.
2. The Shepard tone it generates is also wrong, but in a different way. For instance, there's no bass at the end matching the bass at the beginning, so it can't be looped to create the illusion of infinite ascending motion.
3. The "Game of Life QR code" isn't actually playing the Game of Life in a way that results in a QR code. It looks like it's starting with a QR code and playing the game of life, then time-reversing the result; so you see the Game of Life running "backward" until you get to the QR code as an initial state. I say "seems like" because I can't be confident it hasn't made any mistakes. This _may_ be what the author intended by "working backwards"? But I took that to mean that it should "work backwards" by first finding a state S that stepped to the QR code, then finding a state S' which stepped to S, etc; so that you'd have a run of the game of life ending in a QR code. This would involve actual search, and there are patterns which simply cannot be achieved this way (have no predecessors).
4. The planet orbit simulation seems to have all planets in circular orbits, rather than elliptical. For Earth this is probably unnoticeable, but Mars' orbit varies from 1.38 AU to 1.67 AU - quite noticeably elliptic - but appears circular in the "simulation".
etcetera, etcetera. There are also plenty of "obvious" glitches in the graphics simulations, but those concern me less - precisely because they're obvious.
It does plug a big hole in ChatGPT. ChatGPT math skills are mostly non existent (e.g. it can not factorize a medium size number), but it can easily generate python code to do almost all such things. There might be many more things ChatGPT can not do, but can generate python code to do. This brings it more in line with a normal human (not someone familiar with LLMs) expectations. If it can do a much broader set of things, the queries it can serve will benefit enormously.
GPT with tools, a scratchpad for memory and an iterative loop already seems like it could create something that's quite "intelligent" at solving various tasks that humans usually do with computers. I hope it's all being put together.
This is why "prompt engineering" is going to be a legit job, still requiring the knowledge and sensibilities of an engineer/programmer. A jack of all trades problem solving gig. Maybe even constaultion businesses will build these glued together tools for bigger companies.
Sam Altman states that "prompt engineering", in his mind, is a result of a temporary limitations of the LLMs.
That he doesn't expect people to be doing what we call "prompt engineering" in 5 years. You'll simply ask for what you want and if there is ambiguity it can be sorted out interactively or it will just do the 'right' or obvious thing that we would expect.
I disagree with him. I've spent several years answering people's random questions (librarian) in multiple settings and my first W2 job back in 2004 was technological education. I also have a Linguistics degree and can program so I understand what LLMs can and can't do fairly well even if I'm nowhere near an expert. He's vastly underestimating the average person's awareness of their own thought process or even ability to accurately answer iterative questions in order to sort things out interactively. The weakness with such a claim isn't the AI, it's the humans' ability to provide the AI with accurate and useful information.
Not to mention that there's still a large portion of the general public that just freezes up when it comes to anything technological. I've provided support to a number of different communities and even educated people can't do this type of thinking easily.
It's possible that we could eventually get to that point, but it would require some massive educational efforts and a culture shift, which would also require substantial investment without a clear road to profit, and I believe America lacks the political will or even ability to implement anything like this. Since we can't act in unison, it will be down to companies, and for them it makes more sense to solve the problem by hiring prompt engineers with the savings they get from cutting elsewhere instead of training all their existing teams on how to learn new ways of thinking.
tl;dr: Technologically he's probably right, but he's massively overestimating the public's ability to use interactive tools to get information.
> obvious thing that we would expect
Any time you rely on something being 'obvious', the public is going to somehow fuck it up.
This is similar to the situation with software engineers interacting with other parts of the business or customers. It's IME the biggest skill of software engineers, not to write code, but to understand and translate business requirement in to code, while accounting for _existing_ (prior) requirements that were often forgotten. The code as such is usually straight forward, but the requirements on either end is where the work is.
This is also what programmers typically complain about when they say their PM isn't "technical" enough. What they usually mean is they don't understand the details of the business / domain well enough to meaningfully contribute to the organization / break up of the work.
To that end, I'd expect these language models to eventually improve at that task too, perhaps responding with choices or diagrams to help users learn how to ask what they actually want. That's in a sense what a prompt engineer is doing in this context. In that sense I think I agree w/ your assessment although I also think there's a lot of room for these AI's to handle that task, at least partially, as they improve.
I agree. I'm not a SWE because I don't want to code for 40+ hours a week, but I've been picking up projects here and there for small businesses for a long time (I started helping my dad do the same when I was 5-6) and so I manage projects end to end and so much of it is the back and forth with the clients. Knowing the right questions to ask and how people think about things are some of my most important skills, and in that respect my education and library experience has been very helpful. I'm far from the best programmer - hell, I'd consider myself below average but have enough experience to still be passable, but I'm easy to work with and that counts for a lot.
The issue with language models in particular is that to use them effectively you have to fundamentally understand that you're translating human-brain-language into computer-language and understand that those are in fact two different things even if their delivery happens identically. For most coders, this jump happens semi-instinctively because they know how to test something to verify 'ah yes, this follows computer logic'. The general public thinks computers are wizardry. The other group of people who will be able to understand this are those who have thought deeply about language for whatever reason and therefore know that human-brain-language has qualities that extend beyond 'We use words'. This is also very hard to impart to novices; I have a Linguistics degree and studied a cumulative 10 years of 6 languages so I went through that knowledge transfer as well. They're both hard and one or the other is necessary to interface with language models effectively and will be for likely the next 10ish years just because providing a good technical solution will require the technical people to understand much more about the social side of communication and that will also take time.
I thought about this for a while and I think I would boil it down to being used to dealing with language as data instead of just as a communication medium. Experience with corpora, sentiment analysis, and the various parts of linguistics does give you a solid grounding in the why the frequency distribution in the training set(s) occurs the way it does.
An example of things I consider when interfacing with an LLM that derive from my linguistic knowledge:
* That the language it's trained on is thoroughly modern/confined to the speech of a particular place in time. Which means that I know any viewpoint I receive is not only modern but from a very specific time and place. The language that everybody uses is sort of like the water fish swim in in that most people don't think about it. Which means I know that if I ask it something about (to use an example that is a culture war issue) the history of racism, I know that the answer is being run through modern conceptions of racism and if I want historical views, I need to get those elsewhere.
* That which words are most commonly used relies on the social and economic status of the speaker as well as word properties like phonetics and phonology. This makes it much easier to pick and choose which vocabulary and sentence structures to use in order to 'sub-pick' the part of the training set you want answers to. Asking 'how to grow peppers' and 'what soil variables control how well a Capsicum annuum plant grows' are going to get you different answers.
* Related to this, the differences between spoken and written English on a large scale - one problem with the 'everybody can just use LLMs' idea is that the LLMs are trained on written English but the majority of people interfacing with them speak to them as though it were a verbal conversation. There are parts of verbal communication that don't exist in written communication, and knowing how to provide scaffolding to make a request make sense requires understanding the difference between the two.
* A basic knowledge of sociolinguistics is fantastically helpful for developing personae and/or picking out biases in the training data set. (Somewhat similar to how I can usually ID an American's media diet and political affiliation after a 5-10 minute casual conversation).
> Technologically he's probably right, but he's massively overestimating the public's ability to use interactive tools to get information.
I think you are underestimating the pace of progress in the capability of the LLMs. It has infinite patience to query you for missing facts. Because it is not about the public's ability to use an interactive tool. It is about the LLM's capability to be a good focus group moderator/librarian/etc.
This role reversal is imminent from what it seems. The LLMs will be using us, rather than us using them.
This is exactly the things LLMs seem to be the worst at, even as they improve
LLMs are confidently wrong rather than inquisitive/have an interest in being right. They theoretically have infinite patience, but they really have no patience and accept input at face value.
I don't doubt that these can be improved, but I also don't doubt that some people will be better at interacting with LLMs than others, and that to get good at using an LLM will take onboarding time that organizations would prefer to centralize
This seems to resonate with me. I've got a bit of a tendency to experience spells of obsessiveness and if you get very pedantic and very wordy with GPT it espouses anything you tell it to that it conceptually could. Understanding human language to the max will be a useful skill because as with any agents communication is key to successful understanding
If it's a language model, yes. I think if it's in the initial prompt (first message) or second message to the AI doesn't make much difference for its role in the process.
Many, many people are working on this right now. Having had a shot at it, the details become very important very quickly and it feels to me that getting it right will require a ton of iteration at this stage. Such that it feels easier to wait for GPT-N+1 rather than work on improving its harness. Regardless, again there is a ton of effort being poured into this very specific thing across the world.
Every night it could go through its conversations and condense them down to a tiny compressed sample leveraging context from previous long term memory. We could call it sleeping.
Yes and perhaps during this nightly compression and fine tuning, new inferences are made on previous conversations by connecting then with longer-term context and we can call that dreaming
My first thought of this was like a scratchpad, so if you have a chatgpt session there's a place where chatgpt can add/remove text, and it is continuously available as input.
Right now, you can't ask chatgpt to "think of a person, and the user will guess who it is". It can decide to think of something, but has nowhere to put that information. It should be put into the scratchpad.
Scratchpad could potentially be used for solving problems step by step too, much like how we consciously think through problems step by step. If it gets too complicated for us, the mental scratchpad we use is augmented by a physical notepad or other system.
You could also say that it enables an internal monologue.
We also use tools a lot. We're also not very good with mental arithmetic but we can do quite a lot with a simple tool like a piece of paper and a pencil.
Obviously LLM models cannot be compared one to one with our brains, but one thing is clear: despite them being executed by a computer, they are not the computer that is executing them and as such their ability to do operations that are trivial for the computer is not necessarily trivial for the language model.
I got to thinking about hiring, and things like looking at code history yesterday. There will be a lot of people who suddenly became prolific Github contributors after December of last year. I myself wrote 2 small-to-medium apps in a day using ChatGPT and added them to Github. I'm wondering if looking at Github history is now perhaps as useless as asking a potential writer for a writing sample.
Another point, the apps that I wrote with GPT-4 were filling gaps in my platform and things that would never have come off the backburner due to the lack of ROI. I wonder how we can accelerate contributions to OSS projects. Knocking out new features, refactorings and technical debt 10x faster. Of course it has to be done with care but -- an army of willing AI contributors...
The same way you would with a human developer. Plan it out, break it down into bite sized tasks, give the AI each task or parts thereof and plumb it all together yourself.
> I myself wrote 2 small-to-medium apps in a day using ChatGPT and added them to Github. I'm wondering if looking at Github history is now perhaps as useless as asking a potential writer for a writing sample.
If I'm your hiring manager, I won't be bothered by it, since I know you can use ChatGPT to solve problem, which is what the company needs. But I will still need to assess you more in detail, to ensure that you can develop / solving problem well without ChatGPT, in case somehow it's unusable or provide wrong answers.
GitHub does not color the squares green for the past though, does it? I thought GitHub colors the squares green on the day you push the commits to them, not the alleged author date.
It seems like we're pretty close to the point where LLMs will be able to create simple client apps. When anyone can make, say, an android note-taking app to their own specifications for free, it's not hard to imagine the impact on the paid note-taking apps in the store today. "Make me a game like Clash of Clans, but free, and with..." is probably only a few years further out.
One interesting question is: when people are using LLMs to generate software for their own use, what will they do for the server component of client-server use cases? Is "LLM, go register me an EC2 instance and configure it not to go over $20/mo, here's my credit card number" totally out of the question?
Honestly, I think we're way way way off what you're suggesting about giving it inputs and getting specifications. This blog post shows chatgpt recreating things with very specific instructions, that it has seen before. The hard part of defining requirements is being specific in them, and that's what programmers do. It's pretty common in these threads to see people talking about chatGPT and friends being totally wrong and it not being aware of how wrong it is.
> ? Is "LLM, go register me an EC2 instance and configure it not to go over $20/mo, here's my credit card number" totally out of the question?
I suspect we are _way_ closer to that than to having it respond to "make me a game like clash of clans".
I don't think your intuition is ambitious enough. We have 12 year olds making roguelikes and platformers today, with templates and toolkits. Sure the initial LLM-made apps will suck, the initial everything sucks, but a couple years of progress on something this new will be considerable. We've barely started tuning LLMs for specific use cases!
> The hard part of defining requirements is being specific in them
True but I don't think you're accounting for iteration. You know the "idea guy" cliche? Today, if/when this "idea guy" hires a team of devs from India to make his million-dollar idea into reality, his inability to write good requirements will lead to a useless app, and the idea guy asking for changes (that are also poorly described) until he runs out of money, right?
Now imagine that his "dev team" works for free and can turn around each new set of requirements in minutes rather than months. How long will it take him to learn how to good acceptance criteria? A day? A week? It's hard but not that hard. And a million or two other "idea guys" are doing the same thing, and sharing notes with each other.
The fact that we have 12 year olds asset flipping from tutorials today shows the value of those asset flips. If chatGPT cannibalises that market, I will then be as scared of ChatGPT taking my work as I am of a 12 year old who can follow a unity tutorial right now.
> It's hard but not that hard.
It really is that hard. It's basically 90% of staff+ engineer's job.
I don't doubt that AI and ML in particular will absolutely change our industry, but as an engineer I see chatGPT as a tool like copilot, a debugger, or a linter. All those tools make me an order of magnitude more productive, but they're useless without the human making the decisions
Writing requirements is not usually done by staff+ engineers, is it? An awful lot of working software, the majority I'd guess, is built from requirements written by a non-technical PM who wasn't even in the software industry three years ago. I wonder if you're too good a programmer to have your thumb on the pulse on the kind of software bad programmers make? Because that's where the real disruption will come from. The question is not "will some bad programmers lose their jobs", it's "what will happen when the kind of simple software that used to be made by bad programmers becomes effectively free."
The working requirements of "when I click X, do Y" comes from PO's, sure. But the remaining implied requirements (what happens if there's an error in your app? How do you identify sensitive data that needs to be handled differently to "basic" data?) in my experience (working with teams of varying degrees of experience) are defined by engineering.
We're talking about being a couple of years away from ChatGPT being able to copy and paste from tutorials here, and with my understanding of AI (I've done a few ML projects in the last decade, but nothing that stuck), that's the "easy" part. The impressive part of things like Mindjourney is the synthesis, and we're not really seeing that or any signs that it's coming IMO.
> what happens if there's an error in your app? How do you identify sensitive data that needs to be handled differently to "basic" data?
Nothing happens, and you don't. That's what I meant by "bad software". I'm reminded of a story Clay Shirky told about AT&T analysts from the mid-90s trying to understand the web hosting business[0]:
> The ATT guys had correctly understood that the income from $20-a-month customers wouldn’t pay for good web hosting. What they hadn’t understood, were in fact professionally incapable of understanding, was that the industry solution, circa 1996, was to offer hosting that wasn’t very good. This, for the ATT guys, wasn’t depressing so much as confusing... For a century, ATT’s culture had prized - insisted on - quality of service; they ran their own power grid to keep the dial-tone humming during blackouts. ATT, like most organizations, could not be good at the thing it was good at and good at the opposite thing at the same time. The web hosting business, because it followed the “Simplicity first, quality later” model, didn’t just present a new market, it required new cultural imperatives.
You’re right that it still needs a skilled person to ask the right prompts, and I don’t see that changing anytime soon
But if a few people asking the right prompts is all you need, what happens the to other 50-100+ people a game like Clash of Clans would normally employ?
1. You don't have to know about the existence of such repos
2. You don't have to find a good blog article that works for your specific purpose
3. You can ask for changes to the code like "translate this code to python" "I want to use Azure instead of AWS" "add a database so I can save player scores" etc. and it will both write the code and explain what it did.
4. You can ask for elaboration if you don't understand something, it can describe code or it can further explain written text.
5. It can explain why you're getting an error message after you change something and suggest a way to fix it.
7. If you have a feature you want to add but don't know how you can just ask it and it will show you how to implement the feature and how to integrate it into the existing code.
There's a lot more, too much to list but every one of these is far more useful than just cloning a repo and following an article.
You can ask the bot to flesh out the code and implement new features (roughly) in seconds. For example, if you have a notes app, you could ask the bot to write code that integrates it with existing calendar apps, and it'd take less than a minute to get you close.
I can Google a cloudformation stack for a lambda and get an answer [0] in about as long as it takes an LLM to generate one. That said, as a programmer I'm not paid to copy from Google or stack overflow, I'm paid to figure out if it's working or not. As an example, ask chatgpt to generate a lambda template with 16GB of memory as Nd see what it does.
Until we see chatgpt figuring out how to fix issues like that, I'm not worried.
This CloudFormation template creates an IAM role and a Lambda function with 16GB (16384MB) of memory. The Lambda function runtime is Python 3.9, but you can change this to another runtime if necessary. The code block contains a basic Python script for returning a simple message; you can replace this with your own script.
Yep! Except lambda limits are 10Gb [0],so this doesn't work.
If you ask ChatGPT to generate a template for a lambda with 16 vCPUs, it will generate a template that allows a max of 16 concurrent invocations, which again is not what you asked it for, and unless you _know_ what reserved concurrenct executions are, you won't know that's what it's generated.
In here, it says that maximum memory is 10240mb, so that may not work:
Memory and computing power
Memory is the principal lever available to Lambda developers for controlling the performance of a function. You can configure the amount of memory allocated to a Lambda function, between 128 MB and 10,240 MB. The Lambda console defaults new functions to the smallest setting and many developers also choose 128 MB for their functions.
Its response:
ChatGPT: I apologize for the confusion earlier. You are correct that the maximum memory allocation for a Lambda function is 10,240 MB. Here is the revised CloudFormation YAML template with 10,240 MB memory allocation:
Replace <S3_BUCKET_NAME> and <S3_KEY> with the appropriate values for your S3 bucket and object key that contain your Lambda function code. You may also adjust Runtime, Handler, and Timeout as needed.
===
Then I said:
What can we do to be able to use 16gb memory?
Then it said:
ChatGPT: To use 16 GB memory in a serverless context, you will need to consider other options as AWS Lambda currently supports up to 10,240 MB. Two alternatives are AWS Fargate or running a container with Amazon ECS.
Here's an example of using AWS Fargate with a task definition for a container with 16 GB of memory:
2. Create a task definition with 16 GB of memory (you can adjust the values as needed). Save the JSON below to a file called fargate-task-definition.json:
Replace <YOUR_SUBNET_ID> and <YOUR_SECURITY_GROUP_ID> with the appropriate values for your VPC and security group.
This will give you a Fargate container running with 16 GB of memory, and you can adjust the resources allocated to the container as needed.
It's a nice answer if you know that the problem is that the lambda has a limit of 10GB. See the other example of asking it to specify vCPUs - that's just plain wrong, and unless you know how lambda works, it's not going to work for you.
I think this proves the point though, that it's closer to Copilot - it can generate a bunch of boilerplate, but you the programmer still need to understand what it's generating to deploy it. The only difference is that instead of opening vscode and typing LambdaFunction, I ask chatgpt in english, but I still need to specify. _that's_ the part we're not seeing movement on yet.
I would like to crank this prediction up a bit more. I think we are pretty close that it doesn't make sense to have simple client apps anymore. Why shouldn't the LLM do it all?
Let's say a simple single purpose web app such as Calendly. Why should I pay for such a service if I can just ask the LLM to create the entire app for me?
In TFA, the author let the LLM create an ics file. Why shouldn't the LLM also spawn a REST endpoint and serve stuff from it matching my prompt?
I agree, people thinking about using LLMs to make apps that they can then sell are missing the forest for the trees.
The more interesting question is, once non-technical people are able to auto-generate their own apps, what backend will they use and how will they handle auth? The easiest default option today is "cloud services, and 'log in with facebook' oauth" but we can hope for better...
I was able to use GPT4 to write a small, self-contained JavaScript function and it worked okay. But writing out the function is painfully slow, particularly if you want to iterate. Which you’ll need to do because sometimes it introduces bugs. You can tell it to fix them and it can, but then you have to wait.
I've had luck with telling it to just rewrite a certain part of the code. You can copy and paste the relevant part and ask it to just rewrite that or you can direct it using natural language, you just have to be specific. If the chat is getting too long you can also copy just the important parts or the code so far into a new chat and start from there.
I've found that copying the result of one session to the next can work pretty well (like when you hit token* limits), especially if you have the bot include comments.
Isn't this misunderstanding the point of an AI assistant? The idea is to have AI do the same things that people can do. The benefit is they do it a lot faster and cost a lot less and can work 24/7.
How long until I can make my own movie like Star Wars, but with my story line and my characters? Serious question, does anyone have has insights into this problem?
It's far from movie quality right now but there are a lot of people working on this sort of thing. People are also exploring text to 3D scene, model, and character generation pair that with text to animation which is also being worked on and we've at least taken the first steps towards that goal.
It's beautiful but a lot of these examples are things that have been done into oblivion on Github. As soon as you pick a more obscure problem ChatGPT just doesn't give you something that works at all no matter how hard you try to pry it :/
Which sucks because I want it to help me write multiplayer videogame net-code!!! It's too dang hard!!
Certainly a human with an understanding well enough to produce a working example in 30 seconds of thousands of different problems on GitHub would have the synthesized knowledge to figure out Netcode really quickly and easily for a complex game, how to structure it well, and even how to test and debug it just all immediately.
From what I've seen, there's truth to this, but it's quite good at translation from one language to another. Not only that, but it does a great job understanding documentation and what code means. Those combined- there's a huge surface area of potential utility.
Maybe Copilot-X will bring sufficient debugger information into the model to navigate a large codebase and iteratively work on it. I can't wait to graduate from one-shot next token prediction in VS Code. I want iterative and self correcting stuff.
It seems like the article talks a lot about what can be done with a new model but doesn't say how a language model (ChatGPT) is integrated with code interpreter, how they exchange data and how the data is represented as tokens for a language model. Can someone please explain? I understand how language model works, how it consumes and predicts tokens, just don't understand how it can run code and process the output and how all of this fits into token number limit.
It's trained to emit special commands when it wants to do things. Think of it like being a wizard: speaking magical words of power causes things to happen. So it might say:
Sure, I can make a Shepard tone for you.
>>> COMMMAND { "action": "run_python", "program": "import blah\n...." }
and the driver (the program running the inference loop, AI people what do you really call this?) recognizes that the AI "predicted" this escape sequence and then when the command is finished being "predicted" it runs it. Then the result is added to the prompt. The AI can then see the result and use it.
Re: token limit. A token can be a whole word and there can be thousands of them in the context window simultaneously. So it can look back quite a long way. GPT-4 is rumored to have a token limit in the tens of thousands, although the true number is apparently not public. So that's a lot of code and results.
That said, if you asked for a program that emitted a million line CSV file and then expected the AI to read it, it would indeed get very confused and go wrong because it'd lose track of what it was trying to do. But that's no different to asking a human to mentally digest a million line CSV.
As I understand you don't even need to train the model, you can just tell it in plain English how to use the plugin, ie how to format the data, and it will do that if it sees fit.
I guess some additional training doesn't hurt and could make it more deterministic and reliable. But it's impressive how you can already tell it to create simple JSON structures from naturally worded descriptions, so I'm convinced it would already work reasonably well without additional training.
In my experience it still sometimes makes up schema or outright outputs mansplaining-like plaintext instead of JSON even if I give it correct schema. Happened once about 15~20 attempts, 5% is still too high to be considered reliable :(
I've tuned temperature, added a logit_bias to heavily prefer `{` token, this helped with the plain English v.s. JSON issue, but didn't help with hallucination. I guess I really need API access to {role: "tool"}.
That uses up context window and from what I understand it isn't as reliable as fine tuning. My guess is it's not just stuff in the prompt, it's been fine-tuned (re-trained) on examples.
If I asked a human to digest a million line csv, I'd expect them to sample some of the first results and read through the headers, then maybe pull it into excel and do some summaries or make some graphs.
Yes, one time use Python, use it and throw it away. Similarly diffusion images are mostly throwaways and personal stuff, one time use art. Waiting for the one-time use UIs.
This is not unlike the magic of a Jupyter notebook except that you mostly just write the markdown and Chat does away with a lot of the tedium/boilerplate.
"Prompt engineering" ChatGPT English in this case is like a new higher order programming language. Prolog meets Ruby. A Haskell for mortals. No more counting parentheses or semicolons which turns away a lot of people from coding.
A more advanced compiler is not necessarily sentient just because no traditional lexers/parsers are involved. It is shelling out to already written libraries and tools, which it didn't create, and doesn't know itself to glue without your explicit direction.
I suppose the next intriguing step is for it to not just spew the boilerplate but pick on its own what are the best components to connect - hard to predict how far away that might be.
> I suppose the next intriguing step is for it to not just spew the boilerplate but pick on its own what are the best components to connect - hard to predict how far away that might be.
There are already working demos of ChatGPT writing incorrect code, running the code and seeing the incorrect output multiple times, and then iterating until the code worked. AI can write and rewrite infinitely, thousands of times a day, until it gets the desired result, learning and refining it's own process as it goes, and being fine tuned with human feedback.
This is especially true in something completely digital like programming where it's possible for the AI to run the code and use the output as its next prompt. At the moment it needs human feedback to know if the art it just generated is garbage, but the code it can test itself.
That's a good breakdown (I guess. I am not an AI expert).
It does beg the question: How well does the LLM work on non-English languages?
He did mention translators, but I think he was talking about a more primitive incarnation.
I suspect that the Chinese have a version that works well with Chinese input, and maybe the Koreans and Japanese, but I'll bet one of the limitations is how much material is available to learn on, and the contractors needed to verify the training.
It sounds like training is a really expensive task, and may mean that for-profit corporations could get a lock on it.
I'm wondering if we'll be seeing "AI patent trolls," where you are sued, because your LLM has similar responses to someone else's. I see a whole industry springing up, around that (probably with AI lawyers).
I'm not an English native speaker and my first language has only about 2 million speakers with very limited training corpuses. GPT-3.5 is already shockingly good at it, although the outputs do seem a bit less factually accurate and less practically useful than in English. It's really odd that it gets the language right in all aspects except the content.
The great thing about niche languages is that it's not cost effective to scammers to learn it for their purposes. That all changes now I suppose.
> It does beg the question: How well does the LLM work on non-English languages?
Well I haven't chatted to it much in Chinese, but I've asked extensively about Chinese, and that's probably one area where I get the most benefit. It can accurately translate, segment, romanize, and break down phrases; it usually does a pretty good job explaining why a given word was used, and is good at giving you example sentences using a given word or grammar pattern. I'd be surprised if it were significantly worse in Chinese than in English.
It speaks Dutch just as well as English. I've been talking to it for weeks about the house I am renovating. I double check what it tell me, but it's very useful to get a first general answer.
I am curious what things you are talking about in the context of home renovation.
I have not used ChatGPT much but I see more and more uses that are not obvious ones (and usually only the obvious ones are addressed in articles or posts)
The kind of questions I would ask someone with experience in construction.
"If I want to remove plaster from walls, which tools should I use? What should I do with the debris afterwards?"
"Which kinds of cavity wall insulation exist? Will this cause moisture problems in my house? What can I do to detect or prevent these kinds of problems?"
"Is it useful to insulate interior walls in an old house? Or should I just focus on the exterior? Will interior wall insulation raise the EPC (energy performance) of my house?"
Does the Game of Life QR code example have a real seed? The rendered GIF looks fake, because the cells in the top left suddenly decide to become a QR code with no outside input (gliders, etc). Given that GPT-4 is a world-renowned snake oil salesman, I'd love to see the proof.
I was thinking the same thing. That last frame from game of life to QR code looks very much not like a game of life transition. Apparently GPT is as good at faking visual "work" as it is at faking natural language "work".
Is this like Jupyter, where the state of the interpreter is preserved? Or if you ask follow up questions does every interpreter block have to calculate everything from scratch?
I'm still on the waiting list, or I would test it myself.
Apparently OpenAI has some excellent developer relations and marketing people too. Is this guy even a programmer at all? His bio says "WSJ best selling novelist, Edgar & Thriller Award finalist, star of Shark Week, A&E Don’t Trust Andrew Mayne, creative applications and Science Communicator at OpenAI." so maybe not? This blog seems to have useful OpenAI related information, it's odd that it's on this guy's personal blog instead of the OpenAI website.
This morning I feel oddly compelled to play the fool so here are some near/medium term thoughts on where this may be going (worth less than what you paid for them):
1. The most important ChatGPT plugin is going to end up being the one that invokes itself recursively. The autoregression approach seems to be severely limiting what these models can do by limiting their ability to think without speaking. Although a few months ago I thought the obvious way to fix this was to train the model to emit special sort of "bracket" tokens that would be deleted by the driver once the inner thought completed, leaving only a sort of "result" section, the GPT-as-a-GPT-plugin effectively does the same thing
2. Whilst the first biggest win from the plugin will be "sub-thoughts", the next biggest will be training it how to dispatch multiple sub-thoughts in parallel. GPT already knows how to break a complex problem down into steps, but is still constrained by context window size and inference speed. Once it is taught how to split a problem up such that multiple independent inference sessions are able to work on it in parallel, it'll become feasible to make requests like "Build me a video game from scratch using Unreal Engine, set in the world of Harry Potter, about the adventures of a character named X" etc and it'll end up dispatching a massive tree of GPT sub-instances which end up working on the independent parts like character generation, writing the Unreal C++, prompting Midjourney and so on.
Parallel recursive LLMs are going to be much more awesome than current LLMs, and I mean that in both senses of the word (cool, awe-inspiring). In particular, this will allow us to pose questions like "How can we cure cancer?".
3. OpenAI need a desktop app, pronto. Whilst the cloud model can take you some way, the most valuable data is locked behind authentication screens. The cloud approach faces difficult institutional barriers, because data access inside organizations is oriented around granting permissions to individuals even when they work in teams. Giving a superbrain superuser access doesn't fit well with that, because there's no robust method to stop the AI immediately blabbing business secrets or PII to whoever tickles it in the right way. That's one reason why the current wave of AI startups is focused on open source technical docs and things. If ChatGPT is given tool access via a desktop app running on the end user's computer, it can access data using the same authentication tokens issued to individuals. This also neatly solves the question of who is accountability for mistakes: it's the user who runs the app.
For some reason I've never seen the idea of auto-recursive prompting in any of the papers or discussions. It makes so much sense. It can also help with model and compute size. Instead of using this large model to, say, list the number of primes less than a 1000, it can prompt GPT-3 to do it and count them, then send it back to GPT-4. Sounds quite feasible to implement too!
Original author here. I'm a programmer. I started on the Applied team at OpenAI back in 2020 as a prompt engineer (I helped create many of the examples in the GPT-3 docs.) I became the Science Communicator for OpenAI in 2021.
My blog audience is very non-technical so I write very broadly. We've been super busy with the launch of GPT-4 and Plugins (I produced the video content, found examples, briefed media on technical details, etc.) so I was only able to grab a few hours to put these demos together.
As far as the ChatGPT prompts go, I included a few, but they're just simple instructions. Unlike GPT 3.5 where I'd spend an hour or more getting the right instruction to do zero-shot app creation, GPT-4 just gets it.
Wow, you learned programming specifically to work with AI? That is an inspiring level of flexibility in skills and self-identification. Perhaps many of us will need to learn how to do that sort of reinvention sooner, rather than later.
Not desktop at all. I’m focused on it operating it’s own computing resources using the recursive approach. I call it multi-agent LLM approach. This way it can breakdown a complex task into components and attack each component in parallel or sequentially as it needs.
I’m not a researcher at all but a partitioner with extensive quantitative development experience in an applied industry situation using ML tools.
I’ve been thinking that taking this up a level is more a systems architecture problem. The core LLM model is so incredibly flexible and powerful that what I’m working on is the meta application of that tool and giving it the ability to use itself to solve complex problems in layers.
Hopefully that makes sense. I already have a fairly extensive and detailed systems architecture design.
It is unclear from the OpenAI description or this really cool write-up whether the code interpreter is actually running a Python interpreter, or is emulating a Python interpreter. I assume it is actually running a Python interpreter, but want to make sure (so I know how much verification of results would be needed).
There you go. Before putting it in production, I recommend reading up some on seccomp, user namepsaces, and other security options, but to be honest you're already pretty fine with just that.
Want more options? AWS lambda supports python. Cloud functions of various clouds (google, probably azure, etc) support python. Those are all sandboxes.
You can cross-compile python to wasm, and run it directly with javascript... or with any of a number of sandboxed wasm runtimes.
All that said, given the python packages they have available, my money is on a bunch of docker containers, not wasm or lambda or such.
I don't think this "pedantic reminder" is correct.
Feel free to prove me wrong though. Give me a binary to run which will escape a container, configured only with docker flags I choose to set, but without gvisor.
If it's not a "secure sandbox", surely the above should be easy... But of course you won't be able to. "contained.af" (temporarily broken right now unfortunately) has been running for 5+ years with docker, no gvisor, and no one capturing the CTF flag.
Sandboxing Python is perfectly possible. You just use kernel/process level sandboxing, or you could use something like the GraalVM sandbox which operates at the userspace level by exploiting compiler/VM tech.
Not since I spent two weeks at the Pret a Porter in Hong Kong Central staring at Bitcoin’s white paper in 2011 have I seen so much active interest in a new technology area. I estimate that about 25% of every developer’s time is now fixed on GPT experimentation. Quite an astonishing turn in the history of technology.
Turns out Tom Smykowski (the "I'm a people person, I take the customers' specifications to the engineers" character from Office Space) is the future of engineering.
It's a good day to be a programmer with stuff like this at our finger tips. Just a matter of imagination and creativity for what you could do with it and what problems you confront that can be mostly automated.
Even though it's doing (most) of the programming and gluing stuff together, understanding the fundamental structures underneath is still pretty critical to really take advantage and push the boundaries.
Memory and reward/punishment systems are so unexplored in that context. IMO it is what it will make those systems get what a lot of people define as "agency"
See my experiment trying to write interactive fiction with ChatGPT. In two hours I managed to have it come up with the more advanced interactive narration system ever devised. I don't say this to boast, this is truly leagues ahead of what has been done in this field: the game plays on two planes, action and narration plane, with two characters, one that listen to the narrator, and one that incarnates him in the past. You can argue with the narrator to change the course of the narrated action.
Yesterday I went as far as implementing narrator/listener roles reversal. The player can now take over narration. As a result he must write what an IF system would print on the screen, and the IA replies by issuing commands, endorsing the role of the player.
This is unprecedented, all the more so 99% of the work is done by simply asking: "Let's play a session of interactive fiction".
I agree the comparison is a bit unfair since ChatGPT4 did not produce any code, so I would have to compare this against chatting with someone pretending to be an IF system.
But is it ? The most popular language used to write interactive is Inform7. Inform7 is a language that reads like english (a control subset of it) to program text based games the player interacts with using a very limited form of english, the limit being your capacity at handling textual structures using programmatic constructs.
Couldn't we program entirely using natural language this way (and maybe have ChatGPT writing its own Evals on the go) ? To me it looks like a new kind of programming that's mostly conversational rather than instructional. Instead of composing together contextless bricks to build towers of abstraction and striving for easily adaptable/maintainable code, which are consequences of the fact the concepts named in our codebase are influenced by the distributive aspects of meaning in real world ("hey, want to hear the last news ? Management has decided multiple users should be able to manage the same account"), our building material is some kind of primordial plasma of context out of which logical, cartesian, categorical forms can be enticed to emerge by priming the medium. This could become semiotic programming, in contrast to symbolic programming. Another aspect of Inform7 that's particularly interesting is the fact it is built upon a literate programing "web" engine. Seen in juxtaposition to its latent and broken homoiconic tendency and transposed to ChatGPT this makes for very interesting prompts where the lines drawn between documentation, code and programmed material gets blurred, as I explain using text (maybe to myself ?) how to imagine text using text.
For instance:
1. The description it gives of how to generate a Shepard tone is wrong; to make a Shepard tone, you need to modulate _frequency_, not (or not only) amplitude.
2. The Shepard tone it generates is also wrong, but in a different way. For instance, there's no bass at the end matching the bass at the beginning, so it can't be looped to create the illusion of infinite ascending motion.
3. The "Game of Life QR code" isn't actually playing the Game of Life in a way that results in a QR code. It looks like it's starting with a QR code and playing the game of life, then time-reversing the result; so you see the Game of Life running "backward" until you get to the QR code as an initial state. I say "seems like" because I can't be confident it hasn't made any mistakes. This _may_ be what the author intended by "working backwards"? But I took that to mean that it should "work backwards" by first finding a state S that stepped to the QR code, then finding a state S' which stepped to S, etc; so that you'd have a run of the game of life ending in a QR code. This would involve actual search, and there are patterns which simply cannot be achieved this way (have no predecessors).
4. The planet orbit simulation seems to have all planets in circular orbits, rather than elliptical. For Earth this is probably unnoticeable, but Mars' orbit varies from 1.38 AU to 1.67 AU - quite noticeably elliptic - but appears circular in the "simulation".
etcetera, etcetera. There are also plenty of "obvious" glitches in the graphics simulations, but those concern me less - precisely because they're obvious.