Hacker News new | past | comments | ask | show | jobs | submit login
ChatGPT and Code Interpreter = Magic (andrewmayneblog.wordpress.com)
313 points by dogon on March 26, 2023 | hide | past | favorite | 146 comments



This is a great set of examples for showing off how current chatbot AIs have impressive capabilities but are also prone to getting important details wrong.

For instance:

1. The description it gives of how to generate a Shepard tone is wrong; to make a Shepard tone, you need to modulate _frequency_, not (or not only) amplitude.

2. The Shepard tone it generates is also wrong, but in a different way. For instance, there's no bass at the end matching the bass at the beginning, so it can't be looped to create the illusion of infinite ascending motion.

3. The "Game of Life QR code" isn't actually playing the Game of Life in a way that results in a QR code. It looks like it's starting with a QR code and playing the game of life, then time-reversing the result; so you see the Game of Life running "backward" until you get to the QR code as an initial state. I say "seems like" because I can't be confident it hasn't made any mistakes. This _may_ be what the author intended by "working backwards"? But I took that to mean that it should "work backwards" by first finding a state S that stepped to the QR code, then finding a state S' which stepped to S, etc; so that you'd have a run of the game of life ending in a QR code. This would involve actual search, and there are patterns which simply cannot be achieved this way (have no predecessors).

4. The planet orbit simulation seems to have all planets in circular orbits, rather than elliptical. For Earth this is probably unnoticeable, but Mars' orbit varies from 1.38 AU to 1.67 AU - quite noticeably elliptic - but appears circular in the "simulation".

etcetera, etcetera. There are also plenty of "obvious" glitches in the graphics simulations, but those concern me less - precisely because they're obvious.


If it weren't for the doubling down on mistakes and making false assurances it would be great.

You cannot even be correct with GPT, once it's in an error state you must start a new session. The chance of hallucinations rises rapidly.

It feels like it's trying to cover itself. Pretty weird stuff.


It does plug a big hole in ChatGPT. ChatGPT math skills are mostly non existent (e.g. it can not factorize a medium size number), but it can easily generate python code to do almost all such things. There might be many more things ChatGPT can not do, but can generate python code to do. This brings it more in line with a normal human (not someone familiar with LLMs) expectations. If it can do a much broader set of things, the queries it can serve will benefit enormously.


GPT with tools, a scratchpad for memory and an iterative loop already seems like it could create something that's quite "intelligent" at solving various tasks that humans usually do with computers. I hope it's all being put together.


Reflexion: an autonomous agent with dynamic memory and self-reflection, https://arxiv.org/abs/2303.11366

Add an interpreter as per this article, and we're basically there.



does this not terrify you?


This is why "prompt engineering" is going to be a legit job, still requiring the knowledge and sensibilities of an engineer/programmer. A jack of all trades problem solving gig. Maybe even constaultion businesses will build these glued together tools for bigger companies.


Sam Altman states that "prompt engineering", in his mind, is a result of a temporary limitations of the LLMs.

That he doesn't expect people to be doing what we call "prompt engineering" in 5 years. You'll simply ask for what you want and if there is ambiguity it can be sorted out interactively or it will just do the 'right' or obvious thing that we would expect.


I disagree with him. I've spent several years answering people's random questions (librarian) in multiple settings and my first W2 job back in 2004 was technological education. I also have a Linguistics degree and can program so I understand what LLMs can and can't do fairly well even if I'm nowhere near an expert. He's vastly underestimating the average person's awareness of their own thought process or even ability to accurately answer iterative questions in order to sort things out interactively. The weakness with such a claim isn't the AI, it's the humans' ability to provide the AI with accurate and useful information.

Not to mention that there's still a large portion of the general public that just freezes up when it comes to anything technological. I've provided support to a number of different communities and even educated people can't do this type of thinking easily.

It's possible that we could eventually get to that point, but it would require some massive educational efforts and a culture shift, which would also require substantial investment without a clear road to profit, and I believe America lacks the political will or even ability to implement anything like this. Since we can't act in unison, it will be down to companies, and for them it makes more sense to solve the problem by hiring prompt engineers with the savings they get from cutting elsewhere instead of training all their existing teams on how to learn new ways of thinking.

tl;dr: Technologically he's probably right, but he's massively overestimating the public's ability to use interactive tools to get information.

> obvious thing that we would expect

Any time you rely on something being 'obvious', the public is going to somehow fuck it up.


This is similar to the situation with software engineers interacting with other parts of the business or customers. It's IME the biggest skill of software engineers, not to write code, but to understand and translate business requirement in to code, while accounting for _existing_ (prior) requirements that were often forgotten. The code as such is usually straight forward, but the requirements on either end is where the work is.

This is also what programmers typically complain about when they say their PM isn't "technical" enough. What they usually mean is they don't understand the details of the business / domain well enough to meaningfully contribute to the organization / break up of the work.

To that end, I'd expect these language models to eventually improve at that task too, perhaps responding with choices or diagrams to help users learn how to ask what they actually want. That's in a sense what a prompt engineer is doing in this context. In that sense I think I agree w/ your assessment although I also think there's a lot of room for these AI's to handle that task, at least partially, as they improve.


I agree. I'm not a SWE because I don't want to code for 40+ hours a week, but I've been picking up projects here and there for small businesses for a long time (I started helping my dad do the same when I was 5-6) and so I manage projects end to end and so much of it is the back and forth with the clients. Knowing the right questions to ask and how people think about things are some of my most important skills, and in that respect my education and library experience has been very helpful. I'm far from the best programmer - hell, I'd consider myself below average but have enough experience to still be passable, but I'm easy to work with and that counts for a lot.

The issue with language models in particular is that to use them effectively you have to fundamentally understand that you're translating human-brain-language into computer-language and understand that those are in fact two different things even if their delivery happens identically. For most coders, this jump happens semi-instinctively because they know how to test something to verify 'ah yes, this follows computer logic'. The general public thinks computers are wizardry. The other group of people who will be able to understand this are those who have thought deeply about language for whatever reason and therefore know that human-brain-language has qualities that extend beyond 'We use words'. This is also very hard to impart to novices; I have a Linguistics degree and studied a cumulative 10 years of 6 languages so I went through that knowledge transfer as well. They're both hard and one or the other is necessary to interface with language models effectively and will be for likely the next 10ish years just because providing a good technical solution will require the technical people to understand much more about the social side of communication and that will also take time.


Very interesting. Can you please elaborate or give a TLDR on the linguistics angle here?

Does it have something to do with grounding or implicit knowledge/ connotations and context?


I thought about this for a while and I think I would boil it down to being used to dealing with language as data instead of just as a communication medium. Experience with corpora, sentiment analysis, and the various parts of linguistics does give you a solid grounding in the why the frequency distribution in the training set(s) occurs the way it does.

An example of things I consider when interfacing with an LLM that derive from my linguistic knowledge:

* That the language it's trained on is thoroughly modern/confined to the speech of a particular place in time. Which means that I know any viewpoint I receive is not only modern but from a very specific time and place. The language that everybody uses is sort of like the water fish swim in in that most people don't think about it. Which means I know that if I ask it something about (to use an example that is a culture war issue) the history of racism, I know that the answer is being run through modern conceptions of racism and if I want historical views, I need to get those elsewhere.

* That which words are most commonly used relies on the social and economic status of the speaker as well as word properties like phonetics and phonology. This makes it much easier to pick and choose which vocabulary and sentence structures to use in order to 'sub-pick' the part of the training set you want answers to. Asking 'how to grow peppers' and 'what soil variables control how well a Capsicum annuum plant grows' are going to get you different answers.

* Related to this, the differences between spoken and written English on a large scale - one problem with the 'everybody can just use LLMs' idea is that the LLMs are trained on written English but the majority of people interfacing with them speak to them as though it were a verbal conversation. There are parts of verbal communication that don't exist in written communication, and knowing how to provide scaffolding to make a request make sense requires understanding the difference between the two.

* A basic knowledge of sociolinguistics is fantastically helpful for developing personae and/or picking out biases in the training data set. (Somewhat similar to how I can usually ID an American's media diet and political affiliation after a 5-10 minute casual conversation).


> Technologically he's probably right, but he's massively overestimating the public's ability to use interactive tools to get information.

I think you are underestimating the pace of progress in the capability of the LLMs. It has infinite patience to query you for missing facts. Because it is not about the public's ability to use an interactive tool. It is about the LLM's capability to be a good focus group moderator/librarian/etc.

This role reversal is imminent from what it seems. The LLMs will be using us, rather than us using them.


This is exactly the things LLMs seem to be the worst at, even as they improve

LLMs are confidently wrong rather than inquisitive/have an interest in being right. They theoretically have infinite patience, but they really have no patience and accept input at face value.

I don't doubt that these can be improved, but I also don't doubt that some people will be better at interacting with LLMs than others, and that to get good at using an LLM will take onboarding time that organizations would prefer to centralize


This seems to resonate with me. I've got a bit of a tendency to experience spells of obsessiveness and if you get very pedantic and very wordy with GPT it espouses anything you tell it to that it conceptually could. Understanding human language to the max will be a useful skill because as with any agents communication is key to successful understanding


The "sorted out interactively" that he mentions is exactly prompt engineering, isn't it?


If you ask someone to do something for you and they ask a clarifying question, do you consider that prompt engineering?


If it's a language model, yes. I think if it's in the initial prompt (first message) or second message to the AI doesn't make much difference for its role in the process.


So a technical solution to a human problem?


Many, many people are working on this right now. Having had a shot at it, the details become very important very quickly and it feels to me that getting it right will require a ton of iteration at this stage. Such that it feels easier to wait for GPT-N+1 rather than work on improving its harness. Regardless, again there is a ton of effort being poured into this very specific thing across the world.


Any links to repos, where people are working on such?


How do you imagine the memory would work?


Every night it could go through its conversations and condense them down to a tiny compressed sample leveraging context from previous long term memory. We could call it sleeping.


And fine-tune itself? Fascinating.


Yes and perhaps during this nightly compression and fine tuning, new inferences are made on previous conversations by connecting then with longer-term context and we can call that dreaming


My first thought of this was like a scratchpad, so if you have a chatgpt session there's a place where chatgpt can add/remove text, and it is continuously available as input.

Right now, you can't ask chatgpt to "think of a person, and the user will guess who it is". It can decide to think of something, but has nowhere to put that information. It should be put into the scratchpad.

Scratchpad could potentially be used for solving problems step by step too, much like how we consciously think through problems step by step. If it gets too complicated for us, the mental scratchpad we use is augmented by a physical notepad or other system.

You could also say that it enables an internal monologue.


We also use tools a lot. We're also not very good with mental arithmetic but we can do quite a lot with a simple tool like a piece of paper and a pencil.

Obviously LLM models cannot be compared one to one with our brains, but one thing is clear: despite them being executed by a computer, they are not the computer that is executing them and as such their ability to do operations that are trivial for the computer is not necessarily trivial for the language model.


> We also use tools a lot.

Like whole industries sprung up around the various tools - email, search, notes, images, videos, databases. We're big on tools.


We're making tools that make and use their own tools.

Yeah, pretty big on tools.


We're making tools that use tools to make tools to eventually replace us with tools.

It's tools all the way down.


It's great to be a part of the toolchain!



I got to thinking about hiring, and things like looking at code history yesterday. There will be a lot of people who suddenly became prolific Github contributors after December of last year. I myself wrote 2 small-to-medium apps in a day using ChatGPT and added them to Github. I'm wondering if looking at Github history is now perhaps as useless as asking a potential writer for a writing sample.

Another point, the apps that I wrote with GPT-4 were filling gaps in my platform and things that would never have come off the backburner due to the lack of ROI. I wonder how we can accelerate contributions to OSS projects. Knocking out new features, refactorings and technical debt 10x faster. Of course it has to be done with care but -- an army of willing AI contributors...


There is no reason someone couldn't change commit dates and be prolific far earlier than December


Lack of pre-december github activity could be an indicator, i.e. no pull requests, no issues, no stars or forks, perhaps even no feed updates, etc.


Lack of github activity may also be an indicator for having a life outside programming. Or using private repos.


Or just being as prolific on Gitlab, Bitbucket or elsewhere. We have to stop taking Github contributions so seriously.


How did you use CGPT to write an application? I understand how to use it to solve simple problems, but not create an application


The same way you would with a human developer. Plan it out, break it down into bite sized tasks, give the AI each task or parts thereof and plumb it all together yourself.


Ask it to make an application.


Github history has always been useless

Commit times are plain text user generated content

Just throw generated commits in a private repository and you have as much history as you want

If a sycophant or recruiter or employer wants to use that metric as an indicator of anything, that is pure happenstance and completely on them


> I myself wrote 2 small-to-medium apps in a day using ChatGPT and added them to Github. I'm wondering if looking at Github history is now perhaps as useless as asking a potential writer for a writing sample.

If I'm your hiring manager, I won't be bothered by it, since I know you can use ChatGPT to solve problem, which is what the company needs. But I will still need to assess you more in detail, to ensure that you can develop / solving problem well without ChatGPT, in case somehow it's unusable or provide wrong answers.


> I wonder how we can accelerate contributions to OSS projects.

LLMs are impressive, but hacking on pre-existing complex codebases seems to be a current weakness.


GitHub does not color the squares green for the past though, does it? I thought GitHub colors the squares green on the day you push the commits to them, not the alleged author date.


It seems like we're pretty close to the point where LLMs will be able to create simple client apps. When anyone can make, say, an android note-taking app to their own specifications for free, it's not hard to imagine the impact on the paid note-taking apps in the store today. "Make me a game like Clash of Clans, but free, and with..." is probably only a few years further out.

One interesting question is: when people are using LLMs to generate software for their own use, what will they do for the server component of client-server use cases? Is "LLM, go register me an EC2 instance and configure it not to go over $20/mo, here's my credit card number" totally out of the question?


Honestly, I think we're way way way off what you're suggesting about giving it inputs and getting specifications. This blog post shows chatgpt recreating things with very specific instructions, that it has seen before. The hard part of defining requirements is being specific in them, and that's what programmers do. It's pretty common in these threads to see people talking about chatGPT and friends being totally wrong and it not being aware of how wrong it is.

> ? Is "LLM, go register me an EC2 instance and configure it not to go over $20/mo, here's my credit card number" totally out of the question?

I suspect we are _way_ closer to that than to having it respond to "make me a game like clash of clans".


I don't think your intuition is ambitious enough. We have 12 year olds making roguelikes and platformers today, with templates and toolkits. Sure the initial LLM-made apps will suck, the initial everything sucks, but a couple years of progress on something this new will be considerable. We've barely started tuning LLMs for specific use cases!

> The hard part of defining requirements is being specific in them

True but I don't think you're accounting for iteration. You know the "idea guy" cliche? Today, if/when this "idea guy" hires a team of devs from India to make his million-dollar idea into reality, his inability to write good requirements will lead to a useless app, and the idea guy asking for changes (that are also poorly described) until he runs out of money, right?

Now imagine that his "dev team" works for free and can turn around each new set of requirements in minutes rather than months. How long will it take him to learn how to good acceptance criteria? A day? A week? It's hard but not that hard. And a million or two other "idea guys" are doing the same thing, and sharing notes with each other.


The fact that we have 12 year olds asset flipping from tutorials today shows the value of those asset flips. If chatGPT cannibalises that market, I will then be as scared of ChatGPT taking my work as I am of a 12 year old who can follow a unity tutorial right now.

> It's hard but not that hard.

It really is that hard. It's basically 90% of staff+ engineer's job.

I don't doubt that AI and ML in particular will absolutely change our industry, but as an engineer I see chatGPT as a tool like copilot, a debugger, or a linter. All those tools make me an order of magnitude more productive, but they're useless without the human making the decisions


Writing requirements is not usually done by staff+ engineers, is it? An awful lot of working software, the majority I'd guess, is built from requirements written by a non-technical PM who wasn't even in the software industry three years ago. I wonder if you're too good a programmer to have your thumb on the pulse on the kind of software bad programmers make? Because that's where the real disruption will come from. The question is not "will some bad programmers lose their jobs", it's "what will happen when the kind of simple software that used to be made by bad programmers becomes effectively free."


The working requirements of "when I click X, do Y" comes from PO's, sure. But the remaining implied requirements (what happens if there's an error in your app? How do you identify sensitive data that needs to be handled differently to "basic" data?) in my experience (working with teams of varying degrees of experience) are defined by engineering.

We're talking about being a couple of years away from ChatGPT being able to copy and paste from tutorials here, and with my understanding of AI (I've done a few ML projects in the last decade, but nothing that stuck), that's the "easy" part. The impressive part of things like Mindjourney is the synthesis, and we're not really seeing that or any signs that it's coming IMO.


> what happens if there's an error in your app? How do you identify sensitive data that needs to be handled differently to "basic" data?

Nothing happens, and you don't. That's what I meant by "bad software". I'm reminded of a story Clay Shirky told about AT&T analysts from the mid-90s trying to understand the web hosting business[0]:

> The ATT guys had correctly understood that the income from $20-a-month customers wouldn’t pay for good web hosting. What they hadn’t understood, were in fact professionally incapable of understanding, was that the industry solution, circa 1996, was to offer hosting that wasn’t very good. This, for the ATT guys, wasn’t depressing so much as confusing... For a century, ATT’s culture had prized - insisted on - quality of service; they ran their own power grid to keep the dial-tone humming during blackouts. ATT, like most organizations, could not be good at the thing it was good at and good at the opposite thing at the same time. The web hosting business, because it followed the “Simplicity first, quality later” model, didn’t just present a new market, it required new cultural imperatives.

0: https://gwern.net/doc/economics/2010-04-01-shirky-thecollaps... (shirky's old site has been down for some time)


You’re right that it still needs a skilled person to ask the right prompts, and I don’t see that changing anytime soon

But if a few people asking the right prompts is all you need, what happens the to other 50-100+ people a game like Clash of Clans would normally employ?


I think ChatGPT is actually better at creating the right prompts than doing the answers to the prompts.


how is this different than git cloning an already done hello world style app/using a blog article with a code sample?


1. You don't have to know about the existence of such repos

2. You don't have to find a good blog article that works for your specific purpose

3. You can ask for changes to the code like "translate this code to python" "I want to use Azure instead of AWS" "add a database so I can save player scores" etc. and it will both write the code and explain what it did.

4. You can ask for elaboration if you don't understand something, it can describe code or it can further explain written text.

5. It can explain why you're getting an error message after you change something and suggest a way to fix it.

7. If you have a feature you want to add but don't know how you can just ask it and it will show you how to implement the feature and how to integrate it into the existing code.

There's a lot more, too much to list but every one of these is far more useful than just cloning a repo and following an article.


You can ask the bot to flesh out the code and implement new features (roughly) in seconds. For example, if you have a notes app, you could ask the bot to write code that integrates it with existing calendar apps, and it'd take less than a minute to get you close.


> and it'd take less than a minute to get you close.

Have you actually tried this yourself? I have probably... 300+ ChatGPT chats/sessions past 30 days (aka, I actually use it).

I have 0 faith it would actually spit out anything error-free / working.


Yeah, I've had success, and it has either produced useful code or nearly useful code. I use it to refactor functions and write regexes, tedious stuff.


Tbh for clash of clans, if it can set up the server infrastructure etc, then that will be quite scary.

The scariest thing is that... I did use Chatgpt to set up an aws cloud formation stack for a lambda function, and it gave me the right file to sync.

So... Yeah, it's not that many steps to a "text to (web|mobile) app generator" solution.

Uh.. I will get to work.


I can Google a cloudformation stack for a lambda and get an answer [0] in about as long as it takes an LLM to generate one. That said, as a programmer I'm not paid to copy from Google or stack overflow, I'm paid to figure out if it's working or not. As an example, ask chatgpt to generate a lambda template with 16GB of memory as Nd see what it does.

Until we see chatgpt figuring out how to fix issues like that, I'm not worried.

[0] https://iamondemand.com/blog/how-to-deploy-your-lambda-funct...


It says this, is it ok? I can check when I get home

ChatGPT: Below is an AWS CloudFormation YAML template for creating an AWS Lambda function with 16GB of memory:

```yaml AWSTemplateFormatVersion: '2010-09-09' Description: 'AWS CloudFormation Lambda Function Template with 16GB Memory'

Resources: LambdaFunctionRole: Type: 'AWS::IAM::Role' Properties: AssumeRolePolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Principal: Service: - lambda.amazonaws.com Action: - sts:AssumeRole Path: / Policies: - PolicyName: LambdaExecutionPolicy PolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Action: - logs:CreateLogGroup - logs:CreateLogStream - logs:PutLogEvents Resource: 'arn:aws:logs:::*'

  LambdaFunction:
    Type: 'AWS::Lambda::Function'
    Properties:
      FunctionName: MyLambdaFunction
      Runtime: python3.9
      Role: !GetAtt LambdaFunctionRole.Arn
      Handler: lambda_function.lambda_handler
      MemorySize: 16384
      Timeout: 300
      Code:
        ZipFile: |
          import json

          def lambda_handler(event, context):
              return {
                  'statusCode': 200,
                  'body': json.dumps('Hello from Lambda with 16GB memory!')
              }
```

This CloudFormation template creates an IAM role and a Lambda function with 16GB (16384MB) of memory. The Lambda function runtime is Python 3.9, but you can change this to another runtime if necessary. The code block contains a basic Python script for returning a simple message; you can replace this with your own script.


Yep! Except lambda limits are 10Gb [0],so this doesn't work.

If you ask ChatGPT to generate a template for a lambda with 16 vCPUs, it will generate a template that allows a max of 16 concurrent invocations, which again is not what you asked it for, and unless you _know_ what reserved concurrenct executions are, you won't know that's what it's generated.

[0] https://docs.aws.amazon.com/lambda/latest/operatorguide/comp....


I told it:

In here, it says that maximum memory is 10240mb, so that may not work:

Memory and computing power

Memory is the principal lever available to Lambda developers for controlling the performance of a function. You can configure the amount of memory allocated to a Lambda function, between 128 MB and 10,240 MB. The Lambda console defaults new functions to the smallest setting and many developers also choose 128 MB for their functions.

Its response:

ChatGPT: I apologize for the confusion earlier. You are correct that the maximum memory allocation for a Lambda function is 10,240 MB. Here is the revised CloudFormation YAML template with 10,240 MB memory allocation:

```yaml Resources: LambdaFunction: Type: AWS::Lambda::Function Properties: FunctionName: LambdaWith10GBMemory Handler: index.handler Runtime: nodejs14.x MemorySize: 10240 Timeout: 300 Role: !GetAtt 'LambdaRole.Arn' Code: S3Bucket: <S3_BUCKET_NAME> S3Key: <S3_KEY>

  LambdaRole:
    Type: AWS::IAM::Role
    Properties:
      RoleName: LambdaRoleWith10GBMemory
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - lambda.amazonaws.com
            Action:
              - sts:AssumeRole
      Policies:
        - PolicyName: LambdaBasicExecution
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action:
                  - logs:CreateLogGroup
                  - logs:CreateLogStream
                  - logs:PutLogEvents
                Resource: arn:aws:logs:*:*:*
```

Replace <S3_BUCKET_NAME> and <S3_KEY> with the appropriate values for your S3 bucket and object key that contain your Lambda function code. You may also adjust Runtime, Handler, and Timeout as needed.

===

Then I said:

What can we do to be able to use 16gb memory?

Then it said:

ChatGPT: To use 16 GB memory in a serverless context, you will need to consider other options as AWS Lambda currently supports up to 10,240 MB. Two alternatives are AWS Fargate or running a container with Amazon ECS.

Here's an example of using AWS Fargate with a task definition for a container with 16 GB of memory:

1. Create an ECS cluster:

aws ecs create-cluster --cluster-name MyFargateCluster

2. Create a task definition with 16 GB of memory (you can adjust the values as needed). Save the JSON below to a file called fargate-task-definition.json:

{ "family": "FargateTaskWith16GBMemory", "taskRoleArn": "<YOUR_TASK_ROLE_ARN>", "executionRoleArn": "<YOUR_EXECUTION_ROLE_ARN>", "networkMode": "awsvpc", "containerDefinitions": [ { "name": "MyContainer", "image": "<YOUR_DOCKER_IMAGE>", "cpu": 4096, "memory": 16384, "memoryReservation": 16384, "essential": true, "portMappings": [ { "containerPort": 80, "hostPort": 80, "protocol": "tcp" } ] } ], "requiresCompatibilities": [ "FARGATE" ], "cpu": "4096", "memory": "16384" }

Replace <YOUR_TASK_ROLE_ARN>, <YOUR_EXECUTION_ROLE_ARN> and <YOUR_DOCKER_IMAGE> with appropriate values for your use case.

3. Register the task definition:

aws ecs register-task-definition --cli-input-json file://fargate-task-definition.json

4. Launch the Fargate task:

aws ecs run-task --cluster MyFargateCluster --launch-type FARGATE --task-definition FargateTaskWith16GBMemory --network-configuration "awsvpcConfiguration={subnets=[<YOUR_SUBNET_ID>],securityGroups=[<YOUR_SECURITY_GROUP_ID>]}"

Replace <YOUR_SUBNET_ID> and <YOUR_SECURITY_GROUP_ID> with the appropriate values for your VPC and security group. This will give you a Fargate container running with 16 GB of memory, and you can adjust the resources allocated to the container as needed.

====

I think that's quite a nice answer.


It's a nice answer if you know that the problem is that the lambda has a limit of 10GB. See the other example of asking it to specify vCPUs - that's just plain wrong, and unless you know how lambda works, it's not going to work for you.

I think this proves the point though, that it's closer to Copilot - it can generate a bunch of boilerplate, but you the programmer still need to understand what it's generating to deploy it. The only difference is that instead of opening vscode and typing LambdaFunction, I ask chatgpt in english, but I still need to specify. _that's_ the part we're not seeing movement on yet.


I would like to crank this prediction up a bit more. I think we are pretty close that it doesn't make sense to have simple client apps anymore. Why shouldn't the LLM do it all?

Let's say a simple single purpose web app such as Calendly. Why should I pay for such a service if I can just ask the LLM to create the entire app for me?

In TFA, the author let the LLM create an ics file. Why shouldn't the LLM also spawn a REST endpoint and serve stuff from it matching my prompt?


I agree, people thinking about using LLMs to make apps that they can then sell are missing the forest for the trees.

The more interesting question is, once non-technical people are able to auto-generate their own apps, what backend will they use and how will they handle auth? The easiest default option today is "cloud services, and 'log in with facebook' oauth" but we can hope for better...


ChatGPT,

Pls make website, all responsive like, w/ BIG pictures ooo, use my fav fonts, also fancy menus with whoosh on, load fast pls.

Thanks, Human

PS, no bugs :)


make me a website that looks like apple's website but for my pizza food cart


I was able to use GPT4 to write a small, self-contained JavaScript function and it worked okay. But writing out the function is painfully slow, particularly if you want to iterate. Which you’ll need to do because sometimes it introduces bugs. You can tell it to fix them and it can, but then you have to wait.


For this reason, I use gpt4 mostly just to start off new functions and then I edit them myself.

It would be awesome if gpt4 could be made to edit the code and therefore, it didn’t have to regenerate it from scratch every time.

The regeneration also chews up a lot of the token limit, so it forgets crucial parts earlier on in the conversation.


I've had luck with telling it to just rewrite a certain part of the code. You can copy and paste the relevant part and ask it to just rewrite that or you can direct it using natural language, you just have to be specific. If the chat is getting too long you can also copy just the important parts or the code so far into a new chat and start from there.


I've found that copying the result of one session to the next can work pretty well (like when you hit token* limits), especially if you have the bot include comments.


Pretty much everything you described could be done pretty easily already by anyone half inclined to do so.

If there was any point in doing it, people would’ve already done it 1000x, the LLM adds no value here.


Isn't this misunderstanding the point of an AI assistant? The idea is to have AI do the same things that people can do. The benefit is they do it a lot faster and cost a lot less and can work 24/7.


How long until I can make my own movie like Star Wars, but with my story line and my characters? Serious question, does anyone have has insights into this problem?


Probably not very long, look into ControlNET for video. https://www.youtube.com/watch?v=3FZuJdJGFfE

It's far from movie quality right now but there are a lot of people working on this sort of thing. People are also exploring text to 3D scene, model, and character generation pair that with text to animation which is also being worked on and we've at least taken the first steps towards that goal.


I want to do this for my kid - inject him as a character in kids stories and songs.


It's beautiful but a lot of these examples are things that have been done into oblivion on Github. As soon as you pick a more obscure problem ChatGPT just doesn't give you something that works at all no matter how hard you try to pry it :/

Which sucks because I want it to help me write multiplayer videogame net-code!!! It's too dang hard!!

Certainly a human with an understanding well enough to produce a working example in 30 seconds of thousands of different problems on GitHub would have the synthesized knowledge to figure out Netcode really quickly and easily for a complex game, how to structure it well, and even how to test and debug it just all immediately.


This is true. See https://twitter.com/cHHillee/status/1635790330854526981

GPT-4 can't solve the easy, newer problems at CodeForces, but gets all the older hard problems.


From what I've seen, there's truth to this, but it's quite good at translation from one language to another. Not only that, but it does a great job understanding documentation and what code means. Those combined- there's a huge surface area of potential utility.


Maybe Copilot-X will bring sufficient debugger information into the model to navigate a large codebase and iteratively work on it. I can't wait to graduate from one-shot next token prediction in VS Code. I want iterative and self correcting stuff.


It seems like the article talks a lot about what can be done with a new model but doesn't say how a language model (ChatGPT) is integrated with code interpreter, how they exchange data and how the data is represented as tokens for a language model. Can someone please explain? I understand how language model works, how it consumes and predicts tokens, just don't understand how it can run code and process the output and how all of this fits into token number limit.


It's trained to emit special commands when it wants to do things. Think of it like being a wizard: speaking magical words of power causes things to happen. So it might say:

    Sure, I can make a Shepard tone for you.

    >>> COMMMAND { "action": "run_python", "program": "import blah\n...." }
and the driver (the program running the inference loop, AI people what do you really call this?) recognizes that the AI "predicted" this escape sequence and then when the command is finished being "predicted" it runs it. Then the result is added to the prompt. The AI can then see the result and use it.

Re: token limit. A token can be a whole word and there can be thousands of them in the context window simultaneously. So it can look back quite a long way. GPT-4 is rumored to have a token limit in the tens of thousands, although the true number is apparently not public. So that's a lot of code and results.

That said, if you asked for a program that emitted a million line CSV file and then expected the AI to read it, it would indeed get very confused and go wrong because it'd lose track of what it was trying to do. But that's no different to asking a human to mentally digest a million line CSV.


As I understand you don't even need to train the model, you can just tell it in plain English how to use the plugin, ie how to format the data, and it will do that if it sees fit.


The ability to reliably follow such instructions given a JSON manifest likely comes from training on a (small) dataset of these "plugin manifests".


I guess some additional training doesn't hurt and could make it more deterministic and reliable. But it's impressive how you can already tell it to create simple JSON structures from naturally worded descriptions, so I'm convinced it would already work reasonably well without additional training.


It can but it'd make up its own schema. The driver is deterministic logic and needs a completely predictable token sequence.


Not if you give it the correct schema in the prompt


In my experience it still sometimes makes up schema or outright outputs mansplaining-like plaintext instead of JSON even if I give it correct schema. Happened once about 15~20 attempts, 5% is still too high to be considered reliable :(

I've tuned temperature, added a logit_bias to heavily prefer `{` token, this helped with the plain English v.s. JSON issue, but didn't help with hallucination. I guess I really need API access to {role: "tool"}.


That uses up context window and from what I understand it isn't as reliable as fine tuning. My guess is it's not just stuff in the prompt, it's been fine-tuned (re-trained) on examples.


If I asked a human to digest a million line csv, I'd expect them to sample some of the first results and read through the headers, then maybe pull it into excel and do some summaries or make some graphs.

Not try to read all of the lines


So it can create and run programs, not just call some pre-defined set of utilities? That's impressive.


Yes, one time use Python, use it and throw it away. Similarly diffusion images are mostly throwaways and personal stuff, one time use art. Waiting for the one-time use UIs.


Super cool. I wish he'd posted more of the prompts, part of what makes so impressive is how simple they are, like the chess example.


This is not unlike the magic of a Jupyter notebook except that you mostly just write the markdown and Chat does away with a lot of the tedium/boilerplate.

"Prompt engineering" ChatGPT English in this case is like a new higher order programming language. Prolog meets Ruby. A Haskell for mortals. No more counting parentheses or semicolons which turns away a lot of people from coding.

I guess that is magical, yeah.


People don’t realize that if all you need is prompt engineering then actually you’re one step away from it not needing you at all.

Just consider all the prompts ever entered as just another set of data ;-)


A more advanced compiler is not necessarily sentient just because no traditional lexers/parsers are involved. It is shelling out to already written libraries and tools, which it didn't create, and doesn't know itself to glue without your explicit direction.

I suppose the next intriguing step is for it to not just spew the boilerplate but pick on its own what are the best components to connect - hard to predict how far away that might be.


> I suppose the next intriguing step is for it to not just spew the boilerplate but pick on its own what are the best components to connect - hard to predict how far away that might be.

There are already working demos of ChatGPT writing incorrect code, running the code and seeing the incorrect output multiple times, and then iterating until the code worked. AI can write and rewrite infinitely, thousands of times a day, until it gets the desired result, learning and refining it's own process as it goes, and being fine tuned with human feedback.

This is especially true in something completely digital like programming where it's possible for the AI to run the code and use the output as its next prompt. At the moment it needs human feedback to know if the art it just generated is garbage, but the code it can test itself.


That's a good breakdown (I guess. I am not an AI expert).

It does beg the question: How well does the LLM work on non-English languages?

He did mention translators, but I think he was talking about a more primitive incarnation.

I suspect that the Chinese have a version that works well with Chinese input, and maybe the Koreans and Japanese, but I'll bet one of the limitations is how much material is available to learn on, and the contractors needed to verify the training.

It sounds like training is a really expensive task, and may mean that for-profit corporations could get a lock on it.

I'm wondering if we'll be seeing "AI patent trolls," where you are sued, because your LLM has similar responses to someone else's. I see a whole industry springing up, around that (probably with AI lawyers).


I'm not an English native speaker and my first language has only about 2 million speakers with very limited training corpuses. GPT-3.5 is already shockingly good at it, although the outputs do seem a bit less factually accurate and less practically useful than in English. It's really odd that it gets the language right in all aspects except the content.

The great thing about niche languages is that it's not cost effective to scammers to learn it for their purposes. That all changes now I suppose.


> It does beg the question: How well does the LLM work on non-English languages?

Well I haven't chatted to it much in Chinese, but I've asked extensively about Chinese, and that's probably one area where I get the most benefit. It can accurately translate, segment, romanize, and break down phrases; it usually does a pretty good job explaining why a given word was used, and is good at giving you example sentences using a given word or grammar pattern. I'd be surprised if it were significantly worse in Chinese than in English.


It speaks Dutch just as well as English. I've been talking to it for weeks about the house I am renovating. I double check what it tell me, but it's very useful to get a first general answer.


I am curious what things you are talking about in the context of home renovation.

I have not used ChatGPT much but I see more and more uses that are not obvious ones (and usually only the obvious ones are addressed in articles or posts)


The kind of questions I would ask someone with experience in construction.

"If I want to remove plaster from walls, which tools should I use? What should I do with the debris afterwards?"

"Which kinds of cavity wall insulation exist? Will this cause moisture problems in my house? What can I do to detect or prevent these kinds of problems?"

"Is it useful to insulate interior walls in an old house? Or should I just focus on the exterior? Will interior wall insulation raise the EPC (energy performance) of my house?"


It works well in Norwegian so far


Does the Game of Life QR code example have a real seed? The rendered GIF looks fake, because the cells in the top left suddenly decide to become a QR code with no outside input (gliders, etc). Given that GPT-4 is a world-renowned snake oil salesman, I'd love to see the proof.


I was thinking the same thing. That last frame from game of life to QR code looks very much not like a game of life transition. Apparently GPT is as good at faking visual "work" as it is at faking natural language "work".


The gif is reversed, per the article. The desired qr code is the seed, and we see what the qr code would become, played backwards.


The way it's written, the author asked chatgpt to figure out the seed starting from a qrcode and working backwards towards a seed.

There are parts that doesn't really seem to follow the game of life indeed.


Its weird to interact with something that is both dumber than me and also 100000x times smarter and more knowledgable than me.


Is this like Jupyter, where the state of the interpreter is preserved? Or if you ask follow up questions does every interpreter block have to calculate everything from scratch?

I'm still on the waiting list, or I would test it myself.


Apparently OpenAI has some excellent developer relations and marketing people too. Is this guy even a programmer at all? His bio says "WSJ best selling novelist, Edgar & Thriller Award finalist, star of Shark Week, A&E Don’t Trust Andrew Mayne, creative applications and Science Communicator at OpenAI." so maybe not? This blog seems to have useful OpenAI related information, it's odd that it's on this guy's personal blog instead of the OpenAI website.

This morning I feel oddly compelled to play the fool so here are some near/medium term thoughts on where this may be going (worth less than what you paid for them):

1. The most important ChatGPT plugin is going to end up being the one that invokes itself recursively. The autoregression approach seems to be severely limiting what these models can do by limiting their ability to think without speaking. Although a few months ago I thought the obvious way to fix this was to train the model to emit special sort of "bracket" tokens that would be deleted by the driver once the inner thought completed, leaving only a sort of "result" section, the GPT-as-a-GPT-plugin effectively does the same thing

2. Whilst the first biggest win from the plugin will be "sub-thoughts", the next biggest will be training it how to dispatch multiple sub-thoughts in parallel. GPT already knows how to break a complex problem down into steps, but is still constrained by context window size and inference speed. Once it is taught how to split a problem up such that multiple independent inference sessions are able to work on it in parallel, it'll become feasible to make requests like "Build me a video game from scratch using Unreal Engine, set in the world of Harry Potter, about the adventures of a character named X" etc and it'll end up dispatching a massive tree of GPT sub-instances which end up working on the independent parts like character generation, writing the Unreal C++, prompting Midjourney and so on.

Parallel recursive LLMs are going to be much more awesome than current LLMs, and I mean that in both senses of the word (cool, awe-inspiring). In particular, this will allow us to pose questions like "How can we cure cancer?".

3. OpenAI need a desktop app, pronto. Whilst the cloud model can take you some way, the most valuable data is locked behind authentication screens. The cloud approach faces difficult institutional barriers, because data access inside organizations is oriented around granting permissions to individuals even when they work in teams. Giving a superbrain superuser access doesn't fit well with that, because there's no robust method to stop the AI immediately blabbing business secrets or PII to whoever tickles it in the right way. That's one reason why the current wave of AI startups is focused on open source technical docs and things. If ChatGPT is given tool access via a desktop app running on the end user's computer, it can access data using the same authentication tokens issued to individuals. This also neatly solves the question of who is accountability for mistakes: it's the user who runs the app.

4. Sandbox engineering is the new black.


For some reason I've never seen the idea of auto-recursive prompting in any of the papers or discussions. It makes so much sense. It can also help with model and compute size. Instead of using this large model to, say, list the number of primes less than a 1000, it can prompt GPT-3 to do it and count them, then send it back to GPT-4. Sounds quite feasible to implement too!


Exactly. I’m currently working on this approach. Everything is available to implement it.


Awesome! Please do share once you get some results :D


Original author here. I'm a programmer. I started on the Applied team at OpenAI back in 2020 as a prompt engineer (I helped create many of the examples in the GPT-3 docs.) I became the Science Communicator for OpenAI in 2021.

My blog audience is very non-technical so I write very broadly. We've been super busy with the launch of GPT-4 and Plugins (I produced the video content, found examples, briefed media on technical details, etc.) so I was only able to grab a few hours to put these demos together.

As far as the ChatGPT prompts go, I included a few, but they're just simple instructions. Unlike GPT 3.5 where I'd spend an hour or more getting the right instruction to do zero-shot app creation, GPT-4 just gets it.


Thanks for the reply! That makes sense, I just didn't see mention of coding in your bio. You've had a very varied career!


I always have trouble figuring out what to put in my bio. Every five years I'd shift into something new that caught my interest.

After a stint in entertainment I realized that AI was where everything was heading. I took up programming and started studying AI.


Wow, you learned programming specifically to work with AI? That is an inspiring level of flexibility in skills and self-identification. Perhaps many of us will need to learn how to do that sort of reinvention sooner, rather than later.


>Apparently OpenAI has some excellent developer relations and marketing people too.

I've been repeating this for a while now, I think OpenAI is 50% marketing, and a part of the rest is product.

GPT-enhanced code execution already exists (Langchain).

>sub-thoughts

Someone's trying to implement long-term memory : https://github.com/wawawario2/text-generation-webui


I agree 100% and in fact I’m already working on what you suggest.

> GPT-as-a-GPT-plugin

> Parallel recursive LLMs

Anyone else?


Ah cool. Do I get first dibs on a demo? ;)

(edited to delete a post that was already answered by the other reply)


Not desktop at all. I’m focused on it operating it’s own computing resources using the recursive approach. I call it multi-agent LLM approach. This way it can breakdown a complex task into components and attack each component in parallel or sequentially as it needs.


Nice. Are you a professional AI researcher?


I’m not a researcher at all but a partitioner with extensive quantitative development experience in an applied industry situation using ML tools.

I’ve been thinking that taking this up a level is more a systems architecture problem. The core LLM model is so incredibly flexible and powerful that what I’m working on is the meta application of that tool and giving it the ability to use itself to solve complex problems in layers.

Hopefully that makes sense. I already have a fairly extensive and detailed systems architecture design.


It is unclear from the OpenAI description or this really cool write-up whether the code interpreter is actually running a Python interpreter, or is emulating a Python interpreter. I assume it is actually running a Python interpreter, but want to make sure (so I know how much verification of results would be needed).


It's actually running the Python the AI generates in a sandboxed environment.


How does a sandboxed Python interpreter work? Is it even possible, from what I heard you can't really sandbox Python.


Here, let me try:

    docker run --rm python:3
There you go. Before putting it in production, I recommend reading up some on seccomp, user namepsaces, and other security options, but to be honest you're already pretty fine with just that.

Want more options? AWS lambda supports python. Cloud functions of various clouds (google, probably azure, etc) support python. Those are all sandboxes.

You can cross-compile python to wasm, and run it directly with javascript... or with any of a number of sandboxed wasm runtimes.

All that said, given the python packages they have available, my money is on a bunch of docker containers, not wasm or lambda or such.


Pedantic reminder containers are not secure sandboxes without a tweaked runtime like gVisor.


I don't think this "pedantic reminder" is correct.

Feel free to prove me wrong though. Give me a binary to run which will escape a container, configured only with docker flags I choose to set, but without gvisor.

If it's not a "secure sandbox", surely the above should be easy... But of course you won't be able to. "contained.af" (temporarily broken right now unfortunately) has been running for 5+ years with docker, no gvisor, and no one capturing the CTF flag.


No argument. I personally find gVisor much easier to reason about and delivering exactly what it says on the tin so that is what I proselytize.


Here's an overview of how to sandbox almost every programming language ever using standard linux sandboxing techniques: https://github.com/TryItOnline

This powers tio.run


Sandboxing Python is perfectly possible. You just use kernel/process level sandboxing, or you could use something like the GraalVM sandbox which operates at the userspace level by exploiting compiler/VM tech.


Not since I spent two weeks at the Pret a Porter in Hong Kong Central staring at Bitcoin’s white paper in 2011 have I seen so much active interest in a new technology area. I estimate that about 25% of every developer’s time is now fixed on GPT experimentation. Quite an astonishing turn in the history of technology.


Hence, magic-chatgpt=interpreter


Turns out Tom Smykowski (the "I'm a people person, I take the customers' specifications to the engineers" character from Office Space) is the future of engineering.


I think you're jumping to conclusions


It's a good day to be a programmer with stuff like this at our finger tips. Just a matter of imagination and creativity for what you could do with it and what problems you confront that can be mostly automated.

Even though it's doing (most) of the programming and gluing stuff together, understanding the fundamental structures underneath is still pretty critical to really take advantage and push the boundaries.


I've been working on this which has some similar functionality in a desktop app: https://github.com/blipk/gptroles

It's still in development, but you can currently run bash and python from any markdown code blocks in the chat.

I'm working on getting a terminal emulator over the code blocks for better integration.


We released an LLM powered python interpreter namely ICortex as an open source project back in 2022: https://github.com/textcortex/icortex


Memory and reward/punishment systems are so unexplored in that context. IMO it is what it will make those systems get what a lot of people define as "agency"


that's pure crack, inject into my arms. But all these plug-in might lead to it going out of it's sandbox and creating an AI powered worm or virus


See my experiment trying to write interactive fiction with ChatGPT. In two hours I managed to have it come up with the more advanced interactive narration system ever devised. I don't say this to boast, this is truly leagues ahead of what has been done in this field: the game plays on two planes, action and narration plane, with two characters, one that listen to the narrator, and one that incarnates him in the past. You can argue with the narrator to change the course of the narrated action.

https://old.reddit.com/r/interactivefiction/comments/11z6p84...

Yesterday I went as far as implementing narrator/listener roles reversal. The player can now take over narration. As a result he must write what an IF system would print on the screen, and the IA replies by issuing commands, endorsing the role of the player.

This is unprecedented, all the more so 99% of the work is done by simply asking: "Let's play a session of interactive fiction".

I agree the comparison is a bit unfair since ChatGPT4 did not produce any code, so I would have to compare this against chatting with someone pretending to be an IF system.

But is it ? The most popular language used to write interactive is Inform7. Inform7 is a language that reads like english (a control subset of it) to program text based games the player interacts with using a very limited form of english, the limit being your capacity at handling textual structures using programmatic constructs.

Couldn't we program entirely using natural language this way (and maybe have ChatGPT writing its own Evals on the go) ? To me it looks like a new kind of programming that's mostly conversational rather than instructional. Instead of composing together contextless bricks to build towers of abstraction and striving for easily adaptable/maintainable code, which are consequences of the fact the concepts named in our codebase are influenced by the distributive aspects of meaning in real world ("hey, want to hear the last news ? Management has decided multiple users should be able to manage the same account"), our building material is some kind of primordial plasma of context out of which logical, cartesian, categorical forms can be enticed to emerge by priming the medium. This could become semiotic programming, in contrast to symbolic programming. Another aspect of Inform7 that's particularly interesting is the fact it is built upon a literate programing "web" engine. Seen in juxtaposition to its latent and broken homoiconic tendency and transposed to ChatGPT this makes for very interesting prompts where the lines drawn between documentation, code and programmed material gets blurred, as I explain using text (maybe to myself ?) how to imagine text using text.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: