Native JSON Output from GPT-4

danShumway · on June 15, 2023

I'm concerned that OpenAI's example documentation suggests using this to A) construct SQL queries and B) summarize emails, but that their example code doesn't include clear hooks for human validation before actions are called.

For a recipe builder it's not so big a deal, but I really worry how eager people are to remove human review from these steps. It gets rid of a very important mechanism for reducing the risks of prompt injection.

The top comment here suggests wiring this up to allow GPT-4 to recursively call itself. Meanwhile, some of the best advice I've seen from security professionals on secure LLM app development is to whenever possible completely isolate queries from each other to reduce the potential damage that a compromised agent can do before its "memory" is wiped.

There are definitely ways to use this safely, and there are definitely some pretty powerful apps you could build on top of this without much risk. LLMs as a transformation layer for trusted input is a good use-case. But are devs going to stick with that? Is it going to be used safely? Do devs understand any of the risks or how to mitigate them in the first place?

3rd-party plugins on ChatGPT have repeatedly been vulnerable in the real world, I'm worried about what mistakes developers are going to make now that they're actively encouraged to treat GPT as even more of a low-level data layer. Especially since OpenAI's documentation on how to build secure apps is mostly pretty bad, and they don't seem to be spending much time or effort educating developers/partners on how to approach LLM security.

abhibeckert · on June 15, 2023

In my opinion the only way to use it safely is to ensure your AI only has access to data that the end user already has access to.

At that point, prompt injection is no-longer an issue - because the AI doesn't need to hide anything.

Giving GPT access to your entire database, but telling it not to reveal certain bits, is never going to work. There will always be side channel vulnerabilities in those systems.

danShumway · on June 15, 2023

> e.g. define a function called extract_data(name: string, birthday: string), or sql_query(query: string)

This section in OpenAI's product announcement really irritates me because it's so obvious that the model should have access to a subset of API calls that themselves fetch the data, as opposed to giving the model raw access to SQL. You could have the same capabilities while eliminating a huge amount of risk. And OpenAI just sticks this right in the announcement, they're encouraging it.

When I'm building a completely isolated backend with just regular code, I still usually put a data access layer in front of the database in most cases. I still don't want my REST endpoints directly building SQL queries or directly accessing the database, and that's without an LLM in the loop at all. It's just safer.

It's the same idea as using `innerHTML`; in general it's better when possible to have those kinds of calls extremely isolated and to go through functions that constrain what can go wrong. But no, OpenAI just straight up telling developers to do the wrong things and to give GPT unrestricted database access.

jmull · on June 15, 2023

SQL doesn’t necessarily have to mean full database access.

I known it’s pretty common to have apps connect to a database with a db user with full access to do anything, but that’s definitely not the only way.

If you’re interested in being safer, it’s worth learning the security features built in to your database.

danShumway · on June 15, 2023

> If you’re interested in being safer, it’s worth learning the security features built in to your database.

The problem isn't that there's no way to be safe, the problem is that OpenAI's documentation does not do anything to discourage developers from implementing this in the most dangerous way possible. Like you suggest, the most common way this will be implemented is via a db user with full access to do anything.

Developers would be far more likely to implement this safely if they were discouraged from using direct SQL queries. Developers who know how to safely add SQL queries will still know how to do that -- but developers who are copying and pasting code or thinking naively "can't I just feed my schema into GPT" should be pushed towards an implementation that's harder to mess up.

jmull · on June 15, 2023

It's hard for me to believe openai's documentation will have any effect on developers who write or copy-and-paste data access code without regard to security, no matter what it says.

If you provide an API or other external access to app data and the app data contains anything not everyone should be able to access freely then your API has to implement some kind of access control. It really doesn't matter if your API is SQL-based, REST-based, or whatever.

A SQL-based API isn't inherently less secure than a non-SQL-based one if you implement access control, and a non-SQL-based API isn't inherently more secure than a SQL-based one if you don't implement access control. The SQL-ness of an API doesn't change the security picture.

danShumway · on June 15, 2023

> If you provide an API or other external access to app data and the app data contains anything not everyone should be able to access freely then your API has to implement some kind of access control. It really doesn't matter if your API is SQL-based, REST-based, or whatever.

I don't think that's the way developers are going to interact with GPT at all, I don't think they're looking at this as if it's external access. OpenAI's documentation makes it feel like a system library or dependency, even though it's clearly not.

I'll go out on a limb, I suspect a pretty sizable chunk (if not an outright majority) of the devs who try to build on this will not be thinking about the fact that they need access controls at all.

> A SQL-based API isn't inherently less secure than a non-SQL-based one if you implement access control, and a non-SQL-based API isn't inherently more secure than a SQL-based one if you don't implement access control. The SQL-ness of an API doesn't change the security picture.

I'm not sure I agree with this either. If I see a dev exposing direct query access to a database, my reaction is going to be very dependent on whether or not I think they're an experienced programmer already. If I know them enough to trust them, fine. Otherwise, my assumption is that they're probably doing something dangerous. I think the access controls that are built into SQL are a lot easier to foot-gun, I generally advise devs to build wrappers because I think it's generally harder to mess them up. Opinion me :shrug:

Regardless, I do think the way OpenAI talks about this does matter, I do think their documentation will influence how developers use the product, so I think if they're going to talk about SQL they should in-code be showing examples of how to implement those access controls. "We're just providing the API, if developers mess it up its their fault" -- I don't know, good APIs and good documentation should try to when possible provide a "pit of success[0]" for naive developers. In particular I think that matters when talking about a market segment that is getting a lot of naive VC money thrown at it sometimes without a lot of diligence, and where those security risks may end up impacting regular people.

[0]: https://blog.codinghorror.com/falling-into-the-pit-of-succes...

BoorishBears · on June 15, 2023

You don't need to directly run the query it returns, you can use that query as a sub-query on a known safe set of data and let it fail if someone manages to prompt inject their way into looking at other tables/columns.

That way you can support natural language to query without sending dozens of functions (which will eat up the context window)

danShumway · on June 15, 2023

You can do that (I wouldn't advise it, there are still problems that are better solved by building explicit functions; but you can use subqueries and it would be safer) -- but most developers won't. They'll run the query directly. Most developers also will not execute it as a readonly query, they'll give the LLM write access to the database.

If OpenAI doesn't know that, then I don't know what to say, they haven't spent enough time writing documentation for general users.

BoorishBears · on June 15, 2023

You can't advise for or against it without a well defined problem: for some cases explicit functions won't even be an option.

Defining basic CRUD functions for a few basic entities will a ton of tokens in schema definitions, and still suffers from injection if you want to support querying on data that wasn't well defined a-priori, which is a problem I've worked on.

Overall if this was one of their example projects I'd be disappointed, but it was a snippet in a release note. So far their actual example projects have done a fair job showing where guardrails in production systems are needed, I wouldn't over-index on this.

danShumway · on June 15, 2023

> You can't advise for or against it without a well defined problem: for some cases explicit functions won't even be an option.

On average I think I can. I mean, I can't know without the exact problem specifications whether or not a developer should use `innerHTML`/`eval`. But I can offer general advice against it, even though both can be used securely. I feel pretty safe saying that exposing SQL access directly in an API will usually lead to more fragile infrastructure. There are plenty of exceptions of course, but there are exceptions to pretty much all programming advice. I don't think it's good for it to be one of the first examples they bring up for how to use the API.

----

> Overall if this was one of their example projects I'd be disappointed

I have similar complaints about their example code. They include the comment:

> # Note: the JSON response from the model may not be valid JSON

But they don't actually do schema validation here or check anything. Their example project isn't fit to deploy. My thought on this is that if every response for practically every project needs to have schema validation (and I would strongly advise doing schema validation on every response), then the sample code should have schema validation in it. Their example project should be something that could be almost copy-and-pasted.

If that makes the code sample longer, well... that is the minimum complexity to build an app on this. The sample code should reflect that.

> and still suffers from injection if you want to support querying on data that wasn't well defined a-priori

This is a really good point. My response would be that they should be expanding on this as well. I'm really frustrated that OpenAI's documentation provides (imo) basically no really practical/great security advice other than "hey, this problem exists, make sure you deal with it." But it seems to me like they're already falling over on providing good documentation before they even get to the point where they can talk seriously about bigger security decisions.

jacobr1 · on June 15, 2023

> your AI only has access to data that the end user already has access to.

That doesn't work for the same reason you mention with a DB ... any data source is vulnerable to indirect injection attacks. If you open the door to ANY data source this a factor, including ones under the sole "control" of the user.

kristiandupont · on June 15, 2023

>At that point, prompt injection is no-longer an issue [...]

As far as input goes, yes. But I am more worried about agents that can take actions that affect the outside world, like sending emails on your behalf.

sillysaurusx · on June 15, 2023

I was going to say “I look forward to it and think it’s hilarious,” but then I remembered that most victims will be people learning to code, not companies. It would really suck to suddenly lose your recipe database when you just wanted to figure out how this programming stuff worked.

Some kind of “heads up” tagline is probably a good idea, yeah.

kristiandupont · on June 15, 2023

I think the victims will mostly be the users of the software. The personal assistant that can handle your calendar and emails and all would be able to do real damage.

irthomasthomas · on June 15, 2023

I don't understand why they have done this? Like, how did the conversations go when it was pointed out to them what a pretty darn bad idea it was to recommend connecting chatgpt directly to a SQL database?

I know we are supposed to assume incompetence over malice, but no one is that incompetent. They must have had the conversations, and chose to do it anyway.

sebzim4500 · on June 15, 2023

Why is this unreasonable to you? I can imagine using this, just run it with read access and check the sql if the results are interesting.

irthomasthomas · on June 15, 2023

Even read only. You are giving access to your data to a black box API.

sebzim4500 · on June 15, 2023

If it's on Azure anyway I don't see the big deal, especially if you are an enterprise and so buying it via azure instead of directly.

blitzar · on June 15, 2023

Perhaps they plan on having ChatGPT make a quick copy of your database, for your convenience of course.

swyx · on June 14, 2023

i think people are underestimating the potential here for agents building - it is now a lot easier for GPT4 to call other models, or itself. while i was taking notes for our emergency pod yesterday (https://www.latent.space/p/function-agents) we had this interesting debate with Simon Willison on just how many functions will be supplied to this API. Simon thinks it will be "deep" rather than "wide" - eg a few functions that do many things, rather than many functions that do few things. I think i agree.

you can now trivially make GPT4 decide whether to call itself again, or to proceed to the next stage. it feels like the first XOR circuit from which we can compose a "transistor", from which we can compose a new kind of CPU.

jonplackett · on June 14, 2023

It was already quite easy to get GPT-4 to output json. You just append ‘reply in json with this format’ and it does a really good job.

GPT-3.5 was very haphazard though and needs extensive babysitting and reminding, so if this makes gpt3 better then it’s useful - it does have an annoying disclaimer though that ‘it may not reply with valid json’ so we’ll still have to do some sense checks into he output.

I have been using this to make a few ‘choose your own adventure’ type games and I can see there’s a TONNE of potential useful things.

ignite · on June 14, 2023

> You just append ‘reply in json with this format’ and it does a really good job.

It does an ok job. Except when it doesn't. Definitely misses a lot of the time, sometimes on prompts that succeeded on previous runs.

bel423 · on June 15, 2023

It literally does it everytime perfectly. I remember I put together an entire system that would validate the JSON against a zod schema and use reflection to fix it and it literally never gets triggered because GPT3.5-turbo always does it right the first time.

worik · on June 15, 2023

> It literally does it everytime perfectly. I remember I put together an entire system that would validate the JSON against a zod schema and use reflection to fix it and it literally never gets triggered because GPT3.5-turbo always does it right the first time.

Danger! There be assumptions!!

gpt-? is a moving target and in rapid development. What it does Tuesday, which it did not do on Monday, it may well not do on Wednesday

If there is a documented method to guarantee it, it will work that way (modulo OpenAI bugs - and now Microsoft is involved....)

What we had before, what you are talking of, was observed behaviour. An assumption that what we observed in the past will continue in the future is not something to build a business on

travisjungroth · on June 15, 2023

ChatGPT moves fast. The API version doesn’t seem to change except with the model and documented API changes.

whateveracct · on June 15, 2023

No it doesn't lol. I've seen it just randomly not use a comma after one array element, for example.

LanceJones · on June 15, 2023

Yep. Incorrect trailing commas ad nauseum for me.

thomasfromcdnjs · on June 15, 2023

Are you saying that it return only JSON before? I'm with the other commenters it was wildly variable and always at least said "Here is your response" which doesn't parse well.

travisjungroth · on June 15, 2023

If you want a parsable response, have it wrap that with ```. Include an example request/response in your history. Treat any message you can’t parse as an error message.

This works well because it has a place to put any “keep in mind” noise. You can actually include that in your example.

lmeyerov · on June 15, 2023

Yeah no

sheepscreek · on June 15, 2023

The solution that worked great for me - do not use JSON for GPT to agent communication. Use comma separated key=value, or something to that effect.

Then have another pure code layer to parse that into structured JSON.

I think it’s the JSON syntax (with curly braces) that does it in. So YAML or TOML might work just as well, but I haven’t tried that.

jacobsimon · on June 15, 2023

Coincidentally, I just published this JS library[1] over the weekend that helps prompt LLMs to return typed JSON data and validates it for you. Would love feedback on it if this is something people here are interested in. Haven’t played around with the new API yet but I think this is super exciting stuff!

[1] https://github.com/jacobsimon/prompting

golergka · on June 15, 2023

Looks promising! Do you do retries when returned json is invalid? Personally, I used io-ts for parsing, and GPT seems to be able to correct itself easily when confronted with a well-formed error message.

jacobsimon · on June 15, 2023

Great idea, I was going to add basic retries but didn’t think to include the error.

Any other features you’d expect in a prompt builder like this? I’m tempted to add lots of other utility methods like classify(), summarize(), language(), etc

bombela · on June 15, 2023

It's harder to form a tree with key value. I also tried the relational route. But it would always messup the cardinality (one person should have 0 or n friends, but a person has a single birth date).

sheepscreek · on June 15, 2023

You could flatten it using namespaced keys. Eg.

    {
      parent1: { child1: value }
    }

Becomes one of the following:

    parent1/child1=value
    parent1_child1=value
    parent1.child1=value

..you get the idea.

rubyskills · on June 15, 2023

It's also harder to stream JSON? Maybe I'm overthinking this.

cwxm · on June 14, 2023

even with gpt 4, it hallucinates enough that it’s not reliable, forgetting to open/close brackets and quotes. This sounds like it’d be a big improvement.

jonplackett · on June 14, 2023

Not that it matters now but just doing something like this works 99% of the time or more with 4 and 90% with 3.5.

It is VERY IMPORTANT that you respond in valid JSON ONLY. Nothing before or after. Make sure to escape all strings. Use this format:

{“some_variable”: [describe the variable purpose]}

SamPatt · on June 14, 2023

99% of the time is still super frustrating when it fails, if you're using it in a consumer facing app. You have to clean up the output to avoid getting an error. If it goes from 99% to 100% JSON that is a big deal for me, much simpler.

jonplackett · on June 14, 2023

Except it says in the small print to expect invalid JSON occasionally, so you have to write your error handling code either way

golergka · on June 15, 2023

If you're building an app based on LLMs that expects higher than 99% correctness from it, you are bound to fail. Negative scenarios workarounds and retries are mandatory.

davepeck · on June 14, 2023

Yup. Is there a good/forgiving "drunken JSON parser" library that people like to use? Feels like it would be a useful (and separable) piece?

golol · on June 14, 2023

Honestly, I suspect asking GPT-4 to fix your JSON (in a new chat) is a good drunken JSON parser. We are only scraping the surface of what's possible with LLMs. If Token generation was free and instant we could come up with a giant schema of interacting model calls that generates 10 suggestions, iterates over them, ranks them and picks the best one, as silly as it sounds.

andai · on June 15, 2023

That's hilarious... if parsing GPT's JSON fails, keep asking GPT to fix it until it parses!

golol · on June 15, 2023

It shouldn't be surprising though. If a human makes an error parsing JSON, what do you do? You make them look over it again. Unless their intelligence is the bottleneck they might just be able to fix it.

golergka · on June 15, 2023

It works. Just be sure to build a good error message.

hhh · on June 14, 2023

I already do this today to create domain-specific knowledge focused prompts and then have them iterate back and forth and a ‘moderator’ that chooses what goes in and what doesn’t.

8organicbits · on June 14, 2023

Wouldn't you use traditional software to validate the JSON, then ask chatgpt to try again if it wasn't right?

girvo · on June 15, 2023

In my experience, telling it "no thats wrong, try again" just gets it to be wrong in a new different way, or restate the same wrong answer slightly differently. I've had to explicitly guide it to correct answers or formats at times.

cjbprime · on June 15, 2023

Try different phrasing, like "Did your answer follow all of the criteria?".

whateveracct · on June 15, 2023

It forgets commas too

ztratar · on June 14, 2023

Nah, this was solved by most teams a while ago.

bel423 · on June 15, 2023

I feel like I’m taking crazy pills with the amount of people saying this is game changing.

Did they not even try asking gpt to format the output as json?

worik · on June 15, 2023

> I feel like I’m taking crazy pills....try asking gpt to format the output as json

You are taking crazey pills. Stop

gpt-? is unreliable! That is not a bug in it, it is the nature of the beast.

It is not an expert at anything except natural language, and even then it is an idiot savant

sethd · on June 14, 2023

I like to define a JSON schema (https://json-schema.org/) and prompt GPT-4 to output JSON based on that schema.

This lets me specify general requirements (not just JSON structure) inline with the schema and in a very detailed and structured manor.

seizethecheese · on June 15, 2023

In a production system, you don’t need easy to do most of the time, you need easy without fail.

pnpnp · on June 15, 2023

Ok, just playing devil's advocate here. How many FAANG companies have you seen have an outage this year? What's their budget?

I think a better way to reply to the author would have been "how often does it fail"?

Every system will have outages, it's just a matter of how much money you can throw at the problem to reduce them.

jrockway · on June 15, 2023

If 99.995% correct looks bad to users, wait until they see 37%.

muzani · on June 15, 2023

It's fine, but the article makes some good points why - less cognitive load for GPT and less tokens. I think the transistor to logic gate analogy makes sense. You can build the thing perfectly with transistors, but just use the logic gate lol.

reallymental · on June 14, 2023

Is there any publicly available resource replicate your work? I would love to just find the right kind of "incantation" for the gpt-3.5-t or gpt-4 to output a meaningful story arc etc.

Any examples of your work would be greatly helpful as well!

SamPatt · on June 14, 2023

I'm not the person you're asking, but I built a site that allows you to generate fiction if you have an OpenAI API key. You can see the prompts sent in console, and it's all open source:

https://havewords.ai/

devbent · on June 15, 2023

I have an open source project doing exactly this at https://www.generativestorytelling.ai/ GitHub link is on the main page!

bradly · on June 14, 2023

I could not get GPT-4 to reliably not give some sort of text response, even if was just a simple "Sure" followed by the JSON.

avereveard · on June 15, 2023

Pass in an agent message with "Sure here is the answer in json format:" after the user message. Gpt will think it has already done the preamble and the rest of the message will start right with the json.

rytill · on June 14, 2023

Did you try using the API and providing a very clear system message followed by several examples that were pure JSON?

bradly · on June 14, 2023

Yep. I even gave it a JSON schema file to use. It just wouldn't stop added extra verbage.

taylorfinley · on June 15, 2023

I just use a regex to select everything between the first and last curly bracket, reliable fixes the “sure, here’s your object” problem.

NicoJuicy · on June 14, 2023

Say it's a json API and may only reply with valid json without explanation.

bradly · on June 15, 2023

Lol yes of course I tried that.

dror · on June 15, 2023

I've had good luck with both:

https://github.com/drorm/gish/blob/main/tasks/coding.txt

and

https://github.com/drorm/gish/blob/main/tasks/webapp.txt

With the second one, I reliably generated half a dozen apps with one command.

Not to say that it won't fail sometimes.

NicoJuicy · on June 15, 2023

Combine both ? :)

throwuwu · on June 15, 2023

Just end your request with

‘’’json

Or provide a few examples of user request and then agent response in json. Or both.

clbrmbr · on June 15, 2023

Does the ```json trick work with the chat models? Or only the earlier completion models?

throwuwu · on June 15, 2023

Works with chat. They’re still text completion models under all that rlhf

majormajor · on June 14, 2023

GPT-4 was already a massive improvement on 3.5 in terms of replying consistently in a certain JSON structure - I often don't even need to give examples, just a sentence describing the format.

It's great to see they're making it even better, but where I'm currently hitting the limit still in GPT-4 for "shelling out" is about it being truly "creative" or "introspective" about "do I need to ask for clarifications" or "can I find a truly novel away around this task" type of things vs "here's a possible but half-baked sequence I'm going to follow".

fumar · on June 14, 2023

It is “good enough”. Where I struggle is maintaining its memory through a longer request where multiple iterations fail or succeed and then all of a sudden its memory is exceeded and starts fresh. I wish I could store “learnings” that it could revisit.

ehsanu1 · on June 15, 2023

Sounds like you want something like tree of thoughts: https://arxiv.org/abs/2305.10601

jimmySixDOF · on June 15, 2023

Interestingly the paper's repo starts off :

Blah Blah "...is NOT the correct implementation to replicate paper results. In fact, people have reported that his code cannot properly run, and is probably automatically generated by ChatGPT, and kyegomez has done so for other popular ML methods, while intentionally refusing to link to official implementations for his own interests"

Love a good GitHub Identity Theft Star farming ML story

But this method could have potential for a chain of function

lbeurerkellner · on June 14, 2023

It's interesting to think about this form of computation (LLM + function call) in terms of circuitry. It is still unclear to me however, if the sequential form of reasoning imposed by a sequence of chat messages is the right model here. LLM decoding and also more high-level "reasoning algorithms" like tree of thought are not that linear.

Ever since we started working on LMQL, the overarching vision all along was to get to a form of language model programming, where LLM calls are just the smallest primitive of the "text computer" you are running on. It will be interesting to see what kind of patterns emerge, now that the smallest primitive becomes more robust and reliable, at least in terms of the interface.

jillesvangurp · on June 15, 2023

Exactly, we humans can use specialized models and traditional tool APIs and models and orchestrate the use of all these without understanding how these things work in detail.

To do accounting, GPT 4 (or future models) doesn't have to know how to calculate. All it needs to know how to interface with tools like calculators, spreadsheets, etc. and parse their outputs. Every script, program, etc. becomes a thing that has such an API. A lot what we humans do to solve problems is breaking down big problems into problems where we know the solution already.

Real life tool interfaces are messy and optimized for humans with their limited language and cognitive skills. Ironically, that means they are relatively easy to figure out for AI language models. Relative to human language the grammar of these tool "languages" is more regular and the syntax less ambiguous and complicated. Which is why gpt 3 and 4 are reasonably proficient with even some more obscure programming languages and in the use of various frameworks; including some very obscure ones.

Given a lot of these tools with machine accessible APIs with some sort of description or documentation, figuring out how to call these things is relatively straightforward for a language model. The rest is just coming up with a high level plan and then executing it. Which amounts to generating some sort of script that does this. As soon as you have that, that in itself becomes a tool that may be used later. So, it can get better over time. Especially once it starts incorporating feedback about the quality of its results. It would be able to run mini experiments and run its own QA on its own output as well.

minimaxir · on June 14, 2023

"Trivial" is misleading. From OpenAI's docs and demos, the full ReAct workflow is an order of magnitude more difficult than typical ChatGPT API usage with a new set of constaints (e.g. schema definitions)

Even OpenAI's notebook demo has error handling workflows which was actually necessary since ChatGPT returned incorrect formatted output.

cjonas · on June 14, 2023

Maybe trivial isn't the right word, but it's still very straight-forward to get something basic, yet really powerful...

ReAct Setup Prompt (goal + available actions) -> Agent "ReAction" -> Parse & Execute Action -> Send Action Response (success or error) -> Agent "ReAction" -> repeat

As long as each action has proper validation and returns meaningful error messages, you don't need to even change the control flow. The agent will typically understand what went wrong, and attempt to correct it in the next "ReAction".

I've been refactoring some agents to use "functions" and so far it seems to be a HUGE improvement in reliability vs the "Return JSON matching this format" approach. Most impactful is that fact that "3.5-turbo" will now reliability return JSON (before you'd be forced to use GPT-4 for an ReAct style agent of modest complexity).

My agents also seem to be better at following other instructions now that the noise of the response format is gone (of course it's still there, but in a way it has been specifically trained on). This could also just be a result of the improvements to the system prompt though.

devbent · on June 15, 2023

For 3.5, I found it easiest to specify a simple, but parsable, format for responses and then convert that to JSON myself.

I'll have to see if the new JSON schema support is easier than what I already have in place.

quickthrower2 · on June 15, 2023

The first transistors were slow, and it seems this "GPT3/4 calling itself" stuff is quite slow. GPT3/4 as a direct chat is about as slow as I can take. Once this gets sped up.

I am sure it will, as you can scale out, scale up and build more efficient code and build more efficient architectures and "tool for the job" different parts of the process.

The problem now (using auto gpt, for example) is accuracy is bad, so you need human feedback and intervention AND it is slow. Take away the slow, or the needing human intervention and this can be very powerful.

I dream of the breakthrough "shitty old laptop is all you need" paper where they figure out how to do amazing stuff with a 1Gb of space on a spinny disk and 1Gb RAM and a CPU.

SimianLogic · on June 15, 2023

I agree with this. We’ve already gotten pretty good at json coercion, but this seems like it goes one step further by bundling decision making in to the model instead of junking up your prompt or requiring some kind of eval on a single json response.

It should also be much easier to cache these functions. If you send the same set of functions on every API hit, OpenAI should be able to cache that more intelligently than if everything was one big text prompt.

moneywoes · on June 14, 2023

Wow your brand is huge. Crazy growth. i wonder how much these subtle mentions on forums help

TeMPOraL · on June 14, 2023

They're the only one commenter on HN I noticed keeps writing "smol" instead of "small", and is associated with projects with "smol" in their name. Surely I'm not the only one who missed it being a meme around 2015 or sth., and finds this word/use jarring - and therefore very attention-grabbing? Wonder how much that helps with marketing.

This is meant with no negative intentions. It's just that 'swyx was, in my mind, "that HN-er that does AI and keeps saying 'smol'" for far longer than I was aware of latent.space articles/podcasts.

memefrog · on June 14, 2023

Personally, I associate "smol" with "doggo" and "chonker" and other childish redditspeak.

swyx · on June 15, 2023

and fun fact i used to work at Temporal too heheh.

swyx · on June 15, 2023

i mean hopefully its relevant content to the discussion, i hope enough pple know me here by now that i fully participate in The Discourse rather than just being here to cynically plug my stuff. i had a 1.5 hr convo with simon willison and other well known AI tinkerers on this exact thing, and so I shared it, making the most out of their time that they chose to share with me.

freezed8 · on June 14, 2023

100%, if the API itself can choose to call a function or an LLM, then it's way easier to build any agent loop without extensive prompt engineering + worrying about errors.

Tweeted about it here as well: https://twitter.com/jerryjliu0/status/1668994580396621827?s=...

bel423 · on June 15, 2023

You still have to worry about errors. You will probably have to add an error handler function that it can call out to. Otherwise the LLM will hallucinate a valid output regardless of the input. You want it to be able to throw an error and say I could produce the output given this format.

ftxbro · on June 14, 2023

> "you can now trivially make GPT4 decide whether to call itself again, or to proceed to the next stage."

Does this mean the GPT-4 API is now publicly available, or is there still a waitlist? If there's a waitlist and you literally are not allowed to use it no matter how much you are willing to pay then it seems like it's hard to call that trivial.

Tostino · on June 14, 2023

Not GP, but it's still the latter...i've been (im)patiently waiting.

From their blog post the other day: With these updates, we’ll be inviting many more people from the waitlist to try GPT-4 over the coming weeks, with the intent to remove the waitlist entirely with this model. Thank you to everyone who has been patiently waiting, we are excited to see what you build with GPT-4!

londons_explore · on June 14, 2023

If you put contact info in your HN profile - especially an email address that matches one you use to login to openai, someone will probably give you access...

Anyone with access can share it with any other user via the 'invite to organisation' feature. Obviously that allows the invited person do requests billed to the inviter, but since most experiments are only a few cents that doesn't really matter much in practice.

Tostino · on June 14, 2023

Good to know, but I've racked up a decent bill for just my GPT 3.5 use. I can get by with experiments using my ChatGPT Plus subscription, but I really need my own API access to start using it for anything serious.

bayesianbot · on June 14, 2023

"With these updates, we’ll be inviting many more people from the waitlist to try GPT-4 over the coming weeks, with the intent to remove the waitlist entirely with this model. Thank you to everyone who has been patiently waiting, we are excited to see what you build with GPT-4!"

https://openai.com/blog/function-calling-and-other-api-updat...

jarulraj · on June 15, 2023

Interesting observation, @swyx. There seems to be a connection to transitive closure in SQL queries, where the output of the query is fed as the input to the query in the next iteration [1]. We are thinking about how to best support such recursive functions in EvaDB [2].

[1] http://dwhoman.com/blog/sql-transitive-closure.html [2] https://evadb.readthedocs.io/en/stable/source/tutorials/11-s...

ilaksh · on June 14, 2023

The thing is the relevant context often depends on what it's trying to do. You can give it a lot of context in 16k but if there are too many different types of things then I think it will be confused or at least have less capacity for the actual selected task.

So what I am thinking is that some functions might just be like gateways into a second menu level. So instead of just edit_file with the filename and new source, maybe only select_files_for_edit is available at the top level. In that case I can ensure it doesn't try to overwrite an existing file without important stuff that was already in there, by providing the requested files existing contents along with the function allowing the file edit.

throwuwu · on June 15, 2023

Not sure that’s true. I haven’t completely filled the context with examples but I do provide 8 or so exchanges between user and assistant along with a menu of available commands and it seems to be able to generalize from that very well. No hallucinations either. Good idea about sub menus though, I’ll have to use that.

naiv · on June 14, 2023

I think big context only makes sense for document analysis.

For programming you want to keep it slim. Just like you should keep your controllers and classes slim.

Also people with 32k access report very very long response times of up to multiple minutes which is not feasible if you only want a smaller change or analysis.

babyshake · on June 14, 2023

What would be an example where there needs to be an arbitrary level of recursive ability for GPT4 to call itself?

swyx · on June 15, 2023

writing code of higher complexity (we know from CICERO that longer time spent on inference is worth orders of magnitude more than the equivalent in training when it comes to improving end performance), or doing real world tasks with unknown fractal depth (aka yak shave)

killingtime74 · on June 15, 2023

Who is Simon Willison? Is he big in AI?

swyx · on June 15, 2023

formerly cocreator of Django, now Datasette, but pretty much the top writer/hacker on HN making AI topics accessible to engineers https://hn.algolia.com/?dateRange=pastYear&page=0&prefix=tru...

killingtime74 · on June 15, 2023

Oh wow, nice! Big fan of his work

boringuser2 · on June 15, 2023

Do you people always have to overhype this shit?

delhanty · on June 15, 2023

Do you have to be nasty?

That's a person you're replying to with feelings, so why not default to being kind in comments as per HN guidelines?

As it happens, swyx has built notable AI related things, for example smol-developer

https://twitter.com/swyx/status/1657892220492738560

and it would be nice to be able to read his and other perspectives without having to read shallow, mean, dismissive replies such as yours.

boringuser2 · on June 15, 2023

[flagged]

dang · on June 15, 2023

Hey, I understand the frustration (both the frustration of endless links on an over-hyped topic, and the frustration of getting scolded by another user when expressing yourself) - but it really would be good if you'd post more in the intended spirit of this site (https://news.ycombinator.com/newsguidelines.html).

People sometimes misunderstand this, so I'd like to explain a bit. (It probably won't help, but it might, and I don't like flagging or banning accounts without trying to persuade people first if possible.)

We don't ask people to be kind, post thoughtfully, not call names, not flame, etc., out of nannyism or some moral thing we're trying to impose. That wouldn't feel right and I wouldn't want to be under Mary Poppins's umbrella either.

The reason is more like an engineering problem: we're trying to optimize for one specific thing (https://hn.algolia.com/?dateRange=all&page=0&prefix=true&sor...) and we can't do that if people don't respect certain constraints. The constraints are to prevent the forum from burning itself to a crisp, which is where the arrow of internet entropy will take us if we don't expend energy to stave it off.

It probably doesn't feel like you're doing anything particularly wrong, but there's a cognitive bias where everyone underestimates the damage they're causing (by say 10x) and overestimates the damage others are causing (by say 10x) and that compounds into a major bias where everyone feels like everyone else is the problem. We need a way out of that dynamic if we're to have any hope of keeping this place interesting. As you probably realize, HN is forever on the brink of caving into a pit. We need you to help nudge it back from that, not push it over.

Of course you're free to say "what do I care if HN burns itself to a crisp, fuck you all" but I'd argue you shouldn't take that nihilistic position because it isn't in your own interests. HN may be annoying at times, but it's interesting enough for you to spend time here—otherwise you wouldn't be reading the site and posting to it. Why not contribute to making it more interesting rather than destroying it for yourself and everyone else? (I don't mean that you're intentionally destroying it—but the way you've been posting is unintentionally contributing to that outcome.)

I'm sure you wouldn't drop lit matches in a dry forest, or dump motor oil in a mountain lake, trample flower gardens, or litter in a city park, for much the same reason. It's in your own interest to practice the same care for the commons here. Thanks for listening.

boringuser2 · on June 15, 2023

Thanks for the effort of explanation.

If someone were egregiously out of line, typically, I feel community sentiment reflects this.

Personally, I feel your assessment of cognitive bias at play is way off base. I don't think it's a valid comparison to claim that someone is causing "damage" by merely expressing distaste. That's a common tool that humans use for social feedback. Is cutting off the ability for genuine social feedback or adjustment and forcing people to be saccharine out of fear of reprisal from the top really an optimal solution to an engineering problem? It seems more like a simulacrum of an HR department where the guillotine is more real: your job and life rather than merely your ability to share your thoughts on a corner of the Internet.

Think about the engineering problem you find yourself in with this state of affairs: something very similar to the kind of content you might find on LinkedIn, a sort of circular back-patting engine devoid of real challenge and grit because of the aforementioned guillotine under which all participants hang.

And, quite frankly, you do see the effects of this in precisely the post in this initial exchange: hyperbole and lack of deep critical assessment are artificially inflated. This isn't a coincidence: this has been cultured very specifically by the available growing conditions and the starter used -- saccharine hall monitors that fold like cheap suits (e.g. very poorly, lots of creases) when the lowest level of social challenge is raised fo their ideas.

You know what it really feels like? A Silicon Valley reconstruction of all the bad things about a workplace, not a genuine forum for debate and intellectual exploration. If you want to find a place to model such behavior, the Greeks already have you figured out - how do you think Diogenes would feel about human resources?

That being said, I appreciate the empathy.

Obviously, I feel a bit like a guy Tony Soprano beat up and being forced to apologize afterwards to him for bruising his knuckles.

dang · on June 15, 2023

Not to belabor the point but from my perspective you've illustrated the point about cognitive bias: it always feels like the other person started it and did worse ("I feel a bit like a guy Tony Soprano beat up and being forced to apologize afterwards to him for bruising his knuckles") and it always feels like one was merely defending oneself reasonably ("merely expressing distaste"). This is the asymmetry I'm talking about.

As you can imagine, mods get this kind of feedback all the time from all angles. The basic learning it adds up to is that everybody always feels this way. Therefore those feelings are not a reliable compass to navigate by.

This is not a criticism—I appreciate your reply!

Edit:

> forcing people to be saccharine [...] like a simulacrum of an HR department

We definitely don't want that and the site guidelines don't call for that. There is tons of room to make your substantive points thoughtfully without being saccharine. It can take a little bit of reflective work to find that room, though, just because we (humans in general) tend to get locked into binary oppositions.

The best principle to go by is just to ask yourself: is what I'm posting part of a curious conversation? That's the intended spirit of the site. It's possible to tell if you (I don't mean you personally, I mean all of us) are functioning in the range of curiosity and to refrain from posting if you aren't.

It is true that the HN guidelines bring a bit of blandness to discourse because they eliminate the rough-and-tumble debate that can work well in much smaller groups of close peers. But that's because that kind of debate is impossible in a large public forum like HN—it just degenerates immediately into dumb brawls. I've written about this quite a bit if you or anyone wants to read about that:

https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que... (I like that analogy)

https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

boringuser2 · on June 15, 2023

I think your argument is reasonable from a logical perspective, and I would generally make a similar argument as I would find the template quite persuasive.

However, I, again, feel you're improperly pushing shapes into the shape-board again. Of course, understanding cognitive bias is a fantastic tool to improve human behavior from an engineering perspective, and your argumentum ad numerum is sound.

That being said, you're focusing too much on what my emotional motivation might be rather than looking at the system - do you really think there isn't an element of that dynamic I outlined in an interaction like this? Of course there is.

Anyhow, you know, I don't have the terminology in my back-pocket, but there's definitely a large blind-spot when someone is ignoring the spirit of intellectual curiosity in a positive light rather than a negative one.

In this case, don't you think a tool like mild negative social feedback might be a useful mechanism? Of course, there's a limit, and if such a person were incapable of further insight, they'd probably not be very useful conversants. That's obviously not happening here.

One final thing is relevant here - you just hit on a pretty important point. There is a grit to a certain type of discourse that is actually superior to this discourse, I'd happily accept that point. Why not just transfer the burden of moderation to that point, rather than what you perceive to be the outset? Surely, you'll greatly reduce your number of false positives.

I provide negative social feedback sometimes because I feel it's appropriate. In the future, I probably won't. That being said, it's obvious that I've never sparked a thoughtless brawl, so the tolerance is at least inappropriately adjusted sufficiently to that extent.

throwuwu · on June 15, 2023

What’s your problem? There’s nothing overhyped about that comment. People, including me, are building complex agents that can execute multi stage prompts and perform complex tasks. Comparing these first models to a basic unit of logic is more than fair given how much more capable they are. Do you just have an axe to grind?

boringuser2 · on June 15, 2023

[flagged]

pyinstallwoes · on June 15, 2023

How is it inappropriate? How is it not building?

minimaxir · on June 14, 2023

After reading the docs for the new ChatGPT function calling yesterday, it's structured and/or typed data for GPT input or output that's the key feature of these new models. The ReAct flow of tool selection that it provides is secondary.

As this post notes, you don't even need to the full flow of passing a function result back to the model: getting structured data from ChatGPT in itself has a lot of fun and practical use cases. You could coax previous versions of ChatGPT to "output results as JSON" with a system prompt but in practice results are mixed, although even with this finetuned model the docs warn that there still could be parsing errors.

OpenAI's demo for function calling is not a Hello World, to put it mildly: https://github.com/openai/openai-cookbook/blob/main/examples...

tornato7 · on June 14, 2023

IIRC, there's a way to "force" LLMs to output proper JSON by adding some logic to the top token selection. I.e. in the randomness function (which OpenAI calls temperature) you'd never choose a next token that results in broken JSON. The only reason it wouldn't would be if the output exceeds the token limit. I wonder if OpenAI is doing something like this.

ManuelKiessling · on June 14, 2023

Note that you don’t necessarily need to have the AI output any JSON at all — simply have it answer when being asked for the value to a specific JSON key, and handle the JSON structure part in your hallucinations-free own code: https://github.com/manuelkiessling/php-ai-tool-bridge

lyjackal · on June 15, 2023

Would be nice if you could send a back and forth interaction for each key. This approach turns into lots of requests that reapply the entire context and ends up slow. I wish i could just send a Microsoft guidance template program, and process that in a single pass.

naiv · on June 14, 2023

Thanks for sharing!

senko · on June 14, 2023

It would seem not, as the official documentation mentions the arguments may be hallucinated or be a malformed JSON.

(except if the meaning is the JSON syntax is valid but may not conform to the schema, but they're unclear on that).

sanxiyn · on June 14, 2023

For various reasons, token selection may be implemented as upweighting/downweighting instead of outright ban of invalid tokens. (Maybe it helps training?) Then the model could generate malformed JSON. I think it is premature to infer from "can generate malformed JSON" that OpenAI is not using token selection restriction.

woodrowbarlow · on June 14, 2023

the linked article hypothesizes:

> I assume OpenAI’s implementation works conceptually similar to jsonformer, where the token selection algorithm is changed from “choose the token with the highest logit” to “choose the token with the highest logit which is valid for the schema”.

sanxiyn · on June 14, 2023

Note that this (token selection restriction) is even available on OpenAI API as logit_bias.

newhouseb · on June 14, 2023

But only for the whole generation. So if you want to constrain things one token at a time (as you would to force things to follow a grammar) you have to make fresh calls and only request one token which makes things more or less impractical if you want true guarantees. A few months ago I built this anyway to suss out how much more expensive it was [1]

[1] https://github.com/newhouseb/clownfish#so-how-do-i-use-this-...

ttul · on June 15, 2023

I think the problem is that tokens are not characters. So even if you had access to a JSON parser state that could tell you whether or not a given character is valid as the next character, I am not sure how you would translate that into tokens to apply the logit biases appropriately. There would be a great deal of computation required at each step to scan the parser state and generate the list of prohibited or allowable tokens.

But if one could pull this off, it would be super cool. Similar to how Microsoft’s guidance module uses the logit_bias parameter to force the model to choose between a set of available options.

yunyu · on June 15, 2023

You simply sample tokens starting with the allowed characters and truncate if needed. It’s pretty efficient, there’s an implementation here: https://github.com/1rgs/jsonformer

DougBTX · on June 15, 2023

This is the best implementation I've seen, but only for Hugging Face models: https://github.com/1rgs/jsonformer

have_faith · on June 14, 2023

How would a tweaked temp enforce a non broken output exactly?

sanxiyn · on June 14, 2023

It's not temperature, but sampling. Output of LLM is probabilistic distribution over tokens. To get concrete tokens, you sample from that distribution. Unfortunately, OpenAI API does not expose the distribution. You only get the sampled tokens.

As an example, on the link JSON schema is defined such that recipe ingredient unit is one of grams/ml/cups/pieces/teaspoons. LLM may output the distribution grams(30%), cups(30%), pounds(40%). Sampling the best token "pounds" would generate an invalid document. Instead, you can use the schema to filter tokens and sample from the filtered distribution, which is grams(50%), cups(50%).

isoprophlex · on June 14, 2023

Not traditional temperature, maybe the parent worded it somewhat obtusely. Anyway, to disambiguate...

I think it works something like this: You let something akin to a json parser run with the output sampler. First token must be either '{' or '['; then if you see [ has the highest probability, you select that. Ignore all other tokens, even those with high probability.

Second token must be ... and so on and so on.

Guarantee for non-broken (or at least parseable) json

behnamoh · on June 14, 2023

What's the implication of this new change for Microsoft Guidance, LMQL, Langchain, etc.? It looks like much of their functionality (controlling model output) just became obsolete. Am I missing something?

lbeurerkellner · on June 14, 2023

If anything this removes a major roadblock for libraries/languages that want to employ LLM calls as a primitive, no? Although, I fear the vendor lock-in intensifies here, also given how restrictive and specific the Chat API.

Either way, as part of the LMQL team, I am actually pretty excited about this, also with respect to what we want to build going forward. This makes language model programming much easier.

londons_explore · on June 14, 2023

> Although, I fear the vendor lock-in intensifies here,

The openAI API is super simple - any other vendor is free to copy it, and I'm sure many will.

koboll · on June 14, 2023

`Although, I fear the vendor lock-in intensifies here, also given how restrictive and specific the Chat API.`

Eh, would be pretty easy to write a wrapper that takes a functions-like JSON Schema object and interpolates it into a traditional "You MUST return ONLY JSON in the following format:" prompt snippet.

neuronexmachina · on June 14, 2023

Langchain added support for `function_call` args yesterday:

* https://github.com/hwchase17/langchain/pull/6099/files

* https://github.com/hwchase17/langchain/issues/6104

IMHO, this should make Langchain much easier and less chaotic to use.

gawi · on June 15, 2023

It's only been added to the OpenAI interface. Function calling is really useful when used with agents. To include that to agents would require some redesign as the tool instructions should be removed from the prompt templates in favor of function definitions in the API request. The response parsing code would also be affected.

I just hope they won't come up with yet another agent type.

neuronexmachina · on June 15, 2023

Like this? https://github.com/hwchase17/langchain/blob/master/langchain...

gawi · on June 15, 2023

LangChain is a perpetual hackathon.

arbuge · on June 15, 2023

They have something closer to a simple Hello World example here:

https://platform.openai.com/docs/guides/gpt/function-calling

That example needs a bit of work I think. In Step 3, they're not really using the returned function_name; they're just assuming it's the only function that's been defined, which I guess is equivalent for this simple example with just one function but less instructive. In Step 4, I believe they should also have sent the function definition block again a second time since model calls in the API are memory-less and independent. They didn't, although the model appears to guess what's needed anyway in this case.

H8crilA · on June 15, 2023

That SQL example is going to result in a catastrophe somewhere when someone uses it in their project. It is encouraging something very dangerous when allowed to run on untrusted inputs.

Xen9 · on June 14, 2023

Marvin Minsky was so damn far ahead of his time with Society of Mind.

Engineering of cognitively advanced multiagent systems will become the area of research of this century / multiple decades.

GPT-GPT > GPT-API in terms of power.

The space of possible combinations of GPT multiagents goes beyond imagination since even GPT-4 goes so.

Multiagent systems are best modeled with signal theory, graph theory and cognitive science.

Of course "programming" will also play a role, in sense of abstractions and creation of systems of / for thought.

Signal theory will be a significant approach for thinking about embedded agency.

Complex multiagent systems approach us.

SanderNL · on June 15, 2023

Makes me think of the Freud/Jungian notions of personas in us that are in various degrees semi-autonomously looking out for themselves. The “angry” agent, the “child” agent, so on.

emilsedgh · on June 14, 2023

Building agents that use advanced API's was not really practical until now. Things like Langchain's Structured Agents worked somewhat reliably, but due to the massive token count it was so slow, the experience was _never_ going to be useful.

Due to this, the performance in which our agent processes results has improved 5-6 times and it does actually do a pretty good job of keeping the schema.

One problem that is not resolved yet is that it still hallucinates a lot of attributes. For example we have tool that allows it to create contacts in user's CRM. I ask it to:

"Create contacts for top 3 Barcelona players:.

It creates an structure like this"

1. Lionel Messi - Email: lionel.messi@barcelona.com - Phone Number: +1234567890 - Tags: Player, Barcelona

2. Gerard Pique - Email: gerard.pique@barcelona.com - Phone Number: +1234567891 - Tags: Player, Barcelona

3. Marc-Andre ter Stegen - Email: marc-terstegen@barcelona.com - Phone Number: +1234567892 - Tags: Player, Barcelona

And you can see it hallucinated email addresses and phone numbers.

pluijzer · on June 14, 2023

ChatGPT can be usefully for many things, but you should really, not use it if you want to retrieve factual data. This might partly be resolved by querying the internet like bing does but purely on the language model side these hallucinations are just an unavoidable part of it.

Spivak · on June 14, 2023

Yep, it's always always write code / query / function / whatever you need that you would parse and retrieve the data from an external system.

037 · on June 14, 2023

I would never rely on an LLM as a source of such information, just as I wouldn't trust the general knowledge of a human being used as a database. Does your workflow include a step for information search? With the new json features, it should be easy to instruct it to perform a search or directly feed it the right pages to parse.

edwin · on June 14, 2023

For those who want to test out the LLM as API idea, we are building a turnkey prompt to API product. Here's Simon's recipe maker deployed in a minute: https://preview.promptjoy.com/apis/1AgCy9 . Public preview to make and test your own API: https://preview.promptjoy.com

abhpro · on June 14, 2023

This is really cool, I had a similar idea but didn't build it. I was also thinking a user could take these different prompts (I called them tasks) that anyone could create, and then connect them together like a node graph or visual programming interface, with some Chat-GPT middleware that resolves the outputs to inputs.

edelans · on June 15, 2023

Congrats on the first-time user experience, I could experiment with your API in a few seconds, and the product is sleek!

yonom · on June 14, 2023

This is cool! Are you using one-shot learning under the hood with a user provided example?

edwin · on June 14, 2023

BTW: Here's a more performant version (fewer tokens) https://preview.promptjoy.com/apis/jNqCA2 that uses a smaller example but will still generate pretty good results.

sudb · on June 15, 2023

This is still pretty fast - impressive! Are there any tricks you're doing to speed things up?

edwin · on June 14, 2023

Thanks. We find few-shot learning to be more effective overall. So we are generating additional examples from the provided example.

wonderfuly · on June 15, 2023

I own this domain: prompts.run Do you wanna it?

thorum · on June 14, 2023

The JSON schema not counting toward token usage is huge, that will really help reduce costs.

stavros · on June 14, 2023

> Under the hood, functions are injected into the system message in a syntax the model has been trained on. This means functions count against the model's context limit and are billed as input tokens. If running into context limits, we suggest limiting the number of functions or the length of documentation you provide for function parameters.

yonom · on June 14, 2023

I believe functions do count in some way toward the token usage; but it seems to be in a more efficient way than pasting raw JSON schemas into the prompt. Nevertheless, the token usage seems to be far lower than previous alternatives, which is awesome!

blamy · on June 15, 2023

But it does count toward token usage. And they picked JSON schema which is like 6x more verbose than typescript for defining the shape of json.

minimaxir · on June 14, 2023

That is up in the air and needs more testing. Field descriptions, for example, are important but extraneous input that would be tokenized and count in the costs.

At the least for ChatGPT, input token costs were cut by 25% so it evens out.

037 · on June 14, 2023

I'm wondering if introducing a system message like "convert the resulting json to yaml and return the yaml only" would adversely affect the optimization done for these models. The reason is that yaml uses significantly fewer tokens compared to json. For the output, where data type specification or adding comments may not be necessary, this could be beneficial. From my understanding, specifying functions in json now uses fewer tokens, but I believe the response still consumes the usual amount of tokens.

lbeurerkellner · on June 14, 2023

I think one should not underestimate the impact on downstream performance the output format can have. From a modelling perspective it is unclear whether asking/fine-tuning the model to generate JSON (or YAML) output is really lossless with respect to the raw reasoning powers of the model (e.g. it may perform worse on tasks when asked/trained to always respond in JSON).

I am sure they ran tests on this internally, but I wonder what the concrete effects are, especially comparing different output formats like JSON, YAML, different function calling conventions and/or forms of tool discovery.

gregw134 · on June 15, 2023

That's what I'm doing. I ask ChatGPT to return inline yaml (no wasting tokens on line breaks), then I parse the yaml output into JSON once I receive it. A bit awkward but it cuts costs in half.

jonplackett · on June 14, 2023

This is useful, but for me at least, GPT-4 is unusable because it sometimes takes 30 seconds + to reply to even basic queries.

m3kw9 · on June 14, 2023

Also the rate limit is pretty bad if you want to release any type of app

jiggawatts · on June 15, 2023

More importantly: there's a waiting list.

Also, if you want to use both the ChatGPT web app and the API, you'll be billed for both separately. They really should be unified and billed under a single account. The difference is literally just whether there's a "web UI" on top of the API... or not.

m3kw9 · on June 14, 2023

It works pretty good. You define a few “function” and enter a description on what it does, when user prompts, it will understand the prompt and tell you which likely “function” to use, which is just the function name. I feel like this is a new way to program, a sort of fuzzy logic type of programming

Sai_ · on June 14, 2023

> fuzzy logic

Yes and no. While the choice of which function to call is dependent on an llm, ultimately, you control the function itself whose output is deterministic.

Even today, given an api, people can choose to call or not call based on some factor. We don’t call this fuzzy logic. E.g., people can decide to sell or buy stock through an api based on some internal calculations - doesn’t make the system “fuzzy”.

m3kw9 · on June 14, 2023

If you feed that result into another io box you may or may not know if that is the correct answer, which may need some sort of error detection. I think this is going to be majority of the use cases

Sai_ · on June 14, 2023

Hm, I see what you mean. Afaict, only the decision to call or not call a function is up to the model (fuzzy). Once it decides to call the function, it generates mostly correct JSON based on your schema and returns that to you as is (not very fuzzy).

It’ll be interesting to test APIs which accept user inputs. Depending on how ChatGPT populates the JSON, the API could be required to understand/interpret/respond to lots of variability in inputs.

m3kw9 · on June 14, 2023

Yeah I’ve tested, you should use the curl example they gave as you can test instantly pasting it into your terminal. The description of the functions is prompt engineering in addition to the original system prompt, need to test the dependency more, it’s so new.

mritchie712 · on June 14, 2023

Glad we didn't get to far into adopting something like Guardrails. This sort of kills it's main value prop for OpenAI.

https://shreyar.github.io/guardrails/

swyx · on June 14, 2023

i mean only at the most superficial level. she has a ton of other validators that arent superceded (eg SQL is validated by branching the database - we discussed on our pod https://www.latent.space/p/guaranteed-quality-and-structure)

mritchie712 · on June 14, 2023

yeah, listened to the pod (that's how I found out about guardrails!).

fair point, I should have said: "value prop for our use case"... the thing I was most interested in was how well Guardrails structured output.

swyx · on June 15, 2023

haha excellent. i was quite impressed by her and the vision for guardrails. thanks for listening!

Blahah · on June 14, 2023

Luckily it's for LLMs, not openai

blamy · on June 15, 2023

Guardrails is an awesome project and will continue to be even after this.

chaxor · on June 14, 2023

Is there a decent way of converting to a structure with a very constrained vocabulary? For example, given some input text, converting it to something like {"OID-189": "QQID-378", "OID-478":"QQID-678"}. Where OID and QQID dictionaries can be e.g. millions of different items defined by a description. The rules for mapping could be essentially what looks closest in semantic space to the descriptions given in a dictionary.

I know this should be able to be solvable by local LLMs and bert cosine similarity (it isn't exactly, but it's a start on the idea), but is there a way to do this with decoder models rather than encoder models with other logic?

jiggawatts · on June 15, 2023

You can train custom GPT 3 models, and Azure now has vector database integration for GPT-based models in the cloud. You can feed it the data, and ask it for the embedding lookup, etc...

You can also host a vector database yourself and fill it up with the embeddings from the OpenAI GPT 3 API.

chaxor · on June 15, 2023

Unfortunately this doesn't really work, as the model is not limited in it's decoding vocabulary.

Does anyone have other suggestions that may work in this space?

runeb · on June 14, 2023

The way openai implemented this is really clever, beyond how neat the plugin architecture is, as it lets them peek one layer inside your internal API surface and can infer what you intend to do with the LLM output. Collecting some good data here.

swyx · on June 15, 2023

huh, i never thought of it that way. i thought openai pinky swears not to train on our data tho

runeb · on June 16, 2023

I was thinking more as data for their product org

dang · on June 14, 2023

Recent and related:

Function calling and other API updates - https://news.ycombinator.com/item?id=36313348 - June 2023 (154 comments)

minimaxir · on June 14, 2023

IMO this isn't a dupe and shouldn't be penalized as a result.

dang · on June 14, 2023

It's certainly not a dupe. It looks like a follow-up though. No?

minimaxir · on June 14, 2023

More a very timely but practical demo.

dang · on June 14, 2023

Ok, thanks!

rank0 · on June 14, 2023

OpenAI integration is going to be a goldmine for criminals in the future.

Everyone and their momma is gonna start passing poorly validated/sanitized client input to shared sessions of a non-deterministic function.

I love the future!

nextworddev · on June 15, 2023

In the “future”?

courseofaction · on June 14, 2023

Nice to have an endpoint which takes care of this. I've been doing this manually, it's a fairly simple process:

* Add "Output your response in json format, with the fields 'x', which indicates 'x_explanation', 'z', which indicates 'z_explanation' (...)" etc. GPT-4 does this fairly reliably.

* Validate the response, repeat if malformed.

* Bam, you've got a json.

I wonder if they've implemented this endpoint with validation and carefully crafted prompts on the base model, or if this is specifically fine-tuned.

037 · on June 14, 2023

It appears to be fine-tuning:

"These models have been fine-tuned to both detect when a function needs to be called (depending on the user’s input) and to respond with JSON that adheres to the function signature."

https://openai.com/blog/function-calling-and-other-api-updat...

smallerfish · on June 14, 2023

I will experiment with this at the weekend. Once thing I found useful with supplying a json schema in the prompt was that I could supply inline comments and tell it when to leave a field null, etc. I found that much more reliable than describing these nuances elsewhere in the prompt. Presumably I can't do this with functions, but maybe I'll be able to work around it in the prompt (particularly now that I have more room to play with.)

adultSwim · on June 14, 2023

Running an LLM every time someone clicks on a button is expensive and slow in production, but probably still ~10x cheaper to produce than code.

edwin · on June 14, 2023

New techniques like semantic caching will help. This is the modern era's version of building a performant social graph.

daralthus · on June 14, 2023

What's semantic caching?

edwin · on June 14, 2023

With LLMs, the inputs are highly variable so exact match caching is generally less useful. Semantic caching groups similar inputs and returns relevant results accordingly. So {"dish":"spaghetti bolognese"} and {"dish":"spaghetti with meat sauce"} could return the same cached result.