In my experience, using TypeScript for prompt management alongside something like zod or TypeBox is super useful for getting rid of the stringly-typed prompt problem. Write a function that wraps template strings for the English bits, and inside the templates for your examples do:
JSON.stringify({ ... } satisfies MyOutputType)
which ensures your examples always exactly match what you expect the LLM to return, even as you iterate on the prompt and output types. Then just call the LLM and validate the output using zod/TypeBox. Once you're done writing the wrapper function, now instead of managing a bunch of string interpolation when you want to have the LLM analyze something, you have a function that takes some input data, and returns your output type, guaranteed.
With OpenAI tool calling, you can even convert the zod/TypeBox type into JSON Schema, give that to OpenAI, and have a fairly good chance the LLM doesn't make any mistakes without even needing retries. (Although you should still retry if the zod/TypeBox/whatever validator fails.)
If you're using llama.cpp there is also a tool to convert Typescript interfaces to the BNF grammar used to restrict the output tokens: https://grammar.intrinsiclabs.ai/
The paper is full of insightful parts. For instance, on challenge with LLMs is parsing the model output. The participants suggest that instead of forcing the model to return a JSON, it should be allowed to return the results in its preferred format, and then these results can be parsed into a structured format.
Quote: "if the model is kind of inherently predisposed to respond with a certain type of data, we don’t try to force it to give us something else because that
seems to yield a higher error rate"
Building a copilot the first time will be a mess, the second time much clearer. This is a fun time working with local models. Layer-thinking and abstracting isn't that hard with the right prompt about the specific pain points... Every problem seems to have been solved before.
Shameless plug: I work on CopilotKit - open-source copilot building blocks for react apps.
Designed to alleviate exactly the pain points in the article.
Devs define simple Copilot entrypoints--
state (frontend + backend + 3rd party),
action,
purpose-specific LLM chains, etc. And the CopilotKit engine takes care of the rest.
I have not had a chance to try it yet but Microsoft's Copilot Studio seems to address the "pain points" listed in the article. Can anyone here by chance attest to it?
I bet the prompt for the (doubtlessly AI-generated) image above the article was "too many tinkerers spoil the robot". Also, wearing glasses seems to be part of the job requirements...
With OpenAI tool calling, you can even convert the zod/TypeBox type into JSON Schema, give that to OpenAI, and have a fairly good chance the LLM doesn't make any mistakes without even needing retries. (Although you should still retry if the zod/TypeBox/whatever validator fails.)