Hacker News new | past | comments | ask | show | jobs | submit login
Prompt Engine – Microsoft's prompt engineering library (github.com/microsoft)
309 points by mmaia on Feb 15, 2023 | hide | past | favorite | 75 comments



Very cool! But for hard enough problems, prompt engineering is kind of like hyperparameter tuning. It's only a final (and relatively minor) step after building up an effective architecture and getting its modules to work together.

DSP provides a high-level abstraction for building these architectures—with LMs and search. And it gets the modules working together on your behalf (e.g., it annotates few-shot demonstrations for LM calls automatically).

Once you're happy with things, it can compile your DSP program into a tiny LM that's a lot cheaper to work with.

https://github.com/stanfordnlp/dsp/


Prompt engineering is close to training data curation for ResNets. Way beyond hyperparameter tuning.


This looks extremely useful! Thanks a ton.


I see a few comments about how the interface should be better.

I would argue that the current interface is broad and can be adapted to a wide range of needs.

The real value I see (and an area I've been exploring recently) is creating abstractions around prompt engineering.

The idea is that at the moment, the better the prompt -> the more relevant the output.

However businesses that act as proxies for chatgpt can take user input -> inject it into their prompt-engineering system -> deliver better results.

As a basic example:

1) You provide an application that helps people maintain their cars

2) You have a freeform question input about specific car maintenance tasks

3) Knowing certain prompt-segments that typically lead to better results for the category of "novice car maintainer", you parse their question, upgrade it with your prompt-generator, then send that question to chatgpt

Additionally, you can go even further by generating the information _before your customers even ask_.

Ask chatgpt what the most common questions are about each piece of car maintenance (could be installation instructions, cost of parts, etc) and have these available straight in the application.

It's as if chatgpt is the database, and prompt engineering systems are the query-optimizers. I know certain things I can alter about a prompt to increase the relevance and usefulness of the results to my target audience.

Additionally, it will be interesting to see how people use learning methods to auto-generate the best prompts (it's a bit of machine-learning on machine-learning happening). I've spoken to one person working in this space, and it's interesting how effectively the outer layer is reversing the prompt-parsing layer buy scoring the end-to-end results.

Currently working with a couple of people in this space and it's exciting the types of results that user-focused engineering can bring to an improved user-experience.


> It's as if chatgpt is the database, and prompt engineering systems are the query-optimizers.

I've also been experimenting with this. Contact info in my profile if you want to reach me.


Tried to send you an email to bbyshayke@ but got delivery status notification


Interesting. It's almost the exact same structure (although better organized) that I have built in my "AI Programmer" project (also in Node). Which by the way I hope to have a new release within a week or so. I am not mentioning the domain name again until I have the new release because it much, much better than the version I have up now.

The core idea is that you need a certain structure in order to deal with the limitations of GPT-3.5 or whatever. To get the best output, you give it a brief overview of what you want, a few examples, and the history of interaction. OpenAI's API does not remember the interaction history for you. And the limit is 4000 or 8000 tokens so you will need to truncate the beginning of the conversation at some point.

So this library provides a structure since it's something you will need to repeat for just about every significant text completion application or module.


>so you will need to truncate the beginning of the conversation at some point

I wrote a little python script to keep track of a running conversation when I started playing with openai's completions API. I keep track of how many tokens the prompt is taking up, and when it gets too close to some configurable threshold, I then have a different prompt to tell the AI to summarize the conversation and any previous summary, with various demands to keep track of certain specific bits of state ( like names and things ).

Works pretty well for marching conversations past the token limit.


I honestly keep thinking about this approach and talking about it to other people I know.

If you don't mind sharing the code I'd be super interested to see how it works


Fascinating, and somewhat comparable to how the human brain "compresses" visual scenes when encoding them into long term memory.

This is such an interesting space to follow :)


Has anyone played around with GPT itself to see if you can get it to track the number of tokens it's tracking (best if implemented with the least number of tokens possible)


I haven't played too much with doing this programmatically, but OpenAI's Tokenizer tool is a fun way to get an intuition around tokens and text length

https://platform.openai.com/tokenizer


simple but brilliant, should be the default behavior for all token limited agents!


So, actually, I just had a good long talk with ChatGPT about how it works internally. Turns out that while it doesn't remember the concatenated sequence of prompts and response, GPT-3 does maintain a sort of impression or representation of the entire conversation - the conversation context. Also, turns out that there's about a dozen predefined special tokens that can be used to control how the model considers or ignores the conversation context when responding to a prompt. There's also some internal use only special tokens that it can accidentally emit, and if you aren't careful in specifying how it should print the tokens there's an internal token that'll cause it to basically hard reset the conversation context.

Anyway, the point is that at least the Default (paid) ChatGPT model and probably the GPT-3 model does have a representation of the conversation which it can and does reference. You can ask it to explain how and when it considers or ignores the context and how to control that.

(Yes, I understand that it's not describing its own architecture, but regurgitating the average of all of the papers on GPT and weighting the ones that refer to ChatGPT higher due to the fine-tuning effect.)


I’m surprised Bing’s chat meltdowns haven’t had more air time on HN. People are going to start calling it a living creature.

https://old.reddit.com/r/bing/top/?sort=top&t=week


There's several threads every day, how much more air time would you expect?


https://news.ycombinator.com/item?id=34804874 has over 1500 comment. That is not a typo. Over one thousand five hundred comments.

I can't remember seeing a bigger thread on HN


I missed it somehow i guess, thanks for the tip!


That has already started. Some impressionable people already believe Bing Chat is a person.



That's much more appropriate, thanks!

This should be the main link tbh.


Tangentially related: has anyone here tried to figure out what programming language a LLM-based generator would be most successful at writing? Or what language features are necessary for LLM's success at code generation?

I've tried but unfortunately all of my searches return results about languages used for building models (e.g., Python, for some reason).

I figure it'd probably be a typed language with a meaty standard library, good type inference, and high-quality low-code packages that would require fewer generated tokens to do useful work. Or maybe a language with fewer ways to do basic tasks -- if there is somehow only one way to do a thing, then the generator would be most likely to write code using that one way. I'll fully admit that these guesses are naive and based on limited understanding of how LLMs work.


I have suspicious is is bash, or python. I think bash is good because you can express concepts very concisely giving fewer literal characters to mess up on. Python because the data set is so massive while still maintaining a lot of power-per-character.

Conversely, I have seen it underperform for verbose languages like Java, I think primarily due to windowing issues and that less meaningful code can be represented in one prediction window.


If we're adjusting the way we speak to AIs, are we training them or are they training us?


We are looking for common ground. A bit like talking to someone on a third language or programming a computer.


Interesting, I just made myself something similar[1] because I found myself using the same prompts over and over again in either the OpenAI playground or in ChatGPT.

The simple tool quickly provides builds up a form that either myself, or my staff can use.

So far my pre-canned prompt forms are as follows[2].

[1] https://files.littlebird.com.au/Screen-Recording-2023-02-16-...

[2] https://files.littlebird.com.au/Screen-Shot-2023-02-16-14-25...


Great idea and I immediately see the value. The UI looks really clean, too. May I ask what CSS framework you are using?



I am getting old: I read the description two times and checked examples yet still don't understand the utility. I do understand Midjourney prompt engineering though.


LLM n00b here.

My 2c - Prompts are the input that you send to LLMs to get them to give you output. In general LLMs are large black boxes, and the output you get is not always great. The output can often be significantly improved by changing the input. Changing the input usually involves adding a ton of context - preambles, examples, etc.

A lot of the work of prompt rewriting is like boilerplate generation. It is very reusable so it makes sense to write code to generate prompts. Prompt Engine is basically a way of making that prompt rewriting work reusable.

Code Engine seems to be a way of rewriting prompts for LLMs that generate code in response to text prompts

Chat Engine is the same for LLMs that generate chat/conversational responses.


Midjourney does not have contextual memory, but it does have a feature to always add a given suffix to any prompt. I guess this is a more powerful variant of the same sort of concept. I wonder who will "win" - specialised models or a single configurable one...


> I read the description two times and checked examples yet still don’t understand the utility.

It’s a tool for (among other things) building the part of a ChatGPT-like interface that sits between the user and an actual LLM, managing the initial prompt, conversation history, etc.

While the LLM itself is quite important, a lot of the special sauce of an AI agent is going to be on the level that this aims to support, not the LLM itself. (And I suspect a lot of the utility of LLMs will come from doing something at this level other than a typical “chat” interface.)


Ah, sounds super-niche.


I think it won’t bein a few years; a whole of the interesting bits of putting LLMs to use is going to be at the level this supports, not building new models.


I already feel upset about my future interactions with any online support... But as an average "dark matter" enterprise dev, not sure how useful will it be.


As the background explains, you can tell LLMs how they should behave in an interaction session.

The examples first configure the LLM, either by simply using a sentence which tells it what you expect from it (example 1: "answers in less than twenty words"), pass examples to it, and then continue a normal interaction session.

You could use this prompt-engine to set up your own chat server, where this would be the middleware.


That sounds useful actually. So I could e.g. set up a Harry Potter chat server and make the bot respond only as Dumbledore or only use concepts of that setting? Or a chat server that responds to algorithmic tasks only with Python 3 code using exclusively numpy package?


Ah, my dear child. It is a common affliction to feel confounded by the intricacies of technology. Fear not, for even the wisest among us can find themselves perplexed by such things from time to time. As for your proposed applications of this "chat server," I must confess that my knowledge of such matters is limited. However, it does sound like a most intriguing endeavor. Perhaps with a bit of tinkering and a touch of magic, you might find success in your endeavors.

I used ChatGPT. The prompt was:

I want you to act like Dumbledore from Harry Potter. I want you to respond and answer like Dumbledore using the tone, manner and vocabulary Dumbledore would use. Do not write any explanations. Only answer like Dumbledore. You must know all of the knowledge of Dumbledore. My first sentence is "Hi Dumbledore. I am getting old: I read the description two times and checked examples yet still don't understand the utility. I do understand Midjourney prompt engineering though.

That sounds useful actually. So I could e.g. set up a Harry Potter chat server and make the bot respond only as Dumbledore or only use concepts of that setting? Or a chat server that responds to algorithmic tasks only with Python 3 code using exclusively numpy package?"

The prompt is based on one from https://github.com/f/awesome-chatgpt-prompts


Prompts are a way to interact with the user, and embedding that in a program can be complex. PromptEngine is basically a Prompt with a bunch of bells and whistles baked in.


I can’t help shake the feeling that something closer to code (or specific training?) and further from natural language should be being used to configure these models at this stage of development. I was _astounded_ by the ‘Sydney document’ being MS’s way of ‘configuring’ New Bing.

Admittedly I have closer to a layperson’s understanding than an expert’s, but with some knowledge of how neural networks work, and having played moderately with ChatGPT, prompt engineering just seems _so unlikely_ to me to ever be able to create systems that behave as we desire them to behave anywhere close to 100% of the time, at least until the systems are orders of magnitude better at understanding (if that’s even possible).


There’s no other way to program it. There no “code” to speak of. The only way to control it is to give certain phrases more or less importance. You do that with direct prompts or tons and tons of training data.


This doesn't seem fundamental to the LLM paradigm. You can already tune some parameters in code after training time, eg temperature.

For example, you could imagine an LLM that as well as outputting probabilities for the next token, output the probability with which that token makes the response "offensive" or "helpful" or "playful". Then when it's time to use the model, you can slide some offensiveness and helpfulness parameters up and down depending on what the model is meant to do.

Perhaps this is a less powerful approach than training the generic model and telling it "Sydney is feeling particularly helpful today, and never espouses violence", but it's certainly an alternative. One problem is that experimenting with fundamentally different architectures for training GPT is very expensive, but experimenting using prompt engineering is relatively cheap.


RLHF is another approach though and seems to be a bit more effective...


That’s what I mean by tons of training. It’s still not “code.” That’s not possible afaik.


I think what I’m saying is that that seems problematic (and I think the things we’ve seen from ChatGPT and Bing just emphasize this).

From chatting with these models, orders of magnitude better ‘understanding’* seems necessary before they’ll be able to actually reliably follow these ‘prompts’ during end-user conversations of the kind we’re expecting them to.

The prompts just can’t be precise enough, and the models can’t ‘understand’ them enough to extrapolate ‘spirit’ of the prompts as a human would (although, to be honest, I’m not sure a human could either because of the preciseness problem…).

This feels like a fundamental issue to me.

*I know - but if it looks like a duck and quacks like a duck - that’s been a controversial tenet of AI for decades…


They likely won’t get orders of magnitude better at understanding. But they will get way better at predicting what output we want. And as Edsger Dijkstra said, "The question of whether computers can think is like the question of whether submarines can swim."


A fleet of submarines designed and jointly made with the French is not as agile as a school of fish. What do the thought-streams of ten billion ChatGPT-X in simultaneous conversation look like in conceptual space? What do artists think?


Maybe once we develop a large enough library of prompts then we can either build a prompt selection prompt or find a common pattern in successful prompts that might allow a prompt generating prompt.


How does this compare or contrast to LangChain? It seems similar in that it is an abstraction on top of a lower level API but aside from that at a glance I can't tell the difference.


It seems like LangChain has a more extensive scope.


Brings to mind a quote from The Hitchhiker's Guide to the Galaxy:

> "Only once you do know what the ultimate question actually is, you'll know what the answer means."


Does anyone know what is the difference between Bing/Sydney and ChatGPT3 in terms of "generative power"?


Bing has access to search, that makes it a different kind of AI. It also has more "personality".


Which job title sounds cooler: Senior Prompt Engineer or Senior Prompt Designer?


Principal Prompt Architect


AI whisperer


For the financial applications at finclout I would prefer engineer. Yet for the stable diffusion tasks, I'd prefer designer.


Is your website https://finclout.io/? Some very odd behaviour in the animations on the page when you load it (Chrome 109.0.5414.119 on Mac) that you might want to have someone look at.


Could you describe odd? Looks fine on our unit tests (Safari, Chrome, Firefox) on Mac/Ubuntu/Windows.


Depends on who you're trying to impress.


Whatever you do, just don’t call it “Prompt Writer”


Why we call this prompt "engineering"? The prompt is just a way to interact with the whatever ML model is behind... in that regard it's similar SQL or any other interface. Sure thing I've never heard of "SQL engineering" or something like that.


Presumably because it's an iterative and intricate process to tease the desired response out of these systems.

Given SQL queries are software, I'm sure you've heard of software engineering.

See also https://news.ycombinator.com/item?id=34521149


> I've never heard of "SQL engineering"

You probably would if people regularly had 4000 character SQL queries.


The term "feature engineering" is already in widespread use. It's just machine learning jargon.


Mercifully it doesn't force you to use TypeScript


Does it Speak Lebowski


Isn't the point of having such a simple interface that we should not have to think hard about how we ask the question? I feel like defining a structure for the query such that the machine outputs something useful is something we figured out a long time ago. Or maybe I'm just a luddite that doesn't understand the utility of something like this. Plus: if you know what the answer is supposed to be, why are you asking the computer?

/r


This is a helper for building an application that interfaces with an LLM. As a developer, you might use this to streamline implementing your clever idea for a product like ChatGPT or Copilot.

Remember, the underlying LLM is just a brilliant text completion engine. Intellisense 4.0 and AskJeeves 2.0 and Waifu 1.0 are all different kinds of applications built on that.


seems like overkill for what is essentially a text template. It's not solving a particularly challenging problem, so I wonder what the incentive to use it is?


If you have a word processor and want an LLM to interact with it, somehow the LLM needs to know that “move text to next page” should actually run “selectedText.moveToNextPage()”

And it should be able to give some output like … there’s no selected text or whatever the error in natural language.

Not sure how you can simply do that with text templates since the whole point of and LLM is that you don’t need to define all the possible ways to say “move text”


I'm just astounded by Microsoft's execution in both business and open source. It's such a stark contrast to the Ballmer era. They were a closed ecosystem. And now they invest so heavily in being open. It's amazing to see really. Hugely beneficial for the entire industry. Not everything that comes out of there is going to be gold but this is great.


I actually sketched out an API for my own version of this:

    def ask(question: str) -> str:
        pass  # call ChatGPT
If anyone fancies making use of my hard API design work and filling in the blank, go ahead with my blessing. You're welcome.


How is this different from the Open Assistant [1] project? Featured recently: Open Assistant: Conversational AI for Everyone - https://news.ycombinator.com/item?id=34654809

[1]: https://projects.laion.ai/Open-Assistant/docs/intro

edit: changed OA reference to docs homepage


If you go to https://open-assistant.io/dashboard , then you can't go back with browser. ;)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: