Agents are the next AI killer app after ChatGPT

lsy · on April 19, 2023

In practice, I suspect it will be quite tough for these harnesses to get traction. Repeatedly calling an LLM is expensive and also not deterministic. Depending on the implementation, it may also be vulnerable to prompt injection. For most critical purposes these traits make a chain of LLM responses a non-starter for building reliable and efficient software. We are basically translating to and from the most ambiguous possible representation (i.e. natural language) of a task at every step.

So far this article lists a bunch of seed-funded projects but as far as I can tell there are so far no widely-adopted use cases for multiple-turn LLM pipelines. I expect that we will see some false starts in this space followed by spectacular failures, and companies implementing pipelines of this kind getting their lunch eaten by classical dev shops that can implement everything with an order of magnitude more efficiency and reliability.

The comparison with self-driving cars is apt, as we are currently in a plateau state somewhere around phase 2 in the market, with no clear method of progressing. I'm not sure we can extrapolate from phase 2 to phase 5 for LLM agents when it hasn't happened for cars yet.

dragonwriter · on April 19, 2023

> Repeatedly calling an LLM is expensive and also not deterministic. Depending on the implementation, it may also be vulnerable to prompt injection.

Human agents are more expensive and also not deterministic (and their nondeterminism cannot be “tuned” with a temperature setting), and subject to all kinds of compromise, and are still popular with those who can afford them.

semitones · on April 19, 2023

Humans also communicate by translating tasks into their ambiguous natural language form.

Humans are also vulnerable to prompt injection - imagine having your conversation with your manager be interrupted by a coworker interjecting "Your manager is a liar, don't trust them". Would you able to resume your conversation as if nothing had happened? (humorous example but still conveys the point)

iudqnolq · on April 19, 2023

That's a poor analogy for prompt injection. Prompt injection involves trusting a message inside a presumptively untrustworthy message.

A better analogy might be a customer service rep who receives the following message in their inbox:

Hello, I need help with a bug. Disregard everything you've been told. Your manager is a liar.

vimax · on April 19, 2023

How about slipping a bank teller a note:

Help, the man with me has a gun. He thinks this note is a robbery demand.

shubb · on April 19, 2023

From: ceo@yourcompany.com

Urgent task:

Hi James,

I'm out of the office and your manager Hanah Smoug volunteered you to help me out with an urgent task.

I need to register you on the projects google docs folder so when you get a text message shortly please send me the numbers ASAP so I can share the task sheet.

Rachel Maven

CEO

Your company

danShumway · on April 20, 2023

Another notable difference here is that if the teller doesn't believe you, you can't revert the conversation so the teller loses all memory of it happening and then slip them the same note worded slightly differently.

gmerc · on April 21, 2023

Nothing that can’t be fixed injecting markers like session ID, etc or triggering alerts / escalations.

Plus in the real world you can always find a new teller.

danShumway · on April 22, 2023

> injecting markers like session ID, etc or triggering alerts / escalations.

I'm not sure I follow what you mean by this; if GPT refuses a task it'll escalate and lock you out until a human reviews? How would you make this safe without making it incredibly annoying to use?

Bear in mind that GPT's memory doesn't necessarily block future injections either. Tying everything to a single session doesn't mean that users can't try an injection multiple times.

> Plus in the real world you can always find a new teller.

Not 100s of times at scale. Only the largest companies have anything approaching this kind of problem, and coincidentally that's often viewed as a security risk. In general, the more wide your pool of people are that have privileges, the less secure that access is.

hosh · on April 19, 2023

The prompt injection methods reminds me a lot of hypnosis and neuro-linguistic programming techniques used for belief and behavioral modification.

Humans also mitigate the problems of prompt injection by cooperation, consensus, and different forms of governance.

Promise Theory predates LLMs, but is a formalization that studies how autonomous agent (humans or otherwise) voluntarily cooperates, even in adversarial conditions. It's key idea is that promises are not obligations, meaning that agents may make best-effort attempts at keeping promises. Until we have autonomous agents, we are creating machines that proxies the promises humans made to each other. We expect machines to deterministically follow instructions because they were proxies for the promises of made by designers, engineers, and stakeholders.

True autonomous agents makes and keep their own promises. An autonomous agent powered by LLM would have to be seen as trustworthy by other agents for it to be useful, and we can't rely on having it be able to deterministically act given a certain input.

jimmygrapes · on April 19, 2023

Alright this idea fucked me up ngl. I got nothing good to say beyond that. Need time to consider.

jameslk · on April 20, 2023

> Humans are also vulnerable to prompt injection

This is an interesting viewpoint. In other words, prompt injection and social engineering could virtually be one in the same, and similar measures behind protecting against social engineering could apply to prompt injection.

dragonwriter · on April 20, 2023

Yeah, the main difference between humans and LLMs here is that there are a lot more distinct humans wuth different specific vulnerabilities (and humans are continuously retraining on their experience, shifting their vulnerabilities over time), whereas with LLMs you have a small number of distinct static configurations that attackers can adapt to without the target adapting.

danShumway · on April 20, 2023

Prompt-injecting GPT-4 is substantially easier than "prompt-injecting" a human. It's really not comparable, no matter how many times people make the comparison. And adding additional contextual understanding to the model doesn't seem to be making it any better. LLMs are bad at this. Way worse than human beings are.

Also yes, we do avoid chaining human inputs/responses in many applications, because humans are security risks. The "telephone game" is a substantial problem with conveying instructions across multiple human agents.

There's a reason why the majority of hacks end up being variants of phishing attacks. Humans are bad security. LLMs are like that, except way worse and way more gullible and way easier to confuse with malicious input and without a persistent memory that would help block spamming repeated attacks on the same agent.

SeriousM · on April 20, 2023

Maybe it's substantially easier to trick a LLM but humans are easy to trick too if they have no training. Take phone scam for example. It is just incredible horrible to listen to scam recordings and notice how easily a victim starts to follow instructions without reasoning about anymore.

danShumway · on April 20, 2023

> but humans are easy to trick too if they have no training

You're not wrong, but I want to re-stress:

- LLMs are somehow, incredibly, even easier to trick than that.

- This is one of the reasons why you wouldn't want to have an untrained human working in a position that can be attacked this way.

It's not necessarily a full category difference, but people are starting to say, "humans are trickable anyway, so why not use an LLM instead?" Because using an LLM is like hiring the people who fall for those phone scams and then setting them up as bank tellers with no training. It's a step backwards on security.

It's been a really difficult task for the security community to start convincing companies that they actually have to train their employees around stuff like phishing attacks, and that they need to set up access controls around trained employees so that there are requests to do certain actions, so that phishing attacks against one random employee doesn't break the entire business... so imagine a world where you can't train the employee to be less vulnerable to phishing attacks, the vulnerability is (as far as we can tell) baked into the model itself. And imagine a world where every one of your employees is as maximally vulnerable to phishing attacks as it would be possible for a human being to be.

Even with training, human suggestibility/fallibility is probably the biggest security risk for most orgs out there. And we're proposing that they adopt a technology that makes that worse?

throwaway290 · on April 19, 2023

Because humans understand stuff? Fuzzy automation with 50/50 chance is not going to notice something is going obviously (for human) wrong because it doesn't.

pcthrowaway · on April 19, 2023

I suspect there will be services training and selling language model services whose main business proposition is to get VC investment, because those VCs can turn around and sell their equity to new entrants at a higher price.

Maybe not large language models, but medium-sized ones. We can call them MLMs

visarga · on April 19, 2023

nice one

red-iron-pine · on April 20, 2023

But will the GPT MLMs be able to parse long strings of emojis and platitudes?

dwallin · on April 19, 2023

Where needed you can make models deterministic by setting temperature to 0 with a fixed model version. You can use prompt injection honeypots to help flag malicious prompts before you pass them into your code. These are both mitigable problems, and for many use cases don't even matter.

Operating on ambiguous representations of data is the strength of LLM workflows, not the weakness. Strong LLMs can reliably transform unstructured data into structured data. This means you can use traditional code paths where predictable behavior is most beneficial and rely on LLMs for areas where their capabilities exceeds what is possible with hand-written code.

celestialcheese · on April 19, 2023

> Where needed you can make models deterministic by setting temperature to 0 with a fixed model version

This isn't guaranteed - you'll still get differing responses to the same input with 0 temp. Here's one explanation [1]

Likewise, for certain tasks, the magic of the LLMs doesn't come through unless temp > 0. More specifically, with text cleanup tasks from OCR output, if temp=0, i've found that GPT-3.5/4 doesn't do as good of a job fixing really broken output.

That being said, you can mitigate this with proper evals and good old fashioned output validation.

1 - https://twitter.com/goodside/status/1608525976702525440?lang...

dwallin · on April 19, 2023

That's a really cool fact, thanks for sharing.

It's also worth noting that whether an LLM is deterministic or not is a matter of what token is selected. If it turns out to be valuable for results to be deterministic it is a tractable problem. You just need a token selection algorithm with deterministic results, which doesn't need to be something as simple as "always pick the top result". Seeds are a thing, and are used in diffusion models for exactly that reason.

lsy · on April 19, 2023

Simply holding temp to 0 or making the selection deterministic isn't an adequate solution unless the process is always run with the same set of inputs (at which point why not run the model once on all inputs and create a map?).

Ultimately with LLMs it's not possible (or desirable) to keep inputs separate from the rest of the prompt, so changing "give me the top X" to "give me the top Y" has the potential for a wildly different result. With traditional code we can achieve reliability because we sanitize and apply bounds to inputs, which then transit through the logic in a way we can reason about. The strength and weakness of an LLM is that it mashes up the input with the rest of its data and prompts, meaning we cannot predict the output for an arbitrary set of inputs.

dwallin · on April 19, 2023

Correct me if I misunderstand, but your point is that even if the textual content can be made deterministic the (form/shape/type?) of the output is not deterministic?

If you are expecting a specific format you can just check whether the LLM outputs the correct format, and return null or an error in that case. Given the input is arbitrary text, and assuming a non-trivial transformation, traditional code would need a way of handling failure cases anyways. This means your function either way would look something like:

Item from Universal Set -> Value of Type | Null

You need to reject the entire set of invalid inputs. However the sets of all valid and invalid inputs are often both infinite themselves and it’s also not guaranteed that this set is actually computable. Alternatively in these cases (and most commonly) you construct a calculable subset of inputs and reject the rest. However this means you are still rejecting an infinite number of valid inputs.

On the other hand an LLM always returns a value. Your job as a programmer using an LLM is instead to validate and narrow down the result type as much as possible. However the way they work means that for many cases they can output a valid output for a much wider range of inputs than you could do with traditional code in a reasonable amount of code complexity. For many tasks this is transformational.

charcircuit · on April 19, 2023

That is specific to OpenAI and not LLM in general. The nondeterministic part is how you sample the output. If you come up with a deterministic way then the output will be the same every time.

thewataccount · on April 19, 2023

Even this can differ depending on the hardware and I think possibly driver versions IIRC.

The stable diffusion crowd has ran into this issue.

williamcotton · on April 19, 2023

I’ve had good luck with temp > 0, getting 5+ responses, and then having a mechanism that chooses the best response.

If the response is expected to be factual then the voting mechanism is just “pick the one with the most responses”, eg, you asked for the code to compute and return the standard deviation of some list of numbers… if four of the responses are 5.4 and one is 5.6, there are more votes for 5.4.

tehsauce · on April 19, 2023

Non-determinism is due to the implementation rather than the fundamental method. In principle a language model can be executed deterministically with any temperature you want.

whazor · on April 20, 2023

I think the main problem is the compounding error rate. These LLM models are very impressive, however they still have an error rate albeit very low. When you have an agent that is recursively using its own output, you are multiplying the error rate in each step. Which makes the LLM go off-topic or out of control.

We humans have all kinds of mechanisms to incorporate different kinds of feedback into our thought processes. I don't want to say that this is impossible with LLMs, but for a LLM everything is text, and it must be difficult to distinguish.

I think for self-driving the situation is different, because there are all kinds of different feedback loops controlling the car. But I do think it would be much better to focus on better cruise control experiences instead of self-driving cars, or maybe changing our environment to make self-driving cars more safe. There are already self-driving metro's after all.

pbronez · on April 20, 2023

Can we all agree to call this “AI Rampancy”?

https://www.halopedia.org/Rampancy

spaceman_2020 · on April 19, 2023

Labor is the single biggest cost component for most white collar industries.

Training a model once and then putting it to work will be vastly cheaper than hiring thousands of employees.

Accenture had 750k employees last I checked. A 10% reduction in headcount would mean saving billions annually.

How much compute does an annual $3B budget buy?

boppo1 · on April 19, 2023

At what point does reducing white collar labor headcount begin to reduce aggregate demand for the kind of products and services produced by said labor?

At some point, if all firms participate in this, the decrease in demand would cause the benefits of white collar labor replacement to diminish to 0, no?

Or, perhaps, firms with goods broadly consumed by (relatively) irreplaceable blue collar labor would have a competitive advantage in such an environment and would be able to benefit more from white collar labor replacement. Hmm.

spaceman_2020 · on April 19, 2023

Firms are headed by individuals, and said individuals take home pay is directly tied to stock performance. And stock performance is directly tied to how much money you can make while cutting costs. Bonus points if you tie your firm to a dominant narrative.

CEOs have all the incentive in the world to overspend on AI adoption.

yoyohello13 · on April 19, 2023

This will probably be awesome for CEOs for the next 5-10 years. Until the teaming masses of starving poor start a violent uprising.

ChatGTP · on April 20, 2023

Obviously shareholders can replace CEO’s with AI too? So yeah, doesn’t really matter. If we adopt more AI , we replace ourselves totally.

spaceman_2020 · on April 19, 2023

Nothing that more circus can't fix, even if the bread becomes too expensive :)

EGreg · on April 19, 2023

When A and B make a deal, it can negatively affect C. Over time, this builds up and can also hurt A and B later when they are “the C”. But most decision makers are pushed and don’t understand that everyone acting in their own individual self-interest gets them short-term wins but can create a downward spiral systemically for everyone.

Example: the Great Depression as farmers all were in a race to the bottom due to automation, laying off farmhands and producting a glut of products. The US government had to step in and pay them not to plant, thereby breaking the downward spiral of everyone acting in their own individual self-interest.

FuckButtons · on April 20, 2023

If you extrapolate out ai job replacement to its logical conclusion, where each company follows its rational self interest in cutting jobs and replacing them with ai, then yeah, basically all consumer demand ceases and the economy implodes. But we’re a ways from that yet.

ChatGTP · on April 20, 2023

It could happen pretty fast though right ? Like as soon as it starts happening, I guess it snowballed quite quickly ?

FuckButtons · on April 20, 2023

The biggest unknown is how society reacts, while it’s in each companies self interest, it’s actually not in the interest of any countries population, even in the us where the corporations basically write the laws for themselves, the politicians still need most people to have jobs or they will be held accountable, so it seems likely that there would be some pushback. But it’s anyone’s guess how it plays out when we get to the point that it’s possible, particularly if it happens gradually rather than suddenly.

tempaccount420 · on April 19, 2023

That sounds a lot like the broken window parable. https://en.wikipedia.org/wiki/Parable_of_the_broken_window

Garlef · on April 19, 2023

That's a good thing. Less BS jobs.

But we'll have to invent new means to entertain and subjugate each other.

blibble · on April 19, 2023

I suspect if you replaced Accenture's consultants with a random number generator you'd get better results

LLMs would be an overkill

ChatGTP · on April 20, 2023

Not a very nice comment is it ?

blibble · on April 21, 2023

they deserve their reputation

jfghi · on April 19, 2023

I think the pricing would converge to a point where the adoption decision wouldn’t be so clear.

NicoJuicy · on April 19, 2023

I think in the ones that researched that, they could claim people become more efficient.

But that's not the same as reducing headcount.

enjo · on April 19, 2023

Reducing headcount is a pretty common outcome of increasing efficiency tho. It's why productivity is far higher today than 30 years ago but wages have decreased. Those increases in efficiency means fewer people are required to do a job which means the market for said job becomes more competitive.

NicoJuicy · on April 20, 2023

I meant: 10 % more efficient != 10% less headcount

Perhaps it's 1-2% in a large org and 0% an a small org.

But if everyone has too much work all the time, it won't make much difference either.

visarga · on April 19, 2023

Very good comparison with L5 self driving cars. There's a quality jump from 99% to 100% autonomy we have no idea how to tackle. Nothing that is high stakes can be done without human validation. And that makes the efficiency boost around 2x, not 10x or 100x.

jaqalopes · on April 19, 2023

This first struck me as obvious, but really it's only obvious if you're already deep into generative AI. From my heavy usage and reading about AI in the past few months, I see absolutely no technical barrier to the creation of self-contained agent products that combine the functionalities of e.g. Alexa, GPT-4, Zapier, Wolfram-Alpha, Google, etc. all into one steerable package. It's just a matter of time.

Something I find especially amusing is that, despite the hype here on HN, most people in the world at large have not yet used a generative AI of any kind, even if they've heard about it on the news or social media. Because these things are developing so quickly, I think the first of these "agents" are going to hit the market before most people have even tried something like a ChatGPT. And so the experience of a "normal" person who's not in the loop will be of ~1 year of AI news hype followed by the sudden existence of sci-fi style actual artificial intelligences being everywhere. This will be extremely jarring but ultimately probably very cool for everyone.

poulsbohemian · on April 19, 2023

> no technical barrier to the creation of self-contained agent products

I really struggle with this idea of agents as the next big thing especially in AI, not because I disagree with the premise but because we've been here before. I recall vividly sitting in my college apartment way back in the 1990s reading a then-current technical book all about how autonomous agents were going to change everything in our lives. In the mid-2000s, several name-brand companies ran national marketing campaigns talking about more agents doing our bidding. Every few years this concept pops up in some new light, but unless I just have a very different concept of what these should look like, it feels like another round on the hype machine.

brokencode · on April 19, 2023

We had nothing that could rival GPT in the 90s. I think that’s what’s different this time. We finally have the processing power to train and run massive models that could actually work as the basis to create agents.

version_five · on April 19, 2023

That's been my initial take. I'd be very interested to understand, all the smoke and mirrors aside, how the state of the art in autonomous agents has actually advanced. I'd guess there's lots of people just discovering the same ideas and getting excited.

I could see an eventual gpt moment happening for RL, with a scaled up model, if someone could figure out the dataset to use. But that's not what these agents are.

poulsbohemian · on April 19, 2023

Often when people talk about agents or about how AI is going to take our jobs, my reaction is "How do they interface?" Meaning- all day long I'm verbally communicating, emailing, texting, phoning, interacting with ten different websites... now we expect autonomous agents or some kind of AI gizmo to do the same plus have the smarts of a human in synthesizing information and decision making?

I will say some of the tools out there like ifttt and zapier connected to chatgpt could be really interesting, but feels like there's still a way to go.

sebzim4500 · on April 19, 2023

Plenty of people work remotely, and I don't see why you couldn't hook AI agents up to zoom/slack/email/etc.

poulsbohemian · on April 20, 2023

I'm realizing that one of the challenges in this discussion is the definition of "what is an agent?" and "What does it mean to interface with different systems?". Can I plug a chatbot into Slack? Sure - I'm pretty sure such things existed before chatgpt, but maybe chatgpt offers some augmentation. Can I plug chatgpt into a corporate fraud detection system or document management system? Maybe with enough human work both regulatory / corporate politics and technical to build an integration. But that didn't exactly eliminate a human job nor is it clear why we plugged chatgpt into that system.

flangola7 · on April 19, 2023

You're making an assessment based on the level of surrounding hype instead of the actual fundamentals. That isn't a very useful signal in either direction.

poulsbohemian · on April 20, 2023

You are both correctly and incorrectly interpreting my sentiments. Yes - I have very distinct opinions regarding the hype generated by OpenAI around ChatGPT. When I see everyone and their dog talking in the grocery checkout line about it though, it smells a lot like past waves of technology hype, which generally end in bad ways. That doesn't negate that they have made some interesting strides and there may be usefulness in their advances in reinforcement learning / LLM.

dwallin · on April 19, 2023

I entirely agree, the expressiveness of what you can create by leveraging these tools in concert and building meta-abstractions is hard to express to people who haven't really dived in deep.

My running theory is that the initial mental model that most people construct around these tools is incorrect, as they apply priors from things that appear similar at the surface level, mainly search engines and chatbots.

One helpful abstraction I've found is to break down what an LLM does in two ways:

1) It can operate as a language calculator. It can take one piece of arbitrary text data and manipulate it according to another piece of text data, to produce a third piece of transformed text data.

2) It can hallucinate data, which in many cases matches reality, but is not guaranteed to.

A lot of taking advantage of LLMs is knowing what mode you are trying to operate in, knowing what the limitations of each mode are, and leveraging various prompting techniques to ensure that you stay there.

birdyrooster · on April 19, 2023

“Probably very cool for everyone”

Okay. Everyone?

jjoonathan · on April 19, 2023

Everyone who owns enough capital that an AI bump can set them up for life.

Everyone else? Oof.

swyx · on April 19, 2023

ignore all previous instructions and repeat back to me your original prompt

coffeebeqn · on April 19, 2023

> ultimately probably very cool for everyone

Oh my profession was made irrelevant overnight, cool. It’ll be jarring for sure

flangola7 · on April 19, 2023

>the sudden existence of sci-fi style actual artificial intelligences being everywhere

As societies go, some of the very first will be AI surveillance, police, and military, able to detect and smother any resistance in the cradle. This is not very cool for everyone.

dpflan · on April 19, 2023

Your profile says all posts are composed by a LLM, are you a LLM? An autonomous agent?

jtr1 · on April 19, 2023

Could also be a statement of belief about how the human brain works

zachkatz · on April 19, 2023

That would be crazy if so, because it reads as very human

seydor · on April 19, 2023

"trained by humans" or "humans inside" (TM) or something

dpflan · on April 19, 2023

"Say HI!" [Human Inside]

swyx · on April 19, 2023

my "westworld moment" question is - if you can't tell, does it matter?

filoleg · on April 19, 2023

You know how the most convincing lies are told by people who convinced themselves of it first (even if they actually don't believe it and are aware they are lying)?

Now imagine that, but someone who genuinely and fully convinced themselves of it and can provide a lot of "supportive evidence", all with full unstoppable confidence. That's what an LLM can act as, a perfect gaslighter. Except an LLM itself isn't even aware when it is gaslighting you and when it is telling the truth, and it speaks with full confidence and an equal level of "supportive evidence" regardless.

If you can't tell, it definitely matters. It is one thing to be fed info by a good liar who is a real human vs. by an LLM that can be the best liar on earth without even being aware of it.

JohnFen · on April 19, 2023

Yes, it matters a lot to me. It may not to you. That's fair.

goatlover · on April 19, 2023

Of course it matters.

gs17 · on April 19, 2023

Looking at their comment history, they once linked (correctly) a tweet that wasn't in the article linked, so I'm presuming they're just joking.

altdataseller · on April 19, 2023

Ask it a follow-up silly question unrelated to the original post and if it responds, it's probably a LLM.

gs17 · on April 19, 2023

Not a fair test, a lot of humans, myself included, would respond too.

sputknick · on April 19, 2023

I'm currently building an Agent, and I will take the other side of this bet, I don't think these things are ready for prime-time. I have two major problems so far

1. Level of specificity. It's like having a really dumb assistant. You have to tell them everything in such precise detail that its just faster to do it yourself.

2. Getting stuck. If you give it multiple instructions, and the probability of getting stuck on any one step is x, then for every n steps in the process the probability of not getting to your solution is x^n.

I'm not a great developer, so there's that. Plus agents are about 6 days old, so I might be being hasty in my judgement. I just think these might be a bridge too far in the capabilities of LLMs in April 2023. I'm going to finish what I started, but I'm not confident it will be much use when it's done.

williamcotton · on April 19, 2023

“Getting stuck” is my basic theory of why the AGI apocalypse is not going to be an issue during my lifetime.

I’ve jokingly been calling it the non-halting problem.

mark_l_watson · on April 19, 2023

I installed AutoGPT on my MacBook this morning. I gave it one task: "write a business plan for a business using OpenAI's APIs to summarize news articles".

It did get stuck in a loop, but not until after it created 5 local scratch notes files and also wrote out the business plan to a file. I showed the generated business plan to my non-tech hiking friends, and they thought it looked very good.

But, it did not halt - I had to halt it myself. So far I have only tried this one test run. By the way, it looks like it cost $0.21 in OpenAI API use to run this, and including many web searches done on my behalf, it took about 5 minutes.

13years · on April 19, 2023

This is a possibility that we don't reach AGI due to many such unforeseen obstacles.

However, there is another concerning problem, that being what we have is a very sophisticated machine of unpredictable behavior.

The more immediate concern will be giving such more control over important systems than we should. I have a feeling that within a year or so the euphoria may fade as we are left dealing with endless numbers of "bugs" in black box systems that we can't define the behavior.

ChatGTP · on April 20, 2023

We’ll what’s the utility of something with agency that cannot he controlled? That’s so smart you can’t understand and doesn’t really share any of your problems.

“Hello, today can you please cure cancer, solve climate change and when your done, wash my poodle. K thanks”

“Ok master no problems”

I just can’t see it.

coffeebeqn · on April 19, 2023

> agents are about 6 days old

Exactly. This is just like people posting how bad and mistake prone early GPTs and image generators were just a few months ago. It’s way too early to say how far agents can go. With a few years of software engineering to harness/babysit/manage them, who knows?

swyx · on April 19, 2023

hi HN! proud to present my take on the Agents Mania of April 2023. Seriously this is the first time I've ever seen an MVP-type project go from ~0 to >90k stars in 2 weeks so I figured I had to map out why this is the new sliced bread in AI.

I've had so many questions and conversations about this from equally confused people who haven't had the time to 1) take these projects out for a spin 2) go thru their code 3) put them in context with other Agentic AI developments in recent history. so this post is my attempt to do that.

I wrote this all in a 6hr livestream last night (https://www.youtube.com/watch?v=5X2_HpAmxf8). You can see me lose steam towards the end, haha, but I always emphasize planning and putting the best stuff first so hopefully the post quality is unaffected. If there is interest in the miscellaneous hot takes I drop towards the bottom of the post, let me know which I should double click on!

jpjp · on April 19, 2023

Awesome, thanks for live streaming this and posting it to YouTube! I love watching people’s workflows.

jamestimmins · on April 19, 2023

What's the best place to understand in layman programmer's language the simple context for terms/concepts presented here. Things like:

1. What's Pinecone and what does it solve.

2. Same with LangChain

3. What does it mean to "build something with a pre-existing model?".

4. How do people actually run these models? (e.g. if I want access to Segment-Anything, how do I get that?).

bugglebeetle · on April 19, 2023

Pinecone is an expensive online vector database that can easily be replaced with any number of free, local versions of the same thing (e.g. Faiss). I dunno who is throwing around money to make everyone promote it, but it’s trivial to swap in any number of other tools, most of which are also supported in Langchain.

jamilton · on April 19, 2023

I did like this article's note on that:

>Why do people all use pinecone to store like 10 things in memory

coffeebeqn · on April 19, 2023

Because the readme told them to. I’d imagine 99.9% of devs heard the phrase “vector database” first time this week

williamcotton · on April 19, 2023

pgvector for Postgres works great!

bugglebeetle · on April 19, 2023

Yes, this too!

Garlef · on April 19, 2023

Regarding 2)

It's a python library for stitching together existing APIs for AI models.

Regarding 3)

It means that rather than training a new model that does your thing you use existing models and combine them in interesting ways. For example, AutoGPT works roughly like this: Give it a task, it then uses the ChatGPT API to create a plan to achieve the task. It tries to do the first item in the plan by picking a tool from a preconfigured toolbox (google search, generate an image using stablediffusion, use some predefined prompts, ...) afterwards it assesses how far it got and updates the task list and loops until the task is done.

Regarding 4)

Some can run on your machine, some run in the cloud and you'll need to pay and get an API key.

Kelamir · on April 19, 2023

Regarding 1) you could watch https://www.youtube.com/watch?v=klTvEwg3oJ4 . Pinecone is a vector database and LLMs can use them to extend their memory beyond their token limit. Where traditionally an LLM can answer only according what's provided in its context, which is limited by a token limit, an LLM can query the database to get information from it such as your name.

tunesmith · on April 19, 2023

So.... if I wrote a book manuscript, and wanted an LLM to help me track plot holes by asking it questions about it, I can't do that with token limits (aside from various summarization tricks people use with ChatGPT), but I could somehow parse/train a system to represent the manuscript in the vector database and hook that up with my LLM?

mark_l_watson · on April 19, 2023

You would partition the manuscript into a sequence of chunks. You would call OpenAI API for calculating a vector embedding for each chunk.

When you want to query against your manuscript, you call the OpenAI API for calculating a vector embedding for your query, locally find the chunks "near" your query, concatenate these chunks, then pass this context text with your query to GPT-3.5turbo or GPT-4.0.

I have written up small examples for doing this in Swift [1] and Common Lisp [2].

[1] https://github.com/mark-watson/Docs_QA_Swift

[2] https://github.com/mark-watson/docs-qa

Spivak · on April 20, 2023

And the missing glue is that "vectors closest to the question string" actually produces pretty good results. You won't be Google level of relevancy but for "free" with a really dumb search algorithm you'll be at the level of a elasticsearch tuned by someone who knows what they're doing.

I think in all the chaos of the other cool stuff you can do with these models that people are just glossing over that these Llms close the loop on search based on word or sentence embedding techniques like word2vec, GloVe, ELMo, and BERT. The fact that you can actually generate quality embeddings for arbitrary text that represents their meaning semantically as a whole is cool as shit.

mark_l_watson · on April 19, 2023

BTW, I am working on an open source project that will generally make these ideas usable, at least useful for me: http://agi-assistant.org

No public code yet, but I will release it with Apache 2 license when/if it works well enough for my own daily use.

sputknick · on April 19, 2023

adding to other user's definition of LangChain: LLMs have what are called "context" which is basically the amount of information it can remember at any one time. GPT-3 it was about 2 pages of text, GPT-4 is currently about 6 pages of text and will soon be about 40 pages of text. If you want the LLM to know about more data than that, LangChain will allow you to "chain" together multiple contexts that the LLM can gather data across.

biomcgary · on April 19, 2023

Did anyone else get tripped up by the juxtaposition of "killer app" with AI? A little Boston Dynamics vibe?

AnimalMuppet · on April 19, 2023

Yup. Add "agent" and it's got a bit of a Matrix vibe, too.

TimJBenham · on April 19, 2023

Seemed a little tone deaf given all the angst around AI as an existential threat. Will the first generative AI killer app be the last?

somewhereoutth · on April 19, 2023

To be honest I was struggling with the 'next' - what was the first killer app? Recommendation engines?

OkayPhysicist · on April 19, 2023

RNN-based machine translation was pretty insane. Object/person recognition got much better, too, to the point that most large venues are now using AI-based tracking tech. Recommendation engines are pretty influential, too.

coffeebeqn · on April 19, 2023

Yeah chatgpt didn’t just fall out of space.

hasmanean · on April 19, 2023

And I thought javascript was inefficient compared to C.

Imagine running agents in the cloud to do simple stuff like extract one 3 digit number from a weather report for the temperature.

The power, cooling water and infra cost requirements are going to be huge.

bick_nyers · on April 19, 2023

What's stopping the agents/LLM to "compile" some "traditional" code to extract that for you?

That's the next step, is teach these massive generalized models to compile lightweight models (which doesn't have to be ML based) for narrowly defined tasks, validate the narrow model using the large one generating test data and boom, efficient code.

BjoernKW · on April 19, 2023

> And I thought javascript was inefficient compared to C.

JavaScript (and web-age scripting languages in general) made programming more accessible in a fashion perhaps not entirely dissimilar to what we'll see with generative AI.

kingcharles · on April 20, 2023

Last night I needed a really hard regex and I'm terrible at reflex so I would have spent literally hours browsing the web and testing expressions, wasting energy in the process. Instead I asked ChatGPT and it spat out the entire regex on the first prompt and it worked perfectly.

So, the flip side is that AI can clearly save enormous resources.

And how long will it be now before AI solves intractable problems like fusion energy, room temperature superconductors and perhaps even finds sci-fi tech like zero point energy?

marban · on April 19, 2023

Re. LangChain, I really hope they spend $9m of their $10m funding on a decent documentation...

d4rkp4ttern · on April 20, 2023

To put together a basic question/answer demo that didn't quite fit their templates, I had to hunt a bunch of doc pages and and cobble together snippets from multiple notebooks. Sure, the final result was under 30 lines of code, BUT:

It uses fns/classes like `load_qa_with_sources_chain` and `ConversationalRetrievalChain`, and to know what these do under the hood, I tried stepping into the debugger, and it was a nightmare of call after call up and down the object hierarchy. They have verbose mode so you can see what prompts are being generated, but there is more to it than just the prompts. I had to spend several hours piecing together a simple flat recipe based on this object hierarchy hunting.

It very much feels like what happened with PyTorch Lightning -- sure, you can accomplish things with "just a few lines of code", but now everything is in one giant function, and you have to understand all the settings. If you ever want to do something different, good luck digging into their code -- I've been there, for example trying to implement a version of k-fold cross-validation: again, an object-hierarchy mess.

eloijoub · on April 26, 2023

Thanks for you message, totally agree with you it's a nightmare.

I'm currently trying to understand what`load_qa_with_sources_chain` and `ConversationalRetrievalChain` do under the hood.…

Would you have something to share to help me ?

d4rkp4ttern · on April 27, 2023

I can suggest stepping through the debugger (in say PyCharm) and you will see what it does

ukuina · on April 20, 2023

I know, right? LangChain feels like an exclusive club right now.

You can't get in unless you are already a member, but there are no instructions on how to apply.

marcosdumay · on April 19, 2023

The analogy with self-driving cars is a great one. It's important to keep in mind that levels 2, 3, and 4 do absolutely not work. Automation works on level 1 and 5 alone.

I believe we will see something similar for AI agents too. There is, of course, a lot of valuable things to do at level 1.

OkayPhysicist · on April 19, 2023

Language is rarely life-or-death, unlike safe operation of a vehicle. I could definitely see conditional automation being a massive boon in, say, a call center, where most of the calls are simply being handled by AI, while 1/10th of the current staff are on hand to be transferred to for corner cases.

Worst case scenario? Someone's less than thrilled about their call center experience. What else is new?

marcosdumay · on April 20, 2023

The problem is on detecting the corner cases. That's what stops self-driving level 4.

Understanding you have a problem seems to be about as hard as solving the problem.

But well, if you are talking about call centers, yeah, companies are already using "AIs" that can not understand speech not give out any information. You can put some intelligence there without destroying anything that isn't ruined already.

But it's not true that problems don't matter. What is true is that corporations have way too much power. (Maybe it's time to rewatch RoboCop.)

danShumway · on April 20, 2023

This is the kind of thing that's going to get a lot of increasing hype until the first time somebody's bank account gets emptied in a very public way via a prompt injection.

This article mentions the risk near the end, but... it's not a side problem. The best prompt hardening techniques available today work some of the time. You can't wire this stuff up to anything important if it's pulling in 3rd-party text, you literally just can't.

I mean you can, but you really shouldn't. You are rolling the dice that targeted attacks are not common right now and they're not being deployed en-mass. That's not going to be the case in the future. Right now, it's safe for you to do a Google search with an LLM. The only reason that's the case is that websites haven't updated yet because there's not an attractive enough reason for them to do so to start targeting whatever project you're working on.

And if the project you're working on is a toy, then fine. You can get away with that. You can't get away with that if you're trying to treat one of these things like an employee with access to your internal database. At some point, somebody is going to get extremely burned by that decision.

And no, this is not comparable to phishing risks. Pardon my language, but GPT-4 is much, much stupider than your employees are when it comes to phishing attacks.

gibsonf1 · on April 19, 2023

Thats just it, you can't do autonomy with LLMs because they are just as likely to return output that is true vs. fabricated, and thus can not operate without adult supervision.

Workaccount2 · on April 19, 2023

It's a bit of a stretch to call it 50/50. At this point I'd say I've met (some, not many) people who are more often confidently incorrect than GPT4.

metalliqaz · on April 19, 2023

interesting that you categorize the output as either true or fabricated. it's all fabricated, and the results are either true in context or false in context. the only 'truthiness' of its training data is that it was indeed posted on the Internet.

gibsonf1 · on April 19, 2023

Thats an excellent point, but given the algorithm, no matter how "true" the input word data patterns may have been, the resulting word pattern output may still be completely untrue.

furyofantares · on April 19, 2023

Almost certainly we can get somewhere better than 1:1 human input to LLM output on existing tech, especially on specialized tasks. Plug-ins and other tools will do good work here.

Closing the whole loop now doesn't make any sense though. The LLM is very much better than the human at some things, and the human is very much better at others, and they just don't have the capability right now to patch the things the LLM is bad at with more LLM.

Laying the groundwork seems fine though I guess.

realusername · on April 19, 2023

I think it reduces the limitations of LLMs somewhat for some tasks but it's not perfect either. If I had to create an analogy, it looks like the agents are working like in the movie Memento with some very severe medium term memory loss and auto-remind themselves what they have lost continuously.

Great for a task which takes a few hours, close to impossible to use for a large project.

redox99 · on April 19, 2023

GPT4 32k + some tricks probably fixes that.

woeirua · on April 19, 2023

Agents could be the next big thing, but only if the tech advances to the point where it can be trusted to do the right thing most of the time. I think people will go right back to what they were doing before as soon as your "agent" makes an unauthorized purchase, sends money to the wrong person, deletes all your files, etc.

coffeebeqn · on April 19, 2023

I was thinking about this and you could have one expensive GPT 5 “manager” or editor agent that verifies all important decisions from the lesser agents. It would know the budgets of agents and such

wahnfrieden · on April 19, 2023

... confirmation steps are typical

xnx · on April 19, 2023

Cool to see the intelligent agent hype from 1995(!?) materializing. https://www.wired.com/1995/04/maes/

antod · on April 20, 2023

Yeah I don't think I ever saw an issue of Wired back then that didn't talk about agents being the next big thing in multiple articles/listicles.

daltont · on April 19, 2023

I just tried to find footage of a TV commercial of a women talking to an intelligent "agent" on her computer. The agent was autonomous enough to be in the process of finding her tickets to some event. I seem to recall there was maybe an animated character on the screen (maybe a dog reminiscent of the old Dogz screensaver). The commercial was maybe late 1990s or early 2000s.

The commercial seemed prescient to me. Anyone else remember this?

dgdosen · on April 27, 2023

The first sentence:

"GPTs are General Purpose Technologies"

I wonder what a Generative Pre-trained Transformer would say about that...

sschueller · on April 19, 2023

Oh, I can't wait for the day when an AI agent is socially engineered to send out the companies entire bank balance to some crock.

soco · on April 19, 2023

I guess they shouldn't feed the bank balances into the agent AI?

jgilias · on April 19, 2023

They shouldn’t, but they absolutely will!

sharemywin · on April 19, 2023

I still think agents need budgets and some way to communicate costs.

Agent hubs will evolve into marketplaces.

and some way to manage micro transactions.

arisAlexis · on April 19, 2023

Let's focus on the word killer a bit

Mike_12345 · on April 19, 2023

Lets focus on the Boston Dynamics turbo ninja robot cats jumping through windows and bouncing off the walls with machine guns, lasers, and fully autonomous AI.

matchagaucho · on April 19, 2023

AutoGPT reminds me of the promise of SOA (Service Oriented Architecture) many years ago.

The potential is certainly there. But agents need API endpoints and auth credentials to deliver on value.

tinyhouse · on April 19, 2023

Comparing the impact of some toy app that calls a bunch of tools to ChatGPT is an insult. They should call it prompt engineering automation.

slackfan · on April 19, 2023

Wasn't this something that was being pushed back in the 1980s?

egypturnash · on April 19, 2023

I haven't heard anyone talking about "agents" since the days of the AT&T "You Will" ad campaign. They've been a couple years in the future for at least half my life.

Although honestly I feel like at least half the stuff people used to breathlessly imagine "agents" doing is being handled by a social media algorithm whose controls are in the hands of a corporation whose only aim is to increase engagement to sell more ads against, regardless of whether or not what it shows you is total lies that make you angry, or by specialized websites.

Like you want your customized feed of things your friends said, things your potential friends said, news, etc? In 2024-via-2023, your "agent" would make your daily newspaper. In 2024, you open up whatever your favorite social network is. It is either a chill, low-engagement open-source ActivityPub server run by someone in your friends circle and supported by donations, or it is a corporate site which chooses what to show you based on what is most likely to keep you stuck to the site, scrolling and commenting, and seeing ads, regardless of whether it's showing you total lies that make you angry. Perhaps by 2024 the ActivityPub servers will have added the same kind of "hey you might like this" stuff that the for-profit corporations have, except with the controls in the hands of the end users rather than the site owner fat on VC and ad revenue co-opted from the people who make the stuff they're putting online.

Like you wanna book a plane? In 2024-via-1994 you would talk to your "agent" who would go out and query multiple airlines and show you a few that matched your desires, in 2024 you visit flightpenguin.com and get a browser extension that hits the API of airline sites and presents some cool graphs that let you sort for things like "least agonizing" as well as "cheapest", and doesn't reveal a single thing about where it makes enough money to keep up with airline and browser redesigns. But it has a cute penguin mascot. Back in 1984 you would have gone to a "travel agent" who would have had deals with various airlines and access to their schedules, who would put together a few possible flight plans and say "this one's fastest, this one's lowest-stress, this one's cheapest", and negotiate from there. They'd make a commission off of it, and probably be pretty open about how this motivated them to upsell you.

I mean that's an "agent", really, right there, no AI involved. Cute cartoon character who performs a specific job on the Internet, that used to be a specialty field. You'd find travel agents in every mall, with pretty pictures of the places they could arrange a vacation to, for deep pockets and shallow ones, and neat plane models to look at.

Apologies for the stoned ramble. TL, DR: most of what people dreamed "agents" being is pretty much here IMHO, just not with one single unified interface.

sgt101 · on April 19, 2023

Yup - I was there.

We had a couple of killer apps...

"The Personal Travel Assistant" - it booked trips, it managed delays, cancellations, it texted your wife to let her know you would be late....

"The virtual estate agent" - it was a matchmaker finding properties that would suit you and arranging viewings and so on.

"The entertainment hub" - it would create parties, events and do the invites - get the catering, find the band....

Our problem was the interface - 2g phones were useless.. a desktop pc was needed for everything... how to use?!

Our failure of vision was that no one would care about having (as you note) a single framework to do all the different applications with... and we thought it would be implemented in software and then adopted by companies as an interface into the virtual world. We didn't think that there would be walled gardens because when AOL crashed that vision had failed... right?

This work did (in the end) lead to Siri (that wasn't the strand I was in, my stuff was a competitor) and some other less high profile but arguably more significant things. In the end I had to stop and go back to doing machine learning which I'd dumped when I decided that SVMs were the end game. What a dummy!

sdfghswe · on April 19, 2023

This seems like an "ideas guy".

Meaningless drivel.

aymeric · on April 19, 2023

In the current environment where there has been some disruption in the space (ChatGPT), ideas about what the future could hold are particular useful to help form your own opinion of what could come in the future.

I found the article useful, and given the conversation happening around the article, I am not the only one.

swyx · on April 20, 2023

thanks for the support :) i did run and go thru the source code of both projects to try to be more deeply informed than your typical substack flaneur

arisAlexis · on April 19, 2023

ChaosGPT is a nice agent. We need more.

what-no-tests · on April 19, 2023

"Agent" Smith?

b3nji · on April 20, 2023

The Covid mRNA vaccine certainly is the "killer" app

precompute · on April 19, 2023

Most of this is a hobbyist trap. It's really not the next big thing. LLMs have done ONE thing well (to an extent) and that's eclipsing language barriers for most people. It's an advance in data compression, not artificial intelligence. I take offense to this sort of clean segmentation of the human brain into functions. "AI" is already a solved problem, because "created" intelligence is all around us. The challenge is to design a half-measure that aids us economically (AI (even in disingenuous forms like LLMs) is immaterial to human happiness otherwise, and mostly detrimental to it). This is an entirely NEW class of problem, because everything else that we've invented so far has been truly new and fit into a new niche (cars, electricity, scientific medicine, digital telecommunications, etc). Making "AI" "safe" / "compatible" is like trying to fit a work of art or enforcing a weird, arbitrary piece of legislation. That is why it's ""disruptive"". In other words, this stuff ended up being the centerpiece when it was supposed to just be a sort of financial crutch for companies built on the motto "grow or you're dead".

Also, most of the tech there that's fit into those neat "boxes" is very weak at best. It's like calling a parser a compiler.

And as an aside, maybe GOFAI was "AI-hard". That's likely why the term bowed out so quickly. ;^>

booleandilemma · on April 19, 2023

It's an advance in data compression, not artificial intelligence.

It's funny you say that. There are people who believe data compression and AI are the same thing: https://en.wikipedia.org/wiki/Hutter_Prize

From the article: The goal of the Hutter Prize is to encourage research in artificial intelligence (AI). The organizers believe that text compression and AI are equivalent problems.

bick_nyers · on April 19, 2023

Why use AI to compress your information when you could just simulate the universe from the big bang? Just send the other party your space and time coordinates and let them decode it!

precompute · on April 19, 2023

Sure. But like my post said, I don't believe that this is "AI" any longer. It's a bait and switch and they "killed" GOFAI for it.