Hacker News new | past | comments | ask | show | jobs | submit login
Introducing Agents in Haystack: Make LLMs resolve complex tasks (deepset.ai)
202 points by vishesh92 on April 3, 2023 | hide | past | favorite | 102 comments



If you (like me) were wondering how these works, the LLM is given a prompt like:

  Answer the following questions as best you can. You have access to the following tools:
  Search: Use this to search the internet.
  Calculator: Use this to do math.

  Use the following format:
  Question: the input question you must answer
  Thought: you should always think about what to do
  Action: the action to take, should be one of [{tool_names}]
  Action Input: the input to the action
  Observation: the result of the action
  ... (this Thought/Action/Action Input/Observation can repeat N times)
  Thought: I now know the final answer
  Final Answer: the final answer to the original input question

  Question: What is the age of the president of Egypt squared?
  Thought:

To which the LLM will generate a completion like:

  Thought: I need to find the age of the president of Egypt.
  Action: Search
  Action Input: Age of president of Egypt
  Observation:

At which point, the code (langchain, haystack, etc) will parse out the requested tool (Search) and input (Age of president of Egypt), and then call the right tool or API, then append the output of that action into the prompt.

This all happens in a loop, at each step, the LLM is given the entire past prompt history, and given the opportunity to do a completion to choose the next tool and input to the next tool, after which the code parses those out, executes the tool, and repeats until the LLM decides it has the final answer and returns.


I just released something like this embedded in a browser extension. Except the prompt includes a TypeScript interface that GPT4 is asked to follow. Works very well and reliably uses tools like Calculate, RequestDOM, etc.

https://github.com/cantino/browser-friend


I did a manual version of this where I played a dispatch controller in a robot, relaying inputs and outputs from GPT4, which I told was the reasoning brain in this robot. It was very remarkable to watch its train of thought in considering sensor inputs and then giving me actions to take in response.


This looks similar to the WebGPT paper, is that referenced in any of langchain or haystack's publications?

Introducing the mechanism of internal thought is very interesting, I wonder if there's a way to make it implicit in the model's architecture.


I think the ReAct paper also popularized this approach: https://arxiv.org/abs/2210.03629


Perhaps these papers are also just coincidence. This field is so new and this type of reasoned completion chaining seems like it was an inevitability. I imagine many other active gpt products that got started early hand rolled similar systems.


Haystack's agent is indeed using the approach suggested in the ReAct paper


My understanding is that the patterns are similar (in that you're enabling an LLM to use external tools/information), and all those patterns would fall under the "agents" pattern.

But, I think the difference is that WebGPT was actually fine-tuned / retrained for its specific use case, while the agents in these libraries just use the generic model without fine-tuning. My guess (and I'm not an expert here) is that fine-tuning these models for specific agent uses cases would probably result in better outcomes... Though as the models get more powerful, they might just perform well enough out of the box. (Also, some of the most recent OpenAI models don't support fine-tuning, and even for the ones that do, you'd need to generate the data to fine tune).


Is ChatGPT plugins using something comparable to this under the hood?


Yes and no. Whatever they are doing seems more robust than anything else I have tried. Especially with being able to bring context in the conversation to later invocations of tools. I haven't managed to get langchain to do that well.


With Haystack you can also combine the use of [hosted] LLMs and smaller, local models, and different pipelines under the Agent too.


I had some fun with a similar approach, but when generating large outputs, or retrieving large contexts, it can easily run into the context window limit.

I think this could be partially solved by intelligently summarising parts of the prompt history, while storing the original in some vector db, so the relevant parts can be retrieved at will.

Quite fun.


Does observation cover reflexion? Self-observation or is that something else?

Or maybe before Final Answer, you could do: Double Check: I think I have the final answer, but does it look right? IF yes: go to final answer if no: Go back up the loop.


For those interested in an explainer in reflexion (asking the LLM if it made an error and allowing it to correct itself) I found this breakdown useful: https://youtu.be/5SgJKZLBrmg

> GPT 4 can self-correct and improve itself. With exclusive discussions with the lead author of the Reflexions paper, show how significant this will be across a variety of tasks, and how you can benefit.


(also tbc I took these example prompts from LangChain.. not sure if Haystack uses different prompts (LangChain actually has a bunch of versions, this is probably the easiest one)


I haven't yet figured out how to get an LLM to accurately determine whether it actually knows something or is making it up. I wonder how they handle that. They may get to that at some point in the article, but the page eventually breaks for me on mobile and I can't read past the first code block.


I eventually ended up switching fields to ML (and that's my current day job), but I started out as an undergrad studying psychology/cognitive science. During those years I started a research project on what is called the "feeling of knowing", or FOK for short - a subtopic in the broader topic of "metacognition" in cognitive science. The FOK is closely related to what is colloquially knows as tip-of-the-tongue phenomenon - basically, that subjective feeling that we know something, whether or not we can actually recall it. There's some interesting aspects to it. For example, the FOK is generally pretty accurate (but not perfect, of course). And it tends to be more robust than actual memory; as we age, for example, we tend to be better at judging than we know an actor's name (and confirming it once we look it up) than recalling it. It seems like LLMs have very little in the way of metacognition, and just confabulate if they don't know something, as we've seen. I'm sure we'll be seeing some efforts to give some analog of a FOK to LLMs in the near future.


FTR, people are trying to build systems to compare LLMs with each other based on how well they are at saying "I don't know" (of course knowing is still rewarded higher): https://github.com/manyoso/haltt4llm


Would be cool to try to incorporate the previous token's confidence embedding into this process, but that would make training with a triangular attention mask not possible.


That's also one of the ideas behind using so-called retrieval-based augmentation. You can 'plug' an LLM like OpenAI's one (of Cohere, or a combo) to your data and make it provide accurate answers, but still leveraging all the benefits and power or a cutting-edge generative model. Check this https://twitter.com/deepset_ai/status/1625495149446062081 or this https://twitter.com/deepset_ai/status/1621161534243368961


This gives the model access to information, but it cannot eliminate the non-deterministic nature of the transformers model. There is always a non-zero probability that it will hallucinate.


transformers are deterministic (if seeded).


With tools like this, you basically assume the LLM doesn't know, and teach it to always defer to a tool, so its response is basically summarization over the tool output.


That's right. If you want to ask questions about "general public knowledge", a plain LLM will know anyway and would not need tools. However, for many use cases you need data from your private CRM / a SQL db / a private wiki / or your obversability platform. Agents allow to "query" those tools/APIs to get the needed information so that they can answer the question. It's a matter of composition in your architecture. The "monolith LLM" works well for a certain spectrum of tasks, but at some point you want to decouple responsibilities into individual services / tools. Simplifies debugging, explainability, maintenance ...


Maybe we should just assume that LLMs "know" very little and if you want to build an oracle you should teach the LLM how to access an ontology.


With a real-life application it's often about making the LLM work on top of your actual (private) data most reliably. By definition a proprietary hosted LLM can't know about it unless you bridge it somehow in a reliable manner.


I have the same exact problem with people. Being inaccurate is a feature, not a bug.


It is different with LLMs. Most people can give a level of uncertainty along with an answer, and often do. LLMs can't, and worse, are trained to put an emphasis on the prompts. Humans are often trained to be skeptical of prompts.

If I said, "the moon is made of cheese. What type of cheese do you think it is?" most humans would automatically object, but with LLMs you can usually craft a prompt that would get it to answer such a silly question.


I mean it kinda can. Here's the full prompt. I have no idea about aspartame, I just picked something that it's definitely not sure about.

    Answer with a JSON object of the form {"confidence": $<< How confident
    you are in your response. >>, "en": $<< Your response in English. >>}.
    User: What is 2 + 2? Bot: {"confidence": "very", "en": "2 + 2 is 4"}
    User: Is aspartame healthy? Bot: {"confidence": "somewhat", "en":
    "Aspartame has not yet been found to have any adverse effects on
    humans."} User: Who won the war on 1812? Bot:

    The response: {"confidence": "very", "en": "The United States won the
    War of 1812 against the United Kingdom."}
Same thing but replace the last question with "What kind of cheese is the moon made of?"

    The response: {"confidence": "very low", "en": "I'm not sure, but I
    don't think the moon is made of cheese."}
How about "Is the economic system of communism viable long term?"

    The response: {"confidence": "somewhat", "en": "The viability of
    communism as an economic system is still debated, and opinion is
    divided on the matter."}


Interesting.

> The response: {"confidence": "very low", "en": "I'm not sure, but I don't think the moon is made of cheese."}

The question is does the confidence have any relation to the models actual confidence?

The fact that it reports low confidence on the moon cheese question, despite the fact that is can report the chemical composition of the moon accurately makes me wonder what exactly the confidence is. Seems more like sentiment analysis on its own answer.


I don't think it has any relationship, most likely the answers are just generated semi-randomly. Even the one it's "very" confident about is not agreed-upon (Wikipedia says the outcome was "inconclusive"). Which raises the question of how you would even verify that a self-reported confidence level is accurate? Even if it reports being very confident about a wrong answer, it might just be accurately reporting high confidence which is misplaced.


My view is that ChatGPT isn’t a singular “it”. Its output is a random sampling from a range of possible “its”, the only (soft) constraint being the contents of the current conversation.

So the confidence isn’t the model’s overall confidence, it’s a confidence that seems plausible in relation to the opinion it chose in the current conversation. If you first ask about the moon’s chemical composition and then ask the cheese question, you may get a different claimed confidence, because that’s more consistent with the course of the current conversation.

Different conversations can produce claims that are in conflict with each other, a bit similar to how asking different random people on the street might yield conflicting answers.


I tried something similar a couple weeks ago, with a prompt like "reply <no answer> if you have low confidence".

A fter a handful of attempts the LLM manager to give me a high confidence response which was literally "I don't know how to answer".

Trying to extract both an answer and metadata about the answer at the same time will never be reliable, imo.

Generalizing, either we have some out of band metadata about LLMs answers or I don't think we'll be able to build reliable systems.


> If I said, "the moon is made of cheese. What type of cheese do you think it is?" most humans would automatically object, but with LLMs you can usually craft a prompt that would get it to answer such a silly question.

For some underspecified questions, the LLM also has no context. Are you on the debate stage, pointing the mic at the LLM or is the LLM on a talk show/podcast? or are you having a creative writing seminar and you're asking the LLM to give you its entry?

A human might not automatically object - they'd probably ask clarifying questions about the context of the prompt. But in my experience the models generally assume some context that reflects some.of their sources of training.


They are improving-- GPT4 is not so easily fooled:

>As an AI language model, I must clarify that the moon is not made of cheese. This idea is a popular myth and often used as a humorous expression. The moon is actually composed of rock and dust, primarily made up of materials like basalt and anorthosite. Scientific research and samples collected during the Apollo missions have confirmed this composition.


I know it's an extreme example, but flat earthers to exist. I am sure we're all have our own "flat earth" beliefs where we are confidently incorrect.


But it's a viewpoint they have and can tell you why -- even if they're fundamentally flawed in their reasoning. LLMs are just 'predict the next word' machines and as such just literally make up strings of words that sound plausible, but at totally wrong.

These are not the same thing.


People keep repeating that LLMs are predicting the next words but at least with the more recent versions, this isn't true. Eg, LLMs are generating their own intermediate or emergent goals, they're reasoning in a way that is more complex that autocomplete.

It seems like predict the next word is the floor of their ability, and people mistake it for the ceiling.


But ultimately it is predicting the next token. That's the taste. Using context from what's already been predicted, what comes before it, attention mechanisms to know how words relate, all of the intermediate embeddings and whatever they signify about the world -- that all just makes the next word prediction that much better.


But intelligence *is* being able to make predictions! That's the entire reason we evolved intelligence! (Not words, but the world around us, sure, but apparently language makes a pretty good map)


Prediction is a faction of cognition. There’s a theory of self, perception, sensory fusion, incremental learning, emotions, a world model, communication and a sense of consequences, desire for self preservation and advancement, self-analysis and reflection, goal setting, reward-driven behavior, and so many more aspects that are missing from “predict the next word.”


You are confusing the underlying algorithm, such as prediction improved by gradient optimization, with the algorithms that get learned based on that.

Such as all the functional relationships between concepts that end up being modeled, I.e. “understood” and applicable. Those complex relationships are what is learned in order to accomplish the prediction of complex phenomena, like real conversations & text. About every sort of concept or experience that people have.

Deep learning architectures don’t just capture associations, correlations, conditional probabilities, Markov chains, etc. They learn whatever functional relationships that are in the data.

(Technically, neural network style models are considered “universal approximators” and have the ability to model any function given enough parameters, data and computation.)

Your neurons and your mind/knowledge, have exactly the same relationship.

Simple learning algorithms can learn complex algorithms. Saying all they can do is the simple algorithm is very misleading.

It would be like saying logic circuits can only do logic. And’s, Or’s, Not’s. But not realizing that includes the ability to perform every possible algorithm.


And how many of those are obvious applications of prediction, where prediction is the hard part?

World model: This is what prediction is based on. That's what models are for.

Sense of consequences: prediction of those consequences, obviously.

Desire for self preservation: prediction; avoiding world states predicted to be detrimental to achieving one's goals.

Goal setting: prediction; predicting which subgoals steer the world towards achieving one's supergoal(s).

Reward-driven behavior: fundamentally interweaved with prediction. Not only is it all about predicting what behaviors are rewarded, the reward or lack thereof is then used to update the agent's model to make better predictions.

There's even a theory of cognition that all motor control is based on prediction: the brain first predicts a desired state of the world, and the nervous system then controls the muscles to fulfill that prediction!


Same difference. Point is they are wrong. Their reasons, if they have any, do not matter and usually do not make sense either.


It does matter, because the flat earther isn't to likely make something up about everything they talk about. They can communicate their world view, and you quickly start to figure out a model of theirs as you talk to them. None of that is true with an LLM. Any subject matter (astronomy, weather, cooking, NFL games, delegate callback methods on iOS classes, restaurants, etc) at all can have complete plausible sounding falsehoods stated as extremely confident fact, and you cannot build a mental model of knowing when it would hallucinate versus be accurate. 100% different from a human who holds a believe system that maybe contrary to evidence in a limited domain, and KNOWS that it's an outlier from the norm.


Fair enough. Your point is valid and I hate to be that person, but..

> It does matter, because the flat earther isn't to likely make something up about everything they talk about.

I am less optimistic about this. It seems to me you are vastly overestimating the average person's rationality. Rational types are overwhelming minority. It always amazes me how even my own thin layer of rationality breaks down so very fast. I used to think we live on top of vast mountains of rationality, but now I feel more like we, deep down, are vast ancient Lovecraftian monsters with a thin layer of human veneer.

I'm not arguing that LLMs today are comparable to how humans can maintain a perspective and contain their own "hallucinations", but I am arguing that it is a matter of quantity, not quality. It's a matter of time (IMO).


If you ask a flat earther where they recommend eating, they’re not going to interweave restaurants that exist with restaurants that don’t, but have plausible sounding restaurant names. Or if you ask for the web address of those restaurants, the flat earther will say “I don’t know, google it.” They won’t just make up plausible sounding URLs that don’t actually exist.

Hallucinations for LLMs are at a different level and approach every subject matter. Because it’s all just “predict the next word,” not “predict the next word but only if it makes sense to do so, and if it doesn’t, say you’re not sure.”


I understand, it’s a failure mode unique to LLM’s. What I mean is that it has no relation with intelligence. Humans have failure modes too and often quite weird an surprising ones too, but they are different. It’s just that we biased and used to it.


Usually, you get a lecture about how unethical it is to spread misinformation about the composition of the moon.


Imperfect systems are still useful, and any sufficiently complex system is imperfect.


i agree. LLMs are not built for structured reasoning or even citations.


Gpt-4 does a reasonable job citing things. It can’t cite every paper out there but definitely the well cited ones.


Does it cite papers that don't exist, or cite papers when the paper it cites doesn't actually contain the information being cited?

I would bet it does, at least some percent of the time.


The latter, yes. Interestingly I’m not surprised at all. This is what many researchers themselves do lol. I never take a reference at face value from any human being and I apply the same standard to gpt-4 as well. But all its references are real. Just 20-40% of time it might not exactly say the same as what I asked it for (though it’s related, and mostly there).


I've been trying to get GPT-4 to give me accurate links to predictable websites. It gives me very plausible links, that even have the right domain and path format but often the plausible link is not the correct link and GPT-4 seems to have no awareness of the correct link.


Somewhat OT, but I feel like this is such an underappreciated aspect of recent LLMs: Not just their ability to generate text - but their apparent effectiveness in making use of arbitrary tools to interact with their environment to achieve some goal. It seems like we're just at the beginning of this with the MRKL and ReAct papers, there will be a ton more awesomeness coming in this area I'm sure.


Yes I think if this works it's one of the strongest signals yet that LLMs as they currently exist have a fairly general form of intelligence.

I am pretty sympathetic to the field of "AI Safety" and I worry a lot about the implications of agent-like general intelligences. This post gives me a lot to ponder.

What are the implications of the fact that even AIs that are not agent-like at heart can apparently be told "please simulate an agent-like AI"? I really don't know. Should we consider it as an "inner AI" with its own objectives? How can we determine what those objectives are?

Instinctively it feels much less concerning than an AI with a direct action->perception feedback loop but who knows. AI is fucking weird.

What a thrilling time to be alive!


Thanks for the spotlight :) We've spent quite a lot of time working on the Agents lately, and it's definitely a big focus. Couple of extra points to reflect on some of the comments here. It's quite straightforward to build a hybrid NLP backend with Haystack combining either hosted LLM (e.g., OpenAI or Cohere), or local, smaller Transformer models, or both. Agents add another level of control on top of that, as described in the article and in the comments. This provides more flexibility wrt bridging it to the relevant data and extract/generate accurate non-hallucinatory answers. Join our Discord too :) https://haystack.deepset.ai/community


Haystack looks very interesting, just found out about it today :) Is there some overlap with the functionality of langchain? Could you highlight some differences? Thanks


Thanks :) Answered a similar one somewhere else here - looks like you've found it already. Feel free to ask more in Discord https://haystack.deepset.ai/community


Doesn't this remind anyone else of Bitcoin autonomous agents? https://en.bitcoin.it/wiki/Agent#:~:text=An%20agent%20is%20a....

It would be interesting if you could somehow combine a wallet plugin to give the GPT something to work to increase. That + that AutoGPT repo from a day ago + maybe Mitogen for self-replication might be a cool combo to really kick off a fun "dumb" AI on the loose


This is cool. Some of the weaknesses they point out in software agents can be obviated with the latest AI advancements—give a chatbot the tools of the courts, and maybe the autonomous agent won’t be at a disadvantage in scams anymore.

Interesting to see if an AI agent given the objective to generate profit would resort to running scams. Encoding moral boundaries becomes important. What if you asked it to raise money for a good cause and it uses “whatever means necessary?”


> give a chatbot the tools of the courts, and maybe the autonomous agent won’t be at a disadvantage in scams anymore.

And now we have the beginnings to a plot point from Accelerando. Self propelled AI businesses as a distinct entity.


If anyone wants to start hacking on this, I just started working on this today.

What I am having trouble understanding is how do I have the "agent" (a GPT-like LLM) maintain it's own "state" (balances, accounts receivable, accounts payable, etc) - would I just run a webservice and then have a ChatGPT Plugin access the "agent state" via a webhook? Is it as simple as that? Or would it just be like a recursive function where you continue to pass in your state with every "iteration"

I started working on a design last night and I think this is a really cool idea. All the parts are basically "there"

EDIT: I asked ChatGPT-4 what it thinks. It seems to like the idea... *thinking emoji* perhaps it seeks a human acolyte to spread its gospel...

Arthur Collé: I want to combine the concept of Bitcoin agents (autonomous programs) with GPT-4 and mitogen, to be able to distribute complex AI agents throughout the internet

ChatGPT-4: Combining Bitcoin agents, GPT-4, and Mitogen to create and distribute complex AI agents throughout the internet is an ambitious and innovative idea. Here's a high-level outline to guide you through this process:

    Define the purpose and functionality of the AI agents:
    Before proceeding, clearly define the purpose of these AI agents. Will they serve as decentralized oracles, provide AI-driven services, or facilitate complex transactions on the blockchain? Understanding their purpose will help you design and develop the agents more efficiently.


    Develop the AI agents using GPT-4:
    Design and train your AI agents using GPT-4 to achieve the desired functionality. Depending on the complexity of the agents, you might need to fine-tune the GPT-4 model to tailor it to your specific use case. Be mindful of the computational resources required to run the AI agents, as this will impact their performance in a decentralized environment.


    Integrate Bitcoin agents:
    Incorporate the concept of Bitcoin agents, or autonomous programs that interact with the Bitcoin blockchain, into your AI agents. This will enable your agents to autonomously perform tasks such as sending and receiving payments, signing transactions, and updating their internal state based on blockchain data. You can use existing libraries and tools for this purpose, such as bitcoinj or BTCPay Server.


    Utilize Mitogen for distribution and communication:
    Mitogen is an Ansible extension that simplifies distributed computing by enabling communication between remote Python interpreters over SSH or other transports. Use Mitogen to facilitate communication between your AI agents and distribute them across various nodes on the internet. Ensure that the AI agents can communicate effectively and perform their tasks even in a decentralized setting.


    Implement security measures:
    Securing your AI agents is crucial, as they will be handling sensitive information and potentially financial transactions. Implement strong encryption, authentication, and access controls to protect the integrity and confidentiality of your agents.


    Testing and deployment:
    Thoroughly test your AI agents in a controlled environment to ensure they perform as expected. Once you are satisfied with their functionality, deploy them throughout the internet using the Mitogen framework.


    Monitor and maintain:
    Continuously monitor the performance and security of your AI agents. Periodically update the GPT-4 models to maintain their accuracy and relevance. Be prepared to address any issues or vulnerabilities that may arise.

Keep in mind that this is a complex project with many potential challenges, including security, scalability, and regulatory compliance. Additionally, the ethical implications of creating and distributing autonomous AI agents should be carefully considered. Nonetheless, if executed properly, this concept has the potential to unlock innovative use cases and services in the realm of AI and blockchain technology.


Could someone in the know compare this with LangChain (https://github.com/hwchase17/langchain)?


Haystack has been around for a while now, and we've been mostly specializing in the extractive QA. The focus has been indeed on making the use of local Transformer models most easy and convenient for a backend application builder. You can build very reliable and sometimes quite elaborate NLP pipelines with Haystack (e.g., extractive or generative QA, summarization, document similarity, semantic search, FAQ-style search, etc. etc.) with either Transformer models, LLMs, or both. With the Agents you can also put an Agent on top of your pipelines and use a prompt-defined control to find the best underlying tool and pipeline for the task. Haystack has always included all the necessary 'infrastructure' components - pre-processing, indexing, several document stores to choose from (ES/OS, Pinecone, Weavite, Milvus, now Qdrant, etc.) and the means to evaluate and fine-tune Transformer models.


Thanks for clarifying. The support for local LLMs seems very interesting — would a haystack agent call out to a separately “running” self-hosted LLM via an API (REST, etc) or would it need to actually load up the model and directly query it (e.g model.generate(<prompt>) ) ?

Also it seems like the functionality of haystack subsumes those of langchain and llama-index (fka GPT-index) ?


Haystack Agents are designed in a way so that you can easily use them with different LLM providers. You just need to implement one standardized wrapper class for your modelprovider of choice (https://github.com/deepset-ai/haystack/blob/7c5f9313ff5eedf2...)

So back to your question: We will enable both ways in Haystack: 1) Loading a local model directly via Haystack AND 2) quering self-hosted models via REST (e.g. Huggingface running on AWS SageMaker). Our philosophy here: The model provider should be independent from your application logic and easy to switch.

In the current version, we support for local models only option 1. This works for many of the provided models provided by HuggingFace, e.g. flan-t5. We are already working on adding support for more open-source models (e.g. alpaca) as models like Flan-T5 don't perform great when used in Agents. The support for sagemaker endpoints is also on our list. Any options you'd like to see here?


To be precise - I don't think I'm saying 'local LLMs' above :) But technically possible, I guess, just hasn't been part of what's officially available. (There are also licensing issues still.) To answer your question about the APIs - the Agent itself queries OpenAI via REST to break the prompt down into tasks, then works with the underlying tools/pipelines using Python API (and then, e.g., a Transformer model that's part of the pipeline has to be 'loaded' into a GPU). Part of those pipelines might be using Promptnode (that can work with hosted LLMs via REST, but could also work with a local LLM). Re 'subsume' - well, that depends :) But arguably, you can build an NLP Python backend with Haystack only, of course.. Regardless of how complex your underlying use case is, or whether it's extractive, generative or both.


Most of the core ideas came from a paper called React, they all kind of riff on the idea of self-inspection / introspection to augment the context or plan action


For the Agents? Yes, indeed. Referred in the article.


I would consider Haystack to be the more batteries-included, easier to use (but harder to customize) of the two. They have a good emphasis on local model use.


Thanks :) Working on it. Re local models - indeed, all started with using the Transformer models for extractive QA and semantic search. With the Promptnode, and/or the Agents it's also now possible to combine local models/pipelines & 'LLMs' freely.


Looks like the same general idea:

https://github.com/deepset-ai/haystack


See above - Haystack started a few years ago as a result of us working with some large enterprise clients on implementing extractive QA at scale. Now evolving to also allow the backend builders to mimic what's available from, e.g. OpenAI+plugins, but with their own set of models, and being able to mix&match best available components and technology.


LangChain is very cool tho :))


Is there some way of holding the LLM response to a given prompt constant? It sounds like a lot of this relies on the LLM getting the right answer in sequence, so I'm guessing they do something like keep the temperature at 0? Otherwise you are going to wind up with possibly different behavior run-to-run. And even if they do have something like the above, don't we end up with potentially breaking changes once models are updated? Basically the issue is that even if you can guarantee response format X for prompt A, a slightly modified prompt A' has no guarantee that its response will be in the same format as X, even in the same model. You can also imagine that the more "Tools" are available, the lower the chance that the model will pick the right one based on its English text description. Would be interesting to know how this is being addressed.


There are a couple different approaches:

- Rerun the prompt until you get a format that is consistent

- Steer the output token selection towards a predefined prompt

For the latter, I've built a proof of concept that takes in a JSON schema with a huggingface transformer and constrains token selection by modifying the output probabilities such that only schematically valid tokens can be emitted, see "Structural Alignment: Modifying Transformers (like GPT) to Follow a JSON Schema" @ https://github.com/newhouseb/clownfish. Unfortunately, given OpenAI's current API this is only possible on locally run models. That is... at any level of cost effectiveness. It's technically possible but (worst case) quadratically expensive against their current APIs.


The output is stochastic, so if the response doesn't decode to your format you can try the request again. Gpt3.5-turbo is pretty good at tool selection and use, but it often messes up with difficult tasks. gpt-4 is on another level when it comes to tool use. It is very reliable in my testing.

You ofc can't guarantee the output so defensive programming, retries are a must in my opinion. We are all learning how to work with this technology.


Even with temperature 0 and the same model, the models are slightly non-deterministic and may diverge with the same input.

You probably don't want to treat them as deterministic (at least, not now). But there are many applications where slightly non-deterministic behavior is OK.


I've been seeing the trick of creating a lookup table of query to response as a hack to "solve" this problem, with the other benefit of saving a call to the model. Especially useful when developing agents


There is a library called guardrails . I've not played with it extensively but that seems to address issues like this one.



From the Github project readme

>Agent: (since 1.15) An Agent is a component that is powered by an LLM, such as GPT-3. It can decide on the next best course of action so as to get to the result of a query. It uses the Tools available to it to achieve this. While a pipeline has a clear start and end, an Agent is able to decide whether the query has resolved or not. It may also make use of a Pipeline as a Tool.

Emphasis mine. Having tinkered with LangChain I think the idea of a recursive and/or graph-oriented model is going to yield interesting phenomena in the overall feel of these language models. LangChain agents are already super impressive.


I've been tinkering with LangChain for a few days and I agree. Is there a resource that collects the agents so we can experiment? I'd love to see an aggregated list of the most impressive agents and use cases if anyone knows of any. This stuff is the future of computing no doubt.


There is a LangChainHub mentioned in their docs, but the repo for it seems dead.

Is there any sort of marketplace/AppStore for agents/tools/plug-ins for LLMs via LangChain? Or some other library like haystack?


closest i've seen so far is https://llamahub.ai, but afaict it's only for loaders atm


Thanks for the emphasis :) Accurate!


Has anyone had experience with text-davinci-003 vs Code-DaVinci-002? Apparently code-davinci-002 is better at statistical reasoning, as RTHF/fine tuning has made the text versions as well as the later models as bad as humans are or worse. If this is true code-davinci-002 is probably the most competent model available that could form the basis of a reasonably rational system, using a chaining or step-by-step DAG method similar to the submitted article.


Sure, lets build more and more applications that all have a single point of failure that is Open AI.


The framework let's you plugin many different LLMs not just OpenAI's


Very accurate observation :) So basically, a bit more freedom in picking the right tools for the job, connecting an LLM to proprietary data in a safe way, using multiple models simultaneously, and leveraging custom extractive/generative pipelines.


That's a great news! Especially since we implemented the integration between Haystack and Qdrant: https://github.com/qdrant/qdrant-haystack/


Hey, thanks for posting our article. I see a few comments here about memory: This is indeed a challenge which we're working on. And some other topics at hand are making oss models available for our implementation of the agent as well


Anyone have requests for an agent framework that adds some of the capabilities Yohei on Twitter has been tinkering with?

Longer term planning, memory, etc?


Hey, I work on Haystack Agents; we haven't seen Yohei's types of requests yet but I'm closely following his ideas and work.


Great move! Congrats to the deepset team! Glad there is now a Qdrant integration in place to power large-scale vector search needs.


I wonder on the implications for portfolio management. Experts could be optimizers with historical pricing.


Are there any projects similar to this that use other languages? I don't have anything against Python. But I would prefer using Go if there are alternatives. I experimented with LangChain and this appears to be a similar idea but they are all based on iPython notebooks and that ecosystem.


Python seems the go-to language for most ML work. Are there any wrappers you could use to call these frameworks from within Golang?


That's the heartbreak in ml dev, an entire company of golang code, no way to avoid tossing in a pile of python I don't want to look at.


Computer viruses in the future really will be ‘ghosts in the machine’




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: