Hacker News new | past | comments | ask | show | jobs | submit login
Retrieval in LangChain (langchain.dev)
212 points by gk1 on March 27, 2023 | hide | past | favorite | 69 comments



Enabling the 'terminal' and 'python-repl' tools in a langchain agent demonstrates some pretty remarkable behavior.

The link below is the transcript of a session in which I asked the agent to create a hello world script and executes it. The only input I provide is on line 17. Everything else is the langchain agent iteratively taking an action, observing the results and deciding the next action to take.

This is just scratching the surface. I've seen it do some crazy stuff with the AWS CLI. And this is just with GPT-3.5, I don't have access to GPT-4 yet and it clearly has better capabilities.

https://pastebin.com/qJsbufVj


If you put GPT-4 on a loop with access to the shell it manages to do whatever is needed to finish the job

https://raw.githubusercontent.com/jla/gpt-shell/assets/examp...


My experience with GPT-4 has been really disappointing. It didn't feel like a step up from 3.5.

As an example, I've been trying to use it to learn Zig since the official docs are ... spartan. And I've said, "here's my code, here's the error, what's wrong with it?" and it will go completely off the rails suggesting fixes that don't do anything (or are themselves wrong).

In my case, understanding/fixing the code would have required GPT-4 to know the difference between allocating on the stack/heap and the lifetimes of pointers. It never even approached the right solution.

I haven't yet gotten it to help me in even a single instance. Every suggestion is wrong or won't compile, and it can't reason through the errors iteratively to find a fix. I'm sure this has to do with a small sample of Zig code in its training set, but I reckon an expert C coder could have spotted the bug instantly.


If you are using GPT-4 to try to deal with the fact that technical documentation on the public internet is sparse for your topic of interest, you are likely to be disappointed, since GPT-4’s training set likely has the same problem, so you are, in effect, hoping it will fill in gaps in missing data, prompting hallucinations.

It’ll be much better on subjects where there is too much information on the public internet for a person to efficiently manage and sift through.


I think you're right. My hope was that it could reason through the problem using knowledge from related sources like C and an understanding below the syntax of what was actually happening.

But it most certainly did not.


Depending on what you're doing, you might find few-shot techniques useful.

I used GPT 3.0 to maintain a code library in 4 languages, I'd write Dart (basically JS, so GPT knows it well), then give it a C++ equivalent of a function I had previously translated, and it could do any C++ from there.


1. GPT4 is learning from the same spartan docs as you, likely

2. GPT4's training data likely doesn't include significant Zig use, since large parts of its training data cut off a few years ago. I use Rust and it doesn't know about any recently added Rust features, either.

This has interesting implications because it means people will gravitate towards languages/frameworks/libraries that GPT knows well, which means even less training data will be generated for the new stuff. This is a form of value lock-in.


> This has interesting implications because it means people will gravitate towards languages/frameworks/libraries that GPT knows well, which means even less training data will be generated for the new stuff. This is a form of value lock-in.

That's the kind of problem that most people are just failing to see. The usage of this models might not in itself be problematic, but the changes that it bring are often unexpected and too deep for us to see clearly now. And yet, people are rushing towards them at full speed.


It's inevitable, really. But that's like saying Washing Machine changed fashion. It might have, but the changes aren't all that abominable, either.


GPT-4 is just regurgitating what its "learned" from previously scraped content on the Internet. If somebody didn't answer it on StackOverflow before 2021, it doesn't know it. It can't reason able anything, it doesn't "understand" stacks or pointers.

That said its really good at regurgitating stuff from StackOverflow. But once you step beyond anything that someone has previously done and posted to the Internet, it quickly gets out of its depth.


It's a step up by an order of magnitude for certain things. Like chess. It is really good at chess actually. But not programming. Seems maybe marginally better on average. Worse in some ways.


It can't learn zig without plenty of samples


Yeah I can't wait to get API access to gpt-4, it is a stepwise more capable based on the stuff I've done with chatgpt on gpt-4.

That said, even gpt-3.5 will try multiple routes to get to the same endpoint. It seems to get distracted pretty easily though.


One demo of gpt-4’s superiority over gpt-3 is to come up with a prompt that determines the language of some given text.

I couldn’t figure out a gpt-3 prompt that could handle “This text is written in French” correctly (it thinks it’s written in French), but with gpt-4 you can include in the prompt to disregard what the text says and focus on the words and grammar that it uses.


> It seems to get distracted pretty easily though.

That’s true, gpt-4 is way more easy to guide with the system messages and it doesn’t forget the instructions as the conversation goes on.


About how long did this take to run?

It would be great if it also summarized what the error was, what was the fix, and how to run the code that it created. That’s all in the output but could be pulled out at the end.


A minute or two.

You can ask it to summarize things if you like. It sometimes forgets to do so, however.


What is the aws cli stuff? I was thinking about writing a terraform agent tool.


You install and configure the CLI to run locally, then ask it to do something. For example, ask it to create a website using s3 static website feature. It creates the bucket, creates the content, uploads the content, configures the bucket static website features and configures the permissions on the bucket and content.

I just started tinkering with terraform, which it seems to understand fairly well.


That pastebin is mindboggling.


I'm glad our AI overlords are as bamboozled by python vs. python3 as us lowly humans


How so?


it worked through a software problem and came to a conclusion based on the computer's feedback


I feel like it brute forced an answer? I actually think that this could've been achieved with Stack Overflow Search API?


It looks to me like something a junior developer would do. It's not pure brute force


Looking like something is different to doing what a junior developer would do. I admit they'd probably look through a database of known existing answers...not disputing that.

I'd say that a person using Linux for the first time might do what is happening in that demo.

Anyway, not to diminish the bot or LLMs or whatever, but it's definitely something that could be done using the Stack Overflow Search API or similar. I'd say it's actually a very primitive / boring use of an LLM.


Wait until it's directed to hack into power grids!


Can you share the code for llm.py?


    #!/home/ubuntu/venv/bin/python3.10
    from langchain.agents import load_tools
    from langchain.agents import initialize_agent
    from langchain.chat_models import ChatOpenAI
     
     
    llm = ChatOpenAI(model='gpt-3.5-turbo',temperature=0)
    tools = load_tools(['python_repl', 'requests', 'terminal', 'wolfram-alpha', 'serpapi', 'wikipedia', 'human',  'pal-math', 'pal-colored-objects'], llm=llm)
     
    agent = initialize_agent(tools, llm, agent="chat-zero-shot-react-description", verbose=True)
     
    agent.run("Ask the human what they want to do")

Note that you'll need to get api keys for openai and serpapi and a app id from wolfram-alpha.


People are joking that they can't wait for the next big thing in AI to come out in a few weeks, but this seems pretty big. After fiddling with it for a while, its not perfect, but this isn't that far from being able to replace (or at least dramatically change) my job as a software engineer.

For example, I asked it to write conway's game of life, and it took about 4-5 attempts but it wrote fully functional code that popped up a matplotlib window with a fully functional simulation. This would've taken me a day at least.

I asked it to write a FastAPI backend that uses SQLite for storing blog Posts, and it struggled with that one a lot and couldn't quite get it right, although I think that's largely a limitation of the python REPL from langchain as opposed to GPT.

On the one hand I'm excited to build all sorts of new things and projects with this, but on the other hand I'm worried my standard of living will decline because my skills will become super commodified :/


And you only used the most popular example of getting started and one of if not the most popular programming languages. Amazing.


Really excited to see LangChain moving really fast in this space. They turn your favorite Llm into a real boy that can do real work.

Some "agents" in their vernacular that I've built.

* A reminder system that can take a completely free-form English description and turn it into a precise date and time and schedule it with an external scheduler.

* A tool that can take math either in English or ascii like y = 2x^2 + ln(x) and turn it into rendered LaTeX.

* A line editor that let's you ingest Word documents and suggests edits for specific sections.

* A chess engine.

Like it's crazy at just how trivial all this stuff is to build.


I share your enthusiasm for LangChain (as well as LlamaIndex). I don’t remember being as obsessed by any new technology. I am writing a book on the subject [1] but to be honest the documentation and available examples are so very good, that my book has turned into just a write up of my own little projects.

I agree that some things really are trivial to implement and I think this opens the door to non-programmers who know a little Python or JavaScipt to scratch their own itches and build highly personalized systems.

[1] https://leanpub.com/langchain


That's funny...I actually just bought your book because I have a bone to pick with the documentation. It's good at explaining stuff, but I feel it fails to show how some of the parts work together. It would be immediately more obvious if there were just full examples rather than partial ones.

Thanks for writing this.


Thanks for the kind comments! The final version should be available in a day or two.


These examples are great, but the chess engine sounds specially interesting and can't think of how I'd do it with langchain. Do you have a git link or something written down, on how you accomplished this?


> Some "agents" in their vernacular that I've built.

Are any of those open source / are they on Github? Could you link it?


Making retrieval really really good is part of the mission of LlamaIndex! Given a natural language input, find the best way to return a set of documents that is relevant to your LLM use case (question-answering, summarization, more complex queries too).

- We integrate with vector db's + ChatGPT Retrieval Plugin

- Submitted a Retrieval PR to langchain here: https://github.com/hwchase17/langchain/pull/2014

- would love to explore further integrations as a plugin in any outer agent system


I want to use llamaindex. My input would be a slack export but I don't want any data to go to openai I want it all to happen locally or within my own EC2 instance. I have seen https://github.com/jerryjliu/llama_index/blob/046183303da416... but it calls hugging face.

My plan was to use https://github.com/cocktailpeanut/dalai with the alpaca model then somehow use llamaindex to input my dataset - a slack export. But it's not too clear how to train the alpaca model.


Who would have thought we'd be able to follow the birth of Skynet in real-time.


To be fair, LangChain may be the reprogrammed ChatGPT that we use to fight off Skynet


One thing I love about LangChain is I can basically use it to keep abreast of all that's going on around LLMs. Which models are available, which patterns (eg agents), papers like MRKL or ReAct. The library is always at the cutting edge.


I've been playing around with sentence embeddings to search documents, but I wonder how useful they are as a natural language interface for a database. The way one might phrase a question might be very different content wise from how the document describes the answer. Maybe it might be possible to do some type of transform where the question is transformed into a possible answer and then turned into a embedding but I haven't found much info on that yet.

Another idea I've had is to "overfit" a generative model like GPT on a dataset but pay more attention to how url and the like are tokenised


> Maybe it might be possible to do some type of transform where the question is transformed into a possible answer and then turned into a embedding but I haven't found much info on that yet.

Here you go https://twitter.com/theseamouse/status/1614453236349693953


"The way one might phrase a question might be very different content wise from how the document describes the answer."

You have late-interaction models, which replace the dot product with a few transformer layers and are able to learn complex semantics.

Of course this would adversely affect latency and embedding size, so you might want to compress and cache the answers, hence (shameless plug):

https://aclanthology.org/2022.acl-long.457/


Embeddings can be trained specifically to cause questions and content including their answers to have similar representations in latent space. This has been used this to create QA retrieval systems. Here's one commonly used example:

https://huggingface.co/sentence-transformers/multi-qa-MiniLM...


In your first paragraph, you are describing Hypothetical Document Embeddings (HyDE) [0]. I've tested it out, and in certain cases, it works amazingly well to get more complete answers.

[0] https://python.langchain.com/en/latest/modules/chains/index_...


> might phrase a question might be very different content wise from how the document describes the answer

That's what hypothetical embeddings solve: https://summarity.com/hyde

There are also encoding schemes for question-answer retrieval (e.g. ColBERT)


> The way one might phrase a question might be very different content wise from how the document describes the answer.

If the embeddings are worth their salt, then they should not be influenced by paraphrasing with different words. Try the OpenAI embeddings or sbert.net embedding models.


is there an example what what your talking about?

Also would you just return a list of likely candidates and loop over the result set to see if any info is relevant to the question and then have the the final pass try to answer the question.


How many embeddings can fit into a single input?


embeddings are really good at that, you dont need to use similar words at all.


A little off-topic: are LLMs the death knell for new languages, frameworks, tools, processes, etc? I can see how an LLM is going to be such a huge productivity boost that they’ll be hard to avoid everywhere, but then new stuff won’t have any training data. Will anyone ever go through the effort of everything being 10x - 100x less effective with new tools since there’s no training data?


Tools could be made so that an AI can learn to use it, generating training data in the process.


LangChains is a great tool and looking forward to swapping between LLM's like it was an API!


In the back of my head im thinking about how you could make a stock portfolio analysis chatbot. I think you would want to vectorize a set of documents containing summarized historical + a few documents containing recent data. When asked to analyze a particular portfolio, the portfolio, along with the short term + long term vectors are passed into the LLM. Im not sure if this is the ideal approach though.


Thanks for this info! Regarding the first instance of a non-LangChain Retriever - the ChatGPT Retrieval Plugin. I know this was recently open sourced by OpenAI for their own purposes to help gain more traffic to them. Probably to help people figure out endpoints to hook into ChatGPT. But how about for other usages? Be interested in how this fits in with your vectorstores. Thanks for the read!


Does anyone have an opinion on LangChain versus Deepset Haystack?

Haystack seems more polished for NLP tasks, but LangChain looks more extensible long term?

Thanks!


Haystack unfortunately seems like a dead end. LLMs are the way forward.


Can anyone use the ChatGPT Retrieval plugin yet? Or is it limited to whitelisted beta testers?


Still whitelisted atm.


Noob here. Is this similar to what ChatGPT Retrieval plugin does, but for other LLM's?


LangChain is LLM agnostic. People are using it with Cohere's LLMs and even self-hosted LLMs like LLaMA.


I would say LangChain is similar to ChatGPT itself.

- LangChain's Retriever is analogous to ChatGPT Retrieval Plugin. - In general, LangChain has tools for what ChatGPT calls Plugins. - ChatGPT uses OpenAI's GPT-4 LLM. LangChain uses ... any LLM (i.e. configurable).


It's the backend of retrieval plugin basically


The retrieval plugin does not use LangChain, afaik.


Guess I can't delete my comment but I guess "basically" in this context means a similar type implementation


How can I use this with llama ?





Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: