Hacker News new | past | comments | ask | show | jobs | submit login
New: LangChain templates – fastest way to build a production-ready LLM app (github.com/langchain-ai)
137 points by johhns4 11 months ago | hide | past | favorite | 69 comments



I worked on a simple company idea chatbot style app as a take home test for a startup I was hoping to contract with. They asked me to use LangChain and SolidStart JS. No biggie it seemed as I’ve done a few SolidJS and OpenAI API things in the past. What I found was that I am either very much not a ML / prompt engineer / whatever term this falls under or LangChain is completely useless as an abstraction layer. The examples did not work at the time like mentioned in this thread and after struggling through and getting a few things working with the docs, I then had to add memory to the agent (to reuse the chat context in successive questions), and I really struggled to get that working. Part of it was also dealing with SolidStart, which docs were half baked as well. I eventually got it all working, but with what seemed like twice as much code and maybe three times the effort of maybe just using OpenAI’s API. I think the part that I really thought was off was the classes and the distinction between chains, agent, memory, and the other abstractions. They didn’t seem to add value or make the coding part of it easier. I even did a dry run with just calling OpenAI’s API directly and it just worked way better. It all reminded me sort of the fast fashion of Javascript years past after sitting there with a working solution and some time to reflect. I also feel like I understand the high level usefulness of the chain idea and such, but the solution didn’t really seem very pragmatic in the end, at least with LangChain.


Did you look into Flowise or Langflow? These are UI drag-n-drop implementations of LangChain and can be easily self-hosted. It’s trivial to create a chatbot with data coming from other apps, the internet, etc. You can use OpenAI or Antrophic keys and it even has an embeddable chat UI.


Why would I use drag-and-drop UIs to replace one of my favorite things, coding, when I use infrastructure as code stuff to replace a cloud console panel, FastAPI to generate OpenAPI documents, and even D2-lang to draw graphical diagrams?


> They asked me to use LangChain and SolidStart JS

They were at some level, trolling you. Either way intentionally or not, it says too much about them and not about the position itself.


They did indeed use those things in their own setup. Whether that is good or bad is a matter of opinion, but I think they were trying to provide a realistic challenge into the work that needed to be done. I think he did realize the “challenges” in the combination, though.


Off topic - what's your opinion on SolidJS? What's the right use case for it vs other frameworks?


I think it keeps what is good about React and ditches what is bad. It is familiar enough to pick up if you’ve done any component and hook based React. I think it probably serves as a framework that people who know what they are doing to get more performance and predictability out of their front end code and maybe a better mental model than wondering why React has an infinite rerender issue or other weird quirks we’ve grown to not notice / appreciate in React. I think there is probably more out of the box functionality that probably allows you to not then have to bundle React with some other meta framework, such as NextJS or similar.

While SolidJS is good, I really don’t think SolidStart is close to being useful or good. Not sure I really understand the value add on top of SolidJS quite like I understand NextJS on top of React. When I had to use SolidStart, there was a few times, I really just used SolidJS in place of some SolidStart built-ins because I couldn’t get it to do what I wanted and the docs were nearly non-existent. I even think I had to look at its src code to paint a complete picture from where the docs were at the time. In additional, I have no idea why people decide to use things that are so new for apps that are production used as much as they do. SolidStart really just made things more complicated for as simple of an app as I used it on. I couldn’t imagine using it for something that is non trivial at all.


Langchain is the weirdest thing. It just does not make any sense to me. I see absolutely no benefit and plenty of cost associated.


My best guess is that Langchain might be a good thing if one plans on switching between LLM implementations. Then again, this reminds me of the fantasy that using an ORM will make one's application plug-n-play with any relational database (guaranteed or your money back).

Otherwise, I have to agree. Langchain to a large extent seems to base its existence on a problem that barely exists. Outside of LLMs as services, the challenging part about LLMs is figuring out how to get one up and running locally. The hard part isn't writing an application that can work with one. Maintaining "memory" of conversations is relatively trivial, and though a framework might give me a lot of stuff for free, it doesn't seem worth giving up the precision of writing code to do things in a very specific way.

Perhaps Langchain is "good" for programming noobs who might benefit from living in just the Langchain universe. The documentation provides enough baby steps that someone who has maybe a few months of experience writing Python can whip something together. However, I'm really giving it the benefit of the doubt here. I really hope noobs aren't getting into programming because they want to build "the next ChatGPT", inherit a bunch of bad ideas about what programming is from Langchain, and then enter the workforce with said ideas.


Prompting is the most difficult part and that is LLM-bound and not very transferable. One can make all kinds of prompts, might want to glue them into some semi-intelligent state machine making its own decisions (=agent), run multi-task prompts (a single LLM call returns multiple answers to various NLP tasks). I guess LangChain helps with the semi-intelligent part but its problem (and the problem of LlamaIndex) is that they were originally written for GPT-2/3 and then new LLMs obliterate the need to do their specific chaining. Running basic prompt templates in both almost always guarantees higher latency for a minimal gain in accuracy.

I guess it's all about whether you believe the most recent LLMs to be good enough to do their own adequate decision making inside their own hallucinations, or if you need to enforce it externally. If the latter, you use LangChain or LlamaIndex. If the former, you rely on OpenAI functions/Claude 2 iterative prompting with minimal Python glue.

LangChain and LlamaIndex also have some nice functions like document imports, RAG, re-ranking but one can simply copy the corresponding code and use it standalone without the rest of the library.


>Maintaining "memory" of conversations is relatively trivial

In my experience, it is actually surprisingly hard. I guess it depends on just how "human" you want it to feel. I wrote about it here: https://kristiandupont.medium.com/empathy-articulated-750a66...


I'm allergic to medium.com these days.


Sorry, I know that's a feeling that several people share. To be clear, the post is not behind the paywall, but you can also read it on dev.to, if you want: https://dev.to/kristiandupont/empathy-articulated-2dfj


As much as I hate to say so, I share the sentiment. It feels very much like the days of AbstractSingletonProxyFactoryBean. Abstractions have diminishing returns.


The problem is that LangChain is almost everywhere using the wrong abstraction. I've used the JS version on a project and it was probably more trouble than it's worth. Eventually I ended up using only the most basic things out of it (and even those are kind of broken). I thought the problems were due to it being a translation of a Python library but I've heard that one is very bad as well.


I share the same sentiment. I see the value of different integrations, probably can save you some time of coding. But overall, it is trying to make something simple more complicated (as least to make them look like so) with all those unnecessary abstractions.


It's like the massive WordPress deluxe slideshow plugins


Langchain has compelling examples but once you begin asking, "How will I deploy this to production?" you find yourself replacing it bit by bit until there's nothing left of it to use.


On the subject of cost. I looked at how they do Relevance Extraction, I.e., extract portions of a passage that are relevant to a query (a critical step in a RAG pipeline), and they do it the obvious way, I.e have the LLM parrot out the relevant sentences one by one (“Parrot” not inspired by their logo I swear;). Before looking at their LLMChainExtractor that does this, I had already implemented this massively more efficiently by pre-annotating the passage with sentence numbers and having the LLM simply tell you the relevant sentence numbers. It was natural to do this in an agent loop with function-calling in the Langroid[1] framework. This is both faster (since it has to generate very few tokens) as well as cheaper (remember token generation cost is double the input cost when using GPT4 at least).

I was actually surprised that LangChain doesn’t do it this way. Just an example of how we shouldn’t assume the established implementations are the best ones and one should always be skeptical and take a fresh look. I posted about this a couple days ago—

https://www.linkedin.com/posts/pchalasani_rag-llm-langchain-...

[1] Langroid: https://github.com/langroid/langroid


It’s fun to jump on the hater bandwagon, but did you try using it?

If you look at the documentation (1), the API surface is relatively trivial and obvious to me.

Every interaction is a prompt template + an LLM + an output parser.

What’s so hard to understand about this?

Is writing an output parser that extends “BaseOutputParser” really that bad?

The parser and LLM are linked using:

“chatPrompt.pipe(model).pipe(parser);”

How… verbose. Complicated.

People who like to have a go at langchain seem to argue that this is “so trivial” you could just do it yourself… but also not flexible enough, so you should do it yourself.

Don’t get me wrong, I think they’ve done some weird shit (LCEL), but the streaming and batching isn’t that weird.

You see no benefit in using it?

Ok.

…but come on, it’s not that stupid; I would prefer it was broken into smaller discrete packages (large monolithic libraries like this often end up with lots of bloated half baked stuff in them), and I’d rather it focused on local models, not chatgpt…

…but come on. It’s not that bad.

No benefit?

You’ve implemented streaming and batched actions yourself have you?

The API is complicated.

The documentation kind of sucks.

…but the fundamentals are fine, imo.

It irritates me to see people shitting on this project when they haven’t tried it; I don’t even particularly like it… but if you haven’t actually used it, ffs, don’t be a dick about it.

If you have used it, maybe a more nuanced take than “it does not make any sense to me” is more helpful to people considering if they want to use it, or parts of it, or what the cost of implementing those parts themselves might be.

I personally think these templates (like https://github.com/langchain-ai/langchain/blob/master/templa...) don’t offer any meaningful value, lack documentation and context and fail to explain the concepts they’re using… but they at least demonstrate how to do various tasks.

It probably a valuable reference resource, but not a starting point for people.

[1] - https://js.langchain.com/docs/get_started/quickstart


> People who like to have a go at langchain seem to argue that this is “so trivial” you could just do it yourself… but also not flexible enough, so you should do it yourself.

At least for now and for the most popular usecases, this _is_ true. The framework seems as though it was written by people who had not actually done ML work prior to GPT4's announcement. Regardless if that's true or not; the whole point of a highly robust large language model is to be so robust that _every_ problem you have is easily defined as a formatted string.

The whole idea of deep learning is you don't need rules engines and coded abstractions, just English or whatever other modality people are comfortable communicating with. This is not necessarily true for all such cases at the moment. RAG needs to do a semantic search before formatting the string, for instance. But as we go forward and models get even more robust and advanced, the need for any abstraction other than plain language goes to zero.


> the whole point of a highly robust large language model is to be so robust that _every_ problem you have is easily defined as a formatted string

Using language models is about automation, parsing, etc. like any NLP task.

What you’re talking about (it would be nice) is sufficiently distant to what we have right now as to be totally irrelevant.

I agree langchain is a naive implementation, but NLP libraries are complicated.

They have always been complicated.

Not being complicated is not the goal of these libraries; it’s getting the job done.


> What you’re talking about (it would be nice) is sufficiently distant to what we have right now as to be totally irrelevant.

I disagree. :shrug: Guess we'll see who is right in like 10-20 years. It also sounds as though we're talking about different things maybe? Because a lot of automation, parsing and NLP are very much "solved" tasks for GPT-4 ignoring (important) edge cases and respecting the relative lack of progress in the domain until GPT-3.

If you need agents and stuff, then yeah we haven't got that figured out. But neither will you (general you) with your hodge podge of if statements wrapping an LLM.


If we can only see in 10-20 years, you are wrong today. Which is what the point was.


I'm not sure about Python, but in TypeScript streaming is just a for loop. Similarly, if you're using OpenAI, actions are already built into their client library and are trained against in the models, so there's nothing special you need to do. Memory is just an array!

They have some neat extras like sample selectors that can be useful — although even then, if you have so many examples you need a sample selector, finetuning gpt-3.5 is often better than using a sample selector with gpt-4 (and is considerably cheaper) in my experience.


> You’ve implemented streaming and batched actions yourself have you?

Streaming and batching really aren't that onerous to build yourself. Especially if your design goal isn't to support every single LLM provider and related endpoints. And it's the kind of boilerplate that you build once and usually never touch again, so the front-loaded effort amortizes well over time.

With that said, I do think some of the langchain hate is definitely overstated. There's pieces of it that can be useful in isolation, like the document loaders, if you're trying to spin up a prototype quickly to test some ideas. But the pitch they make is that its the fastest/easiest way to build LLM-based application end to end. That pitch I find to be dubious, and thats being charitable.


I tried using their JS package but it changes every couple of weeks in very breaking ways... on top of that it's a huge package that took up most of my CF Worker's space allowance


Fundamentals are outdated - it's all based on old GPT-2/3 models which needed a lot of hand-holding and the whole point of chains was that those models were too dumb to run multi-task prompts, not to mention that by default some tasks are executed sequentially while they can be run in parallel, slowing everything down (see how they did NER).


> it's all based on old GPT-2/3 models

Are you sure?

There are examples of using mistral eg. https://github.com/langchain-ai/langchain/blob/master/templa...

This is exactly what I’m talking about. How can you say that when there is evidence that blatently contradicts it?

This reeks of “…or so I’ve heard, but I never actually looked into it myself…”


They keep adding new models but it's a bolt-on on an underlying architecture based on old assumptions that no longer hold for LLMs with emergent abilities like GPT-3.5/4.


> The API is complicated. The documentation kind of sucks. …but the fundamentals are fine, imo.

That's damning by faint praise.


All I ask is people have a considered opinion, not a mob mentality about it.

FWIW, I implemented my own library with the features I wanted from langchain; it took about a week.

I don’t recommend people do that themselves though, unless (until) they have a clear idea what they’re trying to accomplish.

Langchain is fine to get started with and play around with imo.


Anyone knows of a good alternative? This seem like such an obvious gap in the "market". Interested especially for JS/TS.


When the best alternative is "do it yourself," there really can't be a market-friendly alternative.


I haven't tried any of them but there's semantic-kernel and autogen by Microsoft, and Haystack.

https://github.com/microsoft/semantic-kernel

https://github.com/microsoft/autogen

https://haystack.deepset.ai/



i'll throw my hat in for txtai, see this for a refreshingly simple RAG implementation: https://neuml.hashnode.dev/custom-api-endpoints


Relying on these types of templates is a one-way ticket to technical debt on top of the technical debt of already using LangChain.


Can you expand a bit on what makes langchain tech-debt? Ive been dabbling for a few days and have seen it referenced but I've never used it. Looking at it, it seems focused on being a library for interacting with LLMs, and a pile of utils to go with that library. I tend not to use "library and utils" packages for my own personal projects, but beyond that I don't see on the surface what makes langchain tech-debt. Can you explain more?


I'm glad you asked because I wrote an entire blog post on it a few months ago! https://minimaxir.com/2023/07/langchain-problem/

That post also gets a very surprising amount of Google traffic.


Hi Max, I'm unaffiliated w/ langchain, but happen to be friends w/ Harrison. He's a very genuine person who authentically desires continuous improvement of LC's tech. Would you be open to hopping on a call some time to discuss how you might make such tech better? I have read your grievances and they're fair enough so what I'd love to hear are your solutions -- that is your honest thoughts on what you think a best-practices AI Framework would look like at a high level (what framework features to emphasize/invest in, what to downplay as nice-to-have-not-need-to-have, etc). Naturally I'd also be curious to hear about how you might evolve LC to achieve your vision, that said even just hearing your architectural/design thoughts would be great. As you mention this is a rapidly developing space where best practices haven't fully converged, so would love to tap your knowledge accordingly. lmk if it's ok to reach out and I'll email you.


I talked with Harrison about my concerns before that article was published. I have nothing to add from what the article says.

As the article mentions, there's no practical way to fix LangChain at this point without alienating its existing users who've adapted to its many problems.


Fair, but I'd still like to learn from ya ;-)

If I shoot you an email and you aren't interested, please just archive it. But if you're in the teaching mood I'd love to be your student for 30 min :D


In my experience, it leads to slow, expensive solutions. It might be really helpful if it provided special purpose small, fast handler models that could be run locally so openai would only be used when really called for.


Have you taken a closer look at AutoGen? I am interested to know how you think it compares to LangChain.


AutoGen (https://github.com/microsoft/autogen) is orthogonal: it's designed for agents to converse with each other.

The original comparison to LangChain from Microsoft was Guidance (https://github.com/guidance-ai/guidance) which appears to have shifted development a bit. I haven't had much experience with it but from the examples it still seems like needless overhead.


Good writeup.


> and a pile of utils to go with that library

I think in terms of tech debt, that's a big part of it. I don't think I've ever seen a python package that was supposed to be used as a library with that many dependencies (that you also can't just pick a reasonable subset from via extras).

I'd rather use a tiny core library with good interfaces + a big ecosystem around it than the kitchen sink approach that langchain takes.


hi all - Harrison (co-founder) of LangChain here

We released this as a way to make it easier to get started with LLM applications. Specifically, we've heard that when people were using chains/agents they often wanted to see what exactly was going on inside, or change it in someway. This basically moves the logic for chains and agents into these templates (including prompts) - which are just python files you can run as part of your application - making it much easier to modify or change those in some ways.

Happy to answer any questions, and very open to feedback!


> we've heard that when people were using chains/agents they often wanted to see what exactly was going on inside, or change it in someway.

I certainly agree, but I'm having trouble seeing how templates help with this. The templates appear to be a consolidation of examples like those that were already emphasized in the current documentation. This is nice to have, but what does it do to elucidate the inner workings?


I find this approach of mixing prompts, code, dependecies loading as slowing down the development. Product people should iterate and test various prompts and developers should focus on code. Am I wrong to expect this?


Not wrong. I think you are probably right. One reason why we aren't seeing that is that the space is constantly evolving and in an era before the best practices solidify and become obvious.

Also, langchain is, at best, not that useful and silly.


What if you're depending on aspects of the LLM output that are caused by the prompt? How can you make sure that the product person doesn't cause the output to lose some of it's more machine readable aspects while giving the product person leeway to improve the prompts?

Maybe there is a way to do this, but my toy fiddlings would encounter issues if I tried to change my prompt in total isolation from caring about the formatting of the output.

To give a concrete example, I've been using local CPU bound LLMs to slowly do basic feature extraction of a very long-running (1000+ chapters) niche fan fiction-esque story that I've been reading. Things like "what characters are mentioned in this chapter?", features which make it easier to go back and review what a character actually did if we haven't been following them for a while.

To get my data from my low-rent LLMs in a nice and semi-machine readable format, I've found it best to ask for the response to be formatted in a bulleted list. That way I can remove the annoying intro prefix/postfix bits that all LLMs seem to love adding ("sure, here's a list..." or "... hopefully that's what you're looking for").

I've found that innocent changes to the prompt, unrelated to my instructions to use a bulleted list, can sometimes cause the result formatting to become spotty, even though the features are being extracted better (e.g. it stops listing in bullets but it starts picking up on there being "unnamed character 1").

I've only been fiddling with things for about a week though, so maybe there's some fundamental knowledge I'm missing about the "LLM app pipeline architecture" which would make it clear how to solve this better; as it is now I'm basically just piping things in and out of llama.cpp

If folks have thoughts on addressing the promp-to-output-format coupling, I'd love to hear about it!


I'm using GPT and its functions, which are basically JSON schemas for its output. It makes the formatting a lot more stable, but even then it's still doing tokenized completion and the function definition is just another aspect of the prompt.

I've done some productive collaborating with someone who only works at the prompt level, but you can't really hand off in my experience. You can do some things with prompts, but pretty soon you are going to want to change the pipeline or rearrange how data is output. Sometimes you just won't get a good response that is formatted the way you want, and have to accept a different function output and write a bit of code to turn it into the representation you want.

Also the function definition (i.e., output schema) looks separate from the prompt, but you absolutely shouldn't treat it like that; every description in that schema matters, as do parameter names and order. You can't do prompt engineering without being able to change those things, but now you will find yourself mucking in the code again. (Though I make the descriptions overridable without changing code.)

Anyway, all that just to say that I agree that code and prompt can't be well separated, nor can pipeline and prompt.


You can use the function definitions and few shot prompting (examples) to great effect together.

E.g. I was trying to build a classifier for game genres from descriptions. I could use the function definitions to easily define all the main genres and add subgenres as enums. That ensured that the taxonomy was pretty much always followed. But then I used examples to coerce the model into outputting in {'strategy':'turn-based'} format rather than {'genre':'strategy', 'subgenre':'turn-based'}. The tokens saved on that could then be used to do more classifications per gpt call, making the whole thing cheaper.


I liked using LangChain and their docs this year to learn basic patterns (RAG, summarization, etc) as someone completely new to the LLM space.

LangChain gets a lot of pushback in production scenarios, but think going through some of their tutorials is a very reasonable way to get learning more about how you could apply gen AI for various use-cases.


Yes, agreed. Also look at LlamaIndex for more learning about what you can do these days. Both are good for an inspiration or maybe quick-no-change examples. When one needs to customize, that's when they both get in the way.


I actually took a look at Lang chain because I was curious a few months ago and it really seems like a very shallow framework over these llm apis. I don't know why anyone would use something that looks so unprofessional.


What does production ready even mean here? There is still no overly viable product I've seen from LLMs other than expensive to run toys. We're still making up what these might be useful for as we go, it's akin to saying here's a production ready website builder in 1992 before we could have imagined wtf to use the internet for.


Have you used it for any NLP problems? Language models exceed previous SOTA drastically for many applications.


Lot of Langchain grudges in this thread, feel the need to counterbalance by adding:

1. This domain is moving very fast

2. Creating a wrapper around a fast-moving domain is difficult, lots of thrash on design patterns

3. Langchain's out of the box examples have undoubtedly made it easier to grok LLM patterns

That being said -- this article is about building a production-ready LLM app, and currently I strictly treat Langchain as a learning aide.

Ultimately I'm glad Langchain exists, and I hope to see Harrison et. al. bring more improvements to the underlying abstractions. Possibly with a more functional inspiration?


“We will be releasing a hosted version of LangServe for one-click deployments of LangChain applications. Sign up here to get on the waitlist.”

There it is! Why have one level of lock-in when you can have two?


I assume this is just going to be some kind of Python hosting service, possibly with LLM configurations and billing included? LangChain applications are just... full Python applications, right? There's nothing super declarative AFAIK, no tighter runtime they could implement beyond a complete container.


I think LangChain serves its purpose when you are new to AI and LLM and have to experiment a lot. That was my case. The landscape is changing so fast it is hard to keep up. With LangChain I was able to try different LLM and their features with minimal efforts. Granted, I haven't taken anything to production yet. But using LangChain to narrow down to the LLM you like, then you can use the LLM directly without going through LangChain.


This is not hate. Langchain was a delightful way to get used to concepts and then it was easier, faster and better to reconstruct the elements I needed myself without having to dig into LC docs to try to figure out why X was acting weird sometimes. I wouldn’t recommend for anything more than that.


I tried langchain agent and other abstractions and gave up after a few weeks. Things break randomly or after switching to a different LLm models (Claude vs ChatGPT). I determined that it was because of the prompt. So, I spent hours trying to make custom prompt. It works fine for me now but I wonder why I have to go through all that Frankenstein hack with Langchain and not just roll my own. I only use langchain for data loading now.


In this thread: a lot of people complaining about Langchain’s architecture or use in production. I’m using it in an alpha quality production app and it is fine. What exactly is knee-capping the complainers so badly?

I won’t defend a lot of the tech choices that have been made, but the “free” tooling (LangSmith), integrations, and modicum of cross-model compatibility are worth it to me.


The other comments have sufficient justification (confusing documentation, technical debt, easier to just roll-your-own) for their dislike of LangChain. This has been the case for nearly every HN thread about LangChain. It is not a vast anti-LangChain conspiracy.

I personally used it in an alpha-quality production app too and it was fine at first, but I found out after a month of hacking, it didn't work for the business needs for the reasons stated above.

Said free tooling just increases lock-in which is not ideal for any complex software project.


Offtopic: I find a lot of the langchain hate is misplaced and misguided. Sure, if you want to ship a weekend project using the LLM APIs directly is probably fine. But I've gotten a lot of use out of Langchain's abstractions on real-world projects that evolve quickly over time. At a minimum, it provides a common language for building LLM applications.

Some valid criticisms: (1) the learning curve is steep (2) the APIs and docs are volatile (though to be expected).

This reminds me of the old Django vs Flask debate. Sure, Flask is easy to get started with, but over time you end up building an undocumented, untested Django.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: