Show HN: ChainForge, a visual tool for prompt engineering and LLM evaluation

_puk · on Aug 7, 2023

I think you should probably mention that its source is available! [0]

I don't personally have a need for this right now, but I can really see the use for the parameterised queries, as well as comparisons across models.

Thanks for your efforts!

0: https://github.com/ianarawjo/ChainForge

swyx · on Aug 7, 2023

"source available" means one thing, but this is properly MIT open source and i believe the authors should receive credit for that in this age of frankenlicenses

EDIT: ah: "This work was partially funded by the NSF grant IIS-2107391." ok cool we the taxpayer funded it haha

_puk · on Aug 7, 2023

I didn't want to open that can of worms: "If you use ChainForge for research purposes, or build upon the source code, we ask that you cite this project in any related publications. The BibTeX you can use for now is.."

That's outside of the MIT licence as far as I'm concerned

mabcat · on Aug 7, 2023

I think that's the wrong end of the stick. When you publish research, the software you used/built on is part of the methods and needs to be cited. The authors are doing you a courtesy by providing a pasteable citation.

Similar "we would appreciate citations" statement for (BSD-licensed) pandas: https://pandas.pydata.org/about/citing.html

8000+ pubs citing pandas: https://scholar.google.com/scholar?cites=9876954816936339312

7moritz7 · on Aug 7, 2023

It says "we ask", not "you have to". I'd say it's open to interpretation that this is an informal request and not legally binding. Also I do want to open the can of worms and say that whoever doesn't even have the respect to include a citation on request when using someone else's work should just write everything themselves.

priyanmuthu · on Aug 7, 2023

Hi! I’m one of the great students working on this. This is merely a request to get more visibility. It will also help us get more grants. We don’t have any intention of restricting the “openness” of it.

gsuuon · on Aug 7, 2023

Y'all sure are some great students! ;)

_puk · on Aug 7, 2023

Awesome, thank you!

If I called it truly open source I half expected to get shot down.

I know where we stand now :)

KRAKRISMOTT · on Aug 7, 2023

It seems to be more powerful than langflow and flowise

https://github.com/logspace-ai/langflow

https://github.com/FlowiseAI/Flowise

fatso784 · on Aug 12, 2023

ChainForge looks similar visually, but is very different in practice. We target evaluating and inspecting LLM outputs, rather than building LLM applications. So, some things are certainly easier to do in CF, while others will certainly be easier in other tools that target LLM app building.

swyx · on Aug 7, 2023

k i need somebody to do a comparison table for us...

priyanmuthu · on Aug 7, 2023

We will mostly do this comparison when we write our research paper. I can post it here when we do it.

sdesol · on Aug 7, 2023

I can't comment on the features, but ChainForge has some catching up to do...mind share wise. Below are some community insights for langflow, Flowise and ChainForge

https://devboard.gitsense.com/logspace-ai/langflow

https://devboard.gitsense.com/FlowiseAI/Flowise

https://devboard.gitsense.com/ianarawjo/ChainForge

Flowise currently has the largest active community (based on GitHub data)

Full Disclosure: This is my tool

fatso784 · on Aug 12, 2023

We don’t view it as a competition here. These other tools are for LLM app building, which is great, but isn’t the focus of ChainForge. Instead, we’re focused on helping devs find the right prompt, and evaluate and inspect LLM outputs. So while we might visually look similar, the goals are rather different.

Some CF users, for instance, might not be app builders at all —they just want to audit models.

I think both problems —prompt engineering and LLM app building —are hard, and deserve their own dedicated tools.

swyx · on Aug 7, 2023

back in finance they had a saying: traders know the price of everything but the value of nothing.

thats what its like to blindly compare tools by github numbers

sdesol · on Aug 8, 2023

I sort of know where you are coming from, but I think it is hard to discount the fact that 80 new people took the time to create 99 issues in the last 6 weeks in Flowise. No matter how you look at it, this is a strong indicator that this project has strong engagement.

I'm not saying ChainForge is bad, but it will need to go against that kind of community engagement that other projects with a head start have. However, if you believe people contributing code (26) and participating in non code activity (150) in Flowise in the last 6 weeks are just novelty metrics, then yes, comparing numbers is silly.

brianjking · on Aug 12, 2023

Yeah, I'd love to see these tools discussed in an upcoming Latent Space episode.

mabcat · on Aug 7, 2023

This looks excellent! It's a great interface for two things I'm struggling to make LlamaIndex do: explain and debug multi-step responses for agent flows, and cache queries aggressively. If I can work out how to hook it into my LlamaIndex-based pile, happy days.

Feature/guidance request: how to actually call functions, how to loop on responses to resolve multiple function calls. I've managed to mock a response to get_current_weather using this contraption: https://pasteboard.co/aO9BmHG5qsFt.png . But it's messy and I can't see a way to actually evaluate function calls. And if I involve the Chat Turn node, the message sequences seem to get tangled with each other. Probably I'm holding it wrong!

fatso784 · on Aug 12, 2023

Thank you for the kind words! Looking at the photo, I think you wouldn’t need the last prompt node there.

As far as evaluating functions go, that’s unfortunately a ways off. But, we generally prioritize things based on how many people posted GitHub Issues about it/want it. (For instance, Chat Turn nodes came from an Issue.) If you post a feature request there, it’ll move up our priority list, and we can also clarify what the feature precisely should be.

ericskiff · on Aug 7, 2023

We just used this on a project and it was very helpful! Cool to see it here on HN

fatso784 · on Aug 12, 2023

Hey Eric! Thank you! As an aside, we are looking to interview some people who’ve used ChainForge (you see, we are academics who must justify our creations through publications… crazy, I know). Would you or anyone on your team be interested in a brief chat?

Can contact me here: https://twitter.com/IanArawjo Or find email on CV here: ianarawjo.com

At any rate, glad it was helpful!

dekervin · on Aug 7, 2023

May I ask, how was it useful ? I find it cool but I have a hard time justifying using it.

koryk · on Aug 7, 2023

I like it! Any plans to add Google Vertex AI support?

fatso784 · on Aug 12, 2023

There is a long term vision of supporting fine-tuning through an existing evaluation flow. We originally created this because we were worried about how to evaluate ‘what changed’ between a fine-tuned LLM and its base model. I wonder if Vertex AI has an API that we could plug-in, though, or if it’s limited to the UI.

koryk · on Aug 13, 2023

I meant for completion, chat and embedding. https://cloud.google.com/vertex-ai/docs/generative-ai/chat/t... . Some examples here.

Vertex AI has the same API as PaLM as far as I know. However, the authorization is through Google Cloud. So I use it like any other GCP API.

I love the idea of adding fine tuning as a node though. Here is the API for creating a model tuning job - https://cloud.google.com/vertex-ai/docs/generative-ai/models...

I wish I could use ChainForge nodes in Node Red.

maxlamb · on Aug 7, 2023

What exactly is prompt discovery?

firtoz · on Aug 7, 2023

AFAIK it's finding out what prompts to use for what LLM to get the answer your want

E.g. this

> Compare response quality across prompt permutations, across models, and across model settings to choose the best prompt and model for your use case.

trentearl · on Aug 7, 2023

Cool project

fatso784 · on Aug 12, 2023

Thanks!