Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: ChainForge, a visual tool for prompt engineering and LLM evaluation (chainforge.ai)
177 points by fatso784 on Aug 7, 2023 | hide | past | favorite | 29 comments
Hi HN! We’re been working hard on this low-code tool for rapid prompt discovery, robustness testing and LLM evaluation. We’ve just released documentation to help new users learn how to use it and what it can already do. Let us know what you think! :)



I think you should probably mention that its source is available! [0]

I don't personally have a need for this right now, but I can really see the use for the parameterised queries, as well as comparisons across models.

Thanks for your efforts!

0: https://github.com/ianarawjo/ChainForge


"source available" means one thing, but this is properly MIT open source and i believe the authors should receive credit for that in this age of frankenlicenses

EDIT: ah: "This work was partially funded by the NSF grant IIS-2107391." ok cool we the taxpayer funded it haha


I didn't want to open that can of worms: "If you use ChainForge for research purposes, or build upon the source code, we ask that you cite this project in any related publications. The BibTeX you can use for now is.."

That's outside of the MIT licence as far as I'm concerned


I think that's the wrong end of the stick. When you publish research, the software you used/built on is part of the methods and needs to be cited. The authors are doing you a courtesy by providing a pasteable citation.

Similar "we would appreciate citations" statement for (BSD-licensed) pandas: https://pandas.pydata.org/about/citing.html

8000+ pubs citing pandas: https://scholar.google.com/scholar?cites=9876954816936339312


It says "we ask", not "you have to". I'd say it's open to interpretation that this is an informal request and not legally binding. Also I do want to open the can of worms and say that whoever doesn't even have the respect to include a citation on request when using someone else's work should just write everything themselves.


Hi! I’m one of the great students working on this. This is merely a request to get more visibility. It will also help us get more grants. We don’t have any intention of restricting the “openness” of it.


Y'all sure are some great students! ;)


Awesome, thank you!

If I called it truly open source I half expected to get shot down.

I know where we stand now :)


It seems to be more powerful than langflow and flowise

https://github.com/logspace-ai/langflow

https://github.com/FlowiseAI/Flowise


ChainForge looks similar visually, but is very different in practice. We target evaluating and inspecting LLM outputs, rather than building LLM applications. So, some things are certainly easier to do in CF, while others will certainly be easier in other tools that target LLM app building.


k i need somebody to do a comparison table for us...


We will mostly do this comparison when we write our research paper. I can post it here when we do it.


I can't comment on the features, but ChainForge has some catching up to do...mind share wise. Below are some community insights for langflow, Flowise and ChainForge

https://devboard.gitsense.com/logspace-ai/langflow

https://devboard.gitsense.com/FlowiseAI/Flowise

https://devboard.gitsense.com/ianarawjo/ChainForge

Flowise currently has the largest active community (based on GitHub data)

Full Disclosure: This is my tool


We don’t view it as a competition here. These other tools are for LLM app building, which is great, but isn’t the focus of ChainForge. Instead, we’re focused on helping devs find the right prompt, and evaluate and inspect LLM outputs. So while we might visually look similar, the goals are rather different.

Some CF users, for instance, might not be app builders at all —they just want to audit models.

I think both problems —prompt engineering and LLM app building —are hard, and deserve their own dedicated tools.


back in finance they had a saying: traders know the price of everything but the value of nothing.

thats what its like to blindly compare tools by github numbers


I sort of know where you are coming from, but I think it is hard to discount the fact that 80 new people took the time to create 99 issues in the last 6 weeks in Flowise. No matter how you look at it, this is a strong indicator that this project has strong engagement.

I'm not saying ChainForge is bad, but it will need to go against that kind of community engagement that other projects with a head start have. However, if you believe people contributing code (26) and participating in non code activity (150) in Flowise in the last 6 weeks are just novelty metrics, then yes, comparing numbers is silly.


Yeah, I'd love to see these tools discussed in an upcoming Latent Space episode.


This looks excellent! It's a great interface for two things I'm struggling to make LlamaIndex do: explain and debug multi-step responses for agent flows, and cache queries aggressively. If I can work out how to hook it into my LlamaIndex-based pile, happy days.

Feature/guidance request: how to actually call functions, how to loop on responses to resolve multiple function calls. I've managed to mock a response to get_current_weather using this contraption: https://pasteboard.co/aO9BmHG5qsFt.png . But it's messy and I can't see a way to actually evaluate function calls. And if I involve the Chat Turn node, the message sequences seem to get tangled with each other. Probably I'm holding it wrong!


Thank you for the kind words! Looking at the photo, I think you wouldn’t need the last prompt node there.

As far as evaluating functions go, that’s unfortunately a ways off. But, we generally prioritize things based on how many people posted GitHub Issues about it/want it. (For instance, Chat Turn nodes came from an Issue.) If you post a feature request there, it’ll move up our priority list, and we can also clarify what the feature precisely should be.


We just used this on a project and it was very helpful! Cool to see it here on HN


Hey Eric! Thank you! As an aside, we are looking to interview some people who’ve used ChainForge (you see, we are academics who must justify our creations through publications… crazy, I know). Would you or anyone on your team be interested in a brief chat?

Can contact me here: https://twitter.com/IanArawjo Or find email on CV here: ianarawjo.com

At any rate, glad it was helpful!


May I ask, how was it useful ? I find it cool but I have a hard time justifying using it.


I like it! Any plans to add Google Vertex AI support?


There is a long term vision of supporting fine-tuning through an existing evaluation flow. We originally created this because we were worried about how to evaluate ‘what changed’ between a fine-tuned LLM and its base model. I wonder if Vertex AI has an API that we could plug-in, though, or if it’s limited to the UI.


I meant for completion, chat and embedding. https://cloud.google.com/vertex-ai/docs/generative-ai/chat/t... . Some examples here.

Vertex AI has the same API as PaLM as far as I know. However, the authorization is through Google Cloud. So I use it like any other GCP API.

I love the idea of adding fine tuning as a node though. Here is the API for creating a model tuning job - https://cloud.google.com/vertex-ai/docs/generative-ai/models...

I wish I could use ChainForge nodes in Node Red.


What exactly is prompt discovery?


AFAIK it's finding out what prompts to use for what LLM to get the answer your want

E.g. this

> Compare response quality across prompt permutations, across models, and across model settings to choose the best prompt and model for your use case.


Cool project


Thanks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: