Hi HN! We’re been working hard on this low-code tool for rapid prompt discovery, robustness testing and LLM evaluation. We’ve just released documentation to help new users learn how to use it and what it can already do. Let us know what you think! :)
"source available" means one thing, but this is properly MIT open source and i believe the authors should receive credit for that in this age of frankenlicenses
EDIT: ah: "This work was partially funded by the NSF grant IIS-2107391." ok cool we the taxpayer funded it haha
I didn't want to open that can of worms: "If you use ChainForge for research purposes, or build upon the source code, we ask that you cite this project in any related publications. The BibTeX you can use for now is.."
That's outside of the MIT licence as far as I'm concerned
I think that's the wrong end of the stick. When you publish research, the software you used/built on is part of the methods and needs to be cited. The authors are doing you a courtesy by providing a pasteable citation.
It says "we ask", not "you have to". I'd say it's open to interpretation that this is an informal request and not legally binding. Also I do want to open the can of worms and say that whoever doesn't even have the respect to include a citation on request when using someone else's work should just write everything themselves.
Hi! I’m one of the great students working on this. This is merely a request to get more visibility. It will also help us get more grants. We don’t have any intention of restricting the “openness” of it.
ChainForge looks similar visually, but is very different in practice. We target evaluating and inspecting LLM outputs, rather than building LLM applications. So, some things are certainly easier to do in CF, while others will certainly be easier in other tools that target LLM app building.
I can't comment on the features, but ChainForge has some catching up to do...mind share wise. Below are some community insights for langflow, Flowise and ChainForge
We don’t view it as a competition here. These other tools are for LLM app building, which is great, but isn’t the focus of ChainForge. Instead, we’re focused on helping devs find the right prompt, and evaluate and inspect LLM outputs. So while we might visually look similar, the goals are rather different.
Some CF users, for instance, might not be app builders at all —they just want to audit models.
I think both problems —prompt engineering and LLM app building —are hard, and deserve their own dedicated tools.
I sort of know where you are coming from, but I think it is hard to discount the fact that 80 new people took the time to create 99 issues in the last 6 weeks in Flowise. No matter how you look at it, this is a strong indicator that this project has strong engagement.
I'm not saying ChainForge is bad, but it will need to go against that kind of community engagement that other projects with a head start have. However, if you believe people contributing code (26) and participating in non code activity (150) in Flowise in the last 6 weeks are just novelty metrics, then yes, comparing numbers is silly.
This looks excellent! It's a great interface for two things I'm struggling to make LlamaIndex do: explain and debug multi-step responses for agent flows, and cache queries aggressively. If I can work out how to hook it into my LlamaIndex-based pile, happy days.
Feature/guidance request: how to actually call functions, how to loop on responses to resolve multiple function calls. I've managed to mock a response to get_current_weather using this contraption: https://pasteboard.co/aO9BmHG5qsFt.png . But it's messy and I can't see a way to actually evaluate function calls. And if I involve the Chat Turn node, the message sequences seem to get tangled with each other. Probably I'm holding it wrong!
Thank you for the kind words! Looking at the photo, I think you wouldn’t need the last prompt node there.
As far as evaluating functions go, that’s unfortunately a ways off. But, we generally prioritize things based on how many people posted GitHub Issues about it/want it. (For instance, Chat Turn nodes came from an Issue.) If you post a feature request there, it’ll move up our priority list, and we can also clarify what the feature precisely should be.
Hey Eric! Thank you! As an aside, we are looking to interview some people who’ve used ChainForge (you see, we are academics who must justify our creations through publications… crazy, I know). Would you or anyone on your team be interested in a brief chat?
There is a long term vision of supporting fine-tuning through an existing evaluation flow. We originally created this because we were worried about how to evaluate ‘what changed’ between a fine-tuned LLM and its base model. I wonder if Vertex AI has an API that we could plug-in, though, or if it’s limited to the UI.
I don't personally have a need for this right now, but I can really see the use for the parameterised queries, as well as comparisons across models.
Thanks for your efforts!
0: https://github.com/ianarawjo/ChainForge