Hey HN,
We are Zain and Ashish, founders of Vanna AI. We recently embarked on an experiment to see if large language models (specifically LLMs) could help in generating SQL queries for real-world datasets. We initially started this project as a web app but realized that it was most useful and had broadest applicability as a Python package since you can then incorporate it into an existing workflow (Jupyter notebook, Slackbot, etc).
We've had some good success with customer datasets but we've generally heard a lot of skepticism so we decided to write a paper about the methodology we're using and how various LLMs compare.
Let us know if you have any questions or requests. The underlying Python package is open source. There is a server component to store and retrieve metadata but by next week there will be a fully open-source and locally runnable version.
Cheers!
How complex were the 'static examples' that were used? Can you share the examples of the three that were used in the tests?
Were the "contextually relevant sql" ran in addition to or isolated from the "static examples"?