I'm curious to see if people have tried this out with their datasets and seen success? I've been using similar techniques at work to build a bot that allows employees internally to talk to our structured datasets (a couple MySQL tables). It works kind of ok in practice, but there are a few challenges:
1. We have many enums and data types specific to our business that will never be in these foundation models. Those have to be manually defined and fed into the prompt as context also (i.e. the equivalent of adding documentation in Vanna.ai).
2. People can ask many kinds of questions that are time-related like 'how much demand was there in the past year?'. If you store your data in quarters, how would you prompt engineer the model to take into account the current time AND recognize it's the last 4 quarters? This has typically broken for me.
3. It took a LOT of sample and diverse example SQL queries in order for it to generate the right SQL queries for a set of plausible user questions (15-20 SQL queries for a single MySQL table). Given that users can ask anything, it has to be extremely robust. Requiring this much context for just a single table means it's difficult to scale to tens or hundreds of tables. I'm wondering if there's a more efficient way of doing this?
4. I've been using the Llama2 70B Gen model, but curious to know if other models work significantly better than this one in generating SQL queries?
For 2. we ended up stuffing the prompt with examples for common date ranges "this month", "last year", "this year to date" and some date math, and examples of date fields (we have timestamp, and extracted Year, Month, Day, etc)
Current date: `current_date()`
3 days ago: `current_date() - INTERVAL 3 DAY`
Beginning of this month: `date_trunc('month', current_date())`
...
4. I get best results with GPT-4, haven't tried Llama yet. 3.5 and 4-turbo tend to "forget" stuff for complex queries, but may be we need more tuning yet.
1. We have many enums and data types specific to our business that will never be in these foundation models. Those have to be manually defined and fed into the prompt as context also (i.e. the equivalent of adding documentation in Vanna.ai).
2. People can ask many kinds of questions that are time-related like 'how much demand was there in the past year?'. If you store your data in quarters, how would you prompt engineer the model to take into account the current time AND recognize it's the last 4 quarters? This has typically broken for me.
3. It took a LOT of sample and diverse example SQL queries in order for it to generate the right SQL queries for a set of plausible user questions (15-20 SQL queries for a single MySQL table). Given that users can ask anything, it has to be extremely robust. Requiring this much context for just a single table means it's difficult to scale to tens or hundreds of tables. I'm wondering if there's a more efficient way of doing this?
4. I've been using the Llama2 70B Gen model, but curious to know if other models work significantly better than this one in generating SQL queries?