I feel like by the time I could turn it into a product, Microsoft & friends will release something that makes it look like a joke. If there is no one on the SQL Server team working on this right now, I don't know what the hell their leadership is thinking.
I am not chasing this rabbit. Someone else will almost certainly catch it first. For now, this is a fun toy I enjoy in my free time. The moment I try to make money with it the fun begins to disappear.
Broadly speaking, I do think this is approximately the only thing that matters once you realize you can put pretty much anything in a big SQL database. What happens when 100% of the domain is in-scope of an LLM that has iteratively optimized itself against the schema?
Microsoft may well catch the rabbit that queries schemas and generates valid SQL.
But that rabbit can't understand the meaning of the data just by looking at column names and table relationships.
Let's say you want to know how sales and inventory are doing compared to last year at your chain of retail stores.
Will Microsoft's rabbit be smart enough to know that the retail business is seasonal, so it must compare the last x weeks this year with the same weeks last year? And account for differences in timing of holidays? And exclude stores that weren't open last year?
Will it know that inventory is a stock and sales is a flow, so while it can sum daily sales, it's nonsensical to sum daily inventory?
The real AI magic isn't generating SQL with four joins, it's understanding the mechanics of each industry and the quirks of your organization to extract the intent from ambiguous and incomplete natural language.
If I can TLDR your comment, which I agree with: the real value is in doing real work.
“Hustlers” burn countless hours trying to “optimize” work out of the picture.
Historically, there’s a lot of money in just sitting down with a to-do list of customer problems and solving them at acceptable cost, come hell or high water.
I will be extremely surprised if Microsoft build this for open source databases, however someone else will definitely build it if you don't, that is completely true :-)
Disclaimer: I work at Microsoft on Postgres related open source tools (Citus & PgBouncer mostly)
Microsoft is heavily investing in Postgres and its ecosystem, so I wouldn't be extremely surprised if we would do this. We're definitely building things to combine AI with Postgres[1]. Although afaik no-one is working actively on query generation using AI.
But I actually did a very basic POC of "natural language queries" in Postgres myself last year:
Postgres is dear to me. Met its founders when I was in college at Berkeley, worked heavily with it at a previous company around 2015, used it for all my own projects. I'm glad to see it getting more attention lately (seemingly).
I didn’t know this, it seems they love open source even thought they have competing commercial products. Maybe there is just more money is selling cloud than there is in selling commercial databases?
> I feel like by the time I could turn it into a product, Microsoft & friends will release something that makes it look like a joke. If there is no one on the SQL Server team working on this right now, I don't know what the hell their leadership is thinking.
If Cortana for Azure isn't a thing in the works, I *really* don't know what the hell their leadership is working on. I could see insane value in "why is my website slow?" and getting actionable responses.
If the do release it , they will only release it for enterprise. Many many sql server installs are sql server standard. There is an entire ecosystem of companies built on selling packages that support sql server standard, wee DevArt, RedGate.
> Microsoft & friends will release something that makes it look like a joke
True, Microsoft & Friends have gotten greedy every passing year. Before they used to develop the platform (OS,DB etc.,) and let others develop and sell apps on it that would benefit them as well as the whole ecosystem.
Now they want every last dollar they can squeeze out of the ecosystem. So they don't leave any stone unturned and they have big pockets to do that.
There are already several products out there with varying success.
Some findings after I played with it awhile:
- Langchain already does something like this
- a lot of the challenge is not with the query itself but efficiently summarizing data to fit in the context window. In other words if you give me 1-4 tables I can give you a product that will work well pretty easy. But when your data warehouse has tens or hundreds of tables with columns and meta types now we need to chain together a string of queries to arrive at the answer and we are basically building a state machine of sorts that has to do fun and creative RAG stuff
- the single biggest thing that made a difference in effectiveness was not what op mentioned at all, but instead having a good summary of what every column in the db was stored in the db. This can be AI generated itself, but the way Langchain attempts to do it on the fly is slow and rather ineffective (or at least was the case when I played with it last summer, it might be better now).
Not affiliated, but after reviewing the products out there the data team I was working with ended up selecting getdot.ai as it had the right mix of price, ease of use, and effectiveness.
Shameless plug – we're working on this at Velvet (https://usevelvet.com) and would love feedback. Our tool can connect and query across disparate data sources (databases and event-based systems) and allows you to write natural language questions that are turned automatically into SQL queries (and even make those queries into API endpoints you can call directly!). My email is in my HN profile if anyone wants to try it out or has feedback.
What made you say this? How is this different from the hundreds of AI startups already focusing on this, or even the submission that we're having this conversation on?