Hacker News new | past | comments | ask | show | jobs | submit login
MQL – Client and server to query your db in natural language (github.com/shurutech)
59 points by akashkahlon 5 months ago | hide | past | favorite | 31 comments



My experience with this kind of tool is that it is at least as hard to learn the tool as it is to learn the technology it abstracts over.

I think that's because thinking about the problem I am trying to solve is always the hardest part and I have to learn a syntax and semantics no matter what. And the syntax and semantics of SQL is mathematically linked to the mathematics of relational databases. Natural language isn't.

Furthermore there's decades of good technical documentation for SQL written by diverse authors for diverse levels of technical experience. Natural language projects are one off and writing documentation is usually a lower priority than making code go.


Agreed. Dozens of companies have built and sold "business intelligence" tools and report builders and visual query interfaces, all promising to ease the interface between man and data and make information easily accessible.

And then every one of these tools turns out to only be usable (barely) by some "data analysts" and never by the executives to whom the system was originally sold.


I think this boils down to fundamental complexity and information theory.

Meaning, that's there's a certain amount of complexity involved in solving any problem. While abstractions are great and useful, they reduce (by their nature) specificity (and consequently, functionality).

We see this issue over and over again with "no code" and "low code" platforms, which are great for to-do apps, but as soon as you get into real-world application requirements, the platform needs to become so complex it's easier to just use a programming language to solve the problem (bubble is a good example).

I think the same issue applies to data querying, but perhaps more-so.

The problem domain is different. Most of the time accuracy is the most important constraint with data queries. For example, if I need to get a list of patients to notify about a drug recall, "mostly correct" isn't going to cut it.

So then the problem becomes developing a language that's specific and can accurately describe and model the problem. Spoken languages aren't great at that. By the time you contort a language like english into a form that can accurately and consistently describe the query, it's probably easier to just use a language that was designed for querying, like SQL or PRQL, etc...

In fact, spoken languages are so terrible at describing problems an entire industry of business analysts, project managers, UX experts and others exist just for the purpose of translating what people need into what's delivered.

I doubt ML models are going to ever replace that. They're sure to provide assistance, but a statistical model is just that - no matter how many of them you chain together, how big it is, or how you weight the model.


IMO, this tool is way simpler than SQL. Once setup is done it is very easy to use for non-tech people, In SQL you have to be 100% correct with the syntax which is not the case here.


For the majority of people from non-tech business functions, the ability to ask for insights from data is liberating. Tools like this can unlock their potential to make more informed decisions. Imagine a store manager of a hyperlocal grocery startup managing a dark store. What if they could ask questions like "What is the fulfilment rate of a certain SKU between 12-3 pm in their store for the past 7 days?"


text-to-sql is a dead end. There's no way for a model to correctly interpret the meaning of every column in a real world database using the `information_schema` alone. Most cloud warehouses (e.g. Snowflake) don't use foreign keys, so you don't even know the joins.

Imagine you hire a highly skilled data analyst (e.g. 9 out of 10 proficiency in SQL) and start asking them questions about your database. They won't answer them, they'll ask you more questions. The conversation would go something like:

you: what is our churn rate by channel?

new analyst: where do we store "channel"? what do we use to process payments? where is that data stored? do we include discounts in MRR / churn? etc.

If a human can't do it, an LLM can't either. An LLM isn't able to write the SQL from scratch get the right answers without a ton of additional context. We're working on an approach using a semantic layer at https://www.definite.app/ if you're interested in this sort of thing.


Agreed, but perhaps more semantic meaning could be expressed in metadata for tables and columns, extending beyond what's typically found in information_schema. (This may be the semantic layer you are talking about.)

Here it seems MQL isn't a query language as much as it's a text-to-SQL translator and you're right... without a bit more understanding of the data's role and purpose and intent it's a hard job for anyone, human or AI.

It strikes me that as I write an sql statement I'm not only using knowledge of sql but also knowledge of domain and database structure that I don't even think about until I need to show someone else how to do the query.


> There's no way for a model to correctly interpret the meaning of every column in a real world database using the `information_schema` alone.

Why would text-to-sql be limited to information_schema alone? Human analysts would use additional documentation, why wouldn't an LLM-based text-to-sql system?


I should have clarified. There's a large number of apps that are:

1. taking info strictly from SQL (e.g. information_schema, query history)

2. taking a user input / question

3. writing SQL to answer that question

An app like this is what I call "text-to-sql". Totally agree a better system would pull in additional documentation (which is what we're doing), but I'd no longer consider it "text-to-sql". In our case, we're not even directly writing SQL, but rather generating semantic layer queries (i.e. https://cube.dev/).


Yes. And also, don't forget that different stakeholders ask in different ways, using different words, which turns out the situation in a nightmare. But I think it's possible to make it to work with mid-size databases.


providing some context about the data, the schema + samples from the entries works quite well, definitely room for improvement but already quite usable imho


Agreed, very usable if you know SQL and iterate from whatever the LLM spits out.


agree, with familiarity with SQL one can use it as a reference for generating the first draft or even the final query


Genuine question: does anyone here actually want to query their database with natural language?


It's really helpful with MongoDB Query Language (also MQL). Document models without a rigid schema and a less intuitive API are where this stuff comes in real handy. MongoDB's GUI Compass already shipped a feature to generate queries and aggregation pipelines from natural language.


The people that hire data analysts do.


Is this to be trusted with things that have to be accurate such as a subpoena ?

Besides, I feel like a data analyst should be able to know what questions to ask, not just how to translate business requests to sql.


If you have to be accurate, "natural language" is not going to be the way to do it.


Nice job getting something released! How does this compare to the other similar open source solutions like Vanna AI and DataHerald?


Thank you, we have not done that comparison yet, but we will check these 2 out to learn more. We calculated the accuracy with a test data set which is part of the repo, we will see how can compare this with others.


That "natural language" will magic and away complexity mindset has done so much damage.


> As of the current version, MQL is designed to work exclusively with PostgreSQL


Yes, we are working on adding MySQL support as well, would you suggest any other integrations after or before mysql ? happy to learn.


isn't SQL already a way to query your DP with natural language?


No, SQL is not natural language.


Or one could, you know, learn SQL.


Most people would rather work in languages they already know. Natural language processing will allow programming languages to become as niche as assembly is, essentially. You won't need to interface with it much because the models will get that good.


:D We have been working with many non-tech founders and business people who are genuinely interested in data but they cannot learn SQL, due to different constraints.


What are those constraints? Really usable SQL for Business people can be learnt in a day long workshop or less time. If they can do Excel, they can do SQL too.


time, focus, priorities, or just curiosity to learn. people have their reasons. Nothing against SQL btw, but very difficult to make someone learn something, especially in authority. There must be people who have done it, it's not none or all, just needs curiosity & effort to learn.


Agreed. It matches my experience as well. I boil it down to personality and orientation than seniority in the org honestly. I have seen VP/Director level people who secretly tinker in SQL alongside delegation 99% of the time. Those who want to learn SQL will do it soon and the rest have some kind of mental block.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: