Ah, this brings back fun memories. Over a decade ago, Microsoft SQL Server had English Query to do the same thing. Developers had to define all the words & relationships that would be used, and keep them up to date over time. Microsoft discontinued it around 2005.
I remember briefly working with it and realizing, “No, this is never going to work because users have a tough time spelling things accurately.”
If misspelling is one of the main problems than it would be interesting to have suggestions based on how similar a word was to another, so that you could have something similar to the rust compiler’s “<undefined variable name> does not exist, did you mean <defined variable name>”. I think it uses Levenshtein distance, and maybe some other algorithms/heuristics to pick it’s best guess.
I am not criticizing this project -- obviously there's a desire for software to understand natural language questions and certainly makes sense to target SQL.
However, it immediately reminds me of the main drawback of SQL: it was designed with the aim that it would be easy for non-programmers to use, at a point in history when that seemed to make sense to the designers. Hence SQL itself has a natural languagey feel, e.g. a keyword like "group by" has an embedded space despite "group" and "by" having no meaning individually. As many others have observed less superficially, SQL lacks composability. It feels less like a programming language and more like a natural language interface to a relational database. It's certainly great, and has proven the test of time, etc etc, but I agree that we can do better, e.g. the following article's discussion, if not their proposed solution.
The reverse is way more interesting to me.... do a seemingly bunch of random aggregations and joins and send me an email of all the English versions and let me look at the results of those if I care to. The issue with these types of ideas is always that only "clean" or heavily analyzed data can generate this level of automation... and this type of tool is really only useful for when you are looking at virgin data or data that is crap.
There's nothing wrong with being resource efficient, but relatively SQL queries used to take two minutes, and the result printed on paper, and the whole machine cost two million.
Yeah and sending a message to someone across the sea used to take months and probably wouldn’t even get there half the time, that doesn’t meant it’s not ridiculous that I can’t run two instances of slack without my computer melting.
Without really digging into this, if it's a neural model like a sibling comment mentions perhaps you can do model compression that shrinks many multiples, if that is the majority of the image size.
Very interesting. I did a similar project for my machine learning class in grad school. I'm curious what kind of architecture/model framework you used for this.
Why do people fear programming that much, so that even a simple SQL query is tried to be abstracted?
Same thing with visual "no code" programming. Is that really simpler to endlessly move around colorful blocks than writing a few lines in, say, Python?
Yes? If you work with it every day you forget how obtuse and opaque programming can be. Most normal people barely understand basic consumer electronics.
Programming is like many things not an immensely difficult task, but it is one that requires knowledge and experience, so a true novice confronted with a SQL manual would not be able to quickly write the right code. Someone who's done programming before would have a much easier time.
The programmer's equivalent is "what's the StackOverflow answer for how to do X with Y?".
(e.g., Microsoft demonstrated some of their tech with a Stack Overflow Bot https://github.com/microsoft/BotFramework-Samples/tree/maste... -- idk if anyone actually uses it though).
Sure, I should do it the "go fast by going slow" way and spend time to read the manual and go through the tutorial.
But often I'll want to know something which many people will have asked.
It's great to see a nice open source example of work in this area. It would be helpful if there were also some examples of where the model fails either by not being able to produce a query or producing an incorrect query.
I feel that SQL queries that people will need help with are not the SQL 101 examples shown here.
Looking at the queries I write as a product manager, I wouldn't have the slightest clue how to translate them to English. How do you LEFT JOIN on a relation and fetch NTiles on a calculated value from three different tables? Surely this will require some English-to-schema dictionary that someone who knows SQL will have to maintain.
Humans think with language. Anything that can be understood can be written in language. The README acknowledges the need for further research and development but it's not hard to imagine how a more complex query could be written in English.
"How many bridges are there that are in the same state as any bridge designed by O.H. Amman and are shorter than the number of kilometers between the Earth and the Moon divided by the number of people living in the state and have a maximum load that is at least as great as the combined weight of all currently living Asian elephants?"
When someone walks over and asks, "What does this query do?" Do you just tell them that it cannot possibly be put into English? Of course not, knowing the answer and knowing how to say the answer in at least one human language are the same thing.
The example is probably wrong or not working since the English questions uses "throgs neck" in lower case instead of the captialized "Throgs Neck" version.
Those are two different strings in SQL and shouldn't match exactly.
This already shows one difficulty of translating a natural language to a SQL query.
Just curious, what tool do you use to write SQL today and how much time do you spend doing it? Would you be interested in this to reduce the time you spend writing SQL?
I think anyone who uses sql would find this tool (and a lot of similar abstractions) too imprecise, verbose, and error prone.
What's the longest bridge in mariland - or was it merryland (or maybe state_code=MD), is it in bridges table or bridge, or maybe bridge_specs joined on bridges by some id...I'll just write my sql in my favorite sql autocomplete tool.
I agree, but I was thinking it could give people that don't know much SQL a starting point. They could ask a natural language question > get SQL > tweak SQL. So in your example, they might get something back like:
SELECT MAX(length) FROM bridges WHERE state = 'merryland'
And they could recognize the error of merryland vs. MD.
You can e.g. do this using semantic parsing, I remember some older papers but cannot find them right now. For a more recent paper see [1] (haven't read it though).
You can also use Deep Learning for this. There are some interesting approaches using Reinforcement Learning [2]
I remember briefly working with it and realizing, “No, this is never going to work because users have a tough time spelling things accurately.”