This is called Question/Answering (QA). "bedroom", "apparatments" are different ...

gbrits · on Oct 2, 2015

Thanks for this. Looked at your SO answer, and feel what you call the 'narrow search approach' is what I'm looking for.

Above you said: > The idea is to create templates for most common questions.

I assume here that a template would be an abstract phrase where things like Named Entities (Seattle, Capitol Hill), Adjectives (2 bedroom), etc. are removed and substituted by variables. Correct?

Could supervised learning then be used to map natural language questions to templates? After all, there's only so many ways in which you can ask a particular abstract question (i.e.: template) in a limited domain.

What I'm thinking then are the following steps:

- 1. Source questions that cover the domain. (e.g.: Mechanical Turk)

- 2. Manually come up with abstract templates that cover these questions. (Although somehow I feel it must be possible to semi-automate this using Wrapper Induction or something)

- 3. Manually label a test set <question -> template>

- 4. Have the system learn/classify the remaining questions and test for accuracy (what classifiers would you use here?)

Flow of new question:

1. if coverage in 2 was big enough, the system should be able to infer the template.

2. A template should be translatable to a bunch of queries (e.g.: GraphQL format). Not the hard part I believe.

Out pops your answer in machine form. Bonus points to transform that answer into a Natural Language answer using some generative grammar.

Of course the devil is in the details but from 10,000 feet does this look solid? Suggestions/glaring omissions? Thanks again.

amirouche · on Oct 2, 2015

1. There is the Yahoo QA dataset that might be helpful. Also you can crawl specific websites for such questions

2. semi-manually come up with templates (a grammar for the questions). You have to analyse the dataset in a unsupervised way to find out the common patterns and sanatize the results.

3. maybe step 2 is enough.

4. markov networks are useful in this context but I can be wrong

> A template should be translatable to a bunch of queries (e.g.: GraphQL format). Not the hard part I believe.

Yes once you have the templates with typed variables (named entities, adjectives, etc...) like you describe you can write the code to search for the results. I doubt GraphQL is a good solution for that problem. You can't translate the templates into a search on the fly. It's a mapping that you need to build manually or automatically.

I think in your case SQL will be fine. Have a look at https://github.com/machinalis/quepy