This is called Question/Answering (QA). "bedroom", "apparatments" are different entities from "Capitol Hill", "Seattle". You could do as you say, trying to understand the question based on some of the words that appears using statistics. This is a "bag of word" approach.
The general idea of NLP is not different from general computer science ie. 1) narrow the problem 2) solve it 3) try to solve a bigger problem.
The tower of sentence structure in NLP is:
- bag of word
- part of speech + named enties tagging
- dependency tagging/framing
- semantic tagging
The idea is to create templates for most common questions. Then you parse questions recognizing the named entities like "Capitol Hill", "Seattle" and commons "appartement" you can resolve the question. It's not an ordinary dictionary hash lookup since for in given template there is several "key". The value of the dictionary is the correct search method. It makes me think to multiple method dispatch which support dispatch by value.
Also something to take into account is that in the "assistant" example you give, the assistant can ask for confirmation. You don't explicitly state that you are looking to "rent" something. So the system might not recognize the question, but just guess that you talk about renting something because it's the most popular search around Capitol Hill, Seattle. You can implement a "suggest this question" feature that will feedback the "question dispatch" algorithm to later recognize this question.
This is mostly a Dynamic Programming approach. Advanced NLP pipelines use logic, probabilistic programming, graph theory or all of them ;)
The other big problems of NLP are:
- summary generation
- automatic translation
Important to note is that like other systems it must be goal driven. You can start from the goal and go backward infering the previous steps or do it from the initial data and go forward. Again, it's very important to simplify. Factorize by recognizing patterns. It's the main idea regarding the theory of the mind.
Have a look at this SO question [1] I try to fully explain an example QA. Coursera NLP course is a good start.
OpenCog doesn't deal solely with NLP but gives an example of what a modern artificial cognitive assistant can be made of.
Thanks for this. Looked at your SO answer, and feel what you call the 'narrow search approach' is what I'm looking for.
Above you said: > The idea is to create templates for most common questions.
I assume here that a template would be an abstract phrase where things like Named Entities (Seattle, Capitol Hill), Adjectives (2 bedroom), etc. are removed and substituted by variables. Correct?
Could supervised learning then be used to map natural language questions to templates? After all, there's only so many ways in which you can ask a particular abstract question (i.e.: template) in a limited domain.
What I'm thinking then are the following steps:
- 1. Source questions that cover the domain. (e.g.: Mechanical Turk)
- 2. Manually come up with abstract templates that cover these questions. (Although somehow I feel it must be possible to semi-automate this using Wrapper Induction or something)
- 3. Manually label a test set <question -> template>
- 4. Have the system learn/classify the remaining questions and test for accuracy (what classifiers would you use here?)
Flow of new question:
1. if coverage in 2 was big enough, the system should be able to infer the template.
2. A template should be translatable to a bunch of queries (e.g.: GraphQL format). Not the hard part I believe.
Out pops your answer in machine form. Bonus points to transform that answer into a Natural Language answer using some generative grammar.
Of course the devil is in the details but from 10,000 feet does this look solid? Suggestions/glaring omissions? Thanks again.
1. There is the Yahoo QA dataset that might be helpful. Also you can crawl specific websites for such questions
2. semi-manually come up with templates (a grammar for the questions). You have to analyse the dataset in a unsupervised way to find out the common patterns and sanatize the results.
3. maybe step 2 is enough.
4. markov networks are useful in this context but I can be wrong
> A template should be translatable to a bunch of queries (e.g.: GraphQL format). Not the hard part I believe.
Yes once you have the templates with typed variables (named entities, adjectives, etc...) like you describe you can write the code to search for the results. I doubt GraphQL is a good solution for that problem. You can't translate the templates into a search on the fly. It's a mapping that you need to build manually or automatically.
The general idea of NLP is not different from general computer science ie. 1) narrow the problem 2) solve it 3) try to solve a bigger problem.
The tower of sentence structure in NLP is:
- bag of word
- part of speech + named enties tagging
- dependency tagging/framing
- semantic tagging
The idea is to create templates for most common questions. Then you parse questions recognizing the named entities like "Capitol Hill", "Seattle" and commons "appartement" you can resolve the question. It's not an ordinary dictionary hash lookup since for in given template there is several "key". The value of the dictionary is the correct search method. It makes me think to multiple method dispatch which support dispatch by value.
Also something to take into account is that in the "assistant" example you give, the assistant can ask for confirmation. You don't explicitly state that you are looking to "rent" something. So the system might not recognize the question, but just guess that you talk about renting something because it's the most popular search around Capitol Hill, Seattle. You can implement a "suggest this question" feature that will feedback the "question dispatch" algorithm to later recognize this question.
This is mostly a Dynamic Programming approach. Advanced NLP pipelines use logic, probabilistic programming, graph theory or all of them ;)
The other big problems of NLP are:
- summary generation - automatic translation
Important to note is that like other systems it must be goal driven. You can start from the goal and go backward infering the previous steps or do it from the initial data and go forward. Again, it's very important to simplify. Factorize by recognizing patterns. It's the main idea regarding the theory of the mind.
Have a look at this SO question [1] I try to fully explain an example QA. Coursera NLP course is a good start.
OpenCog doesn't deal solely with NLP but gives an example of what a modern artificial cognitive assistant can be made of.
Beware that NLP is kind of loop-hole.
[1] http://stackoverflow.com/questions/32432719/is-there-any-nlp...