Hacker News new | past | comments | ask | show | jobs | submit login
Curated list of speech and natural language processing resources (github.com/edobashira)
101 points by sebg on Oct 1, 2015 | hide | past | favorite | 27 comments



Those curated lists poppung up all over the place seem to indicate a need for pre-Google-style Altavista/Yahoo portals.


Curation is always the next step after explosion of content. Yahoo was curation of the whole internet. Then it got too hard. Now, we have enough content in tiny sub-niches to need curation on that level. I definitely see the need for curation of resources around the topic I am interested in (Apache Solr).

Unfortunately, I haven't seen a good software platform that actually allows to build a good curation site. Ones that exist want you to build the content for them. I want one I can run/own/brand on my own. I suspect there might be some in the library space though (haven't search _very_ hard yet).


> Unfortunately, I haven't seen a good software platform that actually allows to build a good curation site

Emacs and HTML work fine, and have been a pretty good solution for the last 20 years.


If your time is free - sure. I prefer to outsource markup consistency, repetition of same content under different tags, and promoted items management to software.


There are about a zillion things out there that can do that.

Including Emacs, which you could use with a bit of elisp :-)


Been there, done that (elisp included), don't think that's quite what I had in mind.

But thank you for persevering. :-)


There always seemed to be a need for dedicated lists. Rather than curate, I'm trying to build a dedicated mini "search engine" for Swift/iOS Resources: http://www.h4labs.com/dev/ios/swift.html

The Internet contains so much information on any given topic that if you have a question, it probably has already been answered. If we could build better search engines, we could learn anything in a fraction of the time.


Indeed I've also noticed that.

The web had a chaotic growth in the first decades but now it looks as if on one end, the larger websites have killed smaller ones, and on the other it has grown so large that search is no longer enough.

You need organization.

(sorry for the offtopic)


The search engine doesn't find because it doesn't exist. A list is better than nothing. How to get things done is more interesting. At least Coursera NLP course is good, but it's only an introduction.

Looking up ready to use NLP software you can go with Solr and that's it. What I mean is that there is way too much NLP libraries. That said, it might be because there is many ways to do it. Anyway, I really think we need scikit learn for NLP.

organization != knowledge.


https://github.com/facebook/MemNN should be in the language modelling (or Deep Learning) part. I'll give them a pass because it was only released a couple of days ago.

The original Word2Vec[1] is missing too. While Gensim and Glove are nice, Word2Vec still outperforms them both in some circumstances.

Surely there is a good LTSM language modelling project somewhere too? I can't think of one off the top of my head though. There's some code in Keras[2], but maybe Karpathy's char-RNN would be better[3] because of the documentation.

[1] https://code.google.com/p/word2vec/

[2] https://github.com/fchollet/keras/blob/master/examples/lstm_...

[3] https://github.com/karpathy/char-rnn


LSTM --> right now, Torch 7 and Theano are receiving the bulk of the attention.


Keras is based on Theano - an easy way to get started.


Consider a speech-to-structured-search-app in a limited domain, like a specialized siri/google now. For example something like a real estate search assistant with possible questions like: "what new 2 bedroom apartments have become available in Capitol Hill, Seattle this week?"

Perhaps naively, it seems a big part of the deducing meaning could be done doing ordinary dictionary lookups with terms like 'bedroom', 'apartments', "Capitol Hill", "seattle" etc.

Is this indeed naive, or is this 'dictionary lookup'-technique part of the bag of tricks used? If so, any good references to use this in combination with other techniques described here?

Highly interested in this topic, but looking for a nice introduction to get used to the terminology of the field.


This is called Question/Answering (QA). "bedroom", "apparatments" are different entities from "Capitol Hill", "Seattle". You could do as you say, trying to understand the question based on some of the words that appears using statistics. This is a "bag of word" approach.

The general idea of NLP is not different from general computer science ie. 1) narrow the problem 2) solve it 3) try to solve a bigger problem.

The tower of sentence structure in NLP is:

- bag of word

- part of speech + named enties tagging

- dependency tagging/framing

- semantic tagging

The idea is to create templates for most common questions. Then you parse questions recognizing the named entities like "Capitol Hill", "Seattle" and commons "appartement" you can resolve the question. It's not an ordinary dictionary hash lookup since for in given template there is several "key". The value of the dictionary is the correct search method. It makes me think to multiple method dispatch which support dispatch by value.

Also something to take into account is that in the "assistant" example you give, the assistant can ask for confirmation. You don't explicitly state that you are looking to "rent" something. So the system might not recognize the question, but just guess that you talk about renting something because it's the most popular search around Capitol Hill, Seattle. You can implement a "suggest this question" feature that will feedback the "question dispatch" algorithm to later recognize this question.

This is mostly a Dynamic Programming approach. Advanced NLP pipelines use logic, probabilistic programming, graph theory or all of them ;)

The other big problems of NLP are:

- summary generation - automatic translation

Important to note is that like other systems it must be goal driven. You can start from the goal and go backward infering the previous steps or do it from the initial data and go forward. Again, it's very important to simplify. Factorize by recognizing patterns. It's the main idea regarding the theory of the mind.

Have a look at this SO question [1] I try to fully explain an example QA. Coursera NLP course is a good start.

OpenCog doesn't deal solely with NLP but gives an example of what a modern artificial cognitive assistant can be made of.

Beware that NLP is kind of loop-hole.

[1] http://stackoverflow.com/questions/32432719/is-there-any-nlp...


Thanks for this. Looked at your SO answer, and feel what you call the 'narrow search approach' is what I'm looking for.

Above you said: > The idea is to create templates for most common questions.

I assume here that a template would be an abstract phrase where things like Named Entities (Seattle, Capitol Hill), Adjectives (2 bedroom), etc. are removed and substituted by variables. Correct?

Could supervised learning then be used to map natural language questions to templates? After all, there's only so many ways in which you can ask a particular abstract question (i.e.: template) in a limited domain.

What I'm thinking then are the following steps:

- 1. Source questions that cover the domain. (e.g.: Mechanical Turk)

- 2. Manually come up with abstract templates that cover these questions. (Although somehow I feel it must be possible to semi-automate this using Wrapper Induction or something)

- 3. Manually label a test set <question -> template>

- 4. Have the system learn/classify the remaining questions and test for accuracy (what classifiers would you use here?)

Flow of new question:

1. if coverage in 2 was big enough, the system should be able to infer the template.

2. A template should be translatable to a bunch of queries (e.g.: GraphQL format). Not the hard part I believe.

Out pops your answer in machine form. Bonus points to transform that answer into a Natural Language answer using some generative grammar.

Of course the devil is in the details but from 10,000 feet does this look solid? Suggestions/glaring omissions? Thanks again.


1. There is the Yahoo QA dataset that might be helpful. Also you can crawl specific websites for such questions

2. semi-manually come up with templates (a grammar for the questions). You have to analyse the dataset in a unsupervised way to find out the common patterns and sanatize the results.

3. maybe step 2 is enough.

4. markov networks are useful in this context but I can be wrong

> A template should be translatable to a bunch of queries (e.g.: GraphQL format). Not the hard part I believe.

Yes once you have the templates with typed variables (named entities, adjectives, etc...) like you describe you can write the code to search for the results. I doubt GraphQL is a good solution for that problem. You can't translate the templates into a search on the fly. It's a mapping that you need to build manually or automatically.

I think in your case SQL will be fine. Have a look at https://github.com/machinalis/quepy


About the "Text-to-Speech" section there, I was really impressed with the updated Swedish "Alva" voice in OSX El Capitan: it correctly pronounces "tomten" in different ways in the first and second occurrence in this example:

say -v Alva "Tomten dricker julmust på tomten"

"Tomten" can mean either "Santa Claus" or "the yard"/"the plot" depending on context, and apparently they're able to detect this properly.


OS X makes progress with every release on this front. I typically test it with a few tricky french sentences (think "les poules du couvent couvent") and it seems to improve, but it's hard to say from the outside what gets better in the model ("Mes fils ont cassé mes fils" still fails for instance, but seems harder to detect to me)


I think the OP was talking about Text-to-Speech, and you are (maybe?) talking about speech recognition?

(The irony of this misunderstanding being kicked off by a comment about the text-to-speech engine understanding the context of a word amuses me)


What is Apple's approach to NLP and speech? What algorithms are they using?


I'm glad to see the CMU pronouncing dictionary in there. It was instrumental when I wrote a web app[1] to generate Spoonerisms[2] (my apologies for the UI and the fact that I haven't yet removed the more obscure words, especially obscure homophones, from my cmudict subset).

The cmudict isn't under the text-to-speech subheading in this list, but I think the folks at Carnegie Mellon may have considered text-to-speech applications, like a talking GPS navigator, when they compiled the dictionary. I recall the cmudict containing lots of US city names.

[1] https://spoonerizer.appspot.com/

[2] https://en.wikipedia.org/wiki/Spoonerism


Also missing TextBlob, which was featured on HN recently on the front page.


Could anyone point me to some sentiment analysis frameworks and/or update the list to include some?


No NLTK love?


Why use NLTK when we can use spaCy instead? http://spacy.io/


English only.


Came here to say this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: