Hacker News new | past | comments | ask | show | jobs | submit login

Are there any feasible production-ready NLP solutions? My default approach has been to browse the codebases of different computational linguistics labs, but something more central would be very handy. As it stands I just say something to the tune, "please don't use nltk, use opennlp, cleannlp, redshift, etc".



If you prefer C, you can try SENNA: http://ml.nec-labs.com/senna/. It includes POS tagging, chunking, NER, constituency parsing and semantic role labeling. But dependency parsing is not there yet.

It's super fast (thanks to C) and very accurate (thanks to Deep Learning approach). The license is not for commercial usage though.

SENNA can be used with NLTK: http://pydoc.net/Python/nltk/2.0.2/nltk.tag.senna/


ClearNLP looks the best imo, especially you're already using Java. If you're using Python...well, Redshift isn't production-ready, but if you needed to, it'd be the best thing to base your work off.

ClearNLP has a lot of nifty bells and whistles that would make a big difference. In particular, it selects the model for you, based on your text, and its similarity to various subsets of the training data. So, you get a model more matched to your data, which will improve real-world accuracies a lot.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: