Pytext: A natural language modeling framework based on PyTorch

syntaxing · on Dec 14, 2018

Wow, I just used the AllenNLP mentioned here and it is quite amazing! I took a random article from Google news which happen to be about Flynn's FBI criticism. I asked a couple question like "who is going to jail" or "who is leading the investigation" and it worked flawlessly. The article is only around 15 sentences too!

Edit: Super wow, the documentation is amazing as well (https://allennlp.org/tutorials).

thanatropism · on Dec 15, 2018

A few years ago I read the novel "Galatea 2.2" by Richard Powers which is all about training a neural network to do just that and thought "now this is some bullshit".

mendeza · on Dec 15, 2018

I love textacy, it has soo much out of the box. Topic modeling, topic extraction, summarization, and its built on top of Spacy. https://github.com/chartbeat-labs/textacy

laughingman2 · on Dec 14, 2018

If anyone from the dev team there, can you look into integrating the "make the research into production" part into allennlp. Facebook currently has fairseq, this and other nlp repos. Allennlp makes it easier to model most classes of NLP problems with a clean dependency injectable interface with most common tasks abstracted out cleanly.

smhx · on Dec 14, 2018

PyTorch dev here. We'll talk with the AllenNLP folks to see if we can make this can happen.

We just released PyTorch 1.0 stable last Friday that adds stable production capabilities, so the stuff is just out of the oven. https://github.com/pytorch/pytorch/releases/tag/v1.0.0

joelgrus · on Dec 14, 2018

AllenNLP dev here. We're going to do a "PyTorch 1.0" release of AllenNLP next week, and then after that we're planning to investigate how to incorporate the new "production" aspects.

smhx · on Dec 14, 2018

win-win! collaboration via hacker news ;-)

sh33mp · on Dec 15, 2018

Could you guys elaborate on the relationship between PyText, torchtext, and AllenNLP? I've briefly used the latter two, but with how quickly things are moving it'd be nice to have a quick answer from the devs themselves.

ahhegazy77 · on Dec 15, 2018

PyText dev here, Torchtext provides a set of data-abstractions that helps reading and processing raw text data into PyTorch tensors, at the moment we use Torchtext in PyText for training-time data reading and preprocessing.

AllenNLP is a great NLP modeling library that is aimed at providing reference implementations and prebuilt state-of-the-art models, and make it easy to iterate on and research with models for different NLP tasks.

We've built PyText to be a rich NLP modeling library (along the lines of AllenNLP) but with production capabilities baked in the design from day 1.

Examples are: - We provide interfaces to make sure data preprocessing can be consistent between training and runtime - The model interfaces are compatible with ONNX and torch.jit - A core goal for us in the next few month is to be able to run models trained in PyText on mobile.

Among other differences like supporting distributed training and multi-task learning.

That being said, so far our library of models has been mostly influenced by our current production use-cases, we are actively working on enriching this library with more models and tasks while keeping production capabilities and inference speed in mind.

bethebunny · on Dec 14, 2018

AllenNLP is great, and influenced the design of PyText in several ways. There are some central design decisions of AllenNLP that make it incompatible with PyTorch's jit tracing and so make productionizing models require much more manual work. It also generally leaves preprocessing up to the user, so preprocessing consistently between training and inference is outside the scope of what AllenNLP does.

wodenokoto · on Dec 15, 2018

What is a good data structure for holding your parsed corpus? Ideally I'd like to be able to count number of sentences, paragraphs, average word counts for these and easily do queries such as "nouns that fit this regex" or "POS that precedes a named entity"

I've been looking at Spacy, but as far as I can tell it is hard coded to use universal parts of speech.

xfitm3 · on Dec 15, 2018

What's NLP in this context?

jimsmart · on Dec 15, 2018

Natural language processing

xfitm3 · on Dec 15, 2018

Thanks. I had assumed Neuro Linguistic Programming.

jimsmart · on Dec 15, 2018

Agreed, it is potentially ambiguous if one doesn’t follow the field.

Clicking the link takes you to the Github repo, which states ‘natural language processing’ in its title (though perhaps it didn’t earlier).

The title of this HN post has been edited now anyhow.

wiradikusuma · on Dec 15, 2018

Is this like bare Dialogflow?