Hacker News new | past | comments | ask | show | jobs | submit login
Pytext: A natural language modeling framework based on PyTorch (github.com/facebookresearch)
219 points by benryon on Dec 14, 2018 | hide | past | favorite | 16 comments



Wow, I just used the AllenNLP mentioned here and it is quite amazing! I took a random article from Google news which happen to be about Flynn's FBI criticism. I asked a couple question like "who is going to jail" or "who is leading the investigation" and it worked flawlessly. The article is only around 15 sentences too!

Edit: Super wow, the documentation is amazing as well (https://allennlp.org/tutorials).


A few years ago I read the novel "Galatea 2.2" by Richard Powers which is all about training a neural network to do just that and thought "now this is some bullshit".


I love textacy, it has soo much out of the box. Topic modeling, topic extraction, summarization, and its built on top of Spacy. https://github.com/chartbeat-labs/textacy


If anyone from the dev team there, can you look into integrating the "make the research into production" part into allennlp. Facebook currently has fairseq, this and other nlp repos. Allennlp makes it easier to model most classes of NLP problems with a clean dependency injectable interface with most common tasks abstracted out cleanly.


PyTorch dev here. We'll talk with the AllenNLP folks to see if we can make this can happen.

We just released PyTorch 1.0 stable last Friday that adds stable production capabilities, so the stuff is just out of the oven. https://github.com/pytorch/pytorch/releases/tag/v1.0.0


AllenNLP dev here. We're going to do a "PyTorch 1.0" release of AllenNLP next week, and then after that we're planning to investigate how to incorporate the new "production" aspects.


win-win! collaboration via hacker news ;-)


Could you guys elaborate on the relationship between PyText, torchtext, and AllenNLP? I've briefly used the latter two, but with how quickly things are moving it'd be nice to have a quick answer from the devs themselves.


PyText dev here, Torchtext provides a set of data-abstractions that helps reading and processing raw text data into PyTorch tensors, at the moment we use Torchtext in PyText for training-time data reading and preprocessing.

AllenNLP is a great NLP modeling library that is aimed at providing reference implementations and prebuilt state-of-the-art models, and make it easy to iterate on and research with models for different NLP tasks.

We've built PyText to be a rich NLP modeling library (along the lines of AllenNLP) but with production capabilities baked in the design from day 1.

Examples are: - We provide interfaces to make sure data preprocessing can be consistent between training and runtime - The model interfaces are compatible with ONNX and torch.jit - A core goal for us in the next few month is to be able to run models trained in PyText on mobile.

Among other differences like supporting distributed training and multi-task learning.

That being said, so far our library of models has been mostly influenced by our current production use-cases, we are actively working on enriching this library with more models and tasks while keeping production capabilities and inference speed in mind.


AllenNLP is great, and influenced the design of PyText in several ways. There are some central design decisions of AllenNLP that make it incompatible with PyTorch's jit tracing and so make productionizing models require much more manual work. It also generally leaves preprocessing up to the user, so preprocessing consistently between training and inference is outside the scope of what AllenNLP does.


What is a good data structure for holding your parsed corpus? Ideally I'd like to be able to count number of sentences, paragraphs, average word counts for these and easily do queries such as "nouns that fit this regex" or "POS that precedes a named entity"

I've been looking at Spacy, but as far as I can tell it is hard coded to use universal parts of speech.


What's NLP in this context?


Natural language processing


Thanks. I had assumed Neuro Linguistic Programming.


Agreed, it is potentially ambiguous if one doesn’t follow the field.

Clicking the link takes you to the Github repo, which states ‘natural language processing’ in its title (though perhaps it didn’t earlier).

The title of this HN post has been edited now anyhow.


Is this like bare Dialogflow?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: