Evidence that this is the most accurate parser is here;
the previous approach mentioned is a March 2016 paper, "Globally Normalized Transition-Based Neural Networks," http://arxiv.org/abs/1603.06042
"On a standard benchmark consisting of randomly drawn English newswire sentences (the 20 year old Penn Treebank), Parsey McParseface recovers individual dependencies between words with over 94% accuracy, beating our own previous state-of-the-art results, which were already better than any previous approach."
From the original paper, "Our model achieves state-of-the-art accuracy
on all of these tasks, matching or outperforming
LSTMs while being significantly faster.
In particular for dependency parsing on the Wall
Street Journal we achieve the best-ever published
unlabeled attachment score of 94.41%."
This seems like a narrower standard than described, specifically being better at parsing the Penn Treebank than the best natural language parser for English on the Wall Street Journal.
The statistics listed on the project GitHub actually contradict these claims by showing the original March 2016 implementation has higher accuracy than Parsey McParseface.
Those are the part-of-speech tag accuracies. spaCy's accuracy on the PTB evaluation is 92.2% --- so it makes 20% more errors than P. McP. On the other hand, spaCy is about 200x faster.
I've been watching the line of research in SyntaxNet closely, and have been steadily working on replacing spaCy's averaged perceptron model with a neural network model. This is one of the main differences between spaCy and Parser McParseface.
The key advantage of the neural network is that it lets you take advantage of training on lots and lots more text, in a semi-supervised way. In a linear model, you grow extra parameters when you do this. The neural network stays the same size --- it just gets better. So, you can benefit from reading the whole web into the neural network. This only works a little bit in the linear model, and it makes the resulting model enormous.
Another difference is that spaCy is trained on whole documents, while P. McP. is trained in the standard set-up, using gold pre-processing. I speculate this will reduce the gap between the systems in a more realistic evlauation. Of course, P. McP can do the joint training too if they choose to. I've reached out to see whether they're interested in running the experiment: https://github.com/tensorflow/models/issues/65
spaCy doesn't use the GPU. Not sure what their speed is on GPU. I wouldn't be surprised if it's hard to use the GPU well for their parser, because minibatching gets complicated. Not sure.
better at parsing the Penn Treebank than the best
natural language parser for English on the Wall
Street Journal
I'm pretty sure "the 20 year old Penn Treebank" and "the Wall Street Journal" are referring to the same dataset here. In the early 1990s the first large treebanking efforts were on a corpus from the WSJ, and they were released as the Penn Treebank: https://catalog.ldc.upenn.edu/LDC95T7 People report results on this dataset because that's what the field has been testing on (and overfitting to) for decades.
Yes, the press release is (actually) pretty difficult to parse and really opaque in how the comparison is measured, which is why I wanted to throw into question the blog's headline, "The World's Most Accurate Parser." It seems more clear now but obviously Google doesn't feel the need to overtly prove that they are the best in the world at tasks, which is a bit questionable considering their number of followers. In all, it seems they have tested against several other dependency parsers, but clearly not all of them, and it's fair to say that it is "highly accurate," but this parser still falls victim to some of the same issues that most statistical parsers do, and while faster than some dependency parsers, it is not faster than all of them.
The point about overfitting is valid, too, which is another reason why this "most accurate such model in the world" claim is obnoxious.
It's also fair to note that their advance is in fractions of percentage points on this specific dataset over models that are 5-10 years older.
> The statistics listed on the project GitHub actually contradict these claims by showing the original March 2016 implementation has higher accuracy than Parsey McParseface.
So you're referring to this LSTM?
"Andor et al. (2016)* is simply a SyntaxNet model with a larger beam and network. For futher information on the datasets, see that paper under the section "Treebank Union"."
After spending a few months hand coding a NLP parser, am rather intrigued by LSTM. I like the idea of finding coefficients, as opposed to juggling artificial labels.
Yes, my mistake. Their claim is that SyntaxNet (originally described in the paper and improved over one month) is the best in field, whereas Parsey McParseface is just one trained instance.
Coincidentally, I had a parent/teacher conference with my 1st grader's teacher yesterday afternoon. Regarding reading level & comprehension, she remarked that current research indicates anything below about 98% comprehension isn't sufficient for reading "fluency". Before the past few years, the standard was 95% comprehension = fluency, but that extra few percentage points apparently make an enormous difference (probably because of colloquial & jargon edge case usages that carry specific meanings in specific contexts, but which aren't easy to programmatically detect, but that's just my supposition).
Sorry, but that just doesn't make any sense to me. Practically 70% seems like enough to understand most narrative. I've read some really difficult texts (translated German theology) and for anything of meaningful complexity 98% is unreachable without a huge vocabulary and understanding of both oddities of grammar and the construction of narrative or argument.
The paper you mention is the world's best results and is macparseface with broader beam search and more hidden layers.
This is an opensourcing of the March 2016 method (syntaxnet, note that in the paper there are results from several trained models) as well as a trained model that is comparable in performance but faster (macparseface).
It is very hard to separate those two things from the way they write.
This is really cool, and props to Google for making it publicly available.
The blog post says this can be used as a building block for natural language understanding applications. Does anyone have examples of how that might work? Parse trees are cool to look at, but what can I do with them?
For instance, let's say I'm interested in doing text classification. I can imagine that the parse tree would convey more semantic information than just a bag of words. Should I be turning the edges and vertices of the tree into a feature vectors somehow? I can think of a few half-baked ideas off the top of my head, but I'm sure other people have already spent a lot of time thinking about this, and I'm wondering if there are any "best practices".
This would be very interesting when applied to Biblical Studies. Any serious academic discussion of biblical texts will involve syntactical breakdown of the text being discussed. Most of the times the ambiguities are clear, but its still quite common for a phrase to have several possible syntactical arrangements that are not immediately clear. These ambiguities are also challenging becuase the languages are dead (at least as used in the biblical texts). So the type of ambiguity of "Alice drove down the street in her car" can lead to some significant scholarly disagreement.
I could see Parsey McParseface helping identifying patterns in literature contemporaneous to the biblical texts. Certain idiomatic uses of syntax, which would have been obvious to the original readers, could be identified much more quickly.
I was going to say... my main interest in this project is precisely for Biblical studies... I could talk about analyzing the Bible for hours, but let's just say there's way more depth than many even realize. The Aleph Tav in relation to the Book of Revelation is one such example, many translations omit it, but the Aleph Tav Study Bible explores it in depth. There could be many discoveries made with these kind of projects that are missed by just about anyone only reading a translation.
There are a ton of Jewish Idioms in the Bible that many don't understand at all, including "No man knows the day or the hour" which is a traditional Jewish Wedding Idiom. Lots and lots of things could be explored with enough data and resources.
I'd think that the advantage of machine translation is on corpora that are not known up front (i.e. user-supplied text) or corpora that are exceptionally large.
If you have a small (ish), well-known text, I don't think you will get much insight from machine translation. Certainly there are plenty of uses for computer text analysis/mining in biblical studies, but I doubt translation is one of them. And for obscure idioms or hapax legomena, machine translation definitely can't help you because by definition there are no other sources to rely on.
With a sufficient level of precision, there's room for machine analysis to "reveal" things we are ignoring out of custom. A lot of text analysis done by people is full of biases and deferral to authorities.
E.g. I remember from school getting in into an argument with a teacher over the interpretation of a poem. "His" interpretation, which was really the interpretation of some authority who'd written a book was blatantly contradicted by the text if you assumed that the author hadn't suddenly forgotten all his basic grammar despite all the evidence to the contrary everywhere else that he was always very precise in this respect.
Of course, in some of these kind of instances, it will be incredibly hard to overcome the retort that any "revelation" is just a bug.
In a more general sense, people are typically exceedingly bad at parsing text, judging by how often online debates devolve into bickering caused largely by misunderstanding the other party's argument. Often to the extent of even ending up arguing against people who you agree with. Having tools that help clarify the parsing for people might be interesting in that respect too.
Well I wouldn't look for idioms, but it would be interesting to throw in information such as "Strong's Concordance" into the mix, I've yet to really think of an application for this library fully, but it would be fun to play around with it nonetheless. I would be analyzing the Hebrew / Greek / Syriac scripts, seeking verses omitted, or missing, etc. It would make for interesting studying if anything.
You might be interested in Andrew Bannister's research on computer analysis of the Quran. He wrote a book on it [1], and there's also this paper which gives a high-level overview [2].
> Any serious academic discussion of biblical texts will involve syntactical breakdown of the text being discussed.
I once interned for a company that's been doing this for years. They have all kinds of features tracing individual words through various different languages, etc.
Actually it's not very appropriate for studying bible text. In Biblical Studies you would prefer not to have any errors at all, and since you work with a limited corpus you can afford to annotate by hand. People have in fact done this and I collaborated with a group that has been working on this for decades.
For actual syntactical breakdown of the Bible, I agree. Biblical Scholars, and even competent pastors, can syntactically analyze the the Bible sufficiently well.
I would think the technology could be helpful in a fairly narrow way: identifying syntactical constructions outside the bible to help explain ambiguous syntactical constructions within it (For example, Ugaritic texts, another ancient Semitic language similar to Hebrew, are often studied to aid in understanding portions of the Old Testament). Scholars have been doing this without computers for some time and have begun to do this type of analysis with software. I would imagine more sophisticated software would yield at least some new insights.
Most of the really good applications are part of larger systems. Parsing is good in machine translation, for instance. You transform the source text so that it's closer to the target language. Parsing is also useful for question answering, information extraction, text-to-speech...
Here's an example of using information from a syntactic parser to decorate words, and create an enhanced bag-of-words model: https://spacy.io/demos/sense2vec
This is actually really useful for a project I'm working on. I'm trying to detect bias in news sources using sentiment analysis and one of the problems I've run into is identifying who exactly is the subject of a sentence. Using this could be really helpful in parsing out the noun phrases and breaking them down in order to find the subject.
I've been experimenting with Stanford's CoreNLP to identify named entities for analyzing RSS feeds and I was really impressed by how well it worked, having known nothing about the state of NLP research before I started. Especially things like being able to identify coreferences.
I was actually pretty disappointed with the NER in CoreNLP - I fed a few articles (including this one) into it, and while it's impressive that a computer can do this at all, it's pretty far away from being able to build a usable product. It seems to over-recognize Persons, for example - Parsey McParseFace was tagged as a person, as were Alice and Bob, as was Tesla (in another article), and while all of these are understandable, they weren't the intended meanings in the articles. I was also pretty disappointed with the date parser: while it gets some tricky ones like "Today" and "7 hours ago", it misses very common abbreviations like 7m or 7min or even "7min ago".
Yeah, I looked at SuTime, but it fell down on many common cases (the CoreNLP online demo is actually integrating SuTime into the annotations it produces).
Another option is Natty [1], but it also seems to fail on the same examples. Natty at least has an ANTLR grammar that's reasonably easy to understand, though.
Thanks 'nl, 'nostrademons and 'rcpt for the links! I've been using Chronicity[0] in my project, and I hand-hacked a Polish-to-English regexp "translator" to make it work with Polish language[1]. I'll be looking at the sources of the libraries you provided as well as papers they reference; maybe I'll manage to steal some code :).
Yes, I've used that before. I'm currently using Textacy for python which is also really good. However, extracting the named entities from a sentence is still a ways off from determining what's the subject of the sentence, although it gives a good indication. Using NER + quality POS tagging and tree building should do the trick for me I think.
Are you using BOW for sentiment analysis? Also, have you tried tinkering with Watson's sentiment analyzer?
I'm working on a project that analyzes sentiment from speech, and I've been meaning to start on text sentiment analysis, but I'm not sure where to start.
Here is an application of parse trees: sentiment analysis with recursive neural networks based on how components of the parse tree combine to create the overall meaning.
They are useful as a preprocessing step for a lot of downstream NLP tasks. It shouldn't be hard to find more papers that take advantage of the tree structure of language.
The typical approach is something like a tree kernel (https://en.wikipedia.org/wiki/Tree_kernel). Looked into them briefly for a work project that never got off the ground, can't say too much about using them in practice.
> Parse trees are cool to look at, but what can I do with them?
One really simple and obvious thing is word sense disambiguation. Plenty of homonyms are different parts of speech (e.g. the verb "lead" and the noun "lead"). I'm sure there's lots of more sophisticated stuff you can do as well, but this might be the lowest-hanging fruit.
However, for that you just need PoS tags (which is also provided by this Google thing, yes).
And of course the hard part of WSD is detecting whether "bank" refers to the bank of a river, or the financial institution, or the building where the institution is located, or [you name it].
I use parse trees as a kind of "advanced language model" for when I need to replace a word in a sentence (see for example: http://www.aclweb.org/anthology/P13-1142 ), it's so much better than using just simple n-grams.
Idea: point this at political speeches / security breach notifications / outage postmortems / etc, and rate them by how many ambiguities with starkly different dependancy parses there are... (Well of _course_ we mean the roads inside Alice's car when we made that commitment!)
One of the projects I'd love to develop is an automated peer editor for student essays. My wife is an english teacher and a large percentage of her time is taken up by grading papers. A large percentage of that time is then spent marking up grammar and spelling. What I envision is a website that handles that grammar/spelling bit. More importantly, I'd like it as a tool that the students use freely prior to submitting their essays to the teacher. I want them to have immediate feedback on how to improve the grammar in their essays, so they can iterate and learn. By the time the essays reach the teacher, the teacher should only have to grade for content, composition, style, plagiarism, citations, etc. Hopefully this also helps to reduce the amount of grammar that needs to be taught in-class, freeing time for more meaningful discussions.
The problem is that while I have knowledge and experience in the computer vision side of machine learning, I lack experience in NLP. And to the best of my knowledge NLP as a field has not come as far as vision, to the extent that such an automated editor would have too many mistakes. To be student facing it would need to be really accurate. On top of that it wouldn't be dealing with well formed input. The input by definition is adversarial. So unlike SyntaxNet which is built to deal with comprehensible sentences, this tool would need to deal with incomprehensible sentences. According to the link, SyntaxNet only gets 90% accuracy on random sentences from the web.
That said, I might give SyntaxNet a try. The idea would be to use SyntaxNet to extract meaning from a broken sentence, and then work backwards from the meaning to identify how the sentence can be modified to better match that meaning.
Thank you Google for contributing this tool to the community at large.
I think this is still risky if used in a context where the student might think that the computer is somehow always right. Great English writers often deliberately use sentence fragments or puns, or use a word with a nonstandard part-of-speech interpretation (especially using a noun as a verb). They may also sometimes use a sentence that's difficult for readers to parse and then explain the ambiguity after the fact.
If a teacher gave students a grammar-checking tool to check their writing, they might assume that the tool knew better than they did, which is only sometimes true.
Knowing the rules is not the same thing as having formal training.
With then-innovated art like Cubism, which in some sense broke earlier rules, the point is that those artists, like Picasso, were able to do representative art in correct proportions -- they were just going beyond that.
This is a nontrivial issue, because there are always students who think they can skip learning boring mundane old fashioned art and go straight to something like cubism, but in 99% of cases that doesn't work well at all compared with learning "the rules" first.
With writing, many great writers have broken "the rules" with punctuation, spelling, grammar, etc. But the important thing is that they do so onpurpose.
Whereas if one doesn't know the rules in the first place, one doesn't have the choice of whether to follow them or break them.
Such a person will always break those rules they don't know (subconscious knowledge counts btw) -- but not for aesthetic reasons, only out of ignorance.
My perhaps favourite example of knowing when and how to break the rules is Franz Schuberts "Erlkönig" [1] because it is so stark.
If you listen to it without paying attention to the text (based on a poem by Goethe by the same name; both the German text and an English translation is found at [1]), parts of it sounds like horrible jammering and poor harmonies and it's easy to write it off as not sounding very nice.
Here's [2] a much clearer rendition (two singers, with much stronger delineation of the three different characters) than the one linked from Britannica.
If you do pay attention to the text, it is very clear that the unpleasant parts are very deliberate:
The singer(s) switches between the role of a father, his sick dying child, and the Erl-king that occurs in the hallucinations of the child while the father is riding to bring the child to a doctor.
The big difference between the unpleasant-sounding parts of this song and a bad composer is the clear intent and delineation - Schubert made things sound bad intentionally explicitly at the points he wanted to illustrate pain and fear, rather than because he didn't know how to make things sound pleasant when he wanted to.
The song clearly proves this by setting the childs jammering and the fathers fearful attempts to soothe him up against much more pleasant segments where the Erl-king speaks and tries to seduce the child to come with him.
You only get that clear separation if you know how to evoke each effect precisely. Arguably a bad particularly composer wouldn't even know how to make things sound bad the "right way" - there's a big difference between random bad sounds and making things evoke a child in pain.
Your logic is flawed. "Artist A did X before doing Y, therefore X is necessary to do Y". It doesn't compute.
I've heard great painters say that the only thing that matters is that you paint. Plenty of wonderful painters did not study their predecessors in depth.
I'm assuming that your 99% number is fabricated? Incidentally 99% of statistics are made up.
I am also afraid of the misuse of such software. It is also possible that the teacher does not know that much and will look at the software as correct.
And come to think of it, isn't there a saying: Did stupidity require smart computers or did smart computers allow for stupidity?
An intelligent software (or one that pretends to be intelligent) might allow for any unqualified bloke to be in a position where they can teach.
Sorry for not being clear. This is intended for use in teaching English. What you describe falls under the purview of Creative Writing. Specifically it'd target English in grade school up until the early college college classes (the ones most people skip out of based on tests). After that, yes you'd be right and this tool wouldn't be appropriate.
Not sure how they work exactly, but have you looked at http://noredink.com (and as another commenter mentioned, http://grammarly.com/)? I'd be interested in your thoughts.
I've bounced my idea off my wife before and asked whether such a thing existed. I do recall her mentioning things like noredink and commenting that they weren't a match for one reason or another (the specific reasons escape me at the moment).
Visiting noredink.com now ... I can't actually figure out what it is the site offers from a cursory glance, so I can't even begin to figure out whether it matches my idea and what issues it may have had that excluded it from her classroom.
I'll bounce grammarly off her later though, in case that's new.
"A large percentage of that time is then spent marking up grammar and spelling."
As an aside, I don't think this is the optimal way to teach people how to write. What were the ideas in those papers? How were they organized? Do the student's arguments make sense? I think that's what most students spend most of their time thinking about when writing an essay, and it can be a bit demoralizing to see the teacher care just as much about whether the grammar was right. Most students can fix grammar mistakes relatively easily once they notice them anyway.
I don't think I remember actually being taught how to write in primary school. I think they should reteach grammar in high school from the beginning. Most people's brains just can't pick up a systematic treatment of some of the finer points when younger. I went and did all the grammar quizzes over at http://grammar.ccc.commnet.edu/grammar/quiz_list.htm a couple of years ago. It improved my writing a lot, and only took a weekend.
I honestly didn't learn how to write until I got to graduate school. My very patient adviser had to beat it into my skull. Writing is a very practical art...you need to practice at it a lot vs. studying and memorizing rules, and my pre-grad school education didn't really force that.
Can you tell what other resources you've used to learn, what practices were helpful in studying grammar, what was not helpful in your study of grammar?
> Most students can fix grammar mistakes relatively easily once they notice them anyway.
You would be surprised! Especially with people whose first languages aren't English. Having something to provide feedback on grammar problems early on would be really useful. Of course, once they get grammar down, the next step is rhythm and flow, as well as reducing redundancy, and the biggest problem, as you say, is always the story, but you have to peel through lots of grammar problems before you get to that point. I edit a lot of research papers for my Chinese peers (most have PhDs, I work in a China-based research lab, so that isn't weird), so I'm pretty clear on the problems.
Doesn't Grammarly[0] already do this? It analyzes the input for common grammar mistakes and proposes ways to fix them. As a student, I occasionally use Grammarly to proofread a paper for me, and it has worked pretty well so far.
SyntaxNet is, by definition, for syntactic analysis - it would likely not help you much with semantics, to extract meaning. It could maybe help you automatically determine is a sentence is grammatically correct, though.
buffalo NN ROOT
+-- buffalo NN nn
| +-- Buffalo NNP nn
| | +-- Buffalo NNP nn
| | +-- buffalo NNP nn
| +-- buffalo NN nn
+-- Buffalo NNP nn
+-- buffalo NNP nn
I assume this is wrong, and probably triggered by the "ly" ending? Also, odd that it dropped the second clause and didn't see any trouble with the parsed results lacking a single verb. I guess it has to be able to handle fragments, etc.
Yes, we derive syntactic meaning from grammatical structure. It's one thing getting a machine to understand grammar and another to get a human to understand. If anyone is interested, Doing Grammar by Max Morenberg is an excellent source of knowledge about grammar.[0]He approaches grammar very systematically which is helpful if people want to train machines.
I'd love to hear Chomsky's reaction to this stuff (or someone in his camp on the Chomsky vs. Norvig debate [0]).
My understanding is that Chomsky was against statistical approaches to AI, as being scientifically un-useful - eventual dead ends, which would reach a certain accuracy, and plateau - as opposed to the purer logic/grammar approaches, which reductionistically/generatively decompose things into constituent parts, in some interpretable way, which is hence more scientifically valuable, and composable - easier to build on.
But now we're seeing these very successful blended approaches, where you've got a grammatical search, which is reductionist, and produces an interpretable factoring of the sentence - but its guided by a massive (comparatively uninterpretable) neural net.
It's like AlphaGo - which is still doing search, in a very structured, rule based, reductionist way - but leveraging the more black-box statistical neural network to make the search actually efficient, and qualitatively more useful. Is this an emerging paradigm?
I used to have a lot of sympathy for the Compsky argument, and thought Norvig et al. [the machine learning community] could be accused of talking up a more prosaic 'applied ML' agenda into being more scientifically worthwhile than it actually was.
But I think systems like this are evidence that gradual, incremental, improvement of working statistical systems, can eventually yield more powerful reductionist/logical systems overall.
I'd love to hear an opposing perspective from someone in the Chomsky camp, in the context of systems like this.
(Which I am hopefully not strawmanning here.)
I don't think you're strawmanning in general, there have been a lot of symbolic AI people who scoffed at any mention of statistics or real-world data, but it's not the case that you have to eschew all empiricism just because you use rules.
See e.g. http://visl.sdu.dk/~eckhard/pdf/TIL2006.pdf which gets 99% on POS and 96% on syntax function assignment – Constraint Grammar parsers are the state of the art of rule-based systems, and the well-developed ones beat statistical systems. CG's are also multitaggers – they don't assume a word has to have only one reading, it might actually be ambiguous, and in that case it shouldn't be further disambiguated (that's why they use F-scores instead of plain "accuracy").
CG's also require manual work, so it's not like you can download a corpus an unsupervisedly learn everything; but on the other hand, for what languages in the world do you have a large enough data set to unsupervisedly learn a good model? And for what training methods can you even get good models from unlabeled data? The set of languages for which there are large annotated corpora (especially treebanks) is even smaller … So CG's are also heavily used for lesser-resourced languages (typically in combination with finite state transducers for morphological analysis), where the lack of training data means it's a lot more cost-effective to write rules (and turn existing dictionaries into machine-readable FST's) than it is to create annotated training data (which would often involve OCR-ing texts, introducing yet another error source). CG writers still tend to have a very empirical mindset – no toy sentences like "put the cone on the block", but continual testing on any real-world text they can get their hands on.
Not in either camp, quit the nlp field due to disillusionment that it would lead to anything useful or meaningful.
Both rule-based and statistical approaches are fundamentally flawed by not incorporating any real world information. Humans do language going from real world info, mapping to grammar or rules. Computers are trying to go the other way and are not going to succeed other than as mere toys.
Even tech progression-wise, both rule and model/nn approaches are really bad since there is no meaningful sense of iterative progress or getting better step by step, unlike cpu chips or memory speeds. They are more of a random search in a vast space, hit or miss, getting lucky or not, which is very bad no good, as a technology or as a career.
How about looking at it this way. We now have multiple implementations that get 94%+ success _without_ knowing anything about the world. Isn't that remarkable?
Now to get to 99.4%+ how about we combine techniques such as spaCy or Parsey McParseFace (love the name Google) with very simple real-world cognitive models. So for the example given "Alice drove down the street in her car." a simple cognitive model would _know_ that streets cannot be in cars and so be able to disambiguate. A cognitive model wouldn't know all the facts about the world, it would know certain things about streets, certain things about cars and be able to infer on the fly whether the relationship between streets and cars matches either the first parse possibility or the second. To me this seems like the obvious next step. If it's obvious to me it must have been obvious to someone else so presumably somebody is working on it.
The 94% success rate is in made up, limited tests. Real world, they fail constantly in weird horrific or laughable ways. See any Microsoft AI public demo ever. It's like the self driving car claim of millions of miles without accidents, except that humans took over anytime there was a chance of one.
>> We now have multiple implementations that get 94%+ success _without_ knowing anything about the world. Isn't that remarkable?
That success only lasts in the limited context of the corpora used for training. Step outside that and success goes down to 60% or much worse. And that's just tagging and things, shallow parsing. Meanning? Discourse? Don't even think about it.
I suspect another way of looking at it is that these models actually learn about the real world by reading about it in the WSJ -- of course their knowledge of it is not as deep as our own, but good enough for what they do.
That is, if you took the well NLP trained model, then you could in principle extract out of it facts like "streets are not found inside cars".
That's why I never considered Chomsky's approach to make sense. Purely statistical methods aren't perfect either, but they do include some real-world information implicitly - training sets aren't random, they're taken from human communication.
But those probability models are really far removed from real world/human experience. No human is going to claim a leopard skin sofa could be an actual leopard for example. Not gonna be even a little confused.
There is just a ton of information and context the computer probability models do not have. They can use all the big data they want, but are capturing only a very thin slice of real world information.
> But those probability models are really far removed from real world/human experience. No human is going to claim a leopard skin sofa could be an actual leopard for example. Not gonna be even a little confused.
Mhm.
When humans see a jeopardy answer looking for the name of an ancient king, they might give the wrong name, because quick, did Hadrian rule before or after Caesar?
If Watson gets it wrong, its answer is something like "What are trousers?".
It seems quite obvious that different things are going on there.
The problem with statistical information is data sparsity. You could read all English texts ever written (or spoken for that matter) and the number of meaningful combinations left to see would still be infinite. If you try to learn language only from finite examples, you'll never see enough of it to learn it well. That's why Google reports results against the Penn trrebank. It's not even clear what's a good metric outside of finite corpora (that the field has been overfitting to for decades like someone noted above).
Prior knowledge solves that problem. A human encounters the same sparsity a computer does when learning from text but prior knowledge allows us to connect rare features to a larger model in which they are, in a way, less rare.
If you think about it, there is an iteration happening within machine learning that is essentially building that prior knowledge about the world by reusing previous models as inputs to knew ones. For example how Spacy uses word2vec vectors to do parsing and NER and then sense2vec uses Spacy pos tags create word vectors.
Prior knowledge _might_ solve that problem. It's not really solved yet so who knows. Yeah, work is ongoing and word vectors sound cool and all, but in the past people said the same thing about bag-of-words models and look where we are now.
Humans solve sparsity, sure, we learn language from ridiculously few data points, but who knows what it is that we do, exactly? If we knew, we wouldn't be discussing this.
Let's restate the problem to make sure we're talking about the same thing: the problem is that the number of possible utterances in a given language that are grammatically correct according to some grammar of that language is infinite (or so big as for it to take longer than our current universe has to live before an utterance is repeated).
And it's a problem because it's impossible to count infinity given only finite time. I don't see how prior knowledge, or anything else, can solve this.
Which must mean humans do something else entirely, and all our efforts that are based on the assumption that you can do some clever search and avoid having to face infinity, are misguided and doomed to fail.
I think both approaches are needed for general AI: neural networks, or something like them, for low level perception and recognition; and symbolic AI for higher level reasoning. Without the symbolic layer, you can't be sure what's going on.
Symbolic AI has been very closely guided by cognitive psychology. Artificial neural networks ignore neurophysiology, so even when they work, they tell us very little about how the brain works.
I keep hearing claims that symbolic AI is the wrong approach for anything, and that it failed. Yet there were quite a few successes (expert systems, discovery learning, common sense reasoning, for example) before sources of funding dried up.
Artificial neural networks ignore neurophysiology, so even when they work, they tell us very little about how the brain works.
That is completely wrong. People like Geoff Hinton spend most of their time thinking about how the brain works (indeed, his background is cognitive psychology). The "convolution" part of convolution neural networks is designed to mimic how the optic nerve interfaces with the brain.
I keep hearing claims that symbolic AI is the wrong approach for anything, and that it failed. Yet there were quite a few successes (expert systems, discovery learning, common sense reasoning, for example) before sources of funding dried up.
The funding dried up because they ran into the limits of what is possible.
No equivalent of error backpropagation has ever been found in real neurons, and it's biologically implausible. So ANNs are almost certainly using a different learning mechanism from the one used in the brain. Even single neurons are quite complex and very little of this complexity is present in neural networks.
The visual system (retina, lateral geniculate nucleus, visual cortex) was fairly well understood well before ANNs were developed. A few uncontroversial ideas (e.g. that cells take their inputs from neighbouring cells in the previous layer) were adopted for use in ANNs.
I was around at the time of, and affected by, the AI winter. There was certainly no consensus among those working in AI that they had got as far as they could. Work stopped when funding was cut, often for political reasons.
The most mature area at the time, apparently ripe for commercialization, was expert systems. However, it was very hard to commercialize them: customers couldn't think of any suitable applications, and when they could, they couldn't spare the time of their experts.
Finally, the main reason for the AI winter was probably that AI was unable to live up to the grossly inflated expectations, simply because the expectations were grossly inflated. This seems to be happening again, with neural networks.
> I was around at the time of, and affected by, the AI winter. There was certainly no consensus among those working in AI that they had got as far as they could. Work stopped when funding was cut, often for political reasons.
I wasn't around, but I got curious about symbolic systems after listening to MIT's AI course[1]. Did some reading about the subject. The impression I got matches what you describe.
It's ridiculous how many people here dogmatically recite statements about failures of symbolic systems without (apparently) knowing anything about how those systems were used and what they achieved. If you listen to the comments, it sounds as if research on symbolic systems only ever produced crude, useless toys. That was certainly my impression before I took some time to actually look into it. A bit of straightforward Googling can show that it's a gross misrepresentation of history. For example, MIT's lecture on knowledge engineering [2] has some really interesting info on this subject.
I've done symbolic AI work. It's great within limits. Deep learning on its own isn't the complete solution either, but statistics and learning are more important than symbolics for achieving breakthrough performance.
If you want to be convincing, give us links to actual neurology research, no to Hinton "explaining away" objections of actual neuroscientists to his suppositions about human brain by making more suppositions. It's pretty obvious that he made up his mind decades ago and isn't going to be critical of his own theories.
1) You have a working system. You know only bits and pieces of how it works.
2) You build a crude model of the system. It kinds of sucks at doing the stuff the System is doing well.
3) People over several decades apply tons and tons of task-specific optimizations and modifications to your model. Those modifications have nothing to do with the original system, but because of them the model finally achieves good performance at some tasks.
4) You use the hype generated by #3 to claim that you were right all along and that your model captures the essential aspects of the original system.
5) When people point out that your model works in ways that clearly don't match the original system, you make a claim that it's the original system that approximates your model, not the other way around. Without any observations of the original system supporting your claim.
I don't see how the paper you're linking to support Hinton's supposition. It's a study of topology, it doesn't aim to show that biological neural networks learn via backpropagation.
> People like Geoff Hinton spend most of their time thinking about how the brain works (indeed, his background is cognitive psychology).
If that was as significant a factor as you make it sound, the progress in artificial neural networks would be closely tied to the progress of neurology. So where are all the citations of neurology and cognitive psychology papers in recent AI/ANN research?
Chomsky was never really interested in AI at all. I don't see anything in these results that has any implications for any position that Chomsky's taken. Chomsky's always pretty much taken it for granted that surface constituency structure can be extracted by statistical methods.
I think you are right and I think in the human brain similar sort of hybrid processes happen in order to make sense of the world. In the end strong AI will look very much like a massive hybrid system and a conscious controller that takes and integrates that information into a understood model of the world.
Try writing to Chomsky. He's well known for replying personally to his email. I know that for a fact, he replied to one of mine once (not about language )
Please don't. If you're going to bother the man, ask a more interesting question. He's made his position on this abundantly clear in the Chomsky-Norvig debate.
This looks fantastic. I've been fascinated with parsers ever since I got into programming in my teens (almost always centered around programming language parsing).
Curious - The parsing work I've done with programming languages was never done via machine learning, just the usual strict classification rules (which are used to parse ... code written to a strict specification). I'm guessing source code could be fed as data to an engine like this as a training model but I'm not sure what the value would be. Does anyone more experienced/smarter than me have any insights on something like that?
As a side-point:
Parsy McParseface - Well done. They managed to lob a gag over at NERC (Boaty McBoatface) and let them know that the world won't end because a product has a goofy name. Every time Google does things like this they send an unconscious remind us that they're a company that's 'still just a bunch of people like our users'. They've always been good at marketing in a way that keeps that "touchy-feely" sense about them and they've taken a free opportunity to get attention for this product beyond just the small circle of programmers.
As NERC found out, a lot of people paid attention when the winning name was Boaty McBoatface (among other, more obnoxous/less tasteful choices). A story about a new ship isn't going to hit the front page of any general news site normally and I always felt that NERC missed a prime opportunity to continue with that publicity and attention. It became a topic talked about by friends of mine who would otherwise have never paid attention to anything science related. It would have been comical, should the Boaty's mission turn up a major discovery, to hear 'serious newscasters' say the name of the ship in reference to the breakthrough. And it would have been refreshing to see that organization stick to the original name with a "Well, we tried, you spoke, it was a mistake to trust the pranksters on the web but we're not going to invoke the 'we get the final say' clause because that wasn't the spirit of the campaign. Our bad."
Curious - The parsing work I've done with programming languages was never done via machine learning,
Artificial languages (such a programming languages) are usually designed to be unambiguous. In other words, there is a 1:1 mapping from a sentence or fragment to its abstract representation.
Natural language is ambiguous, so there is usually 1:N mapping from a sentence to abstract representations. So, at some point you need to decide which of the N readings is the most likely one.
Older rule-based approaches typically constructed all readings of a sentence and used a model to estimate which reading is the most plausible. In newer deterministic, linear-time (transition-based) parsers, such ambiguities (if any) are resolved immediately during each parsing step.
In the end it's a trade-off between having access to global information during disambiguation and having a higher complexity. So, naturally, the rule-based systems have been applying tricks to aggressively prune the search space, while transition-based parsers are gaining more and more tricks to incorporate more global information.
> Humans do a remarkable job of dealing with ambiguity, almost to the point where the problem is unnoticeable; the challenge is for computers to do the same. Multiple ambiguities such as these in longer sentences conspire to give a combinatorial explosion in the number of possible structures for a sentence.
Isn't the core observation about natural language that humans don't parse it at all? Grammar is a secondary, derived construct that we use to give language some stability; I doubt anyone reading "Alice drove down the street in her car" actually parsed the grammatical structure of that sentence, either explicitly or implicitly.
Various syntactic theories (HPSG, GPSG, minimalism, construction grammars) from linguistics are certainly derived constructs, but most researchers would agree that they all reflect real abstractions that humans make. I think the NLP community has good a job of harvesting the substantive aspects (which tend to be fairly conventionalized upon across theories) without overfitting on specific cases. "Alice drove down the street in her car" is easy for people to process, "The horse raced past the barn fell" is not, because it requires a pretty drastic reinterpretation of the structure when you get to the last word.
That said, there is some interesting work on "good-enough" language processing, which suggests that people maintain some fuzziness and don't fully resolve the structure when they don't need to. [1]
but most researchers would agree that they all reflect real abstractions that humans make
They reflect a particular language in its well-written form. However, humans are extremely robust against syntax errors. I am not a linguist, but I think this speaks in favor of lexicalist approaches: we can be very free in word order, as long as our brain can match up e.g. verbs with their expected arguments.
No, the academic consensus is pretty much the opposite. For example by trying to rigorously state the way we form yes/no sentences in english - the process that converts "the man who has written the book will be followed" to "will the man who has written the book be followed?" instead of the incorrect "has the man who written the book will be followed?" - you will find that the rules must involve imposing some sort of tree structure on the original sentence. The fact that we do it correctly all of the time on sentences we've never seen before means that we must have parsed the original sentence.
(Example sentences taken from https://he.palgrave.com/page/detail/syntactic-theory-geoffre..., although any introductory linguistics/syntax textbooks will spend a few pages making the case that humans understand language by first parsing it into some kind of tree structure).
> the process that converts "the man who has written the book will be followed" to "will the man who has written the book be followed?" instead of the incorrect "has the man who written the book will be followed?"
And yet the following is also correct - in terms of real-world usage, not some prescriptive definitions:
"The man who has written the book will be followed, right mate?"
> you will find that the rules must involve imposing some sort of tree structure on the original sentence.
The rules are, and the brain may be, but I feel those are different tree structures. Moreover, I wonder if the "tree structures" of our brains aren't just artifacts of recursive pattern matching - we also know that when reading, humans process whole groups of words at a time, and only if there's some mismatch they process pieces in more detail. Any recursive process like this will generate a tree structure as its side effect.
Anyway, thanks for the examples. I might pick a linguistic book at some point. Right now the idea of understanding natural language by parsing it into "NOUN PHRASES" and "VERB PHRASES" and stuff seems completely backwards, given how humans have no trouble parsing "invalid" sentences, or using them - especially in spoken language.
(Not to mention our ability to evolve the language, and how the grammatically invalid constructs tend to be introduced, used, understood with no trouble and at some point they become grammatically accepted - see e.g. recent acceptance of "because <noun>").
Yes, of course your "right mate" example is also grammatically correct. The point is that people routinely and naturally do the complicated transformation to "will the man who has written the book be followed?", and that transformation can't be done by simple pattern matching. Hence, humans who are able to do the complicated transformation must be mentally parsing the sentence. The fact that there is an alternative simple transformation to form the yes-no question is irrelevant because the ability to use the complicated transformation still exists.
> given how humans have no trouble parsing "invalid" sentences
I think you misunderstand slightly - the claim linguists make is not "humans are unable to understand invalid sentences because they can't parse them", the claim is that when you see an invalid (cannot be parsed into a proper tree) sentence, you have a gut feeling that it "sounds off", and if you're a native speaker you would never accidentally produce such ill-formed sentences. You can still understand the meaning of a sentence like "I this morning fish eat" but you also immediately notice that it's "off" - and that's the phenomena that syntax tries to explain.
Furthermore, the way you understand sentences like "I this morning fish eat" is different from the way you understand "I ate fish this morning", in the former it feels like you're guessing. It could work for communicating simple thoughts, but I doubt an english non-speaker who has an english dictionary could convey a complicated thought requiring many words by that same guessing process. In fact the reason why language evolved tree syntax is probably because it is needed to convey long, complicated thoughts.
> because <noun>
I'm glad you mentioned that! First, modern linguistics is very far from prescriptive. In fact the first thing they teach you (at around the same time they make the claim that "humans parse sentences into tree structure") is that linguistics is a descriptive field - language changes over time, the study of the rules of language and how these rules change is interesting and important, but it's pointless to "enforce" the rules. Even new constructions like "because X" have rules that govern them, eg see http://allthingslinguistic.com/post/72252671648/why-the-new-... - constructions like "because want" and "because need" exist, but no one says "because adore", and something interesting explains why. (to be fair, I haven't really internalized the "because X" construction so I can't claim that I find "because adore" unnatural, but the article says it's the same reason why "omg want" and omg need" are currently grammatical but "omg adore" is not, and even if you're not familiar with the "omg X" construction, it gives independent evidence in that "omg adore" has no tumblr tags; of course, it may become grammatical in the future, but that would be because the rules have changed over time, not because there are no rules). To that point,
> or using [ill-formed sentences] - especially in spoken language.
actually, if a sentence is used in spoken language routinely and non-accidentally, linguists take it as evidence that it's grammatical and then work backwards to find the rules that explain why it is so. How else could they do it?
Thanks for your answers. You've raised a lot of good points, and I need to think them through.
> the claim is that when you see an invalid (cannot be parsed into a proper tree) sentence, you have a gut feeling that it "sounds off", and if you're a native speaker you would never accidentally produce such ill-formed sentences. You can still understand the meaning of a sentence like "I this morning fish eat" but you also immediately notice that it's "off" - and that's the phenomena that syntax tries to explain.
I see. Yeah, most of the way I think about how mind processes language comes from focusing on that "gut feeling", that on one hand tells you that this perfectly understandable sentence is somehow "off", and on the other hand lets you form perfect sentences without ever explicitly thinking about grammar.
> First, modern linguistics is very far from prescriptive. In fact the first thing they teach you (at around the same time they make the claim that "humans parse sentences into tree structure") is that linguistics is a descriptive field
It seems to me that I've been operating under invalid assumption that linguistics is mostly prescriptive. Thanks for that. Any recommendation for an intro book I could grab to read in my spare time?
> Any recommendation for an intro book I could grab to read in my spare time?
Unfortunately I think the field suffers from a lack of such books.
1. You could try Steven Pinker's "The Language Instinct", although it's a general-audience book that doesn't really try to teach you linguistics proper
I'm not sure about the claim on implicit lack of parsing structure. I read your example as who did what, where, in what. There must be some level of structural parsing and recognition so we understand it was Alice who drove in a car, that the car is owned by Alice, and that she, Alice, drove down the street, in her car. That we automatically understand all this seems to indicate some level of implicit parsing, right? Admittedly, it's been many years since I did any study of linguistics and language acquisition, so I'm pretty ignorant of the current state of knowledge here. Am I just layering my grammatical parsing atop an existing understanding that doesn't parse at all?
I think observing how children learn their native language is pretty informative. They can speak and understand it very well, whether or not they were taught formal grammar at school. Personally, I know very, very little of Polish grammar (i.e. of my native language), and only little bit more of English grammar - and that is only because foreign language courses are pretty heavily grammar-laden.
I'm not a linguist, but seeing how people a) can understand sentences that are grammatically malformed perfectly well, b) can easily derive meaning out of "sentences" stripped out of verbs ("I her dinner cinema Washington"), it seems to me that most of the work is being done by pattern-matching to known words and phrases. E.g. "drove down the street" is a kind of semantic unit on its own.
Again, I'm not a linguist, but a lot of introspection as well as observing other people strongly suggest to me that humans do anything but parsing grammatical structures.
It's precisely how strongly we conform to grammar, without having been taught it, which shows that it's key to our internal representations/to how we learn language.
Here is the undeniable proof that syntactic structure exists. Consider the sentence `The magician pointed at the man with the hat.' This is a perfectly natural sentence, of which there are two likely interpretations. One is that the magician used a hat to point at the man. The other is that the man who was pointed at wore a hat.
What distinguishes these sentences? Only the underlying syntactic structure, of whether to parse it as `the magician pointed at (the man) with the hat' or as `the magician pointed at (the man with the hat)'. This `hierarchical structure' of our sentences is syntactic structure at its essence.
You argue that humans can understand sentences with whatever grammar, and parsing is pretty much pattern-matching of words. But what about the sentence pair : `Benny chased Jenny' versus `Jenny chased Benny'? These have the same words, and mean different things. It is only our syntactic understanding of how words are ordered in English that allows us to understand these sentences.
Here is the undeniable proof that syntactic structure exists.
There are multiple hypothesis of what a sequence of words can mean, which is not the same thing as 'we form explicit syntax trees in our heads when reading a sentence'.
I could also give you the bag of words
magician point man hat
You would derive meaning from this bag of words, probably the same interpretations as in your example. However, the sentence is utterly ungrammatical. Note that I am not contending that we don't use some form of syntax at all. E.g., I think that someone whose native language has a freer word order than English will assign more hypotheses to the bag of words above (e.g., my brain also considers the less likely option that the magician is the object).
Another problematic aspect of this hypothesis is that a longer sentence will have so many possible parses that it would take a long time to construct and consider all parses. Moreover, I find it unlikely that we have thousands of exact syntax trees in our head that we compare.
> There are multiple hypothesis of what a sequence of words can mean, which is not the same thing as 'we form explicit syntax trees in our heads when reading a sentence'.
Yeah. I'm playing with a different idea now - maybe that "tree structure" that "undeniably exists" in our brains isn't an explicit syntax tree, but an artifact of recursive, adaptive pattern-matching? I.e. if you look at things like reading speed or "understanding" speed, you'll notice that people tend to process stuff in large blocks until something "does not click", and they have to focus and process the block in detail. That sort of feels like a recursive refinement, and any process that recurses in more than one place generates a tree structure as a side effect.
I'm not sure what you're implying. The fact that we are not consciously aware of parsing grammatical structures doesn't mean it doesn't happen.
For example we know for sure that the brain applies sophisticated mathematical algorithms to signals coming from the ears to locate sound in 3D space, yet we are certainly not consciously aware of it - we just "know" where the source is located
Regarding grammar, there is a theory called Universal Grammar from Chomsky that we are born with grammar structures in the brain.
Ah, sure. Those are excellent points. I wasn't really thinking about how we bridge grammatical incorrectness. For myself, perhaps because I'm a grammar nerd, I feel like I always parse someone's mistaken statements into their grammatically correct forms. But I can recognize doing that after I've already figured out what they were intending to say. Same happened with my kids. That's a helpful vector for thinking about the problem, for sure. Thanks!
> I doubt anyone reading "Alice drove down the street in her car" actually parsed the grammatical structure of that sentence, either explicitly or implicitly.
You do need to analyze a sentence to understand it. Think of a classical attachment ambiguity such as "the boy saw the girl with the telescope". There are two readings of the sentence, and just like a Gestalt, you're typically perceiving it as one or the other. This involves a process of disambiguation, which is evidence that you have parsed the sentence.
echo 'Bob brought the pizza to Alice.' | syntaxnet/demo.sh
Input: Bob brought the pizza to Alice .
Parse:
brought VBD ROOT
+-- Bob NNP nsubj
+-- pizza NN dobj
| +-- the DT det
+-- to IN prep
| +-- Alice NNP pobj
+-- . . punct
Really, the mechanism of all these parsers, including SyntaxNet, is the same in that they use statistical training data to set up a neural network. Here's a paper on the Stanford CoreNLP parser, which you can compare with Google's paper: http://cs.stanford.edu/people/danqi/papers/emnlp2014.pdf
So, really all of the above parsers are weak in that they only output a single best parsing, when in reality sentences can have more than one valid structure, the principal example being the second sentence you've provided. I don't think Google's model has a better sense of humor than the others, no. I anticipate that they all have used relatively similar training data.
However, there is probably a trivial way to get the second sentence to parse as
Subject --- Verb --- Object
Noun Verb Article Noun
| \ | | |
Fruit flies like a banana .
and that is to provide training data with more occurrences of ...
The more sentences using simile that the parser finds, the less likely the neural net is to consider 'like' as a verb. It's also impacted by all of the uses of [flies like].
That's the nature of statistical language tools.
The stock parser debuted here gives the same answer as CoreNLP, by the way.
flies VBZ ROOT
+-- Fruit NNP nsubj
+-- like IN prep
| +-- banana NN pobj
| +-- a DT det
+-- . . punct
I'm glad they point out that we need to move on from Penn Treebank when measuring the performance of NLP tools. Most communication doesn't sound like the Penn Treebank, and the decisions that annotators made when labeling Penn Treebank shouldn't constrain us forever.
Too many people mistake "we can't make taggers that are better at tagging Penn Treebank" for "we can't make taggers better", when there are so many ways that taggers could be improved in the real world. I look forward to experimenting with Parsey McParseface.
Say, I wanted to use this for English text with a large amount of jargon. Do have to train my own model from scratch or is it possible to retrain Parsey McParseface?
How expensive is it to train a model like Parsey McParseface?
I started working on a parser as a side project that could parse simple sentences, create a knowledge graph, and then you could ask questions based on the graph. I used http://m.newsinlevels.com at level 1 to feed it news articles and then you could ask questions.
It worked pretty well but I lost interest once I realized I would have to feed it tons of words. So could I use this to do something similar?
As someone who has published work in the NLP area, I always take claimed results with a grain of salt. With that said, I still will have to read the paper to know the implementation details, although my problem with generic linguistic approaches such as this one seems to be is that it is usually hard to "port" to other languages.
For instance, the way they parse sequences of words may or may not be too specific to the English language. It is somewhat similar to what we call "overfitting" in the data-mining area, and it may invalidate this technique for other languages.
When I worked on this area (up to 2014), I worked mainly in language-independent statistical approaches. As with everything, it has its cons as you can extract information from more languages, but, in general, with less certainties.
But in general, it is good to see that the NLP area is still alive somewhere, as I can't seem to find any NLP jobs where I live! :)
Edit: I've read it in the diagonal, and it is based on a Neural Network, so in theory, if it was trained in other languages, it could return good enough results as well. It is normal for English/American authors to include only english datasets, but I would like to see an application to another language.. This is a very specialized domain of knowledge, so I'm quite limited on my analysis..
According to their paper (http://arxiv.org/pdf/1603.06042v1.pdf), the technique can also be applied to sentence compression. It would be cool if Google publishes that example code/training-data as well.
> It is not uncommon for moderate length sentences - say 20 or 30 words in length - to have hundreds, thousands, or even tens of thousands of possible syntactic structures.
Does "possible" mean "syntactically valid" here? If so I'd be interested in a citation for it.
I don't see how a linguistic parser can cope with all the ambiguities in human speech or writing. It's more than a problem of semantics, you also have to know things about the world in which we live in order to make sense of which syntactic structure is correct.
e.g. take a sentence like "The cat sat on the rug. It meowed." Did the cat meow, or did the rug meow? You can't determine that by semantics, you have to know that cats meow and rugs don't. So to parse language well, you need to know an awful lot about the real world. Simply training your parser on lots of text and throwing neural nets at the code isn't going to fix this problem.
This is exactly the type of problem that a good parser should be able to solve, and training a parser on lots of data and throwing neural nets may indeed be a viable solution. Why wouldn't it be? The article describes how their architecture can help make sense of ambiguity.
In terms of a basic probabilistic model, P(meow | rug) would be far lower than P(meow | cat), and that alone would be enough to influence the parser to make the correct decision. Now, if the sentence were "The cat sat on the rug. It was furry", that would be more ambiguous, just like it is for an actual human to decode. But models trained on real-world data do learn about the world.
That's exactly why it's using a neural net and yes, a lot of text will fix this problem. The only reason why we know cats meow and rugs don't is by learning about cats and rugs. Throw enough training data at it and the parser will figure out what is meowing.
An interesting example of this you can easily try for yourself is playing with Google's voice to text features--if you say silly things like "the rug meowed" you will have terrible results because no matter how clearly it can hear you its training data tells it that makes no sense.
This is actually a rather serious limitation of statistical approaches to language: they work best with utterances that have already been said, or with concepts that are already strongly associated in common speech. Such utterances may make up the bulk of what we say and write, but the remainder isn't gobbledygook. It contains most of the intimacy, poetry, and humor of interpersonal communication, all of which trade on surprise and novelty.
That's basically Chomsky's argument with the "colorless green ideas" sentence. If you put words together to form a sentence never seen before, supposedly a statistical model cannot help you. The thing is, a paper later showed that a simple Markov model is actually perfectly able to discriminate this grammatical sentence from an ungrammatical one. Novel and surprising sentences are never completely alien. They use familiar structures of the language, and combinations of words and other building blocks that we have seen before, and this is exploited when we analyze such sentences. Surprise and novelty are actually strongly related to statistics (cf. information theory).
That is not necessarily true. The problem of ambiguity is fundamental to natural language processing, and a lot of research goes in to addressing it. If we also see a sentence where unambiguously the word "cat" is the subject of the verb "meow", then this could give our parser clues about ambiguous attachment or, in this case, anaphora resolution (to what does "it" refer). In any intro NLP class, you will learn about lexicalized parsing, which takes the head word of the phrase into account when making parsing decisions. I haven't read the paper on this parser yet, but I don't think it is hard to see that your sentence could be accurately parsed given enough data. Look up "word embeddings" for instance, which are fundamental to deep learning for NLP and could probably be trained to assist in disambiguating anaphora or attachment.
Actually, "animacy" is a fundamental feature in semantics. It's part of your mental lexicon that a cat is animate and a rug isn't, and you would simply infer from that which is the referent for "it". As semantic challenges go, this is a very trivial one. In general the border between linguistic and world knowledge can become blurred. There may be limits to what can be learned purely from text, but seeing that this model achieved 94%, a lot can be learned purely from (annotated) text.
Anyone planning (or already busy) training Parsey with one of the alternative Treebanks available from Universal Dependencies [1]? Would love to know your results when you have any :)
I am personally looking for a somewhat reliable NLP parser which can handle Dutch at the moment. Preferably one which can handle POS tagging without hacking it in myself.
> At Google, we spend a lot of time thinking about how computer systems can read and understand human language in order to process it in intelligent ways.
There is 6 links in this sentence in the original text. I get it can help to get more context around it, but I think it's actually making the text harder to "human" parse. It also feels they have hired a cheap SEO consultant to do some backlink integrations.
I wonder what's with that X McXface thing. Yes, I've heard something about Boaty McBoatface (whatever that is), but is the Boatface thing original, or is it some kind of obscure American meme?
It wasn't the original, and yes there is a sort of meme like it. Not always -face though. I've heard Beardy McBeardson, the pattern is usually X-y Mc-X-somthing.
:). It seems to have come from an Internet poll. Assuming everything on the (English-language parts of the) Internet is American is usually a safe heuristic ;).
Definitely; I think it's originally from Friends (possibly in a slightly different form; Xey McXerson). Haven't tracked down the exact reference, but it's well over a decade old.
How would you feed a sentence to a neural net? As I understand, the inputs are usually just floating point numbers in a small range, so how is the mapping performed? And what if the sentence is longer than the number of input neurons? Can that even happen, and pose a problem?
There's a bunch of blogs, tutorials, etc, around word2vec and other methods of generating vectors from a training set of words.
Also, in the tensorflow models codebase where this syntaxnet code lives, there is an another tensorflow-using-method of generating word embeddings with demonstration code called Swivel
I think Syntaxnet is just using the NN to guide search instead of doing end-to-end parsing. That said, you can feed "sequences of stuff" into recurrent neural nets! See
one interesting use i can think of is new improved readability scores that can take into account words that are common or uncommon depending on part of speech. (e.g. a text that used "effect" as a noun would be lower-level than one that used "effect" as a verb)
You know, it's a brilliant name. It's a person's name, like Siri or Cortana. But if they'd named it Jane or John, the mass media would have a fit. So they side-stepped that by making it sound computer-y, while remaining a proper name. Brilliant.
(INB4 yes, that's the reference to an old joke about Lufthansa pilot in Frankfurt complaining he's forced to speak in English to the control tower)
Honestly though, English is a quite good starting language due to its simplicity, but I do hope more attention will be expended on making the technology like this trainable on other languages as well.
According to the section titled "Why is Parsing So Hard For Computers to Get Right?" in the blog post, determining the difference between "An English parser" and "An English parser" .. is Parsey's great strength.
I used to have a woman in my life, pumpkin latte sipping, wanna-be graphic designer and that is how she referred to everything xx-y mc-xx-yface seemingly in an attempt to remain a teenager well into her third decade.
I am so glad to be rid of her vapid nonsense.
It is a silly name and makes me think it is not a serious effort. But names can be deceiving.
I'm pretty sure the borders between the two are fluid at google.
I also wish middle management were a bit less afraid of such things. You can usually get cool ideas approved in the higher echolons (because a CEO doesn't have to fear too much, or possibly because these people actually are more imaginative).
Names that have meaning, tell a story are incredibly useful for marketing, even if they sometimes sound unprofessional. Exp: 'Plan B' (morning after pill), 'CockroachDB', 'Virgin'.
It's beyond me how anyone could have chosen the predictable public outcry instead of naming that boat Boaty McBoatface. That's probably the least offending name that ever resulted from an internet poll.
This kind of parser isn't all that useful anyway. Parts of speech are one of those things people use to talk about language with, but you don't actually use them to understand language.
You do subconciously. When you read "Dave punched John" you don't need to think "hmmm, is 'John' the object or the subject?" but if your brain hasn't figured out which is the object and which the subject you won't know who is hitting who.
You don't need to be able to name or define parts of speech but you need to be able to parse them, or you won't understand anything.
Yeah, except are you sure that the way you parse a sentence maps 1:1 to what you learned as "parts of speech"? I think an equivalent understanding, one that seems to be more intuitive to the way my own brain works (if I can believe introspection), is that "punched" is to be read by default as "--punched-->", and "was/is punched by" pattern-matches to <--punched--". Arrow denotes who's punching whom.
That's your brain taking a shortcut because you know that "punched" is a verb, and that a verb in that form is likely to be followed by a subject noun, etc. Even if you never study grammar and couldn't even answer the question "which is the verb in this sentence?" your brain has still learned to recognise the different types of word which is why you understand them.
Disclaimer: I'm not a linguist, but by layman's standards I'm pretty confident.
"On a standard benchmark consisting of randomly drawn English newswire sentences (the 20 year old Penn Treebank), Parsey McParseface recovers individual dependencies between words with over 94% accuracy, beating our own previous state-of-the-art results, which were already better than any previous approach."
From the original paper, "Our model achieves state-of-the-art accuracy on all of these tasks, matching or outperforming LSTMs while being significantly faster. In particular for dependency parsing on the Wall Street Journal we achieve the best-ever published unlabeled attachment score of 94.41%."
This seems like a narrower standard than described, specifically being better at parsing the Penn Treebank than the best natural language parser for English on the Wall Street Journal.
The statistics listed on the project GitHub actually contradict these claims by showing the original March 2016 implementation has higher accuracy than Parsey McParseface.