Paving the way for human-level sentence corrections

aisofteng · on April 1, 2017

The problem that the authors are trying to tackle is an interesting and difficult one.

I have noticed that when dealing with natural language using artificial intelligence / machine learning techniques, the work being done by computer scientists very often would have greatly benefitted from collaboration with a linguist or other sort of language expert, especially in the design phase of an experiment. This work is a good example of what I mean.

People trained in CS or similar precise fields develop, over time, a tendency of thinking in terms of "getting the right result" (I say this as one of these people). When dealing with natural language, however, sometimes there simply is no single correct result.

Consider the topic of fluency that the authors work on: is there a rigorous, objective definition of "fluent"? The answer, as any linguist would tell you, is "no". There are idiomatic expressions, grammatical structures, contractions, slang, and so on that vary from city to city within a country, let alone globally. What may sound "fluent" to one native speaker of a language may sounds strange to another. It is impossible to objectively generally evaluate "fluency". In particular, any practicing linguist will be able to give examples, likely off the top of their head, of English sentences that would be rated as "fluent" by someone from one geographical area and "awkward" by someone from another.

Furthermore, using Mechanical Turk to find humans to rate the fluency of a particular sentence makes for an unclean dataset and evaluation benchmark. The linked post says that, in the end, 50 people found via Mechanical Turk rated sentences for fluency; since any one language is used significantly differently around the globe, there will be an unpredictable range of fluency ratings for at least some sentences across just 50 people around the world. Choosing a different 50 people to rate the same sentences would most likely result in different fluency ratings.

I do not mean to detract from the authors' work; this is a difficult problem to tackle, with no clear path to a general solution. However, I am forced to wonder why the authors, who, based on their biographies linked to in the article, seem to have a range of experience, did not comment on the considerations I've mentioned here.

matt4077 · on April 1, 2017

Maybe they went to ten linguists, and all they got as an answer is "there is no objective definition of 'fluent'. You are trying to find a single correct result that doesn't exist!"

Then, armed with the naiveté of thinking that if there is something like 'fluency' it must be possible to measure it, they just threw a bit of money at the problem. Note that asking a representative group of people is the closest you can get to exactly what you want to measure (apart from asking everyone). It doesn't matter that there's no agreed-upon method to measure the quality of pizza: if I maximise the subjective impression, I'll get exactly what I wanted.

notahacker · on April 1, 2017

Using the Mechanical Turkers to rate fluency would arguably be an even more dubious evaluation benchmark if more rigorous standards and consistent for fluent English existed; people regarded as having good writing and editing skills can find better-paying sources of part-time remote work than AMT. Some of the examples of human editing shown in the blog entry certainly don't look fluent to me...

matt4077 · on April 1, 2017

I don't quite get this... According to the bar graph, the automated systems fail to correct around half of even orthographic (spelling) mistakes. One example is "advertissment", which macOS is now trying really hard to correct against my will in this text area.

Another example is "From this scope, social media has shorten our distance", where "scope" is supposed to be "perspective". That seems to be something that machine learning should easily pick up on, and indeed, when I just tried it on google translate, I couldn't get it to make this mistake without my original (german) sentence also encroaching awkwardness.

So I'm unsure how much value it is to win against systems that fail rather spectacularly. I also don't quite understand why you would need manually-created data for this task, instead of just buying everything ever written in TOEFL essay questions and pitting it against the New York Time's archive.

It's obviously quite likely that there are good reasons for all this. They may have thought a bit longer about it than I just did.

cooper12 · on April 1, 2017

Hmm I'm sensing a bit of garbage-in garbage-out here. For starters their original sentences contain unlikely typos instead of homonyms which would be much more commons. (it complicates the learning as well I'm sure since some changes were made to correct similarly spelled terms which could really change a sentence's meaning once applied) Second, the human corrections aren't that good. We really need to stop creating data sets using anonymous exploited labor that is paid pennies. (they did screen the Amazon Turk users, but if you live in America or work at a university, is there really a shortage of fluent English speakers around you?) Overall I'd say the fluency-editing approach shows promise and would be a boon to ESL-learners, but the training data needs to be improved.

lutusp · on April 1, 2017

This is a great project -- in Phase One, the algorithm will correct sentences written by people who didn't learn basic literacy in school and who subsequently endeavor to avoid reading or writing any text, preferring video. In Phase Two, the algorithm will do away with the poorly written source and create something entirely on its own. Based on my sampling of contemporary human-crafted sentences, Phase Two will take place just in time.

Apropos, my all-time favorite malapropism took place 50 years ago when I was a teenage TV repairman. I visited a household, spied a record turntable, and asked, "Is that a stereo turntable?" "No," replied the customer, "It's monorail."

I was able to avoid blurting out, "I think you mean monaural, yes?" -- for three reasons. One, it's regarded as bad form to correct the grammar of customers, who are always right. Two, technically, the turntable was in fact monorail (i.e. able to follow only one recorded track). Three, I was too busy trying not to laugh.

kwhitefoot · on April 1, 2017

> In Phase Two, the algorithm will do away with the poorly written source and create something entirely on its own. Based on my sampling of contemporary human-crafted sentences,

A problem with this is that there will be a tendency for it to become normative. This is what happened to the OED. Originally it was an etymological dictionary of the usage of English. Now it is regarded as an arbiter of 'correct' English.

lutusp · on April 1, 2017

> A problem with this is that there will be a tendency for it to become normative.

Yes, true. It would turn description into prescription, but we're already approaching that point. I'm not advocating this, only mentioning it.

> This is what happened to the OED. Originally it was an etymological dictionary of the usage of English. Now it is regarded as an arbiter of 'correct' English.

I suspect those behind the OED would deny that as a goal, while acknowledging it as an outcome.

I have a little fun with people who think dictionaries prescribe correct usage, by pointing out that, according to current dictionaries, "literally" and "figuratively" mean the same thing. This is true because that's how people use the words, and a dictionary's purpose is to dispassionately record how people use words, without judgment or rancor.

This is why "reign it in" (now seen regularly) will become an accepted substitute for "rein it in" -- people want to say it that way, so be it. Reigning is what a monarch does to a kingdom, reining is what a cowboy does to a horse, but people are free to say what they want.

saagarjha · on April 1, 2017

This was an interesting read for someone unacquainted with the field–it appears to be very difficult to fix "awkwardness" in sentences; none of the methods were able to reduce it significantly. It looks to me that awkwardness is more based on common usage than on actual grammar, perhaps this could be improved with a solution similar to Google Translate's, which looks at real world usage instead of syntax?

839083 · on April 1, 2017

Real world usage would have to be curated though, awkward sentence constructions or word choices do happen in real world usage. Or as the article shows, there can be multiple, very different ways of fixing awkwardness. I'm not sure what it would look like to find a solution that's "fitted" to several of these.

mannykannot · on April 1, 2017

Fluency is nice, but semantics matter the most.

jwilk · on April 1, 2017

s/are comprised of/are composed of/

3131s · on April 1, 2017

Are you saying that "are comprised of" is not grammatically correct in its context in the article? Why?

jwilk · on April 1, 2017

https://en.wikipedia.org/wiki/Comprised_of#Evaluation

throwayedidqo · on April 1, 2017

I have a feeling this is one of those places where ML will not be useful until we have strong AI.

Certain grammatical errors are impossible to fix unless you understand the overall meaning of the text. Sometimes this meaning is embedded over many paragraphs. Errors involving incorrect word usage are unsolvable when words have more than one meaning and you don't comprehend the subject at hand.

mattnewton · on April 1, 2017

You don't think we can "fake it" in the the vast statistical majority of cases simply by relying on a corpus containing nearly the same cases?

We can already "understand the meaning" in a latent space well enough to do machine translation between language pairs the model wasn't trained on, or do additions and subtractions in the latent space of word to vec to suggest they have picked up some semantic meaning from the text.

I don't think this is a problem that requires Strong AI in the vast majority of cases, just very large well groomed corpa and clever engineers.

coldtea · on April 1, 2017

>Certain grammatical errors are impossible to fix unless you understand the overall meaning of the text. Sometimes this meaning is embedded over many paragraphs.

A non-strong AI can get clues to that meaning (without really understanding anything) based on the words in those previous and subsequent paragraphs, and a huge text corpus.

mack73 · on April 1, 2017

Once we have strong AI, whatever that buzz word means, what then would be the usefulness of understanding slang?

Personaly I think the usefulness is already to be able to enterpret a concept encoded in slang as the same as the concept derived from a message encoded in a different dialect (or language).

I would never assume a machine spoke this language, only that they understood it. Machines should evolve into speaking succinctly as to not include unneccessary complexity in their messages as they would strive to be well-understood like all other persons do. I fail to see why we would want to produce slang-encoded messages, unless we want to mask the fact we are a machine.

throwayedidqo · on April 1, 2017

Ambiguous messages do not imply slang. Plenty of words have multiple meanings in normal and formal English. It's a much worse problem in tonal languages like Chinese. Tell me how you could grammatically correct this without understanding meaning https://en.m.wikipedia.org/wiki/Lion-Eating_Poet_in_the_Ston...

Strong AI isn't a buzzword either, it's been in use for as long as I can remember. Maybe you would be able to understand my Grammer better if I said super human general intelligence and wasted a bunch of space in the process.

I don't think you read my comment? You seem to imply that the corrections would be unambiguous while my point was that some errors are uncorrectable without understanding meaning.

3131s · on April 1, 2017

> Plenty of words have multiple meanings in normal and formal English.

There are some stats from Wordnet on polysemy in English. Obviously this depends on the granularity of a set of senses in a dictionary, but regardless English has many polysemous words (26,000+ according to Wordnet). And more importantly, these polysemous words also tend to be the most common words, hence words like "set" having around 120 definitions in the Oxford English dictionary.

https://wordnet.princeton.edu/wordnet/man/wnstats.7WN.html#s...

upboatter · on April 1, 2017

if comprehension at level don't exist, someone has incentive to correct those to lower level. I certainly do. We are not talking about poetry, are we?