Hacker News new | past | comments | ask | show | jobs | submit login

As a (former) linguist, I'm baffled about talking about language as "unstructured". Sure it isn't _rigidly_ structured, but "un"? There's a lot of structure into it, it's just so complex (and so connected to extralinguistic structures) that our ML models don't grok it!



Is a (never) linguist, I think the edge case successes demonstrate how far we really have to go in ML generally. Driving is an entirely engineered phenomenon. Down to the species level of things that might run out in front of it, every atmospheric perturbation, we can define every event that's going to affect a car, and they have been designed over more than 100 years to deal with them.

Medical imaging: another area where highly trained humans have thought rigorously (for about the same amount of time) about what this all means.

What ML is lacking is the data and labels a baby bootstraps from into the "common world". How many ML models have been training on the taste of breast milk, smell of mother, the smell of dirt, the upward view of everything (think about how much time babies, since the stone age, have spent laid on their backs, looking up.

These things have to be segregated in a fairly unsupervised way using little more than reflexes (cry, suck, fencer, grasp, etc) for a while: smell of mom sometimes comes with warm milk, but not always. Associating this warm body with the smell, not just food. Sound of mom doesn't always come with smell of mom.


> I'm baffled about talking about language as "unstructured".

It's a (imprecise) term in Computer Science which may not refer to the same thing you are thinking about. Hence, the confusion.

https://en.wikipedia.org/wiki/Unstructured_data


Typically "unstructured" is opposed to tabular data, where rows and columns are a natural representation.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: