Hacker News new | past | comments | ask | show | jobs | submit login

Reading this story brings to mind the history of algorithms in the field of machine translation. Early attempts at the problem attempted to explicitly define the rules of converting between tongues using meticulously laid out systems of vocabulary and syntax. This approach proved untenable, in part due to the complex and ever changing nature of language. Modern systems such as Google Translation make use of machine learning algorithms that are fed large amounts of source material and computationally discern relationships between them.

I wonder if a similar approach could be taken with language construction. Instead of spending 25+ years fleshing out the details of a language in painstaking detail, computer programs could be devised that, using large amounts input, determine the most "efficient" means of expressing information. The approach would not only be far less labor intensive, it could also accommodate the rapidly evolving nature of language, for example adding to its "dictionary" in response to new phenomena in need of naming.




Interlingua was constructed this way, at least its vocabulary. They made the mistake IMHO to make the grammar naturalistic, which made it very easy to read for people who already spoke a Romance language; writing, on the other hand, was made difficult by this.

You could perhaps use a typological database with grammatical features of the world's languages and somehow select an "optimal" combination from it, but that's a far cry from letting a computer determine the most efficient means of expressing information; we have no idea how to define information/meaning, so that it's still an impossible dream. I don't think the problem is that designing languages is hard per se, it's that people can't be bothered to agree on one and learn it.


Maybe it could use Minimum Message Length http://en.wikipedia.org/wiki/Minimum_message_length


It sounds like an experiment worth testing out and could lead to some interesting results. On the other hand, I am imagine most conlangers enjoy devising the details of their language.


In fact it doesn't just sound like an experiment worth doing. It sounds like something that somebody somewhere might have already done.


To be glib, it has been done. We call it language.

Seriously, though, take a look at the link I posted in https://news.ycombinator.com/item?id=8180924

One of the techniques used is to computationally create a space of possible ways to partition semantic domains on a plane whose dimensions are simplicity and informativeness, in order to look at where in the possible space it is that real languages lie. While it's not been done (to my knowledge) for a whole language, it's potential direction to go.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: