Hacker News new | past | comments | ask | show | jobs | submit login

Can we tokenize these signals? A ripe new market for LLM-based solutions may be opening up!



If we could (we probably can), we'd have a fluent LLM that would generate sounds that we still couldn't understand.


It's not a completely useless idea. LLMs are pretty good at relating parallel concepts to each other. If we could annotate the whale speak with behavioral data we might catch something we'd otherwise have missed. Since whale children need to start (almost) from scratch, it sounds worthwhile to tap into that for teaching an LLM alongside a real whale infant.


it's an important step though. with a whale llm and chatbot we would have a tool to study whale language and communication actively rather than just being able to listen to their interactions passively. i could think of all sorts of cool experiments with an algorithm that can generate whale click sounds and elicit predictable replies from actual whales.


Couldn't the LLM translate it to words that have similar embeddings in our own language? Translation is one of the tasks that LLMs excel at.


Don't you actually need manual translations to feed into the model first for that? AFAIK LLMs are not sci-fi-esque magic universal translators.


Not necessarily. For example:

https://engineering.fb.com/2018/08/31/ai-research/unsupervis...

> Training an MT model without access to any translation resources at training time (known as unsupervised translation) was the necessary next step. Research we are presenting at EMNLP 2018 outlines our recent accomplishments with that task. Our new approach provides a dramatic improvement over previous state-of-the-art unsupervised approaches and is equivalent to supervised approaches trained with nearly 100,000 reference translations. To give some idea of the level of advancement, an improvement of 1 BLEU point (a common metric for judging the accuracy of MT) is considered a remarkable achievement in this field; our methods showed an improvement of more than 10 BLEU points.

Although, this specific method does require the relative conceptual spacing of words to be similar between language; I don't see how that would be the case for Human <-> Whale languages.


No, translation from one language to the other doesn't occur in vacuum, there are millions of examples of translated text done by humans, without it LLM wouldn't learn anything.


You would need to align the vectors of words in our language to the ones in their language... which requires knowing their language (or enough of it)


From the article: “I’ve no doubt that you could produce a language model that could learn to produce sperm-whale-like sequences,” Dr. Rendell said. “But that’s all you get.”




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: