My guess: train a generative model to predict whale sounds, based on recordings of real ones, and hope that the resulting latent space will map to the one of a human-trained LLM. We'd need a stupidly large amount of recordings of whale songs, a tokenization scheme, and few already translated sounds/phrases to serve as starting points for mapping the latent spaces.
Exactly. Also, I think an alternative to LLM that is more generally trained towards identifying large linguistic patterns across a language could be cross referenced with the aforementioned more standard llm to at least point to some possible meanings, patterns, etc
We'd need contextual tracking of what the whales are actually doing/communicating to match to the songs. An LLM would be excellent at finding any correlated patterns between the language and actions, and then mapping those to similar English concepts, but that all requires the behavioral data too. Cameras strapped to whales maybe?