It's struggling with Norwegian. Which I guess isn't shocking. The large model performs a fair bit better than the small, though neither is "good".
Though I assume the amount of Norwegian it has been exposed to is fairly limited, so in that light I'm actually impressed as well.
I tried it on a news segment from the radio[1], this is the large model output:
[00:14.000 --> 00:17.200] En skamløs krenking av FN pakten.
[00:17.200 --> 00:24.000] USAs president og verdensledere svarer på den russiske presidentens atomtrusler og krigsmobilisering.
[00:25.500 --> 00:29.400] Arbeidsklær som er ment til å være til begge kjønn, har det med å være tilpasset.
[00:29.400 --> 00:33.400] Men hvordan ville det gått, om det var motsatt?
[00:34.100 --> 00:38.900] Dyrevernsorganisasjon vil ha digital merking av regnstyr,
[00:38.900 --> 00:44.900] men næringen selv insisterer på den gamle tradisjonsrike måten med rissing av kniv.
[00:45.600 --> 00:51.400] Mange strømselskaper er positive til å tilby kundene fastpris på strøm, og det årevis.
[00:51.400 --> 00:59.900] Da risikerer de å måtte betale mye i nettopp åretsvis, sier aktører som aldri tilbyr fastpris.
[00:59.900 --> 01:21.900] Dette er onsdagens Dagsnytten. Jeg heter Espen Ås.
For reference, here's what he actually said, from the source[1] itself:
* En skamløs krenking av FN-pakten. USAs president og verdensledere svarer på den russiske presidentens atomtrusler og krigsmobilisering.
* Arbeidsklær som er ment å være til begge kjønn, er som regel tilpasset ... menn. Hvordan hadde det gått om det var motsatt?
* Dyrevernsoganisasjon vil ha digital merking av reinsdyr, men næringen selv insisterer på den gamle tradisjonsrike måten med rissing av kniv.
* Mange strømselskaper er positive til å tilby kundene fastpris på strøm - og det i årevis.
- Da risikerer de å måtte betale mye i nettopp; årevis, sier aktør som aldri tilbyr fastpris
Dette er onsdagens Dagsnytt 18 - jeg heter Espen Aas.
The translation didn't fare that well though:
[00:14.000 --> 00:17.000] A shameless violation of the UN treaty.
[00:17.000 --> 00:24.000] The US president and world leaders respond to the Russian president's nuclear threats and war mobilization.
[00:24.000 --> 00:33.000] Work clothes that are meant to be for both genders have to be suitable, but how would it be if it was the other way around?
[00:34.000 --> 00:44.000] The animal welfare organization will have a digital marking of reindeer, but the industry itself insists on the old traditional way of tearing a knife.
[00:45.000 --> 00:51.000] Many electricity companies are positive in offering customers fixed electricity prices, and that is annual.
[00:51.000 --> 00:58.000] Then they risk having to pay a lot in just a year, says an actor who has never offered fixed prices.
[00:58.000 --> 01:20.000] This is Wednesday's Dagsnytt 18. My name is Espen Ås.
For reference, here's Google Translate's attempt, which is pretty good:
* A shameless violation of the UN Charter. The US president and world leaders respond to the Russian president's nuclear threats and war mobilization.
* Work clothes intended for both sexes are usually adapted to ... men. How would it have gone if it had been the other way around?
* Animal welfare organizations want digital marking of reindeer, but the industry itself insists on the old, traditional way of marking with a knife.
* Many electricity companies are positive about offering customers a fixed price for electricity - and for years.
- Then they risk having to pay a lot in precisely; for years, says a player who never offers a fixed price
This is Wednesday's Dagsnytt 18 - my name is Espen Aas.
Re-reading the transcription, I guess I was a bit harsh by saying it's not "good". It gets most of it right, but it keeps messing up some key words. Like "regnstyr" (not a word) rather than "reinsdyr" (reindeer), or "Dagsnytten" rather than "Dagsnytt 18".
It also didn't handle the hanging "... menn", instead thinking it was the start of the following sentence. Almost everyone would understand it was the end of the sentence based on the context.
The double-A vs Å is not an issue as it's the same letter, double-A is the older form.
The small model was considerably worse than the large one though.
I am impressed; some of the words are not that common, such as atomtrusler, krigsmobilisering, strømselskaper and dyrevernsorganisasjon, yet it got them correctly
Everything (and everyone, including myself :D ) seem to struggle with Norwegian, it seems the corpus size is simply too small. And/or maybe the market.
Deepl didn't do any Norwegian last I looked, even though it does most other Germanic languages (including Danish and Swedish).
Duolingo doesn't have a Norwegian class for Germans either, though they do have one with English as the source language.
How are you getting the transcription of the NRK episode? I am learning Norwegian and often struggle to find reliable transcriptions for audio where the text exactly matches the audio (often subtitles are heavily edited compared to what's actually being said)
The stuff I quoted was listed as an abstract of sorts for the episode. I know NRK is very good at providing subtitles for their TV productions, but as you say they're abbreviated.
I'm guessing maybe audio books along with the actual books would be the best source for such? I mean there's Mozilla Voice, but it's quite limited in the Norwegian department and perhaps not quite as interesting as an audio book would be.
Though I assume the amount of Norwegian it has been exposed to is fairly limited, so in that light I'm actually impressed as well.
I tried it on a news segment from the radio[1], this is the large model output:
For reference, here's what he actually said, from the source[1] itself: The translation didn't fare that well though: For reference, here's Google Translate's attempt, which is pretty good: [1]: https://radio.nrk.no/podkast/dagsnytt_atten/l_5ce3e323-97a3-... (not sure if it's available outside of Norway)