The Seamless Communication models

ChuckMcM · 2023-12-01T18:55:30.000000Z

I look forward to the day where I'm wearing my headphones in a foreign land and hearing all of the discussions in my own language.

The "universal translator" which was part of Star Trek and a lot of other Sci-Fi I was exposed to as a kid was something I was really fascinated with. My Dad worked as a simultaneous French->English translator and sadly spent long hours away from home and, as a kid, I started trying to build a translator so that it could do his work and he could be home more.

Translation is important work and one that could help a lot of people. It's my hope that we get to the point where these models work entirely on locally carried resources.

mbforbes · 2023-12-02T01:03:53.000000Z

I worked on building exactly this earlier this year. I was hanging out in Taiwan for a few months and thought, surely the Babel Fish should exist by now.

I did several experiments recording from all the microphones I could on my iPhone and AirPods while out in the wild. My conclusion: it's impossible right now for that hardware given the microphones we have and what they pick up.

So much of what's spoken is at a combination of (a) high distance (b) low volume (c) background obscuration. Something that was clear as day to my ears would barely register on the mics. While context is of course an issue, the raw audio didn't have enough to even translate.

The one caveat is that there might be low-level (i.e., Apple-only) access to headphone microphones that capture the environment to do noise cancellation. I'm not sure though---I couldn't find them on any API.

For cases where you do have clear audio, existing apps (e.g., Google Translate) are so close to achieving this, but don't let you specify audio outputs with enough fine grained control. By default, it will start screaming out of your phone what you were attempting to silently translate.

KPGv2 · 2023-12-02T03:48:24.000000Z

Also a lot of spoken language involves context that AI is nowhere near understanding yet, let alone all the cultural baggage necessary to accurately translate/localize a lot of utterances.

"Can you stand up?" would be translated differently into Japanese depending on whether you're implying you need them to move their butt off your cell phone versus directly inquiring as to the function of their legs after a car accident. If you speak English and hear it as a background without the rest of the context being picked up, your brain instinctively knows it can interpret it either way, no problem.

But if you're Japanese and the AI picks a specific way to translate it, then you are completely unaware of the ambiguity because the AI resolved it with a 50% chance of being wrong.

pxoe · 2023-12-02T08:46:20.000000Z

>"Can you stand up?" would be translated differently into Japanese depending on whether you're implying

nitpicky, but is it though? not really. and it's as much 'difference depending on what you're implying' as there would be in english comparing just saying 'can you stand up' or specifying 'from the seat/at all'.

resonious · 2023-12-02T09:42:25.000000Z

Probably not the strongest example but there are definitely phrases that are specific in one language but ambiguous in another.

youngNed · 2023-12-02T11:02:36.000000Z

There are certainly nuances, even when 'understood'

Google: "A bit sticky, things are pretty sticky down there."

civilitty · 2023-12-02T05:17:56.000000Z

I'm on mobile so can't find the link but years ago there was a DARPA (iirc) program trying to solve this problem in the context of surveillance in a loud crowded room. Their conclusion was that there needed to be n+1 microphones in the room to be able to cleanly differentiate all of the noise, where n is the number of noise sources, which in their case was number of conversations going on in the room (assuming no other loud sources of noise like music).

I think it's totally doable but you'd need many more microphones in order to deal with real world noise. As MEMS microphone quality improves, this should eventually be possible with a combination of smartphone/headphone/some other device like something around your neck.

_kb · 2023-12-04T09:45:28.000000Z

Apart from the dynamic range challenges for sensing, source separation is hard. There's been a pretty long line of research into the area - see "cocktail-party problem". AFAIK it's still a mostly unsolved problem.

godelski · 2023-12-02T03:02:51.000000Z

There's also some magic to the Universal Translator and Babel Fish: they perform zero-shot real time translation.

That is, they are able to translate (in all directions) novel languages that were not previously heard[0]. It is an open question, with likely a negative answers, that there is a universal grammar even among humans[1] (the definition itself is vague but even the most abstract version is suspect and highly likely to not be universal across species). I think no one will be surprised if it is always impossible to interpret an entire language based on only a few words (let alone do it in real time)

This isn't a knock down, because even a trained device is insanely useful, it's just a note about limitations and triage. This is awesome stuff and I can't wait for the day we have transnational headphones. It's an incredibly complex problem that I'm sure is not short of surprises.

[0] There are a few exceptions such as Star Trek TNG's episode Darmok, S5E2, where the Tamarians' language is unable to be translated due to its reliance on cultural references (the literal words are translated but the semantic meanings are not). It's a well known episode and if you hear anyone saying "Shaka, when the walls fell" (translates to "Failure") they are referencing this episode (often not using the language accurately but who cares (nerds. The answer is nerds)).

[1] https://en.wikipedia.org/wiki/Universal_grammar

geoelectric · 2023-12-02T05:14:58.000000Z

Can’t speak for ST, but did they ever say the babel fish understood languages it never heard before? I thought the galaxy was just exceptionally well-cataloged, given the HHG itself, and humans were hardly unknown.

civilitty · 2023-12-02T05:19:34.000000Z

The babel fish translated via brainwave energy and a telepathic matrix:

> The Babel fish is small, yellow and leech-like, and probably the oddest thing in the Universe. It feeds on brainwave energy received not from its own carrier but from those around it. It absorbs all unconscious mental frequencies from this brainwave energy to nourish itself with. It then excretes into the mind of its carrier a telepathic matrix formed by combining the conscious thought frequencies with the nerve signals picked up from the speech centres of the brain which has supplied them. The practical upshot of all this is that if you stick a Babel fish in your ear you can instantly understand anything said to you in any form of language. The speech patterns you actually hear decode the brainwave matrix which has been fed into your mind by your Babel fish.

jasomill · 2023-12-02T06:05:54.000000Z

“Now it is such a bizarrely improbable coincidence that anything so mind-bogglingly useful could have evolved purely by chance that some thinkers have chosen to see it as a final and clinching proof of the nonexistence of God.

“The argument goes something like this: ‘I refuse to prove that I exist,’ says God, ‘for proof denies faith, and without faith I am nothing.’

“‘But,’ says Man, ‘the Babel fish is a dead giveaway, isn’t it? It could not have evolved by chance. It proves you exist, and so therefore, by your own arguments, you don’t. QED.’

“‘Oh dear,’ says God, ‘I hadn’t thought of that,’ and promptly vanishes in a puff of logic.

“‘Oh, that was easy,’ says Man, and for an encore goes on to prove that black is white and gets himself killed on the next zebra crossing.

“Most leading theologians claim that this argument is a load of dingo’s kidneys, but that didn’t stop Oolon Colluphid making a small fortune when he used it as the central theme of his best-selling book, Well That about Wraps It Up for God.

“Meanwhile, the poor Babel fish, by effectively removing all barriers to communication between different races and cultures, has caused more and bloodier wars than anything else in the history of creation.”

blooalien · 2023-12-02T09:27:06.000000Z

I couldn't help but hear this in my mind as it was read in the voice of the narrator from the old BBC "Hitchhiker's Guide" mini-series.

passion__desire · 2023-12-02T15:38:07.000000Z

I think idea of Babel Fish might encroach on the computational complexity limit in some sense. Imagine a future "Theory of Everything" book written in alien language. The book has total of 1 million characters across its pages where each character is distinct. Now Babel Fish must be able to "translate" such a language to English given its oracle like powers? Can it do the job?

civilitty · 2023-12-02T18:13:01.000000Z

While Arthur Dent does read some stuff throughout the series that couldn't possibly be in English like signs on an alien planet, the full nature of the babel fish is rather vague and we don't know if it would work that way. As far as I can tell, all the written text it translates for Dent is in the context of living civilizations so the babel fish has brainwave energy to feed into the telepathic matrix - presumably telepathically using the knowledge of nearby persons for the translation.

That said, given the Heart of Gold improbability drive, I don't think information theoretic violations are your biggest problems.

geoelectric · 2023-12-02T06:25:37.000000Z

Well, then. Magic indeed!

diob · 2023-12-01T19:47:31.000000Z

The problem is you need a full sentence, plus surrounding sentences to properly translate a lot of things (aka context matters).

So no matter what, conversations in your native speech would have to be delayed before translation.

ItsMattyG · 2023-12-01T21:18:26.000000Z

My understanding is that they trained a separate model to specifically estimate when they have enough context to begin translating, as a skilled translator would.

scherlock · 2023-12-01T23:52:31.000000Z

My mom used to do English/French translation. Her favorite example was the word "file". That word has multiple translation in French depending on the context, and that context may simply be implied by who is speaking. You may not be able to figure it based on the conversation alone.

DigiDigiorno · 2023-12-01T22:08:31.000000Z

Even the native original version needs the proper context. Sometimes you need the entire sentence to figure out what the sentence was really about.

I'm reminded of Mark Twain complaining about verbs arriving at the very end of sentencess in German (among a myriad of other complaints)

"The Awful German Language* -Mark Twain https://faculty.georgetown.edu/jod/texts/twain.german.html

scotty79 · 2023-12-01T22:57:16.000000Z

Sometimes you even need a second sentence of even a few to understand what the first sentence was about.

sexy_seedbox · 2023-12-01T22:55:30.000000Z

So then we need something like neuralink to get the whole thought from one's brain first, then the sentences are processed properly for the context, then translated before the speech is delivered.

kabouseng · 2023-12-02T04:11:11.000000Z

Most thoughts are in a language. There is no one underlying universal machine language for the brain.

plastic3169 · 2023-12-02T06:34:45.000000Z

Are most thoughts in language? This doesn’t reflect my experience. Language floats on top, but there is layers under there. You can also feel it when you end up thinking in another language. It does not go through the first one but is a thing of it’s own.

Pretty sure there is nothing universal there though as you say.

sexy_seedbox · 2023-12-04T11:22:49.000000Z

What about blind deaf people, do they think in a language?

ChuckMcM · 2023-12-01T20:05:43.000000Z

I think I could adapt to that. But it would be an interesting experiment.

sacvnsune · 2023-12-01T18:59:14.000000Z

If I am not wrong, Google Pixel buds offer live translate feature.

echelon · 2023-12-01T19:13:24.000000Z

Not in the voice of the original speaker.

stevenicr · 2023-12-01T19:42:29.000000Z

now if I could just get the pixel buds tech to remove the voice of the original speaker and translate some youtube videos from thick accent english into no accent am-english.

ChuckMcM · 2023-12-01T19:52:11.000000Z

This is a really interesting use case. I could definitely see this as a service for content providers to get more reach and I think you could justify a subscription price for the service based on this.

By keeping creating speaker specific tonal ranges and profiles you maintain the better cohesion on the final product.

keerthiko · 2023-12-01T20:55:38.000000Z

Obligatory, not directed at you in particular since I'm sure you mean no offense, but just voicing a pet peeve:

I grew up bilingual outside the US, and speak English with a hybrid British/Indian/Middle Eastern accent (with some of my personal quirks, and mixing increasing amounts of various American accents over time). I can understand English in nearly any accent (Singaporean, Chinese, Vietnamese, Indian, Nigerian, eastern European) as long as the words involved are globally used and the grammar is passably queen's. Especially after hearing it for about an hour. And people who natively speak English with these various accents usually can understand my English better than they can an average American accent. Yet in this country, my accent is belittled, despite being perfectly understood and more versatile. Even by others who don't speak with the American accent!

This is the problem of the "default accent" anywhere being referred to as "no accent", and therefore anything deviating is considered "having an accent". This makes "accent" a negative trait, scaling from 0-bad to heavy-bad. But if the vernacular were such that we said "American accent" instead of "no accent", then noone's accent is bad, just not used to.

Most of my non-American peers who were raised on English have a better command of the language than my American ones, yet they are mocked for their accents as if they don't know the language, when in reality it's the Americans lack of familiarity with the language (as its used globally) preventing them from comprehending the language.

So yes, put in more work, the world is shrinking and English is the global language (for better or worse). What you're saying is spoken from a position of privilege because the culture allows you to mock others' accents and imply your version of it is the correct one that everyone else should put in work to provide you with, rather than the other way around.

Every time you hear English with an accent other than British, American or Australian, remember that it usually means the speaker knows at least one entire other language as well, probably one that you would sound like an idiot if you tried to speak it. Don't be rude or dismissive of their command of English.

In fact, you were so close — you called it a "no accent am-english", when you could have just called it what it is — "an american accent".

zer00eyz · 2023-12-02T00:08:35.000000Z

https://www.bbc.com/culture/article/20180207-how-americans-p...

What accent? Whose accent? Brits are as diverse accent wise as Americans, London, cockney, New England, Southern...

A lot of Indians that I know have a very "proper" British accent, one that maybe a bit aristocratic, its quite an irony for a former colony. https://www.bbc.com/future/article/20220915-what-the-queens-...

The context matters, but so does history.

lstamour · 2023-12-01T23:33:12.000000Z

There is another way of looking at this, in the context of the parent post: we could suggest that any accent could be converted to “no accent” where American accents are converted to British, or where standard Japanese is converted to a Nagoya pronunciation. Whatever seems like your preference of “no accent”. With this interpretation of the parent post, it’s not specifically about any particular English accent. I’ve been told by others that I have an accent yet I think I don’t have one - and honestly, I think most people have either encountered this - having an accent when you think you don’t have one - or haven’t travelled enough! :)

And I mean, yes, there are people who know they don’t sound like whatever ideal accent they have in mind, and there are people who will make fun of accents - but, and I can’t stress this enough, depending on the context literally any accent can be made fun of, sadly. I’ve had people mock my “American” accent while travelling, for example. It sucks, but it’s not easy to single out any accent as “default” unless it’s literally enforced by a government and taught that way in schools. Last I checked, the US is not one of those countries and English is not as centrally controlled as e.g. French can be.

Dylan16807 · 2023-12-02T00:51:10.000000Z

> In fact, you were so close — you called it a "no accent am-english", when you could have just called it what it is — "an american accent".

There are many american accents. Your suggestion makes the sentence much less clear.

And by specifying "american" they're already making it clear there is no such thing as a universal base accent for english.

kortilla · 2023-12-02T02:20:40.000000Z

This would carry some weight if you didn’t take an opportunity to take a shit on Americans’ English in the middle.

stevenicr · 2023-12-01T22:25:23.000000Z

I appreciate your sharing, and stating that you assume I meant no offense, and that your thoughts are not directed at me specifically.

I could of been more specific, but my request for the tech to vary, I think would lead to specific options for different people.

And actually to be even more.. not sure the word.. I want 'the Chicago accent' I think it's called, or midwest / no accent. Personally as much as I enjoy some entertainment from Jersy / NY accents, I would not volunteer to watch tutorials on tech taught by the Sopranos cast - as funny as that might be (and I get if you are from the NE, you may be learning just fine being taught with such a language style).

As annoying some of the Cali style of language is, I can understand the words and meanings without squinting my ears and spending double the brain cycles trying to understand the words, while then interpreting the meaning, and then trying to put together concepts for understanding new ways of coding or using tech.

I've run into folks in Louisana that I could not understand at all and had to ask for an interpreter at a gas station. From Florida to Chicago to Seattle down to Miss and Ala - I can hear what people are saying and learn without spending lots of extra energy trying to understand.

With that being said, I understand there are parts around Miami where accents may be thicker (or not) - and with some folks even if using the rights words and grammar, I may need to slow down the speech to actually learn if they were teaching a class.

The slow down and speed up options already exist with youtube.

"So yes, put in more work"

- I do try a bit. I don't mind accents with some folks and media.For example I can listen to and enjoy Shankar sharing via the 'hidden brain' series, partially because his accent is limited but also because the media requires less thought intensity.

I have tried many youtubes, and bought a few courses taught from folks in India and other places where I just could not muster the energy. I literally squint with my ears and feel like my head gets hot trying to decipher what is being said, translate into what is meant, and how it should create new patterns of understanding in my brain.

I can only do that for so long and I am done. Now I just skip any learning video that has non-am English speakers. When I consider courses to sign up for or buy, I have to research the authors / speakers and find video of them to hear the audio, because I just can't learn well that way.

"other than British," - True story, a few years ago I had to call an ISP in Britain(?) and the person I got to to file an issue with, I could not understand them. I had ask 'what did you just say' many times. I laughed at myself for even thinking of saying 'can you slow down and speak clearer English please' - I mean, crazy... I was paying by the minute for the long distance at the time and it ended up being a 25 minute call that could of been 10 if I had a magic translate without accent device.

"a position of privilege because the culture allows you to mock others' accents"

- This is truly not about mocking accents, this is truly about my lack of ability to learn well.

Yes, I would defintely sound like an idiot trying to speak another language. Like I said, I do not learn as well as some others.

Truly not my intent to be rude. I apologize if the shortness came off that way, I was trying to be brief in the hope that there's a chance that some tech like this exists and someone here could point me to it. Before I posted, I DDG'ed it and found a couple of things attempting to be in that space with a 'speak to sales' type of 'you'll never afford this' button for info.

I will never be dismissive of anyone's command of English, or other spoken language, or computer language or anything like that. There is no way for me to know someone else's situation and circumstances led them to their current command of whatever language. If someone is trying to learn more at any age; I applaud and encourage them - being rude or dismissive does not encourage more learning.

"no accent am-english", when you could have just called it what it is — "an american accent". - Well maybe, but actually I meant to be more specific, as mentioned a bit above - I mean '"no accent" American accent' - because there are plenty 'American accent' types that I would want removed by a magic earpiece to make it easier for me to understand and learn.

tptacek · 2023-12-03T04:39:19.000000Z

Hold on: there is a midwestern "accent" that is generally considered "no accent at all", just "generic American", but there is most definitely a distinctive "Chicago accent" that is not that. Just Google [da bears snl]. It's its own whole thing. I have aunts that speak in it it, and you could instantly locate them from it.

stevenicr · 2023-12-03T23:19:27.000000Z

Indeed, and I believe there are also Chicago slang words and terms that would be very city-centric - that would not be spoken in a majority of the midwest (I am guessing)

I believe I had heard the term 'the chicago accent' as a term for radio broadcast that may have come more from 'the chicago market' or mid section of the US.. not meaning specific urban/city speak but the market segment for that part of the country as opposed to the east coast market, etc.

Looking at the top 12 radio markets; New York, Los Angeles, Chicago, San Francisco, Dallas-Ft. Worth, Houston, Atlanta, Philadelphia, Washington, Boston, Detroit, Miami

If you pick any of those and think of broadcasting the base accent of that city to the rest of the country.. I could see most annoying the other cities.. maybe Philly being a close 2nd.. and I don't think anyone can determine what a base accent for DC is.. that would depend on which neighborhood or however to median that.. (and I have no idea what the accent / tone / language is of Frisco in general, never been, and maybe it was different 20 years ago vs today)

wglb · 2023-12-06T04:21:48.000000Z

There is a very specific Chicago accent and it is distinct from the Midwest accent. It is not so strong as to be harder to understand if you aren't from here, but it does exist.

keerthiko · 2023-12-01T22:48:46.000000Z

I appreciate the thoughtful reply. I don't think you're rude, and I get what you're saying as someone who thinks a lot about accents and languages. However, I still think you missed my point.

There is no "no accent". An accent is a baseline feature of intelligible human speech, like a voice, or a volume, or a language. You can't say stuff without those features. When you say "the Chicago accent", or the "Midwest accent", that's an accent! Not "no accent".

I understand it's common usage to refer to the default "radio accent" as "no accent", but in a country like America, all kinds of people with all kinds of accents speak English. Reinforcing an expectation that a certain (usu. majority-white-spoken) one is the "default" by referring to it as "no accent", implicitly suggests all others are erroneous affectations, even if I trust that is not your personal intent.

All that said, I think your idea for a translation device capable of revocalizing what is said with an unfamiliar accent into one you are used to is not a bad one, and likely easier than translating between languages while retaining expressiveness.

kortilla · 2023-12-02T02:31:14.000000Z

> reinforcing an expectation that a certain (usu. majority-white-spoken)

Wow, you just keep digging in don’t you? When these Americans you deride say “no accent”, do you think they are referring to the “majority-white-spoken” Scottish accent?

No, of course not. Get that race baiting out of here.

freedomben · 2023-12-01T21:20:47.000000Z

[flagged]

archagon · 2023-12-01T22:09:05.000000Z

I don’t think it’s unreasonable to remind people that a “default” accent does not exist, and that AI-editing an accent out starts to feel a bit like dystopian identity erasure and homogenization. Even if we scope ourselves to Americans speaking English as a first language, there are dozens of diverse accents across the country.

ChuckMcM · 2023-12-01T22:14:45.000000Z

I think this is one of those times when my Mom, understanding my desire to be understood and to ask questions about motives and related understanding, would observe the, oblivious to me, effect of inflaming the conversation and say, "Charles, this is not the time." :-)

archagon · 2023-12-01T22:19:16.000000Z

I don’t like seeing a comment that’s relatively reasonable get greyed out just because it grinds somebody’s gears. Alas, I only have one counter-downvote to give, so I feel obliged to comment.

stevenicr · 2023-12-01T22:56:22.000000Z

My original statement was wanting a translator device, hardware or software, so I could understand and learn better.

There was not desire for identity erasure or homogenization, leave whoever's voice the way it is online, give me an option to translate it. I added more about my issue downthread.

Diverse accents across the country. - absolutely! which is why I said 'no accent am-english.' (for me, as I can't learn well outside that) - and assuming if this tech exists it could help me, and perhaps be tweaked to change to other accents for other people.. also mentioned in downthread reply.

emmelaich · 2023-12-02T00:13:14.000000Z

When I see the phrase "heavily-accented English" I think of Scottish and Geordie speakers. Extremely few of the Indian/Nepalese/Bengali/Pakistani speakers I have known have such a strong accent that it makes it difficult to understand. At least the younger ones.

freedomben · 2023-12-02T22:22:58.000000Z

Did you actually read the comment, or the part that I quoted? It was just a reminder that a "default" accent does not exist. That would have been fine. It was a full throated condemnation of a person without evidence. That's quite a bit more than just a reminder about defautl accents.

scotty79 · 2023-12-01T23:04:29.000000Z

It would be really cool as an assistance in practicing correct pronunciation and accent. Hearing your voice saying it right and then hearing how you actually said it the last time you tried might help you to get both into alignment.

dimitrios1 · 2023-12-01T19:18:57.000000Z

Another lesson we can learn from Sci-Fi is very often different species on a planet would have their tribal / local languages and dialects but all spoke a common tongue. I think this is the more humanizing approach, rather than delegate even more of our fleshly processing power to machines.

somewhereoutth · 2023-12-01T19:39:55.000000Z

This seems to be what is happening in Europe (and perhaps more generally across the globe), with English being the common tongue.

Question is, what will happen to the tribal / local languages? Will they survive?

nemomarx · 2023-12-01T23:22:39.000000Z

Historically, we've seen the larger languages build themselves up by intentionally stamping out the teaching / use of smaller local languages. France banned some regional languages from appearing on broadcast television for years, and etc.

This might be required to get full buy in for a unified language, which is a bit sad but makes some sense - if you ensure it's taking up more and more of media and culture more people know it from immersion, and other languages are reduced to being spoken at home / with friends and that's going to cut into how many people really are fluent in them.

Cthulhu_ · 2023-12-01T21:28:14.000000Z

It varies. A lot of local languages have gone extinct already. There's linguists hard at work to try and document / record dying languages, but it won't be the same as living the language from childhood.

micromacrofoot · 2023-12-01T20:30:19.000000Z

then of course, there's always Darmok and Jalad at Tanagra

baby · 2023-12-01T19:32:23.000000Z

I’m wearing the Rayban Meta right now and they are already mind blowing, I can already talk to that Meta AI assistant seamlessly. I bet one of the future iteration will have exactly this.

figers · 2023-12-01T20:47:13.000000Z

Curious, what do you ask it besides take a picture / video or what's the weather?

I have a pair and have only asked it that so far...

baby · 2023-12-02T00:20:28.000000Z

Whenever I have a question and used to pull up bard/chatGPT, and if I’m wearing my glasses.

Kind of like having an expert next to you all the time.

pokstad · 2023-12-02T01:32:05.000000Z

I look forward to the day when that problem is solved by a company that doesn’t mine my data to sell ads.

rangestransform · 2023-12-01T19:26:36.000000Z

how am i supposed to talk shit with my friends about other people in public then

csa · 2023-12-01T20:12:04.000000Z

Speak in metaphor and/or code.

I’ve been in mixed language communities in which I wasn’t sure who spoke what, and I have found this to be quite effective when done right.

Good time to reference st:ng “darmok” episode and quotes like “darmok and jalad at tanagra”.

ChuckMcM · 2023-12-01T23:13:36.000000Z

Cincinnati when the Turkeys fell.

flanbiscuit · 2023-12-01T19:43:55.000000Z

I'm curious to know how well these models can pick up slang. Maybe if you talk shit in as thick a slang as you can it won't be able to give a good enough translation.

dontupvoteme · 2023-12-01T20:03:27.000000Z

I'd love to see a map of how it matches up to regional English/British accents and their slang.

kredd · 2023-12-01T20:00:46.000000Z

With my bi/trilingual friends who speak the same languages, we intermix them to make our point more clear. Don’t think models will be good enough for mixes for a few more years, so we’re safe!

smcin · 2023-12-01T20:06:09.000000Z

Can you show us an example of such a sentence?

kredd · 2023-12-01T21:16:11.000000Z

Hm, think of things like “On va bruncher” (we’re going to brunch). The word “brunch” doesn’t exist in french, but we add suffixes to fit into the sentence. Very common in Montreal. My french isn’t very good to do that on the fly, but my francophone friends do that all the time.

In my other languages that I am actually fluent in, it’s kinda the same — you use specific suffixes to soften or embolden your point and so on. Maybe add “exclamation making sounds in specific language” too. Eventually your nouns and verbs end up in different languages, with different suffixes where it “makes sense”, yet the person whom you’re talking to will “get it”.

Would be curious to try the new Seamless model on such speeches.

bertil · 2023-12-01T21:31:07.000000Z

This is extremely common for every new technology: “upload,” “download,” “stream,” “google,” “FaceTime,” most code patterns, all the new ML apps, “venmo” or whatever the name of the app you use for payment, etc. all of those are taken as is, slapped a verb termination and it’s good enough. That’s true in German, Danish, Dutch, French, Italian, and Spanish.

The only thing that doesn’t work is if you talk to people too young to remember Skype. Then you feel old.

smcin · 2023-12-04T01:22:58.000000Z

Ok but that's simply mixing multiple languages by means of neologisms, loanwords, as in Franglais, Spanglish, Taglish (Filipino-English creole, as seen in www.inquirer.net) etc. (which are technically creoles, as seen daily in web posts and text).

Verbing a noun is pretty simple; French borrowing English nouns as loanwords then verbing them is another. I was expecting something intermixed and trilingual.

"The word “brunch” doesn’t exist in French": well not the official French language per the Académie Française, but functionally it does, once everyday French-speakers start using it [0] "Many were quick to point out, as Reuters reports, that French President Emmanuel Macron commonly uses English idioms, including “start-up nation” and “bottom-up.”". I'm guessing Québecois people are even more fluid about this, since they have to be functionally bilingual(/trilingual) in daily interactions, and France's Toubon law doesn't apply to them.

For example, it would be interesting to chart the ratio of the Académie-mandated 'courriel' vs 'email' in majority-French-language posts in various regions. [1]

[0]: ["How France Tries to Keep English Out of Public Life" (8/2019)](https://news.ycombinator.com/item?id=20730219)

[1]: [Should You Use "le Courriel" for "Email"? (2019)](https://www.thoughtco.com/le-courriel-vocabulary-1371793)

kredd · 2023-12-08T05:10:36.000000Z

Sorry, I missed this comment! Here's how I usually talk with my friends who are also fluent in English, Russian and Turkish, I think it would be a better example:

Ne smotra na fact that tomorrow bir o kadar da sunny diyildir, bilo bi horosho progulyatsa around the seawall.

(Despite the fact it's not really sunny tomorrow, it would be nice to go for a walk around the seawall)

It's very weird to type it out, as we only use it when we speak. At least, when I type, I tend to think in one language, so there's less organic mixing happening. Since it's speaking-only thing for me, I tend to add the usual filler words, suffixes for superlative forms and etc., but from all three languages. Apologies for using French/English, as that didn't convey my point properly.

It's not about loan words, it's exactly just mixing random languages together as they roll off your tongue. I know it sounds stupid, but I've been talking to my siblings that way my entire life, so it's very "natural" to me. Most of my university friends are at least bilingual as well, and from what I've been told, they do the same with their friends in their own languages.

smcin · 2023-12-09T01:35:38.000000Z

That example is much better, thanks.

dopidopHN · 2023-12-01T23:32:26.000000Z

I would thing this model would fail with a heavy quebecois lingo, as opposed to standard French.

dopidopHN · 2023-12-01T23:30:54.000000Z

Cockney English and French Verlan comes to mind.

I don’t know for cockney but verlan is very alive.

fasquoika · 2023-12-01T22:40:10.000000Z

Reinventing polari is certainly one way to make yourself less understood...

buryat · 2023-12-01T22:22:15.000000Z

get better at double speak https://en.wikipedia.org/wiki/Doublespeak

ugh123 · 2023-12-01T19:49:50.000000Z

learn Klingon?

bertil · 2023-12-01T21:31:42.000000Z

Klingon is definitely going to be in the top 50 languages covered…

TheHumanist · 2023-12-01T19:06:47.000000Z

Babel Fish

p1mrx · 2023-12-02T19:12:17.000000Z

> so that it could do his work and he could be home more.

That's a nice way of saying unemployed.

999900000999 · 2023-12-01T16:12:35.000000Z

Can't wait for someone to roll a language tutor out with this tech.

Everyone gets a personal tutor for hours a day.

I would absolutely love a VR game where I just need to work in China or Mexico all day and pick up the language that way.

modeless · 2023-12-01T16:18:08.000000Z

This is what I'd like to build (the tutor part at least, not the VR game part yet). I'm planning to extend my current English only rough prototype[1] to support Mandarin. (I happen to be learning Mandarin myself at the moment, and there are a bunch of open source bilingual Mandarin LLMs and speech synthesizers from China to choose from.)

I think a lot of people are working on similar things right now. I know of one called http://yourteacher.ai

[1] https://apps.microsoft.com/detail/9NC624PBFGB7

siraben · 2023-12-01T16:48:36.000000Z

Is there a high quality speech synthesizer (ideally local) for Mandarin you have found? There are some subtleties with tone sandhi rules and how they interact with prosody that I feel are lacking with current TTS voices I’ve tried.

gattr · 2023-12-01T17:12:51.000000Z

I love the idea of LLMs being super-efficient language tutors. And you have a good point; coming soon: "We've been getting a lot of these tourists here lately, they're eerily fluent, but all seem to have the same minor speech impediment" (read: messed-up weights in a commonly used speech model).

siraben · 2023-12-01T18:05:41.000000Z

I've been using ChatGPT 4 to translate and explain various texts in Mandarin and it's been very on point (checking with native speakers from time to time, or internet searches). As expected, it has trouble with slang and cross-language loanwords from time to time. However for languages with much lower information online, it hallucinates like crazy.

> coming soon: "We've been getting a lot of these tourists here lately, they're eerily fluent, but all seem to have the same minor speech impediment"

Haha, if that were to pass, that would still be a far better outcome than our current situation of completely blind machine translation (this is especially for various Asian languages that are very sensitive to phrasing) and mispronunciation by non-native speakers.

bityard · 2023-12-01T18:06:45.000000Z

> all seem to have the same minor speech impediment

Ah, that is called an accent.

dontupvoteme · 2023-12-01T19:22:02.000000Z

Kind of, Accents are typically derived from the intersection of natural languages, specifically which ones you learned the phonetics of first. (With the exception of the Mid-Atlantic accent...)

This would be something quite novel as the speech irregularities would not have their origin in people

I don't know what you would call it but it needs at least some adjective before accent to differentiate it IMO

modeless · 2023-12-01T16:59:10.000000Z

The first one I plan to try is https://github.com/netease-youdao/EmotiVoice

I don't have the expertise to judge the quality of Mandarin pronunciation myself, being a beginner. But it sounds OK in English and it's made by native Mandarin speakers in China so I expect that it sounds better in Mandarin than English.

siraben · 2023-12-01T22:41:08.000000Z

Sounds pretty good, although still lacking in natural-sounding tone sandhi (e.g. try 一下, it should be yi2xia4 instead of yi1xia4).

999900000999 · 2023-12-02T03:31:25.000000Z

Do you have a favorite Chinese learning app ?

rnjesus · 2023-12-01T21:02:23.000000Z

the azure neural tts voices in chinese are the best i’ve heard, specifically the “xiaochen” voice. i use it in anki daily to generate sentences for my mandarin decks with an api key/plugin. it’s not something you run locally of course, but they have a decent enough free tier.

i’m hoping a voice as realistic as this becomes a local app soon, but i’ve not found anything that’s nearly as natural sounding yet. (also, honorable mention to chatgpt’s “sky.” she pronounces mandarin with a funnily american accent, but it sounds natural and not as robotic as the open-source alternatives i’ve tried)

spaceywilly · 2023-12-01T17:11:13.000000Z

To me the key functionally for any language learning app is giving you feedback on your pronunciation and general understanding. I’ve been using Duolingo to learn Mandarin and when I try to speak to anyone it’s difficult for them to understand me, because my pronunciation is all wrong. The app is just feeding info to me one way, and I can try my best to recreate what I’m hearing, but there’s no way to know if I’m messing it up. They do have a speaking feature but it doesn’t work very well, certainly not to the same level as speaking with a real person who is fluent in the language and having them correct you.

throwaway4aday · 2023-12-01T18:03:22.000000Z

As a quick solution, you should try recording yourself speaking and then listen to it to check your pronunciation against some reference. So for example, find a YouTube video in the language you're learning that also has good subtitles (use https://filmot.com/ ) and listen to how they say the phrase and then record yourself saying the same phrase and play it back and compare.

dog321 · 2023-12-01T21:31:45.000000Z

I practiced for a long time using the below pronunciation trainer and I get a ton of compliments from native speakers on how accurate my pronunciation is.

https://fluent-forever.com/product/fluent-forever-pronunciat...

kccqzy · 2023-12-02T06:14:45.000000Z

It's the same struggle language learners have faced for a long time regardless of app or not. I did careful studies of the French grammar, read French books, listened to audio tapes in French, and I still got prononciation wrong often.

I resisted using Duolingo because I knew their speaking feature sucks. But the only reason I need an app rather than books or audio tapes is that I need something to correct my prononciation.

addandsubtract · 2023-12-02T09:20:31.000000Z

There are other language learning apps, such as Busuu, which make you record and peer-review other people's pronunciations.

flanbiscuit · 2023-12-01T19:52:31.000000Z

I would love a game that helped you learn a language (not necessarily VR though as I don't have that equipment). The game drops you into a world (a country of the language the game is meant to teach you) where no one speaks your language and you have to figure out what people are saying in order to fulfill quests. You get some hints, like maybe you have a simple translation guide in your inventory or sometimes you meet people who can speak a few words of your language. That would motivate me to learn faster than self-taught tutorials.

I'd love to learn French and the game would take place in locations all around modern France.

It would have to a good story. Maybe something in the style of Professor Layton series could be interesting, or something more open world.

resonious · 2023-12-02T14:30:43.000000Z

If Professor Layton itself has a French translation then you're more than half of the way there! Existing games are already quite good for language learning. But indeed they're missing the "realistic" element that you're after.

bilsbie · 2023-12-01T17:08:17.000000Z

I think it would be so ironic if advanced AI ended up simply teaching us new languages quickly instead of translating for us.

dontupvoteme · 2023-12-01T19:22:50.000000Z

Finally Esperanto has a use case!

toomuchtodo · 2023-12-01T17:21:38.000000Z

Might be able to generate a better language than what we have.

bilsbie · 2023-12-01T18:48:12.000000Z

Good point. Maybe they invent a better language and easily teach it to everyone.

jbird11 · 2023-12-01T18:26:11.000000Z

Absolutely, what I've noticed is that the current apps are great for beginners but after a certain point the only way to improve your ability to speak a new language is to well... speak it. I built Proseable to help people move beyond the generic how to order a coffee or ask to go to the bathroom, and have more meaningful conversations in the real world. Check it out!

https://www.proseable.com/

meowtimemania · 2023-12-01T16:18:11.000000Z

There’s already a few of them. Checkout https://hallo.ai

999900000999 · 2023-12-01T16:30:35.000000Z

I wouldn't feel good about anything that's not focused on a single language.

You end up with the Duolingo problem where you know to say the names of 20 different fruits but not how to introduce yourself.

apwell23 · 2023-12-01T16:49:25.000000Z

> You end up with the Duolingo problem where you know to say the names of 20 different fruits but not how to introduce yourself.

Not sure if this is a duolingo problem. There are of modules in duolingo specifically for saying your name. I think its the travel module.

numpad0 · 2023-12-01T18:01:30.000000Z

(Duolingo problem(, AIUI): Duolingo is designed around such premise that, by exposing your subconsciousness to such small set of words and phrases in target languages, your brain should be able to trivially construct output shims from Universal Grammar, which must exist, to desired languages; but that doesn't work in practice and you end up with small set of words and phrases your subconsciousness had recorded)

coldtea · 2023-12-01T17:09:57.000000Z

Never seen that in Duolingo. It starts with the basics and phrases, not random useless vocabulary.

cptskippy · 2023-12-01T17:55:04.000000Z

I was going to Italy and started using Duolingo to try and help. I learned such useful phrases as "the children have bread".

gs17 · 2023-12-01T17:51:19.000000Z

Duo has a different problem for me. The lack of focus means some languages don't get features. Chinese still doesn't have Stories (there's an unofficial version of it, but we've been waiting years).

massimokris · 2023-12-01T20:01:47.000000Z

the Duolingo's problem it is not because they have a bunch of languages, it is because achieving fluency in a target language it is about been able to produce/generate phrases, and they just move you to consume and sort words and phrases. in the case of any AI Language tutor, the student must produce phrases in order to practice, and that makes them advance in the path to achieving fluency

massimokris · 2023-12-01T19:52:26.000000Z

I built one for people in Latam to practice languages in a conversational way through a WhatsApp chat https://wa.me/+5491162951713?text=hola%20Speakeasy

coalteddy · 2023-12-02T21:12:20.000000Z

https://chat.quazel.com teaches languages through speaking and having specific goals on where to steer the conversation. It's actually pretty nice. Has many language combinations to learn, ex. French speaker learning German. They seem to use LLMs and text to speech for now but I'm sure they'll get there with this tech progressing so fast. Duolinguo is from yesteryear in my opinion, learning by speaking is way better.

inbread · 2023-12-01T18:15:40.000000Z

I built just this a month ago with the Azure AI speech API, which is already pretty good at multilingual speech.

https://github.com/adrianmfi/gpt-tutor

I look forward to testing if switching to Seamless can improve it further, Seamless supporting nearly 100 languages is a nice improvement.

Jeff_Brown · 2023-12-01T19:01:25.000000Z

> game

Yes! Better yet, you're a spy, or a hostage negotiator, or the leader of any kind of enterprise (army, business, aid organization) ...

Programming games like that will resemble directing improv theater. You can't program every response; you'll have to instead fit each character with beliefs and motivations.

I can hardly wait.

dontupvoteme · 2023-12-01T19:03:54.000000Z

For Language Acquisition, Input Is All You Need. (Mostly)

What would be really cool is something that can autodub videos or audio into your target language. The hardest problem learning languages that aren't English is often finding content to consume in them.

Disclaimer : I am Krashenist so this take is biased

jahewson · 2023-12-01T16:33:03.000000Z

Isn’t having the AI do it for you better than having the AI teach humans to do it?

dylan604 · 2023-12-01T16:38:00.000000Z

Sure, if you're not into personal growth. Not everyone wants to become the useless bit of lard sitting in a chair while a computer does everything for them. Yet. Some of us still like to do the actual things, but just need some assistance along the way. We still have a bit of time before we're all the humanoids from Wall-E

TeMPOraL · 2023-12-01T21:28:44.000000Z

> We still have a bit of time before we're all the humanoids from Wall-E

Obligatory reminder that the movie itself explains that people are what they are not because of their lifestyle, but because of the time spent in low-gravity environment.

dylan604 · 2023-12-01T21:56:36.000000Z

not sure that really matters to the point

ericmcer · 2023-12-01T18:34:31.000000Z

Yeah thats why I mill my own grain and am getting into textiles.

djvdq · 2023-12-01T19:32:27.000000Z

I love when people use this pathetic extreme examples, when they don't have any meaningul arguments.

ericmcer · 2023-12-01T22:06:05.000000Z

That isn't an extreme example at all, people used to mill grain and make clothing by hand, now we don't. We somehow are not sitting around getting fat even though technology takes care of those tasks.

The parents suggestion is that if we don't have to learn languages that will lead to us all laying down drinking big gulps while robot slaves take care of us. Their take is the extreme example. People have literally made this same suggestion about every technological advance and it never comes true.

j33zusjuice · 2023-12-01T16:52:24.000000Z

Not necessarily. It depends on the use case. For taking a vacation, having an AI that can instantly translate to your native language would be amazing. That’d solve a lot of real world problems, no doubt.

However, translation has a great deal of subjectivity embedded in it, particularly when there aren’t 1:1 translations. Case-in-point: there are many English translations of the Christian bible, all similar enough, but there are enormous variations in some cases. And there are at least as many branches of Christianity as there are English translations of the Bible. Some of them strictly recommend the same translation, and they still disagree on the meaning of various passages.

Besides the problems inherent to translation, learning another language gives you another paradigm of thinking. The words we use, the way we construct sentences, etc., all impact our view of the world. Here’s a paper that discusses the impact of the over-reliance on English in cognitive sciences, and how this has downstream effects: https://www.sciencedirect.com/science/article/pii/S136466132...

Learning languages as an adult also has protective benefits. It reduces the probability of Alzheimer’s (maybe dementia, overall?).

coldtea · 2023-12-01T17:10:47.000000Z

In the way that watching porn is better than having sex.

whoisburbansky · 2023-12-01T16:42:50.000000Z

It depends on what your goal is; for some tasks it's possible that getting the AI to do it is best, but, e.g. the existence of auto-pilot doesn't mean that hobbyist pilots wouldn't benefit from/enjoy exercising the same skills manually.

modeless · 2023-12-01T16:39:59.000000Z

Even a perfect human translator following you around wouldn't be anywhere near as good as knowing the language yourself.

swatcoder · 2023-12-01T16:50:06.000000Z

Maybe prior to fluency, for something like an odd business or tourist trip.

But there's a point in language learning where you can come to express yourself directly in a new language without intermediary "thinking" in your first tongue. The communicative and expressive potential of that mode is much higher than trying to squeeze one's intent through any kind of translation, machine or internal.

Plus, you know, it's fun.

tmountain · 2023-12-01T17:02:49.000000Z

Started a project to do this a while back. It's pretty fleshed out:

https://www.parcero.ai/

I could integrate this instead of Polly pretty easily.

advaith08 · 2023-12-01T17:00:18.000000Z

seen a lot of these, but none for Indian languages. Would love to try an Indian language one!

999900000999 · 2023-12-01T17:55:01.000000Z

Are Indian languages hard for English speakers?

thinkingtoilet · 2023-12-01T20:43:22.000000Z

I'm learning Hindi and there are somethings that are easy (phonetic alphabet, nothing like 7 different sounds for 'ough') but the sentence structure is very different and can be hard to get right. Pronunciation isn't too bad for the most part but there a few tricky things, for example four different 't' sounds and four different 'd' sounds. The hardest part is that there really aren't that many resources. Even though Hindi is the third most spoken language in the world, you will find far more resources for many of the less spoken European languages.

dwighttk · 2023-12-01T20:42:57.000000Z

and the language tutor company could have you pilot around a menial labor droid while you are learning...

zbyforgotp · 2023-12-01T22:11:02.000000Z

But will people use them?

navbaker · 2023-12-01T17:06:53.000000Z

Seamless Streaming looks really promising! We just had a new employee start a few months back with profound hearing loss and our company had no idea what to do with him from an accessibility standpoint. They threw out solutions like Dragon, not realizing those solutions are not real-time.

He ended up rolling his own solution by standing up Whisper in one of our clusters and writing a basic front end and API to take his laptop’s mic input and chunk it every few seconds to send to the model and get back text in pseudo-realtime. We got him a pretty beefy Alienware so he wouldn’t be tied to the cluster GPUs. I can’t wait to see what he does with these new models!

cgb223 · 2023-12-01T17:32:36.000000Z

Just wanted to say you’re a great employer to be so incredibly accommodating to the point you get them an Alienware and let them roll an accessibility solution

We need more support for employees like this!

cced · 2023-12-01T17:34:46.000000Z

Second this!

Also, what about Apple’s latest M3 series chips? Are this in the same realm as Alienware in terms of AI compute?

jackson1442 · 2023-12-01T17:46:30.000000Z

I think generally the consensus of Apple Silicon is that they're great _for a laptop_, but still aren't going to beat a dedicated graphics card + high-end CPU like i9/Ryzen 9. Biggest thing going for apple is the performance/watt though which is critical for a laptop.

cjbprime · 2023-12-01T18:33:39.000000Z

I think this is missing the main reason to use Apple Silicon, which is that your dedicated graphics card probably has 24GB or less of RAM, whereas e.g. an M2 Ultra Mac Studio can have 192GB of RAM with a far superior memory bandwidth to anything on x86. This is important because even a "small" LLM like Llama2 13B would require quantization to fit in the 24GB RAM that the dedicated graphics card will give you, whereas the Mac could run Llama2 70B without quantization (at FP16).

aftbit · 2023-12-01T19:09:03.000000Z

Whisper doesn't need that much RAM though.

nodja · 2023-12-01T18:16:06.000000Z

They're better than most consumers x86 CPUs but worse than using a GPU. Where they shine is when the ML model can't fit the GPU's VRAM since you have better options for ram size with macs.

willy_k · 2023-12-01T17:54:15.000000Z

They definitely are in terms of energy efficiency

romwell · 2023-12-01T18:45:00.000000Z

>Just wanted to say you’re a great employer to be so incredibly accommodating to the point you get them an Alienware

So gracious, to give a software developer some hardware to run the software they need to work, that costs a whopping nothing more than what other people in the industry get on the average.

>and let them roll an accessibility solution

"You're such a good employer! You let your employee build their own accessibility ramp to the back entrance in their own time, and even got them a mortar spatula to do so!" We need more support for employees like this!

>We need more support for employees like this!

And less support for employers like this.

Solvency · 2023-12-01T20:08:22.000000Z

Not sure why you're being downvoted. Literally the equivalent of building your own ramp.

freedomben · 2023-12-01T21:48:18.000000Z

I didn't downvote, but I considered doing so because nowhere that I saw in GP does it say in his own time, and that's a critical piece of the equation. Hallucinating that datum means they got the argument wrong, and worse they were harshly critical of the company based on that wrongly assumed information.

It reminds me of the Homer Simpson quote, "I don’t mind being called a liar when I’m lying, or about to lie, or just finished lying, but NOT WHEN I’M TELLING THE TRUTH!" I would be equally critical if it was warranted, but when it isn't it's deeply unfair to the accused.

If the person wanted to build their own ramp, and the employer let them do it on the clock, that's a completely different scenario than the employee having to come in during their off-hours to build the ramp just so they can go to work.

navbaker · 2023-12-02T02:38:11.000000Z

Yeah, it wasn’t on his own time. He had a full budget and this was right in line with stuff he had already done research in anyway, so he just went for it.

romwell · 2023-12-04T07:24:32.000000Z

> He had a full budget and this was right in line with stuff he had already done research in anyway, so he just went for it. reply

So you're saying, not only you didn't pay him extra, but that the company got to benefit from him building the system as it was already in line with your other projects?

Unless his working hours were reduced, he did it in his own time.

Unless his pay was increased, he did it in his own time.

Unless the expectations for him were scaled down, he did it in his own time.

Merely allowing them to hack on a project that is in line of his work is exactly like having someone build their own ramp because "it's already in line with other construction they did on the project".

romwell · 2023-12-04T07:26:42.000000Z

>nowhere that I saw in GP does it say in his own time

Nowhere did I see that GP said the employee was paid extra, had the expectations on other projects reduced in writing and deadlines shifted, or had his working hours reduced at same pay.

Saying "oh hey, you can work on this during 9-to-5 as long as you get your other shit done on time" means the project was done in his own time.

romwell · 2023-12-04T07:30:35.000000Z

>Not sure why you're being downvoted. Literally the equivalent of building your own ramp.

Because we are on Hackernews, where everyone likes to think themselves a scrappy startup owner, and not a person with a disability who might need accommodations from one.

qkeast · 2023-12-01T17:40:36.000000Z

Awesome! I love hearing about places making the effort to be inclusive.

As someone who’s profoundly deaf myself, another less technical approach is to install Rogue Amoeba’s Loopback, and use it to pipe audio from a given app into a tool like Google Meet or Otter.ai using the Loopback device as the audio source. This effectively provides real time captions for anything running on your existing machine.

jallmann · 2023-12-01T18:34:18.000000Z

Google Chrome [1] also has captioning built-in [2], so this could also work from a plain page that hooks into the loopback device. Pretty sure it's using the same TTS backend that Google Meet uses.

The nice thing about Chrome feature is you can move the caption box around and keep it in the foreground while doing other things, although styling options seem limited (the text might be a little small for some).

[1] on desktop, not sure about mobile

[2] via chrome://settings/accessibility -> Live Caption

navbaker · 2023-12-02T02:42:59.000000Z

We definitely explored using these tools, but we’re constrained by government sponsor rules regarding data protection in our day to day work. We can use ZoomGov captions, but most of the other tools weren’t approved. It looks like Windows 11 has a real time solution of some kind, but we’re still stuck on 10.

sdrothrock · 2023-12-02T06:45:44.000000Z

I have profound hearing loss and rely on the Windows 11 captions. They are absolutely best in class and head/shoulders above any other automatic captioning I've used. My coworkers have a variety of accents (Hispanic, Eastern European, South African) and it does a great job with all of them.

Additionally, it supports multiple languages (only one at a time sadly), so I also use it for Japanese captions and it's equally great there.

tuukkah · 2023-12-01T18:17:40.000000Z

Clever use of Google Meet as a tool! Also, Google Pixel phones now provide realtime captions to any speech playing on the phone (Accessibility > Live Caption). You can also choose a "preferred language" and the captions will be automatically translated to that language from other languages.

fy20 · 2023-12-02T09:01:52.000000Z

Whisper is pretty good for speech to text, and can be done with in a resource constrained environment. I tried a demo running in a browser using WASM on my phone and even the tiny model is not bad.

romwell · 2023-12-01T18:48:32.000000Z

>Awesome! I love hearing about places making the effort to be inclusive.

The extent of the effort being getting their employee a slightly-more-expensive-than-average tool that would enable them to do their job better regardless of the disability?

Such inclusive, much pat-yourself-on-the-back, wow.

"We gave our woodworking shop employee a quality saw so that they'd make their own accessibility ramps!"

qkeast · 2023-12-01T19:26:56.000000Z

I have literally been told in job interviews that the company would not be “allowed” to hire me because I’m hearing impaired, so yes, making an effort to support an employee’s disability and their needs is worth recognizing.

romwell · 2023-12-03T03:00:35.000000Z

>I have literally been told in job interviews that the company would not be “allowed” to hire me because I’m hearing impaired,

That is literally grounds for a lawsuit.

>so yes, making an effort to support an employee’s disability and their needs is worth recognizing.

...if the effort were actually made. Which is not the case.

Merely being compliant with the law is a very low bar, and we should recognize that they did not do anything beyond what the ADA requires them to.

callalex · 2023-12-01T19:26:54.000000Z

What would you have them do instead?

romwell · 2023-12-03T03:03:50.000000Z

>What would you have them do instead?

Why instead? They didn't a bad thing, just nothing beyond what the law requires.

romwell · 2023-12-04T07:29:18.000000Z

>What would you have them do instead?

A simple answer for you, BTW:

Pay a student $15/hr to transcribe all meetings in real time for the employee.

Or pay money to professionals in the field to set up a solution. You know, people who get paid to do just that.

Jesus.

RogerL · 2023-12-01T21:30:03.000000Z

So what? Okay, in the case of a ramp, if you need one you probably are going to have difficulty building one. So pay employee Sally to build it instead, absolutely.

But hearing loss does not impair standing up servers and software. They can pay the employee who probably is the expert at this, the guy with the hearing loss, or go task Emil to go do it to ... avoid 'appearances'?

romwell · 2023-12-04T07:20:15.000000Z

> They can pay the employee who probably is the expert at this, the guy with the hearing loss,

Aaaaaaaaand who said they paid him?

Not the OP, for sure. It seems like the disabled person set the system up in their own time to help the company communicate with him in meetings.

>or go task Emil to go do it to ... avoid 'appearances'?

They could hire Emil to live-transcribe the speech into text.

They could hire the services of a professional company that would set up speech recognition for their meetings.

They could have done a lot of things.

What they did is nothing. Gee, a software company gave their employee a laptop, how totally unheard of!

Why are people here so willing to pass this off as some act of charity?

FloatArtifact · 2023-12-01T19:15:31.000000Z

Problem with whisper is its not really optimized for command recognition versus general dictation.

- Whisper processes 30 second audio chunks. So if you process 5 seconds of audio you have to pad it out with 25 seconds of silence. Hence a loss of efficiency with wasted CPU / GPU cycles on 25 seconds per chunk in the case above.

- Whisper most likely can't handle hundreds of commands much less than a thousand performantly.

- Whisper doesn't handle short commands very well with a degree of accuracy post processing commands from free dictation utterances.

Command dictation should be weighted higher than general dictation when decoding.

I work with a little under 1500 of commands dragon naturally speaking. DNS is hot garbage as a program despite it has the best accuracy to date with the feature of commands and dictation in one utterance. You get to pay $750 for the privilege m

I've yet to see a free and open source speech recognition engine that can handle both dictation and commands with a high degree of accuracy.

Please please let me know if there's alternatives out there. I would definitely pay to support an open source project like this that focuses on command and dictation.

Most solutions out there that are open source nowadays focus so much on iot command recognition with intents. That's not well suited for controlling your computer with grammars containing voice commands.

novok · 2023-12-01T20:23:03.000000Z

Is 30s the input size set by the model, or programs that wrap the model? Is it how it's trained?

bakkoting · 2023-12-01T22:21:36.000000Z

It's a property of the model itself.

> Input audio is split into 30-second chunks, converted into a log-Mel spectrogram, and then passed into an encoder.

https://openai.com/research/whisper

aftbit · 2023-12-01T19:09:45.000000Z

Check out Willow! It does essentially this, using WebRTC. It doesn't handle the near-real-time response yet, but it does stream the audio to the server and the change would be pretty minor.

FloatArtifact · 2023-12-01T20:00:32.000000Z

> Check out Willow! It does essentially this, using WebRTC. It doesn't handle the near-real-time response yet, but it does stream the audio to the server and the change would be pretty minor.

Simply voice to text is not what's needed for dictating commands. Unless I can load commands of on the fly and decode utterances that may be useful.

The client would need to be able to send its commands to the server on the fly.

kylixz · 2023-12-01T18:49:34.000000Z

I recommend checking out: https://talonvoice.com/

FloatArtifact · 2023-12-01T19:48:27.000000Z

It's not open source nor does the author intend to open the stack.

sagz · 2023-12-01T20:41:35.000000Z

Do they need realtime transcription?

Computer: webcaptioner.com Android: Live Transcribe (g.co/livetranscribe) iOS: Live Caption with the 'mic' icon enabled.

Web conferencing: Meet, Zoom, Teams all support realtime CC, which is pretty good.

lovich · 2023-12-01T18:30:50.000000Z

Y’all should turn that into a product, or at least open source it and get the positive PR + helping others

FloatArtifact · 2023-12-01T20:05:14.000000Z

> Y’all should turn that into a product, or at least open source it and get the positive PR + helping others

There you go. https://github.com/dictation-toolbox/dragonfly

pawelduda · 2023-12-01T18:21:08.000000Z

That's very nice of you

romwell · 2023-12-01T18:49:43.000000Z

>He ended up rolling his own solution

>That's very nice of you

...doesn't compute.

What exactly was nice here?

diab0lic · 2023-12-01T22:35:46.000000Z

> We got him a pretty beefy Alienware so he wouldn’t be tied to the cluster GPUs.

Probably this.

romwell · 2023-12-03T02:58:57.000000Z

As opposed to what, an IBM 286 that everyone else gets?

An "Alienware" is not a Ferrari, those things cost about the same as any other professional-level workstations.

nathanfig · 2023-12-01T18:46:59.000000Z

Impressive work, really excited for this.

I will note though that I feel safer getting an occasional bad word than I do having a translator straight up deceive me.

For example, "what the fuck" in English->Spanish is giving "qué diablos" output. Definitely toning down the meaning there.

If someone says something mean to me, I want to know it.

jonathanlb · 2023-12-01T19:22:47.000000Z

This may be an intentional decision given that there are several ways to say "what the fuck" in Spanish, such as "qué mierda" or "qué carajos". And that's not including regional expressions like "qué coño" or "qué chingados". So, saying "qué diablos" may be the most common expression across dialects conveying the same meaning.

nathanfig · 2023-12-01T19:28:53.000000Z

Yeah could be, I still need to read the paper to better understand the safety tuning.

Would be interesting to see some work stress-testing the ability to convey ill-intent across multiple languages. Accurately conveying ill-intent is safety-critical for the person being threatened.

infotainment · 2023-12-01T15:58:21.000000Z

It’s amazing how far text to speech has come in the past few years, but what I’m wondering is when this tech will finally make it into local TTS engines baked into the OS (eg for screen readers, etc)

PartiallyTyped · 2023-12-01T16:43:15.000000Z

The accessibility nerd in me is excited!

callalex · 2023-12-01T19:28:45.000000Z

This is already built into recent iOS devices and it’s called Live Captions.

freedomben · 2023-12-01T21:52:48.000000Z

Same with Android (Pixel phones at least).

I'm the most excited for an open source one though, and it would be incredible if this could become it. I do 95% of my compute on desktop linux and it sucks being behind.

nickreese · 2023-12-01T16:08:32.000000Z

My wife was training to be a professional voice actor to do dubbing in several languages when we met.

I told her then that the industry would be disrupted by AI before she retired.

Glad she pivoted. Really impressive results.

ggregoire · 2023-12-01T17:54:56.000000Z

> I told her then that the industry would be disrupted by AI before she retired.

Yes. I just discovered there is a text-to-speech addon [1] (now a few months old) for World of Warcraft that adds voices for every NPC in the game... It is so impressive and game changer (pun intended) that I naively asked in the chat of the Twitch stream I was watching "when did Blizzard add voices to the NPCs??". For an instant I really thought Blizzard contracted actors, but no, someone like you and me just used AI to generate realistic voices for every character in the game. I don't think it's ready yet to completely replace actors in video games (surely it will in the near future tho) but voice acting is something so expensive to do that I can see studios and developers in 2024 already use this tech for all the optional dialogues and secondary characters' voices.

[1] https://www.curseforge.com/wow/addons/voiceover

freedomben · 2023-12-01T21:54:52.000000Z

I've wondered at what point this would happen. I think it could now, but from what I've read the voice actor unions are able to prevent it currently (at least for AAA games or non-indie devs). Many of them have agreements/contracts in place for the foreseeable future, and being the first big company to replace them is a heap of terrible press that nobody is going to want to touch. I think it's the same reason Hollywood reached the AI agreement recently too.

GaggiX · 2023-12-02T06:40:57.000000Z

I think that there are a lot of voice actors that are not unionized tho. And games like The Finals already use AI for many voices.

lyu07282 · 2023-12-01T20:34:42.000000Z

Another recent example, the finals uses AI voice generation for realtime game announcements

https://youtu.be/kZ87wiHps9s

0_____0 · 2023-12-01T16:52:15.000000Z

It won't replace high-end talent, I don't think models can replicate the nuance for a long time, however the entire low-to-mid end of the market is going to get nuked from low earth orbit

chrismorgan · 2023-12-01T19:38:56.000000Z

It won’t replace it, but it’s very likely to supplant it, just about destroying the segment by reducing demand by being good enough and so much cheaper, especially as people get more used to it.

Typesetting. Music engraving. Bookbinding. The quality of all these fields have been materially harmed by advancements.

Computer typesetting has, by and large, been a significant regression, though the gap has largely been made up now if you make the right choices.

Published music scores used to be set by experts. Now they’re set by novices using software that is mechanical in method and generally quite insipid. Most are atrocious compared to the old masters, and mediocre at best compared to the typical published scores from a hundred years ago; and very few popular scores are really good (… and if they are, there’s a reasonably high chance they’ve used GNU LilyPond, which has focused on this problem). But the barrier for entry is so much lower, and people have got used to the inferior results, so I don’t know if anyone engraves music the old way, and even people that know better largely just shrug and make do with the new. Like with computer typesetting, there is hope because things have slowly improved. But most will continue to be mediocre.

Books used to be bound with cold glue. It takes time to set, but the results are very good, supple and long-lasting. Then along came hot-melt glue, and it’s just so much friendlier for cheap manufacturing because books are finished within a few minutes instead of a day or two, that I don’t think anyone produces books the old way any more, even though the results are abysmal in comparison (compare the binding and reading experience of a paperback from the ’40s or ’50s with one from the turn of the century; no one after tasting the old will desire the new; for he says, the old is good). But they’re just (barely) good enough. Unlike the other two, I don’t think there’s any hope here—the regressive advancement crowded out the superior but dearer option so that no place was found for it.

pclmulqdq · 2023-12-01T21:15:45.000000Z

You can still get relatively good published music scores from a few of the old German shops (Schirmer, Henle, etc.), but they are very expensive. They are a joy to use when playing, though, since the music is very clearly laid out and page turns are in the perfect place, etc. Finale and Sibelius are controllable enough that you can use them to do fantastic layout, but many people either do not understand how to make a score readable or don't care enough.

TeMPOraL · 2023-12-01T21:38:46.000000Z

That, and what GP describes, is what I see as the overall trend of the market to hollow out the middle. It's not just about technology (though it plays a big role), as all optimization coming from competitive pressure - materials, processes, business models, marketing.

What seems to universally happen is, the market bifurcates - one part is in a race to the bottom, the other (much smaller) aims for super premium tier (overpriced quality), because only those two positions are sustainable, once the race-to-the-bottom side drags all the economies of scale with it. So as a consumer, you get to chose between cheap low-quality garbage that's barely fit for purpose, and rare, super-expensive, professional/elite high-end products. There is no option for "good value for reasonable price".

This has been happening to everything - software, furniture, construction, electronics, vehicles, food, you name it.

RowanH · 2023-12-01T19:39:57.000000Z

I'm using AI for training videos for my startup. Never going back to voice actors outside of primary marketing videos. tThe sheer convenience of write/listen/tweak cycle on scripts is insane. In minutes you can do a voiceover which would have have taken hours + days delay prior.

Sure the final result sounds slighty robotic. 99% of people wouldn't care, and you can get more training videos done, faster for a fraction of the cost.

[Edit] And I'll add the difference from 6 months ago is noticeable to today. I imagine every 6 months we can just re-download updated voiceovers and every 6 months will sound just slightly more polished..

Shish2k · 2023-12-01T17:08:59.000000Z

I wonder which will happen first - AI evolves to work well at the high-end, or high-end humans retire and there’s nobody left in the low-to-mid end to fill their shoes…

callalex · 2023-12-01T19:30:41.000000Z

Given the modern trend of on-screen actors doing voice work, I think there will be a supply of talent for at least a few more generations.

crakenzak · 2023-12-01T17:23:43.000000Z

It will absolutely replace high-end talent. Anything that a human can do will be able to be done 10x better by a model -- especially in such a narrow and well defined domain.

sushisource · 2023-12-01T17:33:42.000000Z

Did you hear the output examples? Yeah, I think not. I mean, definitely on the way, but there's no way if you need quality acting in your dub that you're going with this.

ygjb · 2023-12-01T19:14:44.000000Z

These are models specially tuned and sized for near real-time, instant translation. It would be naive to think that there aren't technical creatives building and training models tuned for expressiveness and nuance in a more controlled environment.

crakenzak · 2023-12-01T19:15:20.000000Z

Maybe not in the current state of the model, but judging by the rate of improvement we’re all seeing it’s just a matter of time (and data+compute+research obv).

dontupvoteme · 2023-12-01T19:36:25.000000Z

That's what they gave us plebs. To think they don't have a superior one they can sell...

dvngnt_ · 2023-12-01T19:29:09.000000Z

i think the key word is will.

a few more years of improvements if they happen could be disruptive