Hacker News new | past | comments | ask | show | jobs | submit login

It's infuriating when I ask Siri to play a song and it decides to pick an obscure remix of the song one day, the actual song another day, then another remix another day.

Did they test this at all? Why would you ever pick a more complex/verbose option from the results list?




One of the most infuriating things about any voice assistant, IMO, is the absolute clunkiness with which you have to try and control music.

I use Android Auto in my car for safety. God forbid I'm listening to a song and want to queue up another one right after. I don't think I've ever gotten that behavior to work without either skipping the current song or adding the second song to the end of the queue. And even that's assuming that it even recognized the proper song out of my library and didn't try to go to Youtube or something.

Meanwhile, if my buddy is riding shotgun, I can just say "Hey, put on Peace of Mind next"


I switched from Apple Music to Spotify and have yet to figure out how to get it to play a playlist. So it’s either “tap on the screen while hurtling down the highway” or “awkwardly tell it to play each individual song after each one completes”.


I wound up in a screaming match with my Homepod recently.

“Siri, play Beethoven”

<plays a Beethoven-derived hiphop track>

!!!


Hahah yeah I said “Siri play classical music on Spotify” and siri said “I’m afraid I can’t do that” and then starting playing Sabaton (which is Power metal and the exact opposite of classical music).

Then Siri had the gall to claim NO MUSIC WAS PLAYING as this super loud music was assaulting our eardrums. My wife thought my exasperated struggles were the funniest thing, it felt like HAL-9000 with the “I’m sorry I can’t do that Dave” moment.


I've had the exact same experience with Google Home.

"Play music on dining room speaker." (dining room speaker starts blaring music at high volume) "Turn down dining room speaker." (No response) "HEY, GOOGLE. Stop music on dining room speaker." (dining room speaker music volume decreases) (from the dining room speaker) "Can't find dining room speaker." (dining room speaker volume increases, blaring music)

I did follow the advice here to rename all my speakers lowercase, since google home's VOICE interface seems case-sensitive:

https://www.reddit.com/r/googlehome/comments/jsadkp/i_unders...

And since the Google Home android app is, literally, the worst and least-reliable mobile application I've ever used, the voice interface is pretty much all I've got.


> Hahah yeah I said “Siri play classical music on Spotify” and siri said “I’m afraid I can’t do that” and then starting playing Sabaton (which is Power metal and the exact opposite of classical music).

As a Sabaton fan, I laughed out loud. Technically they have classic music sounding (roughly) songs, e.g. Christmas Truce, but yeah, that's a massive fail.


Had the same experience at a friends house trying to get Alexa to play Bach. I gave up after a half dozen attempts.

These products are just hilariously bad.


Names are a hard problem though. How would it know how "Bach" is pronounced? It seems you would need pretty advanced multimodal AI, some sort of GPT which is trained both on text and audio.


That's kind of my point, it's a product that doesn't work. It's a language interface that can't understand language.

FWIW I tried many alternatives like "Bach classical music" etc to no avail.


How do you pronounce Bach?


The problem I've had with Siri and music has nothing* to do with parsing individual words. What I've found recently is that if you don't give an exact match Apple just puts in random shit. Hey Siri play the album Rubber Soul by the Beatles on Apple Music gets me random songs by the Beatles because apparently I have Rubber Soul named "Rubber Soul [some edition info]". Hey Siri play songs by the band Duran Duran literally just plays the eponymous album because reasons. You don't need AI, machine learning, GPT, LLM, or whatever fucking buzzword is all the rage, you simply need to revert to behavior that was standard in iOS 15 and earlier. The upgrade to iOS 16 completely nerfed Siri on my phone, starting with some mandatory trial subscription bullshit.

It's to the point where I've given up trying to use Siri while driving.

* Almost nothing. I still have to say "play underground eight zero s on soma fm" because reasons.


It's one of those products that makes me think: "Do the eng directors in charge of it even use this thing??"


Yeah, some others also had the impression that it got worse over time in some aspects. I wonder whether this was some kind of tradeoff with other abilities. Or perhaps they rewrote the code, which had unintended side effects.


GPT doesn't need to know, I told it to convert possible mistranscribed text to commands, "Play me some back" => "Play me some Bach"


Yes, GPT is particularly good understanding, even when you misspell things. It would make a good front-end interface to something like Siri. take the human input and make it something that makes sense to the dumb computer.


Yes, but large language models are very compute intensive and require a ton of RAM. So they wouldn't be able to run locally (this is currently possible with Siri), would be relatively expensive and possibly slow. So they might still be a while off.


“Bach” is a word like any other, and unambiguous in English pronunciation.


There is no "English" here, it's a name of a concrete person which has only one correct pronunciation.

https://upload.wikimedia.org/wikipedia/commons/f/fb/De-Bach....


The accurate pronunciation in German is irrelevant. Siri is always constrained to one language (Settings > General > Language & Region), and when set to English you get English pronunciations.

Same way Siri understands the english “Los Angeles” even though the G sound is completely different from Spanish.

The english IPA for Bach is straightforward.


Lots of Americans seem to try to pronounce Bach the composer "correctly", which leads to batch, bash, buck, ... which is fine, the German hard "ch" is very hard to form for English throats, and it's always better and more polite to at least try than to simply pretend foreign words and names are just weirdly spelled English ones, but it's not as straightforward as with a John Bach from Ohio.


So how would you have to pronounce "Chopin" in "English"? This doesn't make sense. There aren't even consistent pronunciation rules for many genuine English words, like "ead" in "read" and "thread". It's not even straightforward for English speakers to correctly pronounce "Eliezer Yudkowsky". Which means it's even harder for Siri.


> Siri is always constrained to one language

Not only Siri — the whole iOS. You can’t type a sentence switching languages in the middle, without changing the keyboard language all the time, if you have autocorrect enabled. It will change what you type into utter gibberish, even though without the “correcting” what you type is perfectly correct. This system is quite visibly designed by people who speak only one language and don’t understand that people may want to use multiple languages at the same time. The keyboard should support a mix of languages, instead of making a XOR between languages, because otherwise when it starts, it’s almost always in the wrong mode, and if it isn’t, it will almost certainly be wrong by the end of what I write.


You’re talking as if there is an anccepted standard English pronunciation of Bach. The only one I know is the German one which I would use when speaking English. Perhaps I would soften the ending.


My point is that there's nothing particularly special about this name of foreign origin, compared to any other word. Every word has lots of variations in how they're pronounced.

The audio clip the person posted was for a true German pronunciation, which happened to be very different than how 99% of English-speakers would say it.


I've had multiple teachers teach me different languages (other than english), not one called me by my english pronunciation. It seems that it's just people who speak English that try to do this.


> it's a name of a concrete person which has only one correct pronunciation.

This is an insane standard. The [x] at the end of the German word doesn't exist in English; most English speakers wouldn't be able to pronounce it if they wanted to. When the demands you're making are literally impossible, the problem is you.


So just because the "th" sound doesn't exist in many languages, like German, they should pronounce "Heath Ledger" or "Anthony Hopkins" or "The Beatles" incorrectly? That seems to me a way more "insane" standard. By the way, the Scottish are perfectly able to pronounce "Loch Ness", which has the same sound for "ch" as "Bach".


> So just because the "th" sound doesn't exist in many languages, like German, they should pronounce "Heath Ledger" or "Anthony Hopkins" or "The Beatles" incorrectly?

They're going to use the sounds that exist for them, yes.

> That seems to me a way more "insane" standard.

I hope you never get to make any decisions. Dave Barry once wrote about someone thinking "What an idiot I am! Here I am, a Japanese person, in Japan, and I can't even speak English!"

But then again, Dave Barry was joking.

> By the way, the Scottish are perfectly able to pronounce "Loch Ness"

The population of Scotland is 5 million; if you want to talk about "most English speakers", the Scottish aren't even worth noticing.


> > So just because the "th" sound doesn't exist in many languages, like German, they should pronounce "Heath Ledger" or "Anthony Hopkins" or "The Beatles" incorrectly?

> They're going to use the sounds that exist for them, yes.

That wasn't the question I asked. They will at least try to pronounce "Heath Ledger" or "Chopin" correctly, they won't act as if there was a correct German way to pronounce those names.


I lived in Japan for a while. My name contains sounds that just didn't work for them. No one pronounced it correctly.

I was not upset, annoyed, or confused. It's just the way language acquisition works. You learn the sounds you need and the rest are hard to acquire later in life.

Be strict in what you send, forgiving in what you receive.


> It's just the way language acquisition works. You learn the sounds you need and the rest are hard to acquire later in life.

As a point of interest, this is actually backwards. You're born recognizing all the sounds; what you learn is to ignore the difference between sounds that aren't distinct in your language.

You do keep that ability for the rest of your life, but it isn't helpful when you try to learn to recognize foreign sounds.


Also you want your voice assistant to play you music not hassle you about correct pronunciations.


But that's exactly the issue here when people use the correct pronunciation, which happens to be different than how normal words in their language are pronounced, but the voice assistant assuming normal language, which leads to absurd misfirings. The issue is not people not knowing how to pronounce something, the problem is that it's a hard problem for "dumb" AIs to know how a certain name is pronounced, as long as they are not multimodal LLMs.


I think there's something about sounds that you learn early on in language acquisition - maybe your brain develops differently.

'th' is the obvious one that non-english speakers struggle with. I remember a dutch guy laughing at my attempts at various dutch words - I literally could not hear the difference between his pronunciation and mine.

And 'ch' (as in Loch or Bach) is a sound in Scottish english but not in English english.

I lived in Scotland till I was 4, then moved to England and all traces of my previous Scottish accent are long long gone. But my friend, whose surname is Donnachie, says I'm the only English person she's met who pronounces her name correctly - I guess because I learnt that sound early on.

Similarly, my dad, who learnt english in India, still struggles with a "j" sound (he says "zudge" instead of "judge"), despite living here for 50 years and having a posh middle-class English accent that sounds just like a "native" english speaker.


> And 'ch' (as in Loch or Bach) is a sound in Scottish english but not in English english.

You need to talk to a Scouser! Back and Lock will be pronounced Bach and Loch.


Well, sure.

I don't know if "th" exists in Polish or not, but a common (perhaps dominant) spoken way to refer to "The Beatles" is[0] "Bitelsi", which not only loses "th", but also like half the other sounds in the name[1].

Thing is, we understand it just fine. More than that, if you overheard me saying to someone, "puść teraz Bitelsów" ("put on the Beatles now"), there's a good chance you'd identify the name from context. If you didn't, you could always ask to verify (well, not if you were actually overhearing me...).

----

[0] - Or at least would look like that written down. Polish is mostly a "you say it as you see it" language, but with foreign names, often enough people write the correct form but use localized pronunciation.

[1] - I'm sorry, I'm not a phonetician.


Want to flummox the Japanese tongue? Try a sentence like "Darth Vader is Luke's father". It hits most of the highlights: interdentals, labiodentals, and that weird 'r' sound English has that Japanese sometimes tend to conflate with 'l'. Even a competent Japanese English speaker is likely to render it as "Dāsu Bēdā izu Rūkusu fazā". Depending on the region they may mess up the 'f'; the syllable 'fu' is actually 'hu', but pronounced with very pursed lips in Tokyo Japanese (not so much in Kansai).

Unless they're bilingual from childhood, most people are not able to pronounce sounds outside their milk tongue without difficulty. That you expect English sounds to be perfectly pronounceable by non-English-speakers is probably more reflective of the fact that quality English education is widely available where you live than anything.


That is completely wrong. People have many names in practice, especially historical persons. Even living people often present themselves differently in different languages.

For some examples:

- the famous Romanian/French modern sculptor Constantin Brîncuși (which uses a vowel that has no direct correspondent in either French or most dialects of English, and it pallatelizes the ending sh, so that it's pronounced in two syllables, brîn-cush with a slightly pronounced ee at the end), but also Brancusi (in French, roughly bran-cu-see).

- in Japanese, since Japanese speakers have relatively few syllables they are familiar with, almost all foreign names are expected to be Japanized; for example, if your name is "Stephen", you would be expected to present yourself as, roughly, "su-tee-ve-n", and write your name with the corresponding katakana characters in certain official documents


Cumberbach


> Bach” is a word like any other, and unambiguous in English pronunciation.

There is another pronunciation if you want a holiday in New Zealand.

https://en.m.wikipedia.org/wiki/Bach_(New_Zealand)


Reminds me of the doublewide that served as a motel suite I stayed at in Australia. It was dirt cheap and better than you might think -- very cozy.


Do you have to say Johan Sebastian Bach. Siri will work with that


Soundtrack to that dog movie https://www.youtube.com/watch?v=85A2rWA5O3o


I take your point but really like the mirth of a man who died 30 years before audio recordings being represented by a hip hop version of one of his pieces! He never recorded a canonical version of anything!


He did lots of canonical midi-recordings on paper though.


Funny, my Android Auto playlist for "Bach" is actually Wendy Carlos, a CD rip of Switched-on Bach 2000, which she did on an early 90's Mac II using MIDI sequencing.

In general I find that 80's and 90's era CD's ripped directly to FLAC still sound really good.


You get the same issues on Android. There is zero intelligence, it's embarrassing. You can have a song in your library that you've listened to every day of your life, but it'll still decide you actually want to listen to some weird track you've never heard of just because the names are similar.


It gets real bad if you listen to music in a language other than the language you have Siri set to. It's attempts at deciphering Japanese punk or Bollywood song titles are terrible.

It also has a habit of invoking whenever I say "Hi sweetie" to my neighbor's dog.


Ok Google doesn't work well either.

20% of the time it can't play songs on Spotify, and I have no idea why.


I assume it’s because as the number of features increased, the accuracy of recognising specific phrases went down. The eggcorn in the machine.


I think part of this might be Spotify’s fault. I moved from Apple Music to Spotify Premium and my biggest complaint (aside from literally one SPECIFIC song I really love not being on Spotify at all) is that there is no HomePod support (you have to airplay the music to the HomePod) and Siri support is shoddy.


I understand this isn’t necessarily Apple’s fault, but I bought a HomePod (which are very expensive smart speakers!) because I assumed it’d be the most convenient “smart” speaker with good audio quality. It does deliver on audio quality, but despite supporting literally every other smart speaker under the sun Spotify has no support for HomePod and instead you have to AirPlay from your phone. So my kid can’t listen to Spotify, and the HomePod is useless to anyone but me or my wife.


I think this is Spotify's fault. They're in a weird feud with Apple and won't properly support their app on all of Apple's platforms. I don't know who they think they're winning over but it just makes me dislike them more, I'll pick the Apple ecosystem over their app if push comes to shove.


I stopped using Siri because it does recognise the artist/song I want, and then it plays a completely different thing, not even a related genre.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: