Hacker News new | past | comments | ask | show | jobs | submit | elipollak's comments login

Maybe a silly question but could you use this to recognize phrases or words in a language other than English?


We haven't actually tried that yet. I imagine if you customized your model with words from another language and then pronounced them with an english accent the API might be able to recognize them ok. Would be a fun experiment to try at least!


If I understand correctly, "customizing the model" essentially adds new words to the vocabulary and adjusts the language model to change the probability of some phrases, but does not require any information about pronunciation, let alone audio samples.

But isn't having just the English text really error prone, especially when you are dealing with terms of art and proper names, that might even have roots in foreign languages? E.g. some people pronounce SQL as "sequel", and the English pronunciation of French words varies between "French pronunciation with English accent" and "French orthography interpreted as English orthography". (I'm guessing your model would tend towards the latter?)

So what I'm interested in is whether you have encountered examples of this during your testing, and whether you have some way to work around it (I would try phonemic transcriptions in addition to English); or whether this is not relevant for the use-cases you are trying to cover and the convenience of just using English text trumps the accuracy loss due to just using English text.


Hey! Great question. Our system is actually able to handle transcribing "sequel" as "SQL" automatically if you were to "customize the model" for phrases like "what was my latest SQL query". It can also get words like "colonel" pronounced "kernel". In both cases, without needing the explicit pronunciation of the word. We have some customers who've uploaded thousands of proper names, for example, and we're able to transcribe all of them without needing the explicit pronunciation. This is possible because our ASR implementation is pretty different than traditional setups like Kaldi. You're right that there are some edge cases, especially with foreign words, but we're working hard on smoothing those out.


Sounds amazing! Now I'm really interested how your setup can do that. Will you publish anything about it, or is this the kind of secret sauce you'd rather keep secret?


You can create phonemic transcriptions as a back-off for unknown words (at least in WFST based setups), but with things like "sequel" this won't help much.


AssemblyAI is apparently using their own TensorFlow implementation, not weighted finite-state transducers like e.g. Kaldi.

Speaking about WFSTs, why wouldn't it work for "sequel"? I have only done the "Kaldi for Dummies" tutorial (i.e. digit recognition), but from what I understand, you could add an utterance "s iy k w eh l"/"SQL" and add phrases like "SQL query" to the corpus and this would make it more likely than "sequel query".


Really nice post. We've suffered similar challenges using slack with our remote team and switched to Basecamp, which has been much better.

What's your sense of how Twist and Basecamp compare?


Twist is just about communication. Basecamp has a bit of everything, but it isn't particularly good at any of the things.

We wanted to create a product that just does one thing really well (and that's team communication).


Great question! In this case, you should of us as a highly trained personal assistant who is helping you cancel your Comcast. We help manage all the crappy parts of the process behind the scenes, but you're still the person actually doing the cancellation.


How do you verify identity, e.g. could an attacker use this to cancel a target's service? When you call customer service, cable companies usually have a few security questions to confirm the customer's identity. A letter usually has a customer signature. How do you close the identity loop with the person keying data into your web form?


We charge their credit card. We don't think it's likely that people will use this to maliciously cancel people's Comcast, but we'll absolutely cooperate with authorities in the event of identity theft.


Malicious? Cancelling a random person's Comcast might be a random act of kindness!


What an amazing story. Thanks for checking out our site and sharing that.

Pretty crazy to imagine the incentives Comcast must put in place to get sales reps to act like that.


Or punishments for failure


The incentive is that you get to keep your job.


Uh oh! Should be fixed now. Thanks for letting us know :)


Just fixed that! Thanks for letting me know


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: