elipollak's comments

elipollak · on Aug 5, 2017

Maybe a silly question but could you use this to recognize phrases or words in a language other than English?

dylanbfox · on Aug 5, 2017

We haven't actually tried that yet. I imagine if you customized your model with words from another language and then pronounced them with an english accent the API might be able to recognize them ok. Would be a fun experiment to try at least!

yorwba · on Aug 5, 2017

If I understand correctly, "customizing the model" essentially adds new words to the vocabulary and adjusts the language model to change the probability of some phrases, but does not require any information about pronunciation, let alone audio samples.

But isn't having just the English text really error prone, especially when you are dealing with terms of art and proper names, that might even have roots in foreign languages? E.g. some people pronounce SQL as "sequel", and the English pronunciation of French words varies between "French pronunciation with English accent" and "French orthography interpreted as English orthography". (I'm guessing your model would tend towards the latter?)

So what I'm interested in is whether you have encountered examples of this during your testing, and whether you have some way to work around it (I would try phonemic transcriptions in addition to English); or whether this is not relevant for the use-cases you are trying to cover and the convenience of just using English text trumps the accuracy loss due to just using English text.

dylanbfox · on Aug 5, 2017

Hey! Great question. Our system is actually able to handle transcribing "sequel" as "SQL" automatically if you were to "customize the model" for phrases like "what was my latest SQL query". It can also get words like "colonel" pronounced "kernel". In both cases, without needing the explicit pronunciation of the word. We have some customers who've uploaded thousands of proper names, for example, and we're able to transcribe all of them without needing the explicit pronunciation. This is possible because our ASR implementation is pretty different than traditional setups like Kaldi. You're right that there are some edge cases, especially with foreign words, but we're working hard on smoothing those out.

yorwba · on Aug 6, 2017

Sounds amazing! Now I'm really interested how your setup can do that. Will you publish anything about it, or is this the kind of secret sauce you'd rather keep secret?

woodson · on Aug 5, 2017

You can create phonemic transcriptions as a back-off for unknown words (at least in WFST based setups), but with things like "sequel" this won't help much.

yorwba · on Aug 5, 2017

AssemblyAI is apparently using their own TensorFlow implementation, not weighted finite-state transducers like e.g. Kaldi.

Speaking about WFSTs, why wouldn't it work for "sequel"? I have only done the "Kaldi for Dummies" tutorial (i.e. digit recognition), but from what I understand, you could add an utterance "s iy k w eh l"/"SQL" and add phrases like "SQL query" to the corpus and this would make it more likely than "sequel query".

elipollak · on June 19, 2017

Really nice post. We've suffered similar challenges using slack with our remote team and switched to Basecamp, which has been much better.

What's your sense of how Twist and Basecamp compare?

amix · on June 19, 2017

Twist is just about communication. Basecamp has a bit of everything, but it isn't particularly good at any of the things.

We wanted to create a product that just does one thing really well (and that's team communication).

elipollak · on Oct 2, 2015

Great question! In this case, you should of us as a highly trained personal assistant who is helping you cancel your Comcast. We help manage all the crappy parts of the process behind the scenes, but you're still the person actually doing the cancellation.

walterbell · on Oct 2, 2015

How do you verify identity, e.g. could an attacker use this to cancel a target's service? When you call customer service, cable companies usually have a few security questions to confirm the customer's identity. A letter usually has a customer signature. How do you close the identity loop with the person keying data into your web form?

estsauver · on Oct 2, 2015

We charge their credit card. We don't think it's likely that people will use this to maliciously cancel people's Comcast, but we'll absolutely cooperate with authorities in the event of identity theft.

rrauenza · on Oct 2, 2015

Malicious? Cancelling a random person's Comcast might be a random act of kindness!

elipollak · on Oct 2, 2015

What an amazing story. Thanks for checking out our site and sharing that.

Pretty crazy to imagine the incentives Comcast must put in place to get sales reps to act like that.

mnx · on Oct 2, 2015

Or punishments for failure

ghughes · on Oct 2, 2015

The incentive is that you get to keep your job.

elipollak · on Oct 2, 2015

Uh oh! Should be fixed now. Thanks for letting us know :)

elipollak · on Oct 2, 2015

Just fixed that! Thanks for letting me know