We have been a pretty large user of this feature within Watson for the last 6 mo...

nshm · on March 12, 2017

Why migrate to another service in 2017 when open source toolkits like Kaldi provide you both better results and more features and no vendor lock-in.

taf2 · on March 12, 2017

Cool - well we are hiring, so if you'd like to do this reach out. We have lots of neat projects like this going on all the time.

btown · on March 11, 2017

As someone unfamiliar with the terminology, are the speakers isolated in single tracks, or is there a mix on each channel and due to the differences in relative volume, the system is able to distinguish speakers? The latter seems tremendously valuable if difficult to accomplish.

woodson · on March 11, 2017

For the evaluation in the paper, speakers are on separate channels (mono, it's a telephone conversation, after all). Generally there are solutions for separating speakers on a single channel that can work fairly well (assuming your training data is similar to the target domain) if you know the number of speakers beforehand, but it's tremendously hard if you don't (think transcription of large meetings).

yjftsjthsd-h · on March 12, 2017

In fairness, sorting out speakers in a conference call is hard for a human.