Watson now brings cognitive speech capabilities to developers

mind_heist · on Feb 9, 2015

Hmm ... I along with a couple of my friends spent the last two days at the DeveloperWeek hackathon trying to explore Watson's capabilites. IBM's PaaS solution is called BlueMix and all of Watson's capabilities are available as Services for you to use.

We tried using the "tradeoff Analytics" service for the project - and I must say , the tools and help available around it, the API and its documentation is pretty bad , convoluted and unusable. This is true too other services available through watson too.

We looked into IoT ( Internet of things ) as well . And once again , ran into a ton of dead ends without being able to proceed. The API documentation and examples just suck . If you are used to playing around with well documented APIs / Tools / Languages - this is going to be frustrating.

If the OP is the person who actually wrote the article , please please please go back to Watson Dev Cloud or BlueMix and try it out your services and APIs an a consumer.

squeaky-clean · on Feb 9, 2015

I signed up for Bluemix a while back, when the Watson beta was first announced. IBM have been very nice, they keep in touch over email and phone to see what I'm working on (and I don't even have a paid plan). I'd really like to use the platform, but I just can't. For exactly the reasons you specified.

In my last conversation over the phone, trying to set up the tools to work with Bluemix, several "Getting Started" pages for important steps lead to 404 errors.

picheny · on Feb 9, 2015

We really appreciate the comments and will try to fix the problems. To be honest, sometimes developers can't see even the most obvious flaws in documentation. If you can highlight even one incomprehensible point it would help a lot to accelerate the revision process.

sk5t · on Feb 10, 2015

Overall the BlueMix and Watson documentation is very green and fair to call alpha-stage. This problem tends to all directions, although if a specific example would be helpful, trying to discover what load balancing options exist within BlueMix was something I tried and failed to learn last week.

For a Watson-specific example, at least a few months ago it was the case that putting together a full-featured client implementation for Q&A required poking around several obscure webpages and then plenty of runtime experimentation on top.

undergrowth54 · on Feb 10, 2015

This talk at pyTexas is really good for laying out specific Questions to ask yourself that can guide you to better docs: http://www.pyvideo.org/video/3167/keynote-developer-experien...

jsstylos · on Feb 9, 2015

Hi, sorry you didn't have a good experience with the API and documentation. I'd love to get some more details on what the stumbling blocks were. Thanks, jsstylos@us.ibm.com

mind_heist · on Feb 9, 2015

Hmm .. OK . are there external facing portals to file bugs against the services ?

jsstylos · on Feb 9, 2015

https://developer.ibm.com/answers/smartspace/watson/ is our forum for issues with the services.

wvonhagen · on Feb 10, 2015

In addition to filing bugs, I'd love to get suggestions about how we could improve the documentation. Did you find the API reference material lacking, was the problem in the procedural documentation, (or was it both)? Is the problem with our service docs or figuring out Bluemix? As @jsstylos pointed out, the forum is the way to let us know what you think - not just bugs, but thoughts on what's missing and how to improve things. We'd really like you to be successful ;-)

aroopPandya · on Feb 10, 2015

Hey..would love to talk to you and others and find out bit more around the issues..I am in sf for next few days..

kovacs · on Feb 10, 2015

I'm in SF and spent a few hours today trying to get the speech to text API working with very little success. I'm happy to meet with you and go through this in person so you can get helpful feedback. Short of that the feedback would be to:

1) Decouple your examples from Bluemix. It's needlessly complicating your efforts to get people using the APIs which is why people are showing up, not because they want a PAAS. At least not right off the bat. Less moving parts = better.

2) Create GOOD examples/API libraries for all the languages you're creating these examples for. The ruby on rails example is lacking. It doesn't demonstrate how to call the APIs at all. I'm using the REST API docs and the only headway I've made there is that I'm able to authenticate and create a session. Trying to upload audio using your documented approach (passing the session id as a path param) returns a 401. Passing the u/p via BasicAuth in the same request doesn't work either. As of right now I have no idea how to get an audio post to work and very little in the way of debugging information/docs.

aroopPandya · on Feb 11, 2015

Hi..are you free on Thursday morning I can meet up at your choice of place and work through with you..

kovacs · on Feb 11, 2015

I am free. Email me at kovacs@gmail.com

I did get things working and am working on a ruby gem that I'm hoping to be able to release for general consumption but in the mean time for folks using ruby here's a snippet that will get the simple case working once you've setup a bluemix app and bound the speech to text service to it:

https://gist.github.com/kovacs/72ad554a773a62818d46

aroopPandya · on Feb 12, 2015

PM sent let..can meet during lunch today and be happy to meeet

frik · on Feb 9, 2015

In 1999 IBM released a free version of ViaVoice (http://en.wikipedia.org/wiki/IBM_ViaVoice). IBM sold ViaVoice in 2003 and all distribution functions passed to ScanSoft, now called Nuance (http://www-01.ibm.com/software/pervasive/viavoice.html). Does IBM still own the whole stack, or is it based on Nuance code?

Are there plans to open up parts of the older voice technology and contribute it e.g. to CMU Sphinx?

picheny · on Feb 9, 2015

We only gave up the shrinkwrap product; the core technology stayed with IBM.

picheny · on Feb 9, 2015

Sorry, I missed the question at the bottom. We were very proud of ViaVoice at the time but to make an obvious point, the technology has moved on a lot over the past ten or so years...

frik · on Feb 9, 2015

The old ViaVoice can't compete with Watson Voice & Nuance but would be a good alternative to existing open source voice technology that is years behind. It's highly unlikely that IBM would release such tech, nevertheless it would be appreciated.

picheny · on Feb 9, 2015

We are flattered by the interest and will look into it; obviously given no one has looked at this for years, it is not a likely possibility.

oomkiller · on Feb 9, 2015

The ability to create your own models is very important to use this, as your existing ones do a bad job of processing my normal speech. To test I just tried reading a few simple phrases and the error rate is pretty high. The Web Speech API did a great job with the same phrases https://www.google.com/intl/en/chrome/demos/speech.html

vaibhava72 · on Feb 11, 2015

Just to rule out a known issue with some laptop built-in microphones (e.g. https://developer.ibm.com/answers/questions/174176/speech-to... ) - could you let us know if you tried with an external close-talking microphone, and if that mattered at all?

oomkiller · on Feb 18, 2015

It seemed a bit better with my bose headset, but still was quite lacking.

picheny · on Feb 10, 2015

It is understandable; different people may see different performance depending on what speech service is used. In addition, please keep in mind this is just a beta service; we have only been out there less than a week and know we still have some tuning to do. We also recognize the need for customization and personalization abilities; keep watching our website for continuing developments.

mind_heist · on Feb 9, 2015

Well - I haven't tried the Speech Service myself . But as dev, we notices pretty high error rates with the 'user modeling service' as well.

reledi · on Feb 9, 2015

Live demos that you can play with:

Speech to Text: https://speech-to-text-demo.mybluemix.net

Text to Speech: http://text-to-speech-demo.mybluemix.net

sho_hn · on Feb 10, 2015

The speech synthesis is impressive. It's still clearly a computer, but the prosody is a step up from Google and Bing. I threw some random comments I've written at the English Female voice model. It seems capable of contrasting clauses via rising and falling (and handles patterns like "On the other hand, ..." and "She was either ..., or .... when ..." well), makes little dramatic pauses after noun clusters to allow the listener to catch up, inserted a little mental comma into stuff like "greater than x [,] half the time", put emphasis on an "and" after a comma ("[...] something I still haven't gotten used, and am not sure I want to") etc., lots of traits of an aware speaker. Heck, I almost felt like it picked up speed and layered in an ounce of incredulity when it was reading a rant I wrote, but it might just be good enough that I can project into it on that one.

visarga · on Feb 10, 2015

I copy pasted your paragraph into the TTS. I like it very much, but it might just be on the level of Alex from Mac OS.

> it might just be good enough that I can <project> into it on that one.

Funny. It doesn't stress <project> as a verb, but as a noun, making the whole phrase mean something else.

Alex has a list of words like that too: live (to live) and live (live concert), progress (also verb and substantive), record, suspect and a bunch of other words that have multiple pronunciations based on the surrounding words (they are called homophones).

They should add homophone disambiguation - probably solvable with a classifier based on features extracted from surrounding words and POS tagging.

Yhippa · on Feb 9, 2015

Has anyone used Bluemix past the 30-day trial? It looks like you get 375 GB-hours. That sounds like quite a lot of time. It sounds like as a developer I can mess around with their beta services and not worry about paying anything.

mind_heist · on Feb 9, 2015

You should give it a try . I have a 30-day trial account and spun up a couple of services. Well the UI is cool , and there are a lot of templates ( boiler plates ) that you could use to get you app up and running. But you might struggle with respect to documentation depending on what services you use. A lot Watson stuff is in Beta ( you could see it when you login to Bluemix) - and you might have troubles with them.

picheny · on Feb 9, 2015

Beta is certainly beta but if you find problems we will try to fix them as quickly as we can. Real users of technologies tend to find issues with the technology much faster than the actual developers...

ilyaeck · on Feb 10, 2015

The ASR seems to be a bit immature, but the TTS sounds very nice. Any plans to add more voices?

keelyw · on Feb 9, 2015

I had to input my credit card once I exceeded the 30-day trial, but I haven't been charged for anything because I'm only using the beta services. And even the Bluemix production services allow free usage up to a certain level, which varies by service.

ilyaeck · on Feb 10, 2015

It looks like IBM is getting serious about Watson, but still not serious enough. To create an ecosystem and incentivize developers to work through all the issues, IBM should probably create an investment fund for startups who build their products based on Watson. Any such plans?

skadamat · on Feb 10, 2015

They do have an investment fund for Watson products / startups -- http://www.tefunds.com/

mind_heist · on Feb 9, 2015

There is some thing else I would like to point out as well , IBM folks on this thread can answer the question. This is regarding the "user modeling service" of Watson. I spoke to a couple of IBM folks and asked what are some of the coolest Apps they had seen that was built using Watson - and someone mentioned the following MSNBC article . It's Watson perception about the State of the Union Speech .What the user modelling service does - is to take text as input and sentiment analyze it ( and give outputs around it)

http://www.msnbc.com/msnbc/how-supercomputer-sees-the-state-...

What the folks @ MSNBC did was to pass the last 10 SOTU speeches to Watson and collate the results over a graph.

But, Here is why I have trouble believing Watson's perception. Try passing the following input to Watson - (or any other gibberish)

"jkldsjglkfdsjgdfls kg;jsf g dsfg fdg jsdfjgdfskg dfsgj df gfgdflg;dfg g;fkgljsdfgk;gjldfg dgkjldfgdhfgkjdfhjg fkldskf;ksdlf;ksdlfks jkdhfkhsdjkfhksdhj ljfsdjfhdsjkfjskdhfkjsdhf sdfkls;dkfl; dkfl;sd;fsk roweruoweuroiweuroiwe uweoruweoruweo ruweuro kjgsfgjkldfsjgs klfgjfdsl gjfdlkg fd jkldsjglkfdsjgdfls kg;jsf g dsfg fdg jsdfjgdfskg dfsgj df gfgdflg;dfg g;fkgljsdfg kjgsfgjkldfsjgs klfgjfdslgjfdlkg fd jkldsjglkfdsjgdfls kg;jsf g dsfg fdg jsdfjgdfskg dfsgj df gfgdflg;dfg g;fkgljsdfg kjgsfgjkldfsjgs klfgjfdsl gjfdlkg fd jkldsjg lkfdsjgdfls kg;jsf g dsfg fdg jsdfjg dfskg dfsgj df gfgd flg;dfg g;fkglj sdfg kjgs fgjk ldfsj gs klfgjfds lgjfdlkg fd jkldsjg lkfdsjgdfls kg;jsf g dsfg fdg jsdfjg dfskg fsgj df gf gdflg;dfg g;fkglj sdfg kjgsfgjkldfsjgs klfgjfd slgjfdlkg fd jkldsjglk fdsjgdfls kg;jsf g dsfg fdg jsdfjgdf skg fsgj df gfgdflg;dfg g;fkgl jsdfg kjgsfgjk ldfsjgs klfgjfdslgjfdlkg fd jkldsjg lkfdsj dfls kg;jsf g dsfg fdg jsdfjgdfskg dfsgj f gfgd flg;dfg g;fkgljsdfg kjgsfgjkldfsjgs

fkldskf;ksdlf;ksdlfks jkdhfkhsdjkfhksdhj ljfsdjfhdsjkfjskdhfkjsdhf sdfkls;dkfl; dkfl;sd;fsk roweruoweuroiweuroiwe uweoruweoruweo ruweuro kjgsfgjkldfsjgs klfgjfdsl gjfdlkg fd jkldsjglkfdsjgdfls kg;jsf g dsfg fdg jsdfjgdfskg dfsgj df gfgdflg;dfg g;fkgljsdfg kjgsfgjkldfsjgs klfgjfdslgjfdlkg fd jkldsjglkfdsjgdfls kg;jsf g dsfg fdg jsdfjgdfskg dfsgj df gfgdflg;dfg g;fkgljsdfg kjgsfgjkldfsjgs klfgjfdsl gjfdlkg fd jkldsjg lkfdsjgdfls kg;jsf g dsfg fdg jsdfjg dfskg dfsgj df gfgd flg;dfg g;fkglj sdfg kjgs fgjk ldfsj gs klfgjfds lgjfdlkg fd jkldsjg lkfdsjgdfls kg;jsf g dsfg fdg jsdfjg dfskg fsgj df gf gdflg;dfg g;fkglj sdfg kjgsfgjkldfsjgs klfgjfd slgjfdlkg fd jkldsjglk fdsjgdfls kg;jsf g dsfg fdg jsdfjgdf skg fsgj df gfgdflg;dfg g;fkgl jsdfg kjgsfgjk ....

=====

And Watson rates it as the following.

Big 5

Openness100% Adventurousness100% Artistic interests2% Emotionality1% Imagination100% Intellect100% Authority-challenging100% Conscientiousness93% Achievement striving94% Cautiousness57% Dutifulness1% Orderliness1% Self-discipline81% Self-efficacy3% Extraversion1% Activity level1% Assertiveness1% Cheerfulness1% Excitement-seeking2% Outgoing1% Gregariousness1% Agreeableness1% Altruism1% Cooperation1% Modesty1% Uncompromising1% Sympathy1% Trust1% Emotional range11% Fiery1% Prone to worry10% Melancholy34% Immoderation24% Self-consciousness6% Susceptible to stress9%

Needs

Challenge61% Closeness84% Curiosity51% Excitement66% Harmony65% Ideal54% Liberty75% Love23% Practicality86% Self-expression25% Stability60% Structure57%

Values

Conservation78% Openness to change5% Hedonism15% Self-enhancement76% Self-transcendence11%

========

Its plain gibberish , and you still get some results. I tried passing other text transliterated to English and Watson still gives results like this. I would expect it to atleast call it out as gibberish-text.

jschoudt · on Feb 12, 2015

Sorry for the slow response, it's been internet years 8-)

We have an update coming for User Modeling (to be announced soon). After that update, such a gibberish post will return an error.

User Modeling is based on word counting. Users should ensure that their input is actually from a human and intelligible. The service looks for certain words in the input, and will reject input that doesn't have enough of those words for the service to estimate characteristics. In the upcoming release, the documentation will explain how this works and what the relevant words are.

Also, we will provide a measurement of how accurate our results are based on the number of words that are in the input. This should allow users to understand the reliability of the results in the context of their application (e.g. a casual movie recommender app might be ok with very low confidence, while an application that makes more critical recommendations might require higher confidence).

picheny · on Feb 10, 2015

Yeah, you are right we should be filtering this sort of stuff out. The algorithms are robust in that they ignore words not in the system's vocabulary (rather than, say, crash) but we did not trap the case in which none of the "words" are familiar.

z3phyr · on Feb 10, 2015

By cognitive speech, do they mean 'understanding' the natural language? Can somebody please explain how is it different from other solutions?

npalli · on Feb 9, 2015

What are the plans to release native versions of the api that can be plugged into iOS and Android apps.

picheny · on Feb 9, 2015

It's an obvious extension of what we put out; keep watching the Watson Developer Cloud announcements.

grimborg · on Feb 10, 2015

Most developers I work with already have quite good cognitive speech abilities...