Hacker News new | past | comments | ask | show | jobs | submit login
Siri creator shows off first public demo of Viv (techcrunch.com)
224 points by bbrunner on May 9, 2016 | hide | past | favorite | 105 comments



Maybe I'm not the exact target market, but I always carefully pick my search parameters and research my decision making. For showing up, this is impressive to send flowers, but there's always some minor details that break the experience, like the shop is closed, I can't pay via X service. The Hotels.com booking is great, but not everybody is that rich to book a villa :)

As I see it, Viv aims to be the Google of service providers which is a pretty neat goal, nobody did that successfully, so I'm rooting for them. The platform approach is also a good marketing strategy that they want to allow it to embedable. For example letting people say ask basic things about a monument in a remote national park. The applications are really limitless if it fits on a Raspberry PI like card.

As someone who like to break things, please Viv process this for me:

"VIV, please find me a cheap flight to Vienna, check Ryanair and Wizzair on the 21st of may for two adults" "order the results by price" "show details for the second one" "show the previous one" "are luggage included in the price?" "book it, but skip all marketing offers from the company" "find AirBnB apartments for that day that has wireless internet" "order results by price" "ok, then search in the range of 10 to 50€ per night" "select the first place" "message host: looking forward to meet you" "book it." "Viv, bring me a beer"


I feel the same way. I actually like dealing with the details and making sure I get the absolute best experience.

Wifi? Best price/date compromise? Probability of on-time departures? Opportunities on connecting flights? In-flight alcohol? In-flight chargers? Free meals? Expected leg room? Luggage price? Carry-on limits? That's a 5 minute conversation right there... 20+ minutes if I want to explore results. I'd call a human if I wanted a conversation.

Voice UX is about commands. Fulfill my command if you can give me good results. If you can't, send me to a website pre-filled with data from the command and let me do the rest myself. Anything in between is just a reinvented phone tree.


> If you can't, send me to a website pre-filled with data from the command and let me do the rest myself. Anything in between is just a reinvented phone tree.

That's a really neat idea. Unfortunately it seems to be completely unscalable. For instance, booking on an airline site usually involves navigating and fill a complex multi-page form. Some of the fields are privacy conscious information. The form layout and process varies from airline to airline. Those are just two of the biggest concerns.


How is it unscalable?

"Book me a flight to London on Monday" goes through a speech to text recognition service, then NLP translates that text into service queries. At some point in that chain you're querying a flight aggregation service (ie Kayak).

Just slap together an SPA that calls the same aggregation service, and prefill it with the recognized query.

Amazon's Alexa app is already pretty close, although their queries are less sophisticated (3p integrations just get the literal speech 2 text translation, no deep query transformations/context at play). Google Now is even closer... ask it to "Set an alarm tomorrow morning" and it will give you a partially filed out form with the option to finish it vocally or visually.


There are several reasons.

1. The general pattern of an aggregation service only covering the lowest common denominator features among what it aggregates.

2. I'm not sure of Kayak's terms specifically but accessing via third party aggregator API vs. direct API can affect who gets credit for the referral (i.e., $$$).

3. At the aggregator level, _someone_ has to write the glue code for all of the services. See IFTTT for an example of this. This is a good foreshadow of the problems here by analogy to IFTTT [1].

4. "Just slap together an SPA that calls the same aggregation service, and prefill it with the recognized query." Who's going to build the SPA? For each service vertical? For each company?

[1]: https://blog.pinboard.in/2016/03/my_heroic_and_lazy_stand_ag...


1. That is just convention, services are free to filter on features that are present on a small subset of the data.

2. I don't know how Kayak specifically works, they're just a general example of a customer-facing website that also vends an API that could be integrated with Voice UX.

3 / 4. Everyone has a different approach here and I'm not sure how Viv's would work... but Alexa's ownership model is app-based. 3rd parties build services that process the user's command however they want. They build the backend service, the Alexa voice integration endpoint, and (optionally) a companion website. Amazon owns the marketplace and physical interface. In this model, Uber owns the Uber integration, Lyft owns the Lyft integration, and whoever manages to search both Uber and Lyft would own that meta-service integration.


Viv is being presented as a platform for other developers to build on top of, and these 3rd party developers will be the ones responsible for fleshing out the specifics of how the queries will work given the domain they are implementing for. They will be the ones who will need to anticipate all of the different ways a user can request for something in particular and include all of the edge-cases that might trip-up the API.

In the end I think queries for certain types of well-established services like asking about the weather are probably going to yield better results than more obscure/complex things like placing an order for building custom furniture from a small workshop.


I'm aware of that, but will people realize that? They will just say Viv sucks despite the Netflix app was sloppy.

It's also not clear that what went wrong in certain cases. In one request part of the parameters can be handled by Viv core, part by Netflix and part by another app.


Yes, but who is in control of whether "find me a hotel" searches hotels.com, or Kayak, or Expedia, ...


I think 80% of commands will be handled by Viv's core app, then the third party apps take control seamlessly. It will also learn that you always prefer AirBnb over Hotels.com.


Whatever the default search target for travel stay is, if there is one, will probably determine a large percent of what users will simply stick to using without bothering to change it to another preference, since they may not know better. This default does leave an opportunity for some companies to have an edge over others, and potentially pay to make it so.


Not if there's only one intergration.

The problem with third party integration is that you're going to end up getting a bunch of duplicate "skills", to use Alexa terminology, of varying quality and capability. It's the App Store problem, and bot store problem all over again.


Impressive demo, but yet i can't help thinking that the "big" question still remain largely unsolved for those services.

Two points :

1/ viv keeps mentioning a "breakthrough" in computer science because they managed to create a program that will scan a network of services and fill the parameters from the query. Now, those guys aren't jokers, so i presume there is definitely something fantastic behind, yet i can't help thinking that once you've "understood" the intent, it sounds quite close to a sql / graphql query planner.

2/ Which leads to the big question : guessing the intent. Every time i hear people mention "intelligence" or "understanding", i show them this : "hey siri, please do NOT set a timer for 4 minutes, i beg you". The fact is, it's just a trick. It recognize words, but it doesn't understand anything, it doesn't have concepts, knowledge or any experience. It's a dumb program that never had any life or sense to understand anything.

If i say "find me a good ticket for Chicago tonight". How will it know that i'm talking about the rock band that's playing tonight in Paris, France, since there's absolutely not a chance that a human being asks for a plane ticket for such a big trip just a few hours in advance ?

This is, to me, the big and interesting question that online assistants makers needs to solve. Now of course, viv is aiming for a product release this year, so they're building an intermediate solution, where developers will manually associate keywords to services, in an "easy" way. And it will sort of work.

Yet i'm still waiting for the real "big" advance.


  If i say "find me a good ticket for Chicago tonight". How 
  will it know that i'm talking about the rock band that's 
  playing tonight in Paris, France, since there's absolutely 
  not a chance that a human being asks for a plane ticket for 
  such a big trip just a few hours in advance ?
I don't even need an assistant system to figure out based on context which I'm talking about: All I want is one that can simply ask clarifying questions. "Did you mean Chicago the city, or Chicago the band?"

I had an amusing Siri interaction a few months ago where I asked "what language is spoken in Shanghai?" (I can never remember if it's Mandarin or Cantonese). Siri came back with a very well-reasoned, completely out-of-place answer of "Hindi": https://twitter.com/claymill/status/690386752683577344


  > out-of-place answer of "Hindi"
it's using wolframalpha - which interprets 'Shanghai' as a movie made in India. this is probably because it doesn't have a language field for cities but does for movies.


"If i say "find me a good ticket for Chicago tonight". How will it know that i'm talking about the rock band that's playing tonight in Paris, France, since there's absolutely not a chance that a human being asks for a plane ticket for such a big trip just a few hours in advance ?"

Funny, your "absolutely not a chance" interpretation was mine. Maybe there was a death in the family. Maybe you need to visit a friend short notice. Maybe I'm being a bit pedantic, but I was with you until that point. :)


> If i say "find me a good ticket for Chicago tonight". How will it know that i'm talking about the rock band that's playing tonight in Paris, France, since there's absolutely not a chance that a human being asks for a plane ticket for such a big trip just a few hours in advance ?

I think you discount how often the plane ticket to Chicago might happen. On total number of requests spread across the planet, I think the plane ticket might be the more common one. The point is, we don't care about the rest of the planet, at least not nearly as much as we do about our local surroundings. The way people answer that query is by using their continually updated knowledge about the state of the local system. If the band Chicago isn't playing around your current location soon, then the correct way to interpret that query would be that you needed a plane ticket (as anyone around you would probably think). This is why I think Google is really in a better position to answer these queries, because the knowledge graph allows the context to provide the nuance you are looking for.


>This is why I think Google is really in a better position to answer these queries, because the knowledge graph allows the context to provide the nuance you are looking for.

I agree completely. That's why the viv presentation left me a bit unsatisfied. I also had the exact same thought in the middle of their talk : "with google database of concepts, you could do so much more".

But even Google still needs to break the last barrier, and it's related to language itself and culture. Every translator and linguistic knows that a language conveys much more than clear information. It is a mental construct that evolved during thousands of years, and keeps evolving, is full of ambiguities than only culture, experience (and a bit of common sense) can solve.

The best example of this is to notice how bad automatic translators are. Guys at google tried to brute-force the problem, with a gigantic database of text, yet the result is absolutely miserable.


> The best example of this is to notice how bad automatic translators are. Guys at google tried to brute-force the problem, with a gigantic database of text, yet the result is absolutely miserable.

I suppose. I am fluent in two languages (also Spanish) so I'm in a position to be able to judge the quality of a translation, and I would say that pasting in blocks of text to translate them these days yields a result that's quite a bit better than miserable, thought indeed not flawless yet.


If you're at the point where being fluent in both languages is necessary to assess the quality of a translation, then it's already very high quality. Most poor translations are easily judged as such by anyone fluent in only the target language.


I don't think they are trying to answer your big question. On one hand, as mentioned in previous replies, even a human cannot just understand what you are talking about and would ask further questions, on the other hand, it is not about to try to answer all questions but to find some interesting, frequent cases that products like Viv can solve for you.

Maybe we should mostly think about AI products/tools as just another (weird) friend who is good at answering certain questions for us. At the end, we don't even expect any of our friends to be able to answer all of our questions. Indeed, we pick the right friend for right question.


"it sounds quite close to a sql / graphql query planner."

It probably is. Some representation of the query goes into that module, and a strategy for answering it comes out. That's probably going to look something like what comes out of an SQL query planner, but with more references to external data sources, and possibly queries followed by more queries.

You should be able to say "Viv, explain what you just did" and get back a natural language representation of that program. Then tell it what you meant.

This would be a useful feature on a Bloomberg terminal.


> This would be a useful feature on a Bloomberg terminal.

Why? What type of queries can you perform on those terminals that would be useful to analyze in this way?


A lot of Bloomberg "data" screens (this is also true in the APIs) are also interfaces to models. So like, if you pull up a bond model, you can override and it will reprice the asset, say, or back out some other parameter.

The Bloomberg interface is horrible and insane, and even relatively simple things like "ok well if I change the vol on this European equity option and you give me a new price...i still need to know which method you used to solve..."

There are plenty of other models in there that use much more complex pricing algos, and it would be useful to be like "Trinomial blah blah assume vol calculated as daily variance over last 30 days"


I am gonna copy paste a similar comment I made on a related thread [0]. That demo was very cool. But:

1. do we really want a centralised service for everything that has access to a lot of data within a walled garden?

2. Siri only works well with North American accents afaik (maybe all western accents?). Alexa's voice recognition is the best and I hope Viv does a similar job.

3. This is mostly a personal preference, but for online shopping I prefer using a lot of filters. This UI doesn't really help with that. Also, HN probably knows better but is the "computed plan" shown in the demo, different from an execution plan created by an RDBMS?

[0]: https://news.ycombinator.com/item?id=11660916


> maybe all western accents?

Apparently not[1], but I imagine it's gotten quite a bit better since then.

1: http://www.dailymail.co.uk/news/article-2053684/Scottish-iPh...


To be fair, if you asked me for a good ticket for Chicago tonight, I'd be looking up first-class airfare, and asking if you had a funeral to attend. It's possible to expect too much even from humans, let alone AI.


...it will know your location and it will know that many people got tickets for it already.

I think the key here is 1) evolution - as he said it will get better with time. Just like wikipedia was discarded by many intelligent people in its infancy, similarly now it may feel not so much revolutionary - but it will become something big. And 2) modularity/good design so it's easy to add "improvement" layers.

Some nodes/areas can be backed with deep AI, some don't need to. At the end you just wrap the whole thing in AI at the top to improve itself and we can slowly start calling Viv 21st century's god.

...or evil as some people will probably do. As granpa you'll tell your grandchildren how cool it was when you had to go to the market to pick flowers for your gf, now it's all automated ordering from robots growing flowers.


But this depends on the context. We can assume that the context of bsaul is being in Paris (context is not a requirement of intelligence) and now is where some intelligent processing appears: it is very unlikely for anyone to actually want to fly so quickly, thus checking if Chicago the band is playing in Paris (or close enough) would be a reasonable try before going to a flight planner website.


That’s why AI needs to be taught like a kid.


Body cams on kids, striped across each grade. It's the best way.


I was expecting theater tickets http://chicagothemusical.com/


>SQL query planner

Viv has to disambiguate the intent with user profile and context, normalize the schema across partner databases, rank partners which support the same verbs, build a partner system which lets them add verbs and domain-specific NL training… there's a lot of grunt to make it work.


That's pretty impressive!

Maybe this isn't as big a concern as I think, but the ease of these commercial demos relies on a crucial assumption: that I am nearly completely price insensitive. I'm not a cheapskate, but I don't usually purchase the first product a vendor puts up because there's usually something better and slightly less expensive if you put in an additional minute or two into the search. But with the demos given (ordering flowers, booking hotels, etc.) it's usually a sample of 2-3 different options, and from what I've seen, usually higher end items.

Will such assistants still maintain their usefulness when I don't have the monetary pleasure and freedom of just saying "Yes" to the first option offered?


Given the open decision process (with a visual representation of the exact information and logic that went into a decision) it seems like someone could fairly easily add a "reverse sort by price" or other more complex model. Of course that is assuming a single "price-sort" function could be applied across multiple domains. If each price-sort algorithm has to be implemented and delivered as a separate model it would take a lot longer and be a lot more tedious for both developers and users.

I think the real question is -- how easy is this development process? A gui representation of the program is great, but a lot of times it is easier to deal with text if you have a well designed library. They mentioned it being like an app store, so what is in it for the developers? Selling apps, directing through your own store / affiliate links?

It does look pretty cool, but when there are at least 5 established competitors implementation is everything.

EDIT: I would like to add that this looks like a great combination of the logical analysis of soundhound, the mobility of siri, the ability to extend of facebook and alexa. It could definitely be a breakthrough. Still -- implementation.

EDIT2: Follow-up questions starting at 16 min. Launching at the end of the year, working with select developer partners toward the end of the year. "What makes it something that makes our lives better rather than makes us talk to things we don't want to talk to." A: "Developers motivated by self interest to make things work smoothly". What is that self-interested motivation? They get to sell you stuff? What do they want to sell you? The most expensive thing that fits your query -- or maybe the thing that's $5 more expensive that should sell for $20 but that will sell for $25, or $30... I agree, based on that response, it sounds like even with developer interest in the platform it won't be developer interest to give you the best product, but developer interest to give them the most money.


yeah but things can be more complex than that. If ordering flowers for my girlfriend I will never buy the cheapest one or the most expensive. Hotels are even more complex. Location matters as well as the specific room type and a lot of other things. No one wants to stay at a crummy motel. How do you handle, "I am willing to pay up to 30 a night more if I'm closer to the beach... etc"


In an ideal world, it will maintain your price preferences across sessions.

Something like, for example, knowing my preferred seat on a flight is "frontmost available (non-premium) aisle seat unless the layover is very tight, in which case, frontmost seat period"... that would be amazing and truly useful.


I don't think my price preferences (for flights or really much of anything else) can be boiled down to an algorithm. There are so many factors. What other travel do I have planned around the same time? Have I had a slew of crappy flights lately and want to treat myself? What am I going to be doing the day I arrive and depart (e.g. do I need to be rested enough to go right into meetings, or do I have a while to get settled?)? Am I flying on business or for pleasure?

I'm not sure I can even articulate all of the factors that go into my decision on what flight to purchase, but I wouldn't be surprised if there are hundreds of them over the course of a year...


> I don't think my price preferences (for flights or really much of anything else) can be boiled down to an algorithm

You'd be surprised how many people are trying, see e.g.: https://www.kaggle.com/c/expedia-hotel-recommendations

You could capture one or more ML features for every one of your questions above, plus hundreds more and add the features that define who you are within their customer population. Then the challenge becomes getting the right model and enough data to train it so that 98% of the time it chooses what you would have chosen. Just keep in mind that, if I have enough information to predict which flight you will take, I have enough information to predict who you are voting for, when you are joining a protest, which is the lowest salary you are likely to accept, hell maybe even enough to predict which opportunities to commit a crime you are likely to take... so, here is to hoping that figuring out which flight we want to take remains unreliable.


I always imagine that these services are supposed to serve as the "automation put me out of a job" of the full-time personal assistants had by the wealthy.

In that role, it's usually the assistant's job to present a selection that's sorted by optimality on axes of both quality and price (and arrival time, reliability, packaging options, etc.) Properly done, this means that you should almost always be "defaulting" to the first option presented; rarely switching to the second or third (and there only really needs to be two or three) when there's something in particular that stands out about them despite them optimizing lower.

When described this way, it's clear that this is a much harder job than just doing fancy language recognition. An AI personal assistant has to know what you care about—what weights you put on various optimization criteria—without you having to explicitly specify them. Presumably it would be an online learning system, and screw up a bit at first.

I think the key in such systems is how they'd behave before being trained by lots of individual preference data over time, though. Hopefully it could "guess" at some initial preferences in a good way. (Maybe by taking all the preferences of every other user, putting your sparse preference-set as a point in that preference space, and then seeing how you cluster: effectively "stereotyping" you based on its other "employers.")

On another tangent, one thing I haven't seen built into any such system yet is the "active academy model" mentioned here (https://scifiinterfaces.wordpress.com/2014/06/24/course-opti...). One of the criteria you'd expect a PA to optimize for in their employer is patience. Rather than staying with an initial optimization-problem solution, if the answer is not time-sensitive, one can continue to search the solution-space for new, better solutions that might crop up (involving things that maybe didn't even exist at the time the question was first asked.) If given two weeks to book a hotel room, there might be a good hotel room on the market now, but a better one on the market later. There are very complex questions involving the costs of reserving and then cancelling reservations to re-book, or holding off on reserving and maybe never finding anything better and running out of time.

For physical goods, there is even the possibility of an "indefinite optimization problem"—if I tell my PA to "buy me a top-of-the-line computer", then presumably I want my computer to continue being top-of-the-line unless otherwise noted, and this will require constant weighings of various components' or configurations' depreciation-rate, market-liquidity, employer's opportunity-cost for maintenance time, etc. All the questions a major corporate IT "buyer" employee considers, behind the facade of a single innocent request.


I guess this would still work if you can just say: give me a hotel in the prices range of... with a good rating. Surely the result wont be as good as manually digging out not yet discovered great hotels, but it might still be satisfactory


EDIT: added intent in the conversation

I think one thing missing from these interactions is the idea of conversation. Let me elaborate. Distilling complex tasks into a single query yields complex queries. Drill down with simple initial query requires manual input from provided results. If there was some short term memory as part of the conversation we could potentially arrive at a much clearer intent.

e.g. "Viv I want to go to San Diego this weekend. Find me some cheap flights." (Travel, San Diego, Saturday, air fare)

"No problem. How long are you planning to stay?" (return trip)

"I want to come back on Sunday night" (return trip Sunday night)

"Will you be requiring accommodations?" (air fare, hotel combo lookup)

"Yes I want to stay near the waterfront" (area for hotel, duration 1 night based on previous reply)

"Sure no problem. Will you require a car rental?" (air fare, hotel, car combo lookup)

"No that's fine" (air fare, hotel lookup only)

"Ok here are some good deals for flights and hotels on the waterfront for Saturday"

"What are my options for a morning flight?" (Morning flight on Saturday)

"These flights are in the morning" (Subset of the flights found earlier)

"How about late Friday flights?" (Friday evening flights to San Diego)

"Here are your options for a late Friday departure with the same hotel options" (new departure schedule but same hotel options, new duration of 2 nights of hotel stay)

"Show me just the flights for Friday"

"Here are just the flight options for Friday"

"Nah let's see the Saturday options again" (go back to previous criteria)

"Here are the Saturday flight and hotel options"


It's easier and faster to type all of this into expedia, without actually having to talk out loud. I don't understand the point of these systems.


Agreed if Expedia is your only source for travel plans. Rinse and repeat the process above for Travelocity, Kayak, Orbitz, Bookit, etc. Compare that to specifying it once and saying "Viv lets try to find this on Orbitz" if Viv's default choice does not have what you want and Viv just remembered the whole conversation and issued the refined intent in Orbitz.


Agreed. Seems to me like these people are tripping real hard on their own kool-aid. They see their logo being everywhere next to the wifi logo? Really?


Bingo. The only thing they showed that came close to this, was the tulips query, but I honestly couldn't tell if it was a query refinement, or simply a new query based on the keyword "tulips".


It was most definitely a query refinement, not a new query. It remembered the details about the mom and so forth.


I've seen the demo in their office. It's conversational.


That's great. Would have loved to see more of that in the public demo. Do you have a link to a video that shows that?


"This is software writing itself". Welcome to the 70's.

The speaker comes across as poor in my opinion.

He's doing marketing, not informing the audience, and he's not even that interesting.

He's saying a ton of stuff which is completely unsupported by reality.

Assistants are the next paradigm shift? Have you tried to use Siri in a car? Or at all? It's a joke.

Weather examples? Show me something I use more than once a week.

Show me something actually hard that requires intelligence. Show me a recommendation service which does not suck. Show me an assistant that figures out something nice to do this weekend, based on location inferred by my plane tickets, weather, my taste, and what's available.

Show me an assistant that shops for the cheapest flowers, not force me to use the service of the app maker's choice.

Please, let's stop the "wow I can speak to my telephone, and it tells me the weather" bore. Please.


So, is this an island parser on top of a Bayes net?

Haven't read the papers for a few decades, but I wonder how the auto generated code differs from General Problem Solver: https://en.wikipedia.org/wiki/General_Problem_Solver

Hope the above doesn't sound dismissive. Was working at company incubated at SRI. We were hand coding similar workflows. My guess is that VIV is a 10x improvement in workflow.

Be that as it may, it still seems to be constrained by Humans creating ontologies. So, may suffer scaling issues. Somewhat akin to Yahoo's Human curated websites vs Google's auto generated backrub.


Yes but they let third parties create the ontologies. If they're smart they'll do auto query optimization.


Fair point about UGC; Viv has as good a chance as all the other competitors to make it work. What do you mean by "auto query optimization"?

[EDIT] removed idle speculation


Like search engines, see a failed query followed by a successful one and map the first to the second.


Sort of concerned about this:

Who decides which service to use for what thing? What if I don't want to use uber for my cab but a new uber like service that my friend just started? If my friend adds an experience to viv for his service, how does viv decide which one to use? what about neutrality etc.?


I suspect that will get flushed out during their "wait and see" period of how developers use Viv. There are a couple of scenarios

1. Viv's "experience" is integrated into existing apps. So it knows you like your friends new service because that's the App you're in. (Unlikely based on the goals discussed in the video)

2. There are Viv "add-ons" that let you preference one over another. This sounds like the most-likely scenario if Viv is to stay neutral.

3. It's commerce-driven. Viv decides which services are prefered based on who pays the highest for "cab/car service" searches. Basically, the current search advertising model.

#3 seems the most likely to me, because it also opens up a host of other interesting scenarios related to existing current SEO business. However, I'd love to be wrong and that it stays neutral (#2).


Viv's model of seller interaction seems to be the same as for the pre-Apple version of Siri. Viv cuts a deal with sellers to get an API so they can talk to the seller in some coherent way. They don't try to work through apps intended for humans.


Viv tries to get partners to register their API, domain specific verbs and NL training terms in Viv's back end.


In (3), I wonder who will collect the referral commission.

I suppose we could see a revenue split like what Apple does for in-app purchases.


I bet that depends on branding. If Viv ends up being the "front-end", they probably collect the commission. If they end up whitelabling the Viv "product(s)", then whoever has their name on it.

Could be both.


We're in the middle of building out something similar with Playa - http://getplaya.com/ (or track our progress here: https://baqqer.com/projects/playa/) - An Open Service Exchange for Autonomous Intelligent Agents

I think it comes down to a little training from the user, finding out where the user is price sensitive, etc.


If you look at the screen during the presentation, the home screen includes every app that's used via Viv. Hopefully this means that you use whatever you install.

And it makes sense given the embedded app-specific widgets. (for example uber had the very specific map widget that I'm sure is not embedded into Viv itself)


It can work similarly like web address bars work now. Type G and google shows up, because that's the most used service that starts with G. It will learn it from you.


You can make your own "PowerShiri" in Powershell, by tapping into the .NET Speech Recognition library [0].

Having built out a novel bot this weekend [1], this became an obvious next step. The PowerShiri demo linked lets you set up a key/value table, and when Speech Recognition matches a key, it executes the Powershell command held in the value. In this way, you can issue the same command, e.g. "What is the time?" - and receive a dynamic response. The voice output is generated by the ubiquitous Out-Speech function.

Dynamic program generation, more like dynamic query generation, is the obvious next step. Instead of generating a massive key/value table for every possible combination of commands, you treat each word as a key, then have each value generate the next set of keys.

It's both annoying and gratifying to learn a weekend project is facing the same technical challenges as a well-funded corporation.

[0] http://stackoverflow.com/questions/9361594/powershell-can-sp...

[1] https://github.com/Gilgamech/PowerShiri/


>The live demo went off without any major glitches

At 3:54 - "The other thing I want to note is that Siri... err ummm Viv's audible voice is something we're still working on."


Whoever owns the number 201-844-9120 would probably also disagree that there were no major glitches. Although that glitch is more with techcrunch video distribution team. These are really minor glitches for a product demonstration. Major glitches are when something with the product fails, like this one during a Windows 98 demonstration:

https://www.youtube.com/watch?v=TGLhuF3L48U


I'd bet it's the Uber driver trying to get in touch for the pick-up.

Great to see a "live" demonstration of a new technology, but there are some things that should be scripted to avoid this kind of thing.


Pretty sure that phone call was staged to make it look more live.


by no glitches they meant no Dear aunt, let's set so double the killer delete select all


Who decides what "app" answers when I want to get flowers? I bet there are tens of apps doing it currently, and users have their own preferences. Same for anything weather-related. What would be my chances as a developer to compete with the more mainstream providers? While it isn't huge now, it seems impossible with this future, where the user practically doesn't even chooses the tools she wants to use.

(sorry if the question was answered in the video and I missed it)


We're tackling a similar problem at Playa. I think it's a matter of collating services, and letting the user pick from what is important. It's half training and half availability.

I think it's great that services like Playa and Viv are working to allow smaller services both physical and digital to compete and become more discoverable.


Sounds similar to IBM Watson that generates Prolog code on-the-fly. I wonder if Viv is using Prolog behind the scenes too (behind the visual AST representation).


I was a bit pissed at the shiny marketing talk about dynamically generated programs. I wonder how new this is really or if it's just a way to appear fancy by hiding behind USPTO curtains.


I don't know much about the process of getting a patent, but at least once you have it, the whole idea is that they can't hide behind curtains. In exchange for time-limited exclusivity you have to publish how your invention works.


You do have to describe your thing but it's not that precise.


AFAIK Watson doesn't generate Prolog. It does use Prolog rules:

"Specifically, decomposition for Puzzle questions is performed by matching manually developed Prolog rules against the question’s predicate-argument structure to identify its subparts.",

"Relation detection patterns are written in Prolog and apply unification pattern matching over PAS"

"Most of our rule-based question analysis components are implemented in Prolog"

(All from the IBM system journal issue on Watson).

I'm pretty sure I've read all the published papers on Watson (the QA system). It's possible I've missed something, so a citation would be great if I have.


The missing puzzle piece for your understanding, is how Prolog works. Prolog is declarative programming language, the program logic is expressed in terms of relations, represented as facts and rules. A computation is initiated by running a query over these relations.


Yes. And?

Watson doesn't generate its own rules - they are hard coded.


Well, booking hotels their way has always been easy. Just open the website, see the first three search results and book one which 'looks' the best in photo (assuming you have all the money of the world). In reality, it's not that easy, there are so many parameters to consider which they haven't. Price, smoking preference, number and size of beds, cleanliness, wi-fi and its availability and speed, other amenities, parking and so much more you have to consider before deciding to get into a hotel. This personal assistant doesn't seem to do any of that. The problem is, I don't even know how can we automate or predict that because these are very personal preferences and change with each trip.


Great to see they finally unveiled Viv. They've been working in stealth for a few years. Six months ago, I met the team for interview & offered a position as Mobile Architect. I declined the job for reasons unrelated to Viv. I met the technical cofounders, Adam Cheyer & Chris Brigham. It was just awesome to talking bunch of guys who made original Siri, and personally worked with Steve Jobs. You can read up on Adam, not only he's an AI God, but he's a super nice guy. If I were to bet on one AI company, they would be it. Like Kittaus eluded to, it's going to take some time to see what use cases ordinary people use for but it will be exciting. Best of luck to them.


I can't help but feel bearish on any company like this that tries to exist outside of either the android or iOS integration. It's a desert without OK Google/Siri


I think Hound (SoundHound) is in a similar position. http://www.soundhound.com/hound


The end of the article talks about probable acquisition by Google or Facebook. So this is probably not going to be in the desert very long.


I feel like this would be a major blow and failure on Apple's part if this team that they already acquired was bought by a competitor. I wonder what happened that led to this not being a new version of Siri.


They really hated Apple slowness and canceling existing partner deals. So they want to be ubiquitous on their own, if they can hack it.


This is interesting to me because it really pulls together a number of disparate technologies to get towards the goal of being "the AI". There's nothing he talks about that's overly complicated on it's own, but integrating them all together is the key.

Once you have the voice to text (ASR) from Nuance (or any of the other companies in this space), then it's a question of properly recognizing the contextual intent. Not a trivial task, but certainly getting easier with technology available. I think the visual models displayed in the demo are fairly telling as to how they handle this. Keywords for a domain (e.g. weather) become useful to determining the intent of the speech, as well as variables used in queries, or the data returned for display to the user.

For example, if I ask a question that mentions temperature, and a time, but not a length of time, it's fairly obvious I'm asking for something in a weather domain rather than a recipe domain (and vice versa).

I'd guess the developer integrations he mentions then become a matter of defining those data points/variables for the different data source so the "AI" can build the application to execute.

It's like an enhanced API integration model. You need to know all of the input/output parameters to integrate with an API. The intent/contextual piece also uses the individual data points for the contextual intent recognition in the voice-to-text area.

The other interesting aspect is actually storing the preferences for the commerce side of things. Airlines do this already for saving preference of aisle vs window seats. They're taking things a step further to remember those types of "qualifying data" for interactions you have so they can be saved across areas (read: API calls).

I suspect there will be a ton of work on behalf of the APIs too handle data in this way too, that's what he says when it will take some time to see the direction things go in. If I opened up my existing airline travel APIs to this today, it's unlikely anyone could correctly interact in a way that would provide all of the information needed to actually book a ticket. So, there will need to be some back-and-forth communication of those missing items. If someone finds a ticket they want to book and says "order it", then viv will need a way for the API to communicate "That's great, but you also need to provide your TSA known traveler ID number". Then, because it's something an API has asked for, Viv will know it's a data point it should save for later.

Let's hope Viv has un-breakable encryption and security with all of those "personal preferences"...


I pitched a very early similar idea of Playa (http://getplaya.com/) at Launch IoT Conference in front of the audience and was literally met with Jason and the panel of judges who had no idea what I was talking about.

After the pitch, I had representatives from NTT Docomo, Cisco, GE, and a few others come up to me and were interested to chat about ambient intelligence and stay in touch.

This is telling of how things are now changing and moving towards the Ambient Intelligent future.

We're still building towards this future, and would like to meet up with anyone in SF that wants to help out.

We want to put souls inside your devices. :)

[EDIT] Luckily, Robert Scoble was kind enough to give us some outstanding attention on his feed, and also hook us up with RackSpace's startup partnership program. Huge thank you to him for that.

[EDIT] We just set-up a baqqer account to be as transparent as possible as we build out the product. https://baqqer.com/projects/playa


Am I missing something, or was the demo on the Mac just a pre-recorded QuickTime video? The menu title bar said "QuickTime Player" and the title of the window was "Movie Recording".

I don't doubt it works for real, but isn't it a bit weird he didn't acknowledge the "live demo" was actually pre-recorded?


You're missing something: The way to project an iPhone onto your Mac screen with OSX is to use Quicktime movie recording mode and select your iPhone.


Ah, that makes sense. Thanks!


Isn't that just a workaround to get a live view of the device on the mac to show on the projector?


I think there is an option for screen recording from a real device in QuickTime. It is mostly used by developers to create promotional videos for an App Store page.


I think "Movie Recording" literally means that the movie is currently recording, i.e. he's using the QuickTime movie record feature as a trick to get the iPhone screen to show on his Mac, and thus the projector.


The developer experience looks interesting, looks like some kind of flow based, quite restricted coding environment? Getting that ecosystem and incentives for devs right will be the key to their success.

I've been using Amazon Alexa for a month and really struggling to find 3rd party apps or "skills" for it that I would want to use. Also, namespace pollution quickly becomes an issue when you start having thousands of apps that you didn't install but are there to use.

I think Google will dominate the voice assistant space soon as they have so many crucial services already there.

Good to have competition though!


This sounds a lot like SoundHound.


Yeah, Hound has been doing interesting stuff like this for some time now. Not exactly the same, but they both get me super excited about our future devices. I can't wait for the movie HER to be real


Any ideas on what is used to power the graph demo used about 9 minutes in (https://youtu.be/MI07aeZqeco?t=539)? I haven't seen something so fast and seamless in a browser.


+1 This. Keen to know too.


Live demo starts at 9:00. I am very very impressed.

One of my top complaints with Siri is inability to chain commands. If you want to modify a command you have to start over from scratch. Which often results in it misunderstanding another word. It's very, very frustrating.


I always cringe when someone says 'software writing itself'. Looks to me more like software constructing a query, much like an SQL query. Not sure that is a 'computer science breakthrough' Still, the whole thing is pretty neat.


I live by myself so the idea of talking when another human is not around or hearing my voice is sort of offputting. So cool to see this technology evolve, but I hardly can put Siri (an amazing technology) to good use!


Siri is often powerless. It's impossible to make Siri open a certain video on Youtube. It doesn't even know about the existence of YT, and if you don't have the music locally on your device, it says it didn't find it.

That's why I am excited for more services integrated into personal assistants. I think both Apple and Google have been stagnating. How many years have passed and Siri only just knows how to send a SMS, make a phone call, set an alarm and look up trivia. And all that with very simple phrases, that don't combine multiple requests into one or ask a more nuanced search question and filtering criteria.

Even a regular person knows it should be able to perform such tasks (as opening an app with a specific request), they are not AI, just basic integration.

Another example: go to Google Now and say "Search Beatles on Youtube". Ok, it searches for Beatles in the YT app, but doesn't start playing. Then I say "Play the first result". It goes to web search and displays pages containing "play the first result" text in them. Stupid! I was already in the YT app, asking it to "play".

So what is this? Laziness on the part of Google and Apple, that's what it is, not a difficult AI problem that can't be solved. And the API is closed so I can't add any new behavior in it. I hope Viv kicks their butts and launches them into action.


What is their business model ? What about privacy for such an app that has to know everything in order to provide a smooth experience ?


But will it be open source?


Given that it's a VC backed company, no.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: