Hacker News new | past | comments | ask | show | jobs | submit login
Start building Actions on Google (googleblog.com)
204 points by bajames on Dec 8, 2016 | hide | past | favorite | 62 comments



API.AI is really great but I really don't like this UX.

From using and developing for the Amazon Echo for the last year or so this "Conversational" setup just sounds super unnatural.

You want bus times: Amazon: "Alexa, ask [nextbus] [for the next bus]" Google: "Hey Google, let me talk to [nextbus] <wait> when is the next bus?" Perfect: "[keyword], when is the next bus?" (context set in advance for the specific bus you care about)

You want to control the lights: Amazon: "Alexa, turn off the [kitchen lights]" Google: "Hey Google, let me talk to [the house] <wait> [turn off] the [kitchen lights]" Perfect: "[keyword], turn off the lights" (contextually turn off the lights immediately nearest the source of the voice. optionally "turn off the [named] light"

You want to activate a scene: Amazon: "Alexa, [turn on|activate|start] [scene name]" Google: "Hey Google, let me talk to [the house] <wait> activate [scene name] Perfect: "[keyword], [time for a movie|it's bedtime|good morning]"

I haven't seen or heard of a single "Conversational" app for the Echo that has real usage. I just don't think that's the killer app (at least until they get smarter). The current killer apps are being able to: - Set timers/alarms - Ask quick questions (measurements, weather, time) - Control home automation

I just don't think the experience of talking back and forth to a bot is that enjoyable (again, yet, maybe in a Her future).


From the documentation, they also support Amazon Echo style actions such as "Tell Personal Chef to find me chicken recipes" or "Ask $name to $action_phrase".

See https://developers.google.com/actions/distribute/invocation-...


Spot on! At the core of the bot trend is the fact that users do not want to navigate through menus anymore. They are tired of searching for that one icon on screen after screen so they can enable a function. So they just want an interaction at the root that feels natural and gets them to the goal quickly.


I was pretty impressed the other day by what hobby roboticists are now able to build leveraging stuff like api.ai:

http://imgur.com/a/ue4Ax


I don't think that the newly released Actions always expect a conversation. In scenarios where it makes sense to just reply with an answer you can still do that. I am guessing you just have to use something like 'Hey Google, can you check when is the next bus to SFO on <app name>"


I think that, over time, both paradigms are required. For the quick lookup/command, "ask" is best. For longer interactions, like driving directions, or cooking instructions, etc... a "conversation" makes more sense. Given that we're at the early stages of conversational UIs, I think the 'ask' case is more helpful, but I'm more excited to see what will evolve through conversations.


This is being made a platform for external developers, unlike Google search for instance. So what makes more sense for users is only important on large averages and to the extent users don't get mad enough to abandon it.

What matters for action developers ?

Having people buy shit

Keeping people occupied longer (for commercials)

(for companies that do other things) Preventing people from trying to communicate with a real person


IIRC, the perfect examples where you neither say "Alexa" or "Hey Google" aren't going to happen any time soon - those recognition phrases are hard-coded for performance reasons, and adding custom ones will be difficult.


That's not true. Most conversational platforms do use a separate "trigger word" subsystem which has a much smaller vocabulary and is optimized for the low power, always on scenario. But the trigger words aren't hard coded - for example on the Motorola X series you could (can?) customize the trigger phrase: https://motorola-global-portal.custhelp.com/app/answers/inde...


I think "[keyword]" is meant as a placeholder for either "Alexa" or "Hey Google".


Privacy reasons as well. Every ounce of what you say after "Alexa" or "OK, Google" is submitted to Amazon/Google.

I don't see them changing this ever, so having a recognition phrase is perfectly fine by me.


Performance may be a small part of it, but it's more about branding. The trigger word is the brand, the functionality is the product.


It's a nod to Siri requiring the app name, Siri API was punted from iOS 9 to add that, it was considered a privacy risk.

I don't agree FWIW.


Amazon lost the phone and tablet wars, so they shifted their focus to the voice assistant market, and are now ahead of competitors with their Alexa skills SDK. It's great to see Google now step up too. I expect Microsoft and Apple follow suit.

It's pretty easy to imagine how one could port simple 'voice command' apps between platforms ('hey, order a pizza/uber/etc...). Over time, however, these platforms should emerge into substantial AI. To be useful they will need to remember and understand more subtle contexts. "Hey, what's the score of the game" will have to remember that I like the Seahawks and Sounders, but only the Sounders are playing live right now, etc...

I like Google's choice to include a "conversational" model to the application design. "Let me talk to <x>" is pretty natural and allows <x> to then have full control of the interaction. Alexa's "ask <x> to <command>" model makes it easier to fire off simple commands, but awkward for deeper ones.


> I expect Microsoft and Apple follow suit.

Apple already has SiriKit since iOS 10. I believe you can order an Uber ride with Siri, never tried, I don't use Siri.

https://developer.apple.com/library/prerelease/content/docum...


Google Now is about 95% of the way there on the sports thing (it is always showing me information about a couple of teams it knows I am interested in, including game status).


I'm finding the opposite -- it's always showing me information for teams I'm not interested in.

I'm interested in games that are significant cultural events -- my city's team making the playoffs, the cubs winning the World Series, etc. Outside of those times, I have no interest. Yet despite ignoring stories, I get frequent updates about players, teams, and events I have zero interest in.

Also, despite searching regularly for e-sports events, I see no updates on the teams in the LCS, or in Starcraft II's various tournaments. Or EVO. The sports events I do care about seem to have no support.


There's an option you can select in Google Now to specifically disregard these teams, either through the action menu on the card, or in the preferences section of the Google Now app. Merely ignoring or swiping away the cards doesn't change Google's estimation of your preference of the teams.


Pretty neat, but from a user's perspective I'm a little disappointed that the current implementation seems to require me to explicitly tell Google what app I want to interact with. I'd rather be able to just ask Google "Find me a movie ticket for Rogue One on Thursday" and have it be able to infer that I want to use the movie tickets app installed on my phone, than for me to have to explicitly say "Let me talk to Fandango" first.

That said, it looks like deeper integrations are coming:

> We'll continue to add more platform capabilities over time, including the ability to make your integrations available across the various Assistant surfaces like Pixel phones and Google Allo. We'll also enable support for purchases and bookings as well as deeper Assistant integrations across verticals.


The problem with protocol activation is that there are a million different endpoints and you either

* Keep a list of every single one that apps can opt into. This is how you end up with "midsized_sedan_carwashing" in an enum list. It's a developer drain for Google/Microsoft/Amazon and creates "wait for the next release for 'largetruck_carwashing' so you can release your app".

* Let apps run wild and claim any protocol (in this case, the word "movie ticket") they like. This is how you get malware claiming the .pdf or http:// protocols.

* Let certain special apps run wild. This is how you end up with "Buy Tickets With Fandango" advertised on the side of the box while Brown Paper Tickets or another startup is left in the cold/ "ask brown paper tickets for..."

No solutions here, it's a hard space if you want to protect users but also enable developers equally. In the first two cases you also need to create confusing UI for the user to select which app they want for movies vs drycleaning.


Maybe Assistant could somehow monitor what kind of things users are asking a particular app to do and use that to determine defaults for specific actions in future interactions. Imagine:

"Okay Google, get me movie tickets for Rogue One on Tuesday"

"I'm sorry, I don't know how to do that yet. Try telling me to ask another service like Fandango or MovieTickets.com."

"Ask Fandango"

"Okay. [Switches to Fandango.] What theater do you want to see Rogue One at?"

---

Later when I ask for movie tickets again...

"Okay Google, get me a movie ticket for Moana."

"I'm not sure how to do that. Would you like me to ask Fandango?"

"Sure"

"Okay. [Switches to Fandango.] What theater do you want to see Moana at?"

---

Later still...

"Okay Google, get me a movie ticket for Fantastic Beasts and Where to Find Them."

"Sure, let me ask Fandango... [Switches to Fandango.]"

---

You get the idea. And if a user ever wants to use a different service, they can always ask for it explicitly.


I think the Google voice recognition service (triggered by "Ok Google") on my phone already does something similar.

When I used to say to call a particular person, it would ask "which number, home or mobile?", for example. Over time, I usually always ask for the same number for a particular contact (mobile), and it learned that when I ask for the first name of the contact, it usually always means a specific contact.

So whereas initially I had to say the full name and pick a number, now all I say is "Call <first name>" and it starts dialing the exact contact I wanted. It also knows that this first name is my wife... so I can even say "call my wife" and it also works.

This lends credence to the model you're describing, and maybe that's exactly how it will work in the future. I think machine learning sorts this out quite well.


That's probably step 2 (or 8). The current situation seems like a good step one, though.


Yup! we are thinking along these lines. Please see the baby step on discovery that you can use today called "Deep Link Discovery" https://developers.google.com/actions/distribute/invocation-...


Cool, thanks! What are you doing for conflict resolution? In Windows we would just show a list of apps that registered for png or jpg, for instance, but over a low-bandwidth signal like speech that's kinda hard.


That's why personally I want a fully configurable system in which I myself do the #3. But this ain't gonna happen, because nobody is building anything for advanced users anymore :<.


This problem is reminiscent of indexing the early web.


There are two actions - conversation actions (where you have to say "Tell Uber I want a cab") and direct actions ("get me a cab").

Somewhat understandably the direct actions are limited to certain domains that Google supports. It isn't a free for all.


Yeah, I guess considering that there's no installation process for actions that's kind of a necessity.

Maybe as the technology matures Google will start getting smarter about being able to infer which provider I want to use for particular types of actions, but I recognise that's likely to be a rather difficult problem to solve.


This looks pretty cool. The api.ai interface looks much more approachable (and logical) than Alexa for someone who is just getting started. I played around with Alexa a little bit, but creating a bunch of json and text files to map utterances, intents, schemas, etc felt way too clunky and unintuitive.

The api.ai UI for building interactions looks way more approachable and developer friendly. On top of that it looks like they are building out support for multiple platforms (Slack, FB, Alexa, Telegram) so you can (maybe) have a central integration point for your all your chat apps. Some of the integrations look very basic though.

The biggest downsides are that I have never heard of these api.ai guys before and the pricing beyond their free tier (1 query/sec) requires you to contact them, yuck.

Edit: Looks like I spoke too soon. Apparently they were acquired by Google recently[1]. I was wondering why Google was making this startup so central to their product. Now they just need to be more transparent on pricing, but I guess they are testing the waters right now.

1. https://api.ai/blog/2016/09/19/api-ai-joining-google/


Hmm, the lack of installation on the skill is interesting and an obvious contrast to Alexa/Echo. Should we be expecting an arms race here for sensible invocation names?


Seems like names on the "App Store" [are going to be regulated by Google][1] to prevent that:

> There will be app store-like policies in place to prevent things like keyword camping — one developer using another company’s name for their action or another developer trying to jump on owning important keywords like “shopping.” Douglas says that “this is a large reason for there being a review process. Much of the policy compliance is this process of name selection. “

[1]: http://www.theverge.com/2016/12/8/13878444/google-home-devel...


Great question - we are under NDA, but I can share that Google is going to make priority 1 -> 10 the consumer experience, and priority 11 how to improve visibility through SEO. Think SEO to get the lion share of traffic.


One thing I am interested in is this. Will there be an ability for users to create aliases for their apps or set preferences for which apps they want to use?

For an example, say a user had a specific recipe app he wanted to use, instead of referring to either its exact name or hoping that when asking what recipe they should make that he gets the right app will they be able to go into their settings either online or in an app and assign the app as his preference or rename it to just "chef" or something simple?


Will be interesting to see how that pans out. If google makes a different search ordering solution for me its ok because i get to choose on the results page, whereas with this its like pressing I'm feeling lucky every time.


What so if I say "Ask Lyrics for the lyrics to Yesterday by the Beatles" it will do some kind of ranking on all the skills that have 'Lyrics' as their invocation trigger and choose the 'best' one?


Why won't they release Assistant for all Android phones? Nobody uses Allo, and Pixel is a small minority of the market.


So what's the difference between google now, the google voice search thing at the top of my android galaxy s5, the pixel assistant, and the allo assistant?

Because they are all different, to my eyes. They all have different commands they can do. It's just obscene.


There is/was no such thing as an assistant called "Google Now". That was an illusion/poor branding.

There's "Voice Search", typing into the Google search box with your voice.

VS Assistant, a Google's chatbot interface.


> Pixel is a small minority of the market

Exactly that. They're trying to promote Pixel phones by making Assistant an exclusive feature.


Then why would you spend any time developing for Assistant?


I got excited too soon, seems very limited. I thought that they would at least start with Android seeing as that's where most users would be.

Imagine as a developer spending 20 hours building Actions only for there to be no audience.


I was very impressed by the demos at the last Google IO, but decided to not go all-in on Google for privacy reasons. That said, their integrated platform is looking great.

Some Google services I wouldn't want to live without: Play Music, Play Movies, Google Cloud Platform, and their neat One a Day charity platform. Otherwise, I am doing a 180 degree flip and getting all Apple gear after many years of mostly using Linux.

I like the competition between Google/Apple/Microsoft/Amazon because even though I would like to more than four major players, four is enough to drive feature and price competition.


I was looking at the page earlier today, wondering when it's being launched. Great timing as my days at work are over for the rest of the year.

I'm working on a public-transport chatbot in South Africa, planning on making it available on Twitter, Facebook, and Allo (though less users here). Going to be a fun December and January!


Why not add South Africa to https://github.com/schildbach/public-transport-enabler ? That way, the eventual voice/chatbot support could support other networks too.

I already made an IRC chatbot for this API: https://github.com/ysangkok/console-transit-trip-planner


good luck! do you have a progress blog or something?


Our website is movinggauteng.co.za, I'll blog about progress on our blog over the weeks


Any info on the pricing? Amazon always has its pricing info along with any announcements. and yes I like this chatboot API from google, to compete with amazon lex/polly. Competition will bring out the best for developers.


Their basic unsupported tier (up to 1 query/sec on avg) is free. You have to contact them if you want pricing for a supported plan.

This type of opaque pricing normally annoys me, but they were recently acquired by Google so I expect the pricing to eventually become more transparent and most likely be fairly reasonable.

https://api.ai/blog/2016/09/19/api-ai-joining-google/


It's free.


just registered and got an email saying, for pricing please email sales@api.ai, so I guess it is not free, and it needs not to be, just not enough straightforward info comparing to Amazon's announcements.


It's not, though, is it: https://api.ai/pricing/


Just tried to send the form to apply [1] but it seems to be broken. It says please complete all required fields despite that I completed all of them (double checked)

1. https://assistant.google.com/developer/eap/


I am sorry you had a problem with the form. I just doubled checked and it seems to work now. Can you please try again, and if not drop me a line http://twitter.com/brada


This is awesome. This was the biggest drawback of Alexa to build any meaningful apps. Its like they read my comment on HN from 3 days ago :D https://news.ycombinator.com/item?id=13073236


With LEX and Polly anyone can essentially build its own Alexa chatbot nowadays. Alexa will just be Amazon-version of it. I expect more apps and more devices will compete with Alexa as different products.

Api.ai from Google can also be integrated into many popular apps(twitter,facebook messenger,etc), not sure if it's apps-agnostic, i.e. I write an Android app and can use google assistant directly, do I need any special API-key or something from Google first?


Google says you don't need to enable skills or actions, you can just say "Tell <invocation trigger> XXX". Can anyone work out what happens if there are multiple actions registered with the same invocation name?


My guess is they simply won't allow that. Every action will require a unique name.


Why isn't open-source catching up in this field?


I'd guess because you need a lot of data and a LOT of work to get anywhere close to where literally all competition is already. Even Besides Apple, Amazon and Google no other company really has a decent voice assistant. Even Microsoft doesn't really manage to get up to the same level.

Besides that, a voice assistant is never just a voice assistant. It's only useful if it already has a big range of services to integrate. Apple, Amazon and Google already had market-dominating services they could integrate to provide a decent starting point.

You could see it in how each assistant evolved. Siri was always very much aimed at interacting with your phone. Messaging, playing music, local things. Things Apple already did well without a voice assistant. Alexa's main selling point at the start was it's integration with Amazon for shopping. It's still very important and probably the main reason I don't expect to get Alexa in Belgium any time soon since we don't even have an Amazon store. Google Now then couldn't really do much for you, but it's incredible at providing you with the right information, because it's powered by the world's biggest search engine (by quite a margin). Cortana for example doesn't really have any killer features, because Microsoft still doesn't have any market dominating services besides Windows (which doesn't really benefit from a voice assistant interface).

So, in short, it's not about the voice assistant. It's about the ecosystem. And those are incredibly hard to build and probably impossible to open source.


to be honest, considering Google's 0-fucks-given for its non-paying users, which is the majority, i hesitate to touch and build on any of its tech/APIs.

someone correct me if i'm wrong, constructive criticism welcome.


I've gotten great service from their Play store customer help. I can't recall a single bad experience I've had with them. They've not only solved my issues but exceeded my expectations with some of them. I'd say that qualifies as "giving a fuck" about their non paying customers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: