The first thing the article says is that users are probably aware Google stores all of these recordings. I, for one, was not aware. Am I really in the minority?
I was surprised and dismayed to discover that Skype does something similar.
Skype stores your voice mails and video messages forever[1]. This is something that they started doing 2-3 years ago and few people seem to be aware of it. Unlike Google, they don't provide any way to delete them.
[1] Details: Clicking on Preferences -> Privacy -> Delete history (OS X) or Options -> Privacy Settings -> Clear history (Windows) pretends to delete the voice/video messages but it merely hides them from your view. If you re-install Skype on the same computer or run Skype on a different computer, all those "deleted" voice mails and video messages re-appear.
Google, Siri, Cortana and Amazon Echo all store your voice recordings. They are explicit about this in the fine print. Google and Amazon allow you to listen to these recordings yourself[1][2]. Not sure about Siri or Cortana.
Siri as of 2013: http://www.wired.com/2013/04/siri-two-years/
"Whenever you speak into Apple’s voice activated personal digital assistant, it ships it off to Apple’s data farm for analysis. Apple generates a random numbers to represent the user and it associates the voice files with that number. This number — not your Apple user ID or email address — represents you as far as Siri’s back-end voice analysis system is concerned.
Once the voice recording is six months old, Apple “disassociates” your user number from the clip, deleting the number from the voice file. But it keeps these disassociated files for up to 18 more months for testing and product improvement purposes.
“Apple may keep anonymized Siri data for up to two years,” Muller says “If a user turns Siri off, both identifiers are deleted immediately along with any associated data.”"
Google asks you if you want to store your data to improve their model. Presumably they augment their model of you with the voice data and corrections you supply (voice-typed words are blue in android, and selecting one brings up a dropdown of alternate words that would be stupid not to record...).
I guess I always knew this data was being recorded, but honestly I would rather that Google not have released it via this interface. My Google account is not as well-secured as Google's backend, and if someone compromises my account I don't want them to be able to download recordings of my voice saying things (I probably have enough google searches on voice to make a soundboard for anything).
But, I can also appreciate that few users will appreciate the privacy implications of choosing to personalize the google voice model without a service like this, so...
> I guess I always knew this data was being recorded, but honestly I would rather that Google not have released it via this interface. My Google account is not as well-secured as Google's backend, and if someone compromises my account I don't want them to be able to download recordings of my voice saying things (I probably have enough google searches on voice to make a soundboard for anything).
I would argue, though, that the majority of people want to keep control of their data as much as possible. The responsibility to take care of it and it's security is just a corollary of that.
>I would argue, though, that the majority of people want to keep control of their data as much as possible
This seems like a nonsensical statement in 2015. What basis do you have for it? Most people hand over control of their data without a care in the world.
> This seems like a nonsensical statement in 2015. What basis do you have for it? Most people hand over control of their data without a care in the world.
Have you watched the news lately? At least in Europe that is most definitely what users want and what the laws are giving them the right to and the courts are enabling them to do.
Unless of course, you are trying to derail this argument by meaning whenever something isn't saved on a hard drive in my safe it's out of my control. Which would be complete BS and irrelevant to the discussion (which is about "is Google trying to give control over their data to the users").
Skype, Google, YIM, AIM, etc. all store large portions of your communications far longer than you realize and their "clear" mechanisms frequently are local-only. [i.e. Skype, reinstall, things reappear]
> The first thing the article says is that users are probably aware Google stores all of these recordings. I, for one, was not aware. Am I really in the minority?
Nah. Most people are convinced the deletion stuff actually works and/or people wouldn't engage in this sort of behavior and are surprised when they find out.
The reason people don't care about privacy is they don't really comprehend what they've lost and how extensive it is.
I was not aware of this until a couple days ago, when I stumbled upon that (OP) article. Surprisingly, though, my privacy settings were already configured to not keep these recordings.
I was not aware of Google's policy for this issue, but I knew that Apple keeps all recordings for at least 2 years... so I was thinking that they did about the same.
Anonymized voice recordings? I can't recall if the last article that I read about that a while ago mentioned it, but either way, I would say that it is pretty much impossible to completely anonymize a voice recording.
Nor would anonymising a voice recording (presumably by doing some sort of frequency-domain transform) be of much use to them for the sort of thing they're using them for - training speech recognition on a wide variety of different voices.
Well, I don't know. A typical HN reader should not be surprised by this. You know technology, you know how Google operates, it should be fairly obvious that they store everything, forever.
When I saw the article, my reaction was "hey, thanks for letting me know what the link actually is". I was not surprised at all by it. I sort of expected it.
I assumed they did something to improve matching, but I never imagined they stored the 'raw' (as in actual voice, albeit probably compressed) recordings.
If you've kept up with internet privacy news the last few years from things like the snowden or ashley madison scandals, how is it not your assumption that everything that is exposed to the internet is stored and indexed somewhere forever?
There are laws against retaining user-data that the user has explicitly requested to be deleted and Google spends a lot of engineering time making sure, that is really the case (which is non-trivial e.g. with failing drives that you can't access anymore. You still have to delete the data from them, meaning you must know, at every point in time, the physical drive that a users data is saved on). I think 30 days is the guarantee they make.
as per my previous examples, with the ashley maddison website users would pay to get their information "deleted" and it ended up not being deleted at all, just hidden.
Perhaps that's a fair point if you are the kind of user that keeps up with all that you've detailed. To say that users who use of Google's voice controlled features are probably aware of it is another thing entirely, and highly unlikely.
I'm a bit surprised myself, more at the potential storage cost than the creepiness. Even at a low bit-rate, that's 10 MB/hr, which would add up quickly if many people are actually using Google voice search.
Google says Google was doing a bit over 3 billion queries a day. Let's say 1% are voice (which is prolly way high). 30M voice queries at 5 seconds each is 42K hours. At 10MB per hour that's 420GB per day. Seems rather trivial.
every time I use voice search on my mobile (the only place I use it) I get a toast popup that says 'saving audio to $email_address'. So I knew about this.
I use an android phone, I have however noticed that voice search on the desktop doesn't issue this warning.
Do people generally like that this UI exists, or not?
To everyone who says "wow that's creepy, I wish I had known, God knows what else they have-" do you feel better or worse seeing this UI available, and having the option to turn off services or remove specific records?
I'm honestly curious, because Google is obviously trying to react to the "transparency" criticism. I also wonder who would be offended by a UI that gives you a method to stop using the services you find creepy. Seriously, you can opt out, it explains how pretty easily.
Google knowing a bit about me & my voice is little scary, but hearing my 8yr old daughter ask about the cosmos & other scientific questions on Google, is to me, priceless.
Any chance you are willing to share the content of some of those questions (not the recording, just the questions)?
I've seen little kids ask me and other adults about what a cordless phone was (even "why does it have so many buttons?" from the young owner of an iPhone who failed to grasp the idea of a screen without touch input), why they can't interact with the cover of a magazine by poking at it, and why we can't just take pictures of bacteria with a smartphone camera instead of looking through a microscope. I'm curious what a child would ask Google and what the answers would be given the vastly different expectations each generation has of every day technology.
I have really bad experiences with voice recognition going all the way back to early Dragon Naturally Speaking versions and I can't imagine asking google voice search a question so instead I always just text search for relevant topics or go on the science exhanges/forums.
@akiselv:
Some of the questions my daughter has asked:
> how does an earthquake happen
> when was the first Christchurch earthquake
> what are the well known constellations
> how big is cat VY Canis Majoris
> what is the biggest star
> how old is Pluto
> what is Fusion
> how was the sun formed
And your daughter is only 8? Damn, nice job on the parenting. Might I ask what you've done as a parent to help get her to the point where she's asking these sorts of questions?
Early on I did show her how to use the Google Now to ask questions - like how far is the Moon & like. That's about all the parenting I have done in this regard. She has been watching way too many How Stuff Works videos lately - I want to see if that spills over into her being more curious about things.
Other comments say that these recordings are unlinked from your account after some time (6 months for apple) or you turn this feature off. In a previous HN [1] post we learned that third parties can listen to these anonymized recordings to improve the system, and there are actual humans listening to it...
What just jumped to my mind is that, leaving aside how unique your voice can be, or how identifiable your behaviour and patterns are (hint: a lot), there are a lot of searches that reveal unmistakable and straight personal info. What about:
"Hey phone, text [name of my special other] [naughty and quirky stuff]"
I wouldn't care too much if someone listened to me ask for a Starbucks, but I certainly don't want anyone to listen to that.
Playing with Maps one evening led me to find that at one point I had enabled some kind of tracking, and every single day of my movement was graphed in Maps. Where I went and how long I was there was recorded by Google.
I just had a look at the information Google has been keeping on me, and noticed that when I visited the visitor center for Zion National Park earlier this year, there was a photo of my car's license plate was associated with the event. Any ideas what's going on here?
If the photo was taken with an Android device, and uploaded to Google Photos, most likely Google associated your location with the GPS coordinates in the EXIF data of the photo.
I was at Zion a few weeks ago, and did not see any equipment on the road or in the parking lot that was automatically capturing license plate data, if that's what you're asking.
The title is misleading; as the article reveals, you can't delete the recordings, all you can do is ask Google to not tie them to your identity. (and hope that they actually do that...)
You can delete them (or at least, request that Google delete them); either by selecting the checkbox next to each recording, or by choosing "Delete options" from the vertical-line-of-three-dots icon in the top right. What it looks like you can't do is ask Google not to keep anonymized recordings of voice searches etc that you make in the future.
Even then, you don't really delete the recordings, you just delete the link between your account and the recording. As the article points out, the only way to stop Google from having any voice data on you (anonymized or not) is to stop using their voice features.
Remember that as the AOL search history leaks showed us, the anonymization is a fig leaf and it's not terribly challenging to re-associate a complete search query history with a person -- even without identifying the voice.
"Turning voice Activity off doesn’t stop Google storing your recordings, but it means they get kept with an anonymous identifier, and can’t be easily linked back to your account."
"Off" is the new "on". The Direct Marketing Association's definition of "not tracking" is similar.
Mine all say "paused" under "Activity Controls" [1]
Out of interest, why does Google use the term "paused" rather than "off"?
It makes me suspect that they are still saving it, but just hiding the fact that they are doing so from me. Or am I just paranoid?
Update:
The end of the article outlines this a bit better:
Turning voice Activity off doesn’t stop Google storing
your recordings, but it means they get kept with an
anonymous identifier, and can’t be easily linked back to
your account. If you want to stop Google recording your
voice at all, well, there’s only one solution: stop
talking to it.
I suspect a bit of it is psychology, too. If it's "Paused", it implies that it's usual to be enabled, both as peer pressure and for features/usability.
"This call may be monitored or recorded for quality assurances or training purpose. Your information is confidential and protected by the law."
Call centers do that long time ago. And our voices are stored and used globally. I wish they have a delete button too.
Delete? There's this thing called soft-deletes. Things which get stored on Google (and Facebook for that matter) will never ever be deleted when you, the person which added the content, decides to do this.
They can ofcourse, when they decide to free up storage space by actually deleting stuff which was already soft-deleted -- but given the price per gigabyte these days I bet this will never ever happen...
Decided to listen to some of my recordings. What seemed off to me was that it also records myself saying "OK Google". Meaning that the mic is always on. Do they store audio outside of me calling for Google?
They buffer the audio in a ring buffer constantly, so that the audio can be checked for a wake word. If the wake woRd triggers, then they send the contents of the buffer. The buffer shouldn't be super big. (E.g., one or two seconds). They use it to train their wake word model.
As someone generally critical and untrusting of Google, I assumed if you asked Google to delete your voice recordings, they actually did delete it. Kinda amazed that they don't.
I did not know they were keeping this crap, but it's kind of neat I can hear myself getting directions to the bakery on my wedding day to pick up the cake (donuts).