bkbridge's comments

bkbridge · on Feb 24, 2017

If you are doing AI, machine learning, it's all Python. Other language support yes, but python is it. Alexa does not support Go.

> https://www.tensorflow.org/

StreamBright · on Feb 24, 2017

You mean the API that you are calling is in Python. I think the heavy lifting is mostly C/C++.

kstrauser · on Feb 24, 2017

Not to mention that Jupyter (was IPython) is a gorgeous way to work through a dataset while describing your process to your peers.

vegabook · on Feb 24, 2017

unfortunately this is true. For one thing Go doesn't have a REPL - major problem for data science/AI/scientic programming. And it's very committed to old-style procedural/imperative paradigm.

Numpy is carrying Python, and its competitors are thin. There's Julia, but that's not really ambitious enough compared with Numpy to be worth moving to. There's Scala though then you need to love the JVM. Lua had promise as a nice and tight, pragmatic and fast language, but it seems to have missed its window of opportunity. What else? I'd genuinely like to know. Ideally I'd like something that's got a functional programming flavour but that also data-parallel, almost R-style, by default, so as easily able to target the GPU. I took a look at futhark but it seems too experimental, as does Ocaml/spoc combo. I am taking a serious look at the probabilistic programming area with Edward or Stan, but then we're back to Python front ends.... Actually I did look at Swift which does c-class speed with an official REPL but an apple-supported language may not attract the science/ML crowd. Meantime rust is REPL-less for now...

bkbridge · on Feb 1, 2017

This stuff is complicated. No one gets that, no one.

bkbridge · on Jan 15, 2017

Sounds OK by me :-)

> Paypal founder Peter Thiel becomes marijuana's first big investor

> https://www.theguardian.com/business/2015/jan/08/cannabis-in...

bkbridge · on Jan 10, 2017

And that is why San Francisco and LA are going to EXPLODE with tourist. Snapchat central, here we come. The Kids are running the show now. Amsterdam was cool, but that's kind of last year.

LA is rocking! :-)

NamTaf · on Jan 11, 2017

Question from someone who hasn't been to the US in 2 decades:

Is the MJ culture of the US where it's legalised (Colorado, LA, etc.) such that you get coffeeshops like the Netherlands, or is it more a dispensery style arrangement where you buy it for consumption in a private premises such as your home, or both?

passivepinetree · on Jan 11, 2017

Dispensary style. Public consumption is almost universally outlawed AFAIK.

bkbridge · on Jan 10, 2017

Hmmm, going from someone with a pretty big Wikipedia page, to someone with just a linkedin entry. Guess it's fine. But Apple seems to be in a bit of a tail spin as of late.

Sure they'll do amazing.

hk__2 · on Jan 10, 2017

Having a Wikipedia means you’re quite known. Not having one doesn’t mean you’re incompetent.

bkbridge · on Jan 5, 2017

Money goes where it's treated the nicest. So where is it going now?

bkbridge · on Jan 5, 2017

Well, that's the coolest thing today for sure. WOW! :-)

Seems to make building VR worlds super simple.

bkbridge · on Jan 4, 2017

Was at an Alexa meetup. One of the senior Alexa developers was speaking. What was said blew my mind:

We want you to use conversational UI, on any device, Apple's, MSFT, Google, anyone you like. We think in the end, you'll have the best experience with Echo.

How often does a mega tech company encourage you to use the competitions products?

Think Amazon hit out of the park. Hey Google, just does not cut it for me.

Alexa, and that sultry voice, just no comparisons. (IMHO).

bkbridge · on Dec 27, 2016

Alexa only waits for the wake word "Alexa." It would cost more money than God in hardware to store every thing Alexa ever heard. She saves you're requests. And shows you those request in the Alexa App. And you can also delete it if you wish.

There is no data for the police to have, because beyond requests, there is no data.

Unless someone knows more about this than Amazon is telling us?

cryptarch · on Dec 27, 2016

It would cost about $30m a year if you tailor the system to flagging specific data for storage and don't naively store every moment (e.g. you scrap silent moments and use VBR encoding).

Storing a year's worth of 96kbps audio costs 380GB. If you don't record silence and you assume the people around an Alexa are only speaking for at most 4 hours a day on average, that goes down to 76GB a year.

So if you then assume 5m Alexa's are active at any given point in time that works out to 380k PB. Ok, that doesn't work yet.

However, if you then layer on a flagging system, where only certain users' full record is stored, or only "suspicious incidents" are stored, and you get this down to only flagging 0.1% of all data, you arrive at 380PB of storage.

Amazon Glacier costs about $88.000 a year per PB, but there's a profit margin included in that, so I'll assume it costs Amazon just $75k a year.

In conclusion, it would cost Amazon about $28.5m a year to run such a system. That's certainly within the realm of possibility and of what LE/SIGINT clients would pay; I assume the NSA would gladly pay that sum x100 for that capability. Sounds like it'd be booming business for Amazon.

dannypgh · on Dec 27, 2016

I think you're orders of magnitude off in marginal cost estimates for glacier users. Datacenters are being built out for a small number of commercial users (e.g. Amazon's core business) and the size of modern HDDs would lead me to estimate that storage is about free in a modern datacenter, the scarce resource is disk-time for read/write operations. That is, projects like glacier let Amazon sell disk that would have otherwise been stranded.

It is also the case that a consumer level service like glacier presumably has more redundancy than what might be needed for best-effort storage of these recordings, where losing any fraction of them wouldn't really be a problem.

cryptarch · on Dec 27, 2016

I'm not in the datacenter business, so I've been conservative for lack of experience with storage at PB scale.

I've chosen to err on the side of estimating it to be more expensive, because I think that makes the end result more convincing:

30m is chump change for parties like Amazon, and in reality it'll cost significantly less. 1m might well do. Maybe it's less still. You could combine flagging users with flagging low-certainty or keyword-containing transcriptions.

Either way, you don't need collusion with intelligence parties, just an unscrupulous or naive exac at Amazon that thinks the data might be worth a lot for training future learning models. Of course the more sinister but legal reselling to government agencies is a financially attractive option as well.

tclancy · on Dec 27, 2016

I really like the math here, but isn't this a bit pointless? The system wants to parse meaning from audio; storing just the text it parsed is a lot smaller. Store just the text and whatever machine learning score of how probable the text is correctly parsed and that sounds like something prosecutors would love to bring into court: "Please read this line and let's see what score you get . . . "

kgwxd · on Dec 27, 2016

For improvements they'd store the raw input so that when a mistake happens they can manually try to figure out why the machine got it wrong (e.g. a hi-hat was hit while they were saying "deuce" so it sounded like "douche").

GordonS · on Dec 27, 2016

The raw speech would still be very useful as training data for new and improved models

CamperBob2 · on Dec 28, 2016

It could also store compressed voice waveforms in such a way that any reproduction from the compressed data would sound horrible but would be at least somewhat intelligible to human listeners.

1200 bits per second is almost enough for toll-quality speech -- and I'm referring to the state of the art a few years ago. Speech codecs are probably better now. But let's stick with 1200 bps. That's enough to store continuous speech in the vicinity of the device for a year, using only about 5 GB.

My guess is that if you cared only about intelligibility and not fidelity, you could do the job with 10%-20% of that space.

So yes: Alexa could easily be collecting and storing a vast amount of data that isn't immediately transmitted or used.

darkstar999 · on Dec 27, 2016

96kbps is pretty high for voice. You could get away with 48 or less.

funkyy · on Dec 27, 2016

64kbps would be sufficient to hear back noise and whisper, so that would make more sense.

lozf · on Dec 27, 2016

Way less with modern speech codecs. Even Opus at 32kbps would be overkill for the required quality.

throwaway7767 · on Dec 30, 2016

Opus is quite usable for speech data down to 8kbit/s, even 6kbit/s is mostly understandable. At 10-12kbit/s you have good quality voice recordings.

chrishacken · on Dec 27, 2016

Based on the costs of disks alone, their cost per PB is actually around $35k. They likely get a volume discount, so we can lower that estimate even more and say $25-30k. Bandwidth is essentially free even though they charge ridiculous amounts of money for it on their services. You can get a 10Gbps link for as low as $2,000 if you buy it on-net and in bulk, Amazon probably gets it even cheaper. So ~3PB/month for $1,000/month.

trevyn · on Dec 27, 2016

Pointer? My understanding was that economy 10Gbps transit would be closer to $4-5k/month.

chrishacken · on Dec 27, 2016

http://www.he.net/

"Get BGP+IPv6+IPv4 for $0.25/Mbps!"

I thought it was HE, but it must have been someone else that had a 10Gig deal for $2,000. Either way, that's for a single 10Gig link. If you're buying 100Gb - 1Tbps like Amazon is, you're probably getting an even better deal.

We just signed a contract with Level 3 for slightly more than the price you mentioned, but they had to build into us; which costs them ~$120k out of pocket, thus the higher price.

coding123 · on Dec 27, 2016

Actually you really only need to store about 1 month of data. If the police request something Amazon can lock the account from auto-deletion.

cryptarch · on Dec 27, 2016

How so?

What about abduction cases, inside trading, tax fraud, drug and human smuggling? There it could help to have data from months ago, so any newly discovered targets instantly come with a bunch of evidence.

Flammy · on Dec 27, 2016

> It would cost more money than God in hardware to store every thing Alexa ever heard

Depends, first of all storing compressed audio isn't that space-expensive, especially in some long term data storage like s3. Additionally they could only be storing the transcriptions, but not the voice behind them, which would be a lot less data.

We don't know as Amazon hasn't been very forthcoming about the privacy aspects of Alexa. I personally suspect they are keeping some voice information so they can use it to improve their NLP. I hope they are doing so in a way that is detached from accounts / IDs, but you never know.

Additionally, you can indeed delete a record of the query from the app, but who knows if the voice data or even the query itself is still stored after deletion, just not visible to us end users.

troncheadle · on Dec 27, 2016

> but who knows if the voice data or even the query itself is still stored after deletion, just not visible to us end users.

Almost definitely yes. I've never known a tech company that truly deletes anything

api · on Dec 27, 2016

"Never really delete" is actually standard advice. There are loads of reasons, mostly non-nefarious, why you may want or need that data.

Sometimes deleted stuff is archived offline or in slow warehouse databases that are not live, etc.

wccrawford · on Dec 27, 2016

If it stored everything (and not just requests after the watch word) then it would end up trying to store audio or transcriptions of so many hours of tv and random conversations that it would be ridiculous. And that's just my house. I imagine most people have one somewhere near a TV, and it would do the same.

Flammy · on Dec 27, 2016

I'll point out Facebook is using this always on recording for advertising purposes and one of those is to fuel a nielsen-like TV/movie/audio popularity business.

Basically, Facebook's always-on audio listening on their mobile app (Messenger I believe, but might be both these days) was giving this data. I can't remember the name of the company, but here is another tech company doing the same:

> Symphony uses just one: an app, downloaded to the cellphones of its more than 15,000 panelists. Audio recognition software then picks up whatever people are tuning into, wherever they’re tuning into it: their TV sets, their laptops, or their smartphones. “[It] measures everything you want to measure from one approach,” says Bill Harvey, a media research consultant who’s worked with Symphony

https://theringer.com/tv-ratings-streaming-nielsen-symphony-...

mysterydip · on Dec 27, 2016

I would think it would be possible and even beneficial to dedupe the data (15m homes x NFL broadcast, for example). Link a list of each echo's text conversion given similar data but perhaps different background noise. Or maybe getting data from multiple echos in different homes at the same time allows for "noise" filtering (people asking different things while the same background noise is present).

_Codemonkeyism · on Dec 27, 2016

Can't vote you up enough.

praseodym · on Dec 27, 2016

Speech can be encoded with less than 10Kbps [1], which means a maximum of 108MB of data per day. I don't think that's an impossible amount of data.

[1] e.g. https://en.wikipedia.org/wiki/Adaptive_Multi-Rate_audio_code...

_vya7 · on Dec 27, 2016

In fact that's just 40gb per year, which is pretty doable even on a local SD card that Alexa could be fitted with. Or even if it stores one month and deletes the oldest one every day, that's still 3gb. Very doable.

agentgt · on Dec 27, 2016

It is also feasible the device could just wait and transmit only voice like data and drop other sound data... (Even my crummy baby monitor can detect the difference.)

Speaking of which I wonder what the net traffic usage of the Echo is?

amelius · on Dec 27, 2016

You could even convert it to text before sending.

Nadya · on Dec 27, 2016

Especially since Alexa already converts it to text! It wouldn't be outside the realms of possibility.

Would it be possible to test this? Check the battery life of the Dot in a completely silent room vs the battery life of a Dot listening to an audiobook played on repeat. If it is actually listening and transcribing it should have a higher power consumption and thus die faster - right?

ethbro · on Dec 27, 2016

For NLP research, you'd want something that preserved more information than text.

Questions for anyone in the field: how much is preserved? Is there a < audio but > text form that allows for iterative testing? Maybe the output of a first-pass pheneme decoder? If so, what kind of space requirements?

visarga · on Dec 27, 2016

> Speech can be encoded with less than 10Kbps [1], which means a maximum of 108MB of data per day.

People only speak a few hours per day and "interesting" conversations could be sampled from time to time and some Alexa stations flagged for full upload, if they want to know.

MiddleEndian · on Dec 27, 2016

Https://ubrp.arizona.edu/study-finds-no-difference-in-the-amount-men-and-women-talk/

Average man uses 15,669 words/day, woman uses 16,215. Let's say average household has two people and half those words are spoken at home, 31,884/2 = 15,942. Let's say 8 bytes per word on average, just under 128k/day. That's a little under 47 MB/year. Not too expensive.

Edited with better source/numbers.

9NRtKyP4 · on Dec 27, 2016

> There has never been any "study" showing that "women talk almost three times as much as men", although this non-existent "research" has been cited by dozens of science writers, relationship counselors, celebrity preachers, and other people in the habit of claiming non-existent authoritative support for their personal impressions;

MiddleEndian · on Dec 27, 2016

Well shit my bad. That was the top result from Google with just those numbers quoted when I searched for average number of words spoken per day. I should've been more thorough. Let me see if I can find better numbers.

Edit: Done

tyho · on Dec 27, 2016

> It would cost more money than God in hardware to store every thing Alexa ever heard

About $2.55 / year [0][1][2], or, storing in AWS S3 "Infrequent access storage", about $180 after 5 years of recording [3][4].

[0]: https://www.amazon.com/dp/B00684XVFS/

[1]: https://wiki.xiph.org/Opus_Recommended_Settings#Recommended_...

[2]: https://www.wolframalpha.com/input/?i=0.027+USD%2FGB+*+24+Kb...

[3]: https://aws.amazon.com/s3/pricing/

[4]: https://www.wolframalpha.com/input/?i=(1%2F2)*(0.0125+USD%2F...

clavalle · on Dec 27, 2016

Unless the guy said "Alexa, how do I get rid of a body?" or "Alexa, play 'The Sound of Silence'" just after the time of death, I think you are right.

I could imagine some sort of log data being used to refute an alibi but what is implied by what is missing from the article, that it could be used as an after the fact witness, is not really feasible.

bsharitt · on Dec 27, 2016

Even if Amazon doesn't store all audio by default now, I would if it could become a thing(if it's not already) for law enforcement to basically get a warrant for Amazon(and Google and everyone else who gets into this space) to store everything and send it to law enforcement for specific individuals.

MichaelGG · on Dec 27, 2016

A year of audio at 32kbps (more than enough for speech) is only 120GB. That's only a couple bucks of month on S3 -- at retail pricing. So it's technically and financially feasible to store everything if Amazon wanted to.

jsjohnst · on Dec 27, 2016

If it's just spoken voice, the customary value is actually 8kbps, mono.

sixothree · on Dec 27, 2016

> It would cost more money than God in hardware to store every thing Alexa ever heard.

Actually it would be less than 60 GB per device per year for 24 hour recording at 15 kilobits per second CBR. This includes all silent hours.

My guess would be less than 5 GB per device per year to record all spoken words.

hueving · on Dec 27, 2016

Don't confuse what you are allowed to see in the app with what Alexa captured and sent to Amazon. There is no reason to think they are equivalent.

visarga · on Dec 27, 2016

> Alexa only waits for the wake word "Alexa."

Maybe it also wakes on "bomb" and "infidel".

api · on Dec 27, 2016

How many other words and sounds sound a bit like Alexa?

bkbridge · on Dec 14, 2016

As a New Yorker, all we're hearing here is LA, LA, LA. Just a heads-up.

Smoke a joint in NYC, spend a night in jail. Smoke a joint in LA, change the world. How I look at it all.

We got Brooklyn, and that's about it.