Google Edge TPU Devices

walterbell · on Feb 11, 2019

In Nov 2018, Google engineers who designed the TPU gave a presentation about the use of open-source Chisel for designing the ASIC, https://youtube.com/watch?v=x85342Cny8c

From https://techcrunch.com/2018/07/25/google-is-making-a-fast-sp...

> Google will have the cloud TPU ... to handle training models for various machine learning-driven tasks, and then run the inference from that model on a specialized chip that runs a lighter version of TensorFlow that doesn’t consume as much power ... dramatically reduce the footprint required in a device that’s actually capturing the data ... Google will be releasing the chip on a kind of modular board not so dissimilar to the Raspberry Pi ... it’ll help entice developers who are already working with TensorFlow as their primary machine learning framework with the idea of a chip that’ll run those models even faster and more efficiently.

seldridge · on Feb 11, 2019

If you're interested in playing with Chisel, the "Chisel Bootcamp" is now hosted on Binder meaning you can run through a fair amount of learning content in a browser [1,2].

As a longer, elaborating point: Chisel is much closer to the LLVM compiler infrastructure project than a new hardware description language. Chisel is a front end targeting the FIRRTL circuit IR. There's a FIRRTL compiler that optimizes the IR with built-in and user-added transforms. A Verilog emitter then takes "lowered" FIRRTL and emits Verilog.

Consequently, Chisel is the tip of the iceberg on top of which the Edge TPU was built. The speakers in the video mention this explicitly when explaining the "Chisel Learning Curve" slide and doing automated CSR insertion.

As a further elaboration, Chisel is pedantically not High Level Synthesis (HLS). You write parameterized circuit generators not an algorithm that is optimized to Verilog.

[1] https://mybinder.org/v2/gh/freechipsproject/chisel-bootcamp/...

[2] https://github.com/freechipsproject/chisel-bootcamp

mrlambchop · on Feb 11, 2019

This was an amazing sunday night rabbit hole to go down - thanks!

fizixer · on Feb 11, 2019

So my guess is that Chisel is one of the many responses to the two horrors that are VHDL and Verliog.

Unfortunately, Chisel is built on Scala, and I have no interest in learning Scala. Though I'm intrigued by the claim of using generators and not instances, and would be interested in a white paper that explains it in PL-agnostic terms (PL: programming language).

Also have on my to-do list MyHDL [1], a Python solution to the same problem. (has anyone tried it and found to be better than VHDL/Verilog?)

[1] http://www.myhdl.org/

pedroaraujo · on Feb 11, 2019

I get the impression that people who talk about the "horrors" that are VHDL and Verilog for hardware design are software developers who have little to no knowledge about hardware design processes.

There are reasons why VHDL/Verilog are still in use in the industry and why high-level synthesis hasn't taken off.

VHDL/Verilog for hardware design is not broken. I won't claim that there isn't space for improvement (because there is) but there isn't anything fundamentally broken in them. They are fit for the purpose and they fulfill all of the needs we have.

What could be massively improved is actually the functional verification languages we use, SystemVerilog for verification is in serious need of an overhaul.

dnautics · on Feb 11, 2019

OK. I'll bite. I only have experience with verilog, but it's basically uncomfortable to work with in the sense that there are absolutely no developer ergonomics. We're well into the 21st century and you'd think that our HDLs would learn from everything that the software world has learned.

1) the syntax is very finicky (slighttly more so than C, I'd say). Most software languages (thanks to more experience with parsers and compilers) have moved on from things like requiring semicolons, verilog has not.

2) writing tests is awful. Testbenches are crazy confusing. Much better would be some sort of unit testing system that does a better job of segregating of what constitutes "testing code" versus the "language of the gates". You would have a hard time doing something like, say, property testing using verilog.

3) there isn't a consistent build/import story with verilog. Once worked with an engineer that literally used perl as a verilog metaprogramming language. His codebase had a hard-to-find perl frankenbug which sometimes inserted about 10k lines of nonsense (which somehow still assembled a correct netlist!) but caused gate timings to severely miss and the footprint to be bloated. It took the other hardware developers one week to track down the error.

None of these things have anything to do with the fundamental difference between software and hardware development.

For chisel: at least to some degree, you can get some developer ergonomics from the Scala ecosystem, and do most of your unit, functional, in integration testing outside of verilog, in chisel, and last-minute autogenerate verilog and do a second round of testing to make sure chisel did everything right. It's the same reason why people do things like "use elm to develop frontend, compiling down to javascript" and it's a perfectly valid strategy.

bem94 · on Feb 11, 2019

> 2) writing tests is awful. Testbenches are crazy confusing. Much better would be some sort of unit testing system that does a better job of segregating of what constitutes "testing code" versus the "language of the gates". You would have a hard time doing something like, say, property testing using verilog.

SystemVerilog makes this distinction between RTL (language of the gates) and verification environment code (wrt. testing, they are different things in my experience) very clearly. SystemVerilog inherits much of what people dislike about Verilog, but it makes writing large verification environments much easier. Again, not without lots of potential pain points, but you can do an awful lot that way.

On a slight aside - it worries me (though perhaps unreasonably) about the different approaches to functional verification which come from the software world v.s. the hardware world.

Software verification seems to (generally) be a much more continious affair, while for hardware, there is an extremely intense period of verification before the product is delivered to a customer (as IP) or physically manufactured. This arises because fixing software bugs is cheap by comparison to fixing hardware (again, please accept my generalising!).

It makes me shiver a little to hear people applying software "testing" strategies and terms to verifying actual hardware. I don't know if this is reflected by their actual practice ofcourse. There is a lot of potential for the hardware community to make use of so many software development practices in their verification environments (Big systemverilog testbenches are giant class hierarchies which are far more akin to straight up software), but I'm yet to be convinced about hardware itself. The development constraints are so different, and the possiblity for continuous development is hindered by the hard cut off point (manufacture).

jsolson · on Feb 11, 2019

I am a software engineer who's been involved in the tapeout of a few ASICs (although none of the TPUs). Particularly when you plan to build a series of chips, the continuous approach taken by software is massively preferable. X v2 does what X v1 did, plus some additional things, and with all of the errata fixed. Also, you find the errata in X v1 after tapeout but before your driver team does, saving them an enormous amount of work trying to track down a driver bug that's actually a HW bug (maybe even one with a simple workaround).

bem94 · on Feb 11, 2019

> Particularly when you plan to build a series of chips, the continuous approach taken by software is massively preferable.

For sure. I think continuous integration and cataloging of things like coverage collection is something hardware development really benefits from.

The things that hardware development can learn best from the software world are (in my opinion) mainly down to developing and maintaining verification environments, because they are (mostly) just big software projects. The constrained random variety are anyway.

pjmlp · on Feb 11, 2019

I really don't get why some people get so much hyped up on typing semicolons.

Maybe we should write in our native tongues without any kind of punctuation.

throwaway2048 · on Feb 11, 2019

When we are talking hardware design languages, semi colons or not seem pretty damn far down the list of things that actually matter.

This feels like as shallow of a dismissal as "lisp uses too many parens"

dnautics · on Feb 11, 2019

Lisp does use too many parens. There's a gunning fog associated with debugging lisp, and that's one reason why I don't code in it even though professionally I have my choice in languages and scheme was one of the first I learned.

dnautics · on Feb 11, 2019

"I really don't get why some people would want sub-10-second completion of their unit tests. Why not just wait a 30s to a minute to test everything?"

pjmlp · on Feb 11, 2019

I really don't get what typing semicolons has to do with unit tests.

yjftsjthsd-h · on Feb 11, 2019

And yet, you typed the punctuation marks in your comment even though all of us would understand you without them.

dnautics · on Feb 11, 2019

I didn't type out the word "second"

pedroaraujo · on Feb 11, 2019

1. Verilog requires semicolons almost everywhere. Can you point to a specific example?

2. Are we talking about Verilog or SystemVerilog? Verilog is not suitable for functional verification, people usually use SystemVerilog and methodologies like UVM for verification.

3. It's hard to tell what your college did exactly but it sounds like he over-engineer something himself.

You are talking about the advantages of Chisel for functional verification, not for hardware design, which was exactly the point I was trying to make.

uxp100 · on Feb 11, 2019

RE: #3 I think Perl for Verilog metaprogramming is pretty common, but I'm not really sure, have rarely written the stuff myself. But I've seen it before.

dnautics · on Feb 12, 2019

It's an intel thing, apparently.

jfries · on Feb 11, 2019

(Hi Pedro!)

Maybe nitpicking, but languages like Chisel and MyHDL aren't really HLS. Here there is a straight-forward mapping between the written language and the rendered result, and there should be little surprise in what logic is actually generated.

I am convinced that some specimen of this class of languages will eventually overtake verilog. One feature I'm eagerly waiting for is an equivalent of Option/Maybe types, which makes it impossible to access some signals unless they are signaled as valid by a qualifier signal.

I'm curious about what improvements you would like to see in SystemVerilog?

fizixer · on Feb 12, 2019

It doesn't look like you're open to any serious criticisms of the two Vs, but the readers of your comment and mine deserve to look at the arguments and make up their own mind. Therefore, I'm linking the pages regarding rationale for some of the recent HDLs:

- Chisel: https://github.com/freechipsproject/chisel3/wiki/Frequently-...

- MyHDL: http://www.myhdl.org/start/why.html

- SpinalHDL: https://spinalhdl.github.io/SpinalDoc/regular_hdl

aseipp · on Feb 11, 2019

Chisel isn't high level synthesis. It's not a good name to describe it when the overwhelming majority of "HLS" projects are C/C++ compilers and are completely different beasts in design and theory. Honestly, almost every experienced hardware engineer I meet who's only heard of these languages thinks this, so I partially think it's a marketing failure, but I also get the impression HW engineers think literally anything that is not Verilog is "high level" which is just simply untrue. (If I had any say in the matter, probably the only real "high level synthesis" language that isn't just a tagline for "Compile C++ to Hardware" I've experienced is BlueSpec Verilog.)

I haven't used Chisel personally, but from my experience with Clash -- it is better to think of them as structural RTLs that that have vastly better abstraction capabilities than VHDL/Verilog have. And I don't mean whatever weird things hardware designers think up when they say "abstraction" and they chuckle about software programmers (before writing a shitload of tedious verification tests or using Perl to generate finite state machines or some weird shit but That's Cool And Good because most don't know the difference between a "macro" and a "preprocessor" and no I am not venting), I mean real abstraction capabilities -- for example, parametric types alone can drastically reduce the amount of boilerplate you need for many tedious tasks, and those parametric types inline and are statically elaborated much in the same way you expect "static elaboration" of RTL modules, etc to work. Types are far more powerful than module parameters and inherently higher order, so you get lots of code reuse. In Clash, it's pretty easy to get stateful 'behavioral' looking code that is statically elaborated to structural code, using things like State monads, etc, so there's a decent range of abstraction capabilities, but the language is generally very close to structural design. The languages are overall simply more concise and let you express things more clearly for a number of reasons, and often can compare favorably (IMO) even to more behavioral models (among others, functions are closer to the unit of modularity and are vastly briefer than Verilog modules, which are just crap, etc). Alternative RTLs like MyHDL are more behavioral, in contrast.

The biggest problem with these languages are that the netlists are harder to work with, in my experience. But the actual languages and tools are mostly pretty good. And yes, they do make verification quite nice -- Clash for example can be tested easily with Haskell and all Clash programs are valid Haskell programs that you can "simulate", so you have thousands of libraries, generators, frameworks etc to use to make all of those things really nice.

(This is all completely separate from what a lot of hardware designers do, which is stitch together working IP and verify it, as you note with the verification comment. That's another big problem, arguably the much more important one, and it is larger than the particular choice of RTL in question but isn't the focus here.)

Cyph0n · on Feb 11, 2019

> Unfortunately, Chisel is built on Scala, and I have no interest in learning Scala.

That's a strange reason for not wanting to reap the benefits of Chisel. Care to explain your rationale?

There is another compile-to-HDL "language" called SpinalHDL[1], so I would actually argue that Scala's metaprogramming features seem to be a good fit for this usecase.

[1] https://github.com/SpinalHDL/SpinalHDL

zozbot123 · on Feb 11, 2019

Rust has extensive metaprogramming features as well, so hopefully we'll be able to build a comparable framework starting from that.

Cyph0n · on Feb 11, 2019

I love Rust just as much as the next guy, but not everything needs to be rewritten in Rust...

xvilka · on Feb 11, 2019

This is why the suggestion to create language/tooling neutral intermediate representation, based on FIRRTL[1].

[1] https://github.com/SymbiFlow/ideas/issues/19

dnautics · on Feb 11, 2019

if you'd like something on a to-do, I built this once, but it was a long time ago, julia has gotten a lot better, and it only does combinatorial logic (not sequential logic). I had an idea of how to do sequential logic using lambda closures in julia, but I never got around to it.

https://github.com/interplanetary-robot/Verilog.jl

pjmlp · on Feb 11, 2019

VHDL is quite nice as Ada influenced language, not sure what is horror about it.

zavi · on Feb 11, 2019

> ...and I have no interest in learning...

Single biggest red flag when hiring engineers.

ralusek · on Feb 11, 2019

Good thing they're not asking for a job.

lambda · on Feb 11, 2019

Am I missing something, or are they not only not documenting the chip, they are also not even releasing the compiler, but requiring you to use a cloud based compiler:

> You need to create a quantized TensorFlow Lite model and then compile the model for compatibility with the Edge TPU. We will provide a cloud-based compiler tool that accepts your .tflite file and returns a version that's compatible with the Edge TPU.

This seems like a new low in software freedom, and pretty risky to depend on as Google is known to shutter services pretty often and could just decide to turn off their cloud-based compiler at any time they feel like.

jacquesm · on Feb 11, 2019

Chips without a public toolchain are not worth investing your time in. It is bad enough if your work is tied to specific hardware for which there may at some point not be a replacement but to not even have the toolchain under your control makes it a negative.

digitalzombie · on Feb 11, 2019

Seriously Intel is already struggling after buying Nervana.

I went to their shing dig and they were working their butt off to wow the developers who were invited. When I asked for hard number they were very mum about that and very evasive.

The timeline for Nervana chip have been always seemingly in this mystical horizon that is never solidified to a real date but over yonder.

Google is going to pull this crap? They got better software expertise than Intel though they may be able to do it. But after that fiasco with Angular 1 to 2 I wouldn't trust Google with any early version number.

Traster · on Feb 12, 2019

Nervana had a lot of other issues. It was trying to produce an ASIC with 50 employees. When they got aquired by Intel the first step they had to tackle was hiring the engineers necessary to actually produce an ASIC which innevitably slows down production, and then on top of that they got caught in the Intel 10nm bear trap.

narrator · on Feb 11, 2019

AI is too powerful a technology to let it out there to the masses. People might use it for killer drones after all. All users of AI must be tightly controlled and registered with the authorities!

This is the problem with certain kinds of technology that are bumping up against the edge of innovation. They're too powerful and if these technologies get in the hands of the DIY set, governments will lose control so they have to DRM and regulate everything. Heck, it's a problem with old technology. Many weapons aren't that complicated technologically, but their production and use are tightly regulated.

Edit: I'm not saying this is a good thing, I'm just deconstructing their though process for tight control over AI tech going forward.

imtringued · on Feb 11, 2019

>People might use it for killer drones after all.

For some reason drones are perceived to be completely different from all weapons that have existed before them. Those killer drones have existed for half a century. They are called missiles. Also the reason why UAV based fighter jets are not viable is because a cruise missile can be launched from 1000 miles away and for the cost of a global hawk you can send out more than a hundred of them.

If terrorists have access to explosives then it doesn't matter how they deliver them because most lucrative targets (= lots of people in a small area) are stationary or predictable. A simple bagpack filled with explosives was more than enough to injure hundreds of people during the Boston Marathon.

adrianN · on Feb 11, 2019

I can buy a drone on Amazon for relatively little money. I can't do the same with a rocket.

heyjudy · on Feb 12, 2019

You can make an unguided, explosive-filled rocket that can harm people for cheap from scrap. Insurgents throughout the world have done so for the past 40 years. That may not be as simple as Add To Cart, but it is well within the economic means of almost everyone.

flyingpenguin · on Feb 11, 2019

But how much is a backpack?

dr_zoidberg · on Feb 11, 2019

So the idea is to let Google do the right thing? Or Amazon, Facebook, Microsoft, Apple, ..., etc?

The "right thing to do" is to open up these technologies, so that everyone can harness its power, not hide them under the wing and discretion of the (already too) powerful.

heyjudy · on Feb 12, 2019

Except technology is amoral, it's up to the engineer and others to use it ethically and morally. The internet can organize hate groups and it can organize voters. smh.

dekhn · on Feb 11, 2019

it doesn't require a cloud based compiler; the quote above shows that you use TF-Lite, which is an open source project, or a cloud-based tool for people who don't want/need/have the ability to work with TF-Lite.

[UPDATE] I misread and assumed the previous case (where no cloud tool was required) was still true (I worked with previous versions of this device).

lambda · on Feb 11, 2019

Do you have a source for this, or are you just reading the statement I quoted differently than I am?

The way I read the quote, you use TF-Lite to produce a quantized TF-Lite model, and then use a cloud based compiler to compile it for the actual chip.

This is why I asked "am I missing something." Do you have a reference for where the compiler exists in the open source TensorFlow project?

Mostly, what I'm interested in is learning what capabilities their TPU provides, to see if it would be useful for other similar kinds of kernels like DSP (which, like machine learning kernels, also involves a lot of convolution).

So I'm interested in looking at what the capabilities of the chip are, seeing what could be compiled to it. But I haven't found those docs, or found a compiler that could be studied. But maybe I'm not looking in the right place.

Here's an overview of the architecture of their Cloud TPUs, which has some good architectural details but doesn't documet the instruction set:

https://cloud.google.com/blog/products/gcp/an-in-depth-look-...

xkapastel · on Feb 11, 2019

It does require a cloud based compiler, the accelerator doesn't run TF-Lite models directly. The cloud based tool is what gives you the actual executable.

royal_ts · on Feb 11, 2019

Which paid Services are shutdown regularly?

seanp2k2 · on Feb 11, 2019

There are even dedicated sites for the products they've killed off over the years:

https://www.wordstream.com/articles/retired-google-projects

https://gcemetery.co/

https://killedbygoogle.com/

https://en.wikipedia.org/wiki/Category:Discontinued_Google_s ervices

https://www.businessinsider.com/discontinued-google-products...

ijpoijpoihpiuoh · on Feb 11, 2019

Very few of which, if any, appear to be paid, which is what the parent poster is asking about. (Not sure if they edited that bit in after you responded?)

076ae80a-3c97-4 · on Feb 11, 2019

https://en.wikipedia.org/wiki/List_of_Google_products#Discon...

diminoten · on Feb 11, 2019

Google's reputation for shuttering services is vastly overstated, and beating this particular dead horse every time Google creates something new just wastes people's time.

mlindner · on Feb 11, 2019

Well they decided to not only make it closed source but also lock it up behind a http frontend so you can't even reverse engineer it. Criticizing Google for shuttering things suddenly that companies are depending on is rightly justified.

yalogin · on Feb 11, 2019

Google has mastered the art of using open source to crush competition like they did with Chrome and Android. They ever revealed their main moneymakers ever. For example they opened up their mapreduce technique after they have moved on from it.

delroth · on Feb 11, 2019

What is your source for Google having moved on from MapReduce in 2004 (date when the MR paper was published)?

potatofarmer45 · on Feb 11, 2019

This is a long time coming. I'm normally not a big fan of large companies building products in the embedded space that could potentially destroy competition and future innovation but this is needed.

Nvidia's embedded boards are EXPENSIVE. So expensive it limits the applications dramatically. They also require a different skillset in people to set up which drives up the cost.

We did an analysis for a security project that required visual inference. It turned out all the extra costs to setup with TX boards meant it actually made more sense to have mini desktops with consumer gtx cards.

I am excited to see the performance of the inference module. If it's decent at a good price, that opens up so many pi/beagle/arduino applications that were limited by both cost and form factor of existing options.

Eridrus · on Feb 11, 2019

Note that these chips only support TFLite, which is still pretty spartan atm.

tsbinz · on Feb 11, 2019

What are you missing? Not involved with the project, just using it, and so far we've been able to work around the limitations (for computer vision).

lanevorockz · on Feb 11, 2019

Not so sure how much this kit will cost but I wouldn’t put my hopes it will be cheaper per compute than nvidia tegras. It could be a good alternative for lower end compute though.

arthurcolle · on Feb 11, 2019

Which of Nvidia's boards are you referring to?

lanevorockz · on Feb 11, 2019

Nvidia provides a line of embedded systems for accelerated compute called tegra. It’s pretty awesome kit but costs from 150-500, depending of the compute necessary. Probably a new one will be announced in a months time, hence google is trying to get ahead

newfocogi · on Feb 11, 2019

The new Tegra Xavier kits costs ~$1300 and were first released to developers a few months ago

TorKlingberg · on Feb 11, 2019

Tegra is Nvidia's line of embedded SoCs. Are you talking about the Jetson boards and modules?

joshvm · on Feb 11, 2019

This looks cool.

Currently the only real options for amateur off-the-shelf (accelerated) edge ML are the Nvidia boards (but small carrier boards for the TX2 cost more than the module itself) or the Intel NCS which inexplicably blocks every other USB port on the host device due to its poorly designed case. There is the Movidius chip itself, but Intel won't sell you one unless you're a volume customer. The NCS also does bizarre things: the setup script will clobber an existing installation of opencv with no warning, for example.

There are various optimised machine learning frameworks for ARM, but I'm only counting hardware accelerated boards here. I'm also not including the various kickstarter or indiegogo boards which might as well be vapour ware.

There are no good, cheap, embedded boards with USB3 that I can find. There are a few Chinese boards with USB3, but none of them have anywhere near the quality of support that the Pi has.

Then camera support. The Pi has a CSI port, but it's undocumented and only works with the Pi camera. The TX2 is pretty good, but you need to dig through the documentation to figure things out. USB is fine, but CSI is typically faster and frees up a valuable port.

Finally another issue is fast storage. It's difficult to capture raw video on the Pi because you can't store anything faster than about 20MB/s. There are almost no boards that support SATA or similar (the TX2 does), so the ability to use USB3 storage would be welcome too.

If this is offered at a reasonable price point, it could be a really nice tool for hobbyists. It looks like they're trying to keep GPIO pin compatibility with the Pi too.

walterbell · on Feb 11, 2019

> If this is offered at a reasonable price point

Hopefully it will, since the voice and vision kits on the same AIY page are sold for $50 at Target.

> There is the Movidius chip itself, but Intel won't sell you one unless you're a volume customer

Single units are listed on this page, e.g. mini-PCIe board with Movidius VPU for $79: https://up-shop.org/25-up-ai-edge

joshvm · on Feb 11, 2019

Good find, although they're not out yet by the looks of it? And also pushing the price of a TX2 if you want the dev board plus a vision carrier (though it does have 3 VPUs).

I was referring to the Movidius Dev Kit which exists, but seems impossible to buy as a consumer.

zozbot123 · on Feb 11, 2019

The Raspberry Pi provides documentation for their GPU architecture, so it would be possible to provide support for that within open source machine learning frameworks. It would involve quite a bit of work, though, and the RPi is not really competitive with modern hardware in performance-per-watt terms, even when using GPU compute.

dividuum · on Feb 11, 2019

I believe, Idein did that. At least they regularly post impressively (for the Pi) fast examples to /r/raspberry_pi like https://redd.it/a5o6ou. It seems the result isn't available individually or as open source but only in the form of a service (https://actcast.io/)

joshvm · on Feb 11, 2019

There are some well optimised libraries, for example a port of darknet that uses nnpack and some other Neon goodies. You can do about 1fps with tiny yolo. Not sure if it used anything on the gpu though.

zozbot123 · on Feb 11, 2019

NEON is the CPU SIMD feature, it has nothing to do with the vc4 GPU.

joshvm · on Feb 12, 2019

Yes, I know. My point was that CPU-only deep learning is possible on the Pi if you don't need real-time inference. What I wasn't sure of is whether that specific port does anything on the GPU at all, or if it's only using NEON intrinsics.

bpye · on Feb 11, 2019

The Pi CSI connector can be used with any camera so long as you don't care for the ISP of the VideoCore GPU. If you want ISP then yeah, you need one of the devices the Foundation supports.

makomk · on Feb 11, 2019

I think there's also a few boards out there based on the RK3399Pro (not sure how it compares performance-wise though).

joshvm · on Feb 11, 2019

The rock960 boards do look nice, and tick a lot of boxes in terms of peripherals. My only concern is documentation.

There's a review here though: https://fossbytes.com/rock960-review-affordable-six-core-arm...

imtringued · on Feb 11, 2019

That review is disappointing because he doesn't even run a benchmark to test if it's suitable for machine vision.

joshvm · on Feb 13, 2019

I just bought one, largely for machine vision. What benchmark would you be interested in?

justinjlynn · on Feb 11, 2019

> You need to create a quantized TensorFlow Lite model and then compile the model for compatibility with the Edge TPU. We will provide a cloud-based compiler tool that accepts your .tflite file and returns a version that's compatible with the Edge TPU.

I seriously hope that's not the only way they're expecting people to compile models for this particular TPU.

dejv · on Feb 11, 2019

As a person with access to their documentation I can confirm that this is currently the only way to compile model for this TPU. Also list of supported network architectures is very very short.

Well this is just begining I am sure they are going to expand its capability.

justinjlynn · on Feb 11, 2019

I'd rather they enable independent development of models on the hardware they're selling. This is about as useful as a high performance electric car you can only charge at authorised dealerships. Dealerships which have the unfortunate habit of closing down after a few years or so.

dejv · on Feb 11, 2019

I bet there is some technical reason behind allowing just few architectures and I hope they will fix it in future releases. Currently you can't even run ResNet type of networks on it.

justinjlynn · on Feb 11, 2019

Well, one can hope.

gtm1260 · on Feb 11, 2019

This has been around for a while but has been stuck at 'Coming Soon' forever. Does anyone know what the status of this project actually is? I suspect that it has been stalled for some reason or the other.

mondoshawan · on Feb 11, 2019

One of the Google engineers on the project here. No, we haven't stalled. Keep an eye out. :-)

alvar0 · on Feb 11, 2019

What kind of performance can we expect, for example compared to the intel ncs2?

Aduket · on Feb 11, 2019

do you have any price estimation?and performance wise, what is its equivalent in the market?

alvar0 · on Feb 11, 2019

75 / 150 usd. usb module / full soc board

lanevorockz · on Feb 11, 2019

Google has been marketing the TPUs for a long time but they were not even using them much as they were still on Nvidia stack. Not sure if it changed on the past 9 months but my guess is that they are able to run tensorflow on them.

theDoug · on Feb 11, 2019

This is nonsense.

TaylorAlexander · on Feb 11, 2019

Yep. Google definitely uses TPU internally.

theDoug · on Feb 11, 2019

We do sell access to a huge variety of NVIDIA GPU via Google Cloud Platform[1], which is maybe where the poster's confusion came from. Shrug.

1: https://cloud.google.com/gpu

dwrodri · on Feb 11, 2019

A related edge computing AI accelerator: https://www.crowdsupply.com/up/ai-core-x

pilooch · on Feb 11, 2019

What is the use of 100fps vision models other than being the input to a controller (e.g. driving, flying etc) ? A raspberry can hold up to 3fps with standard open source frameworks, and this is enough for many applications, e.g. construction site surveillance etc... Not criticizing, rather a genuine interest to understand the edge ML vision market.

dejv · on Feb 11, 2019

I did work on optical sorting machine: you have stream of fruits on very fast conveyor belt and then machine vision system scans the passing fruits, detects each object and reject (by firing stream of air) those that don't pass: those can be molded fruits, weird color or foreign material like rocks or leaves. 100 fps might be enough, but faster you go, faster your conveyor belt could be.

ovi256 · on Feb 11, 2019

The 100fps model is also much more efficient in W/Flop, or J/Flop, or W/fps which is very important for embeded and mobile applications. You can design your construction site surveillance system to record 10 frames a minute while sleeping the ML accelerator and then process 100 frames all at once in a few seconds, which reduces the duty cycle tremendously, improving battery / device life.

michaelt · on Feb 11, 2019

Barcode scanners routinely perform 100 scans per second.

Of course, some would say machine learning is needless complexity if you just want to scan barcodes :)

Rafuino · on Feb 11, 2019

How would someone compare the Edge TPU Accelerator with a Movidius Neural Compute Stick? https://software.intel.com/en-us/movidius-ncs

m0zg · on Feb 11, 2019

Movidius is not a TPU. It's more like a GPU, but with SIMD, DSP and even VLIW capabilities and with a _very_ wide memory bus (and massive throughput). It's rather impressive actually, but probably serious overkill for what really needs to be done during inference: https://en.wikichip.org/wiki/movidius/microarchitectures/sha.... Whereas a TPU is highly specialized for just, you guessed it, processing tensors, which basically means matrix and vector multiply. It's a systolic architecture, so it also (purportedly, since I don't have insider knowledge) stores the weights for the computation for the duration of the computation.

TorKlingberg · on Feb 11, 2019

As far as I understand "TPU" is Google's brand name, so of course competing products are not TPUs. There is an overlap in what you can do with the devices, so a comparison of their strengths would be useful.

sanxiyn · on Feb 11, 2019

We hardly know anything about Edge TPU. Yes, TPU uses systolic array, but there is really no reason to believe Edge TPU will use it. Edge TPU is not TPU.

m0zg · on Feb 11, 2019

It has "TPU" right its name. If that's not a dead giveaway, I don't know what is. :-)

bpye · on Feb 11, 2019

I don't know enough about this but how do these devices compare to the Sipeed MAIX devices [1] I saw mentioned on HN the other day? They seem to both support TensorFlow Lite but that's where my ability to understand their capabilities end.

[1] - https://www.indiegogo.com/projects/sipeed-maix-the-world-fir...

joshvm · on Feb 11, 2019

The risk with crowd funded chips is support and longevity. The reason that the Raspberry Pi wins every time against technically superior competition is that it's well supported and there are reasonable supply guarantees.

The same goes for big companies of course. Intel has a habit of releasing Iot platforms and then killing them. Let's hope the TPU lasts a bit longer.

As for a comparison, it's impossible to say until Google releases benchmark information on the edge TPU, or some kind of datasheet for the SOM.

ocdtrekkie · on Feb 11, 2019

Given Google's tendency to kill products and shift priorities rapidly, I think building a product or service dependent on a supply of their hardware is probably a pretty risky choice.

I definitely have been shocked how fast Intel maker boards have come and gone though. It feels like Intel has written them off before anyone's tried to build a project using one. I have one sitting around here somewhere that's never so much as been powered on.

joshvm · on Feb 11, 2019

It's very hard to beat the traction that the Pi has. I think because it's explicitly targeted towards people without any embedded experience, there's been a lot of pressure to make things work and to make the documentation somewhat organised.

Intel made some nice little boards, but there wasn't much publicity and actually getting started with them wasn't easy at all because the docs were buried. They were usually modules designed for integration, not standalone devices.

With the Pi you can buy a kit, plug in the SD card and boot ot desktop in minutes.

AndrewKemendo · on Feb 11, 2019

Are they going after the Movidius [1] with this?

[1]https://software.intel.com/en-us/movidius-ncs

petra · on Feb 11, 2019

So they will sell develoent boards, without selling IC's.

Seems like a nice way to gather ideas and data about new products.

xrisk · on Feb 11, 2019

How expensive is this likely to be? Feasible for a hobbyist to purchase?

Odz86 · on Feb 11, 2019

The Dev Board is $149.99, the USB Accelerator is $74.99. (I was invited for Beta Program.)

neuromancer2701 · on Feb 11, 2019

The NXP® i.MX 8MQuad board is available for 150$ and has USB3 and PCIe. The TPU would probably be attached through one of those buses. I would bet around 250$ with the TPU which is pretty good and puts it around half the Jetson Tx2, 1/5 the Xavier. I wonder if the TPU could be used for SLAM not just object identification, now that would be useful

voltagex_ · on Feb 11, 2019

Do you know where I can buy the board itself? (without the TPU), I only saw "SOM" versions which I'm not equipped to use.

lanevorockz · on Feb 11, 2019

Plenty of affordable options on the market already. For a hobbyst I would highly recommend to use tegra instead. It is compatible with all frameworks and won’t lock you down on google stack

polskibus · on Feb 11, 2019

Is this anything worth considering, if it cannot be used for training?

0xbadcafebee · on Feb 11, 2019

When I see products developed by Google, I imagine them as Replicants. Developed by an advanced tech company, full of futuristic technology and amazing potential, and destined to die in ~4 years.

jrockway · on Feb 11, 2019

The competitors don't really keep chips around for longer. Intel isn't manufacturing Skylake anymore. Nvidia isn't manufacturing Maxwell GPUs anymore. (Incidentally, Apple did appear to be using their 4-year old A8 SoC in the first HomePods, released in 2018, though.)

Hardware and software are different things. We are all sad that Google Reader doesn't exist anymore, but every silicon product has basically been a flash in the pan. They make it, you buy it, and by the time it's shipped to you, it's announced as obsolete. That's the pace of that industry. Maybe with Google's attention span, they should have been a hardware company all along. They will fit right in.

0xbadcafebee · on Feb 11, 2019

My example was inaccurate, because at least dead Replicants can be replaced with newer models, whereas dead Google products have no follow-up model and require completely replacing what you had created around that product. That's something seemingly unique to either vaporware start-ups, or Google.

jacquesm · on Feb 11, 2019

At least you can still write software for those. If Google decides your particular flavor of chip is no longer supported then good luck. Besides that, good luck to acquire those chips in the first place, 'coming soon' without a stated delivery date may well translate into 'never'.

I'll stick to the usual suspects before I get roped into some cloud based development system. Why does Google need access to my IP to begin with?