> Here at Google Cloud, we want to provide customers with the best cloud for every ML workload and will offer a variety of high-performance CPUs (including Intel Skylake) and GPUs (including NVIDIA’s Tesla V100) alongside Cloud TPUs.
We fundamentally want Google Cloud to be the best place to do computing. That includes AI/ML and so you’ll see us both invest in our own hardware, as well as provide the latest CPUs, GPUs, and so on. Don’t take this announcement as “Google is going to start excluding GPUs”, but rather that we’re adding an option that we’ve found internally to be an excellent balance of time-to-trained-model and cost. We’re still happily buying GPUs to offer to our Cloud customers, and as I said elsewhere the V100 is a great chip. All of this competition in hardware is great for folks who want to see ML progress in the years to come.
Any plans to support AMD GPUs and the Radeon Open Compute project? The AI/ML community really needs viable alternatives to NVIDIA, otherwise they will continue to flex pricing power. Google, via TensorFlow, is in a phenomenal position to promote open source alternatives to the proprietary Deep Learning software ecosystem that we see today with CUDA/CuDNN.
"32-bit floating-point precision math units for scalars and vectors, and 32-bit floating-point-precision matrix multiplication units with reduced precision for multipliers."
ATI demonstrated FP24 was frickin' awesome over a decade and a half ago. it wouldn't surprise me in the least if you went somewhere like that, but it perplexes me as to why you think that's secret sauce in any way long after ATI nearly destroyed NVIDIA with FP24 back in the early days of DirectX 9 and NV3x.
This isn’t exactly correct.
ATI pulled a “fast one” and went with 24bit despite the initial DX9 spec called for 16/32 bit floats which NVIDIA followed.
Once DX9 was split into DX9b and c that “advantage” went away and NVIDIA proved that 16/32 bit was better, something that ATI also had to adopt once MSFT told them enough is enough.
24bit is only better as long as it can do everything 32bit can do and it’s advantageous to build a hardware with 24bit FPUs instead of 32bit FPUs that can also do 2x16bit ops per cycle.
Basically if the silicon cost allow you to put far more 24bit FPUs than 32/16bit ones.
And history proved that this isn’t the case.
For gaming eventually even 2:1 FPUs went away since they are costlier than only 32bit FPUs with promotion.
Maybe in the future we’ll have a 24bit FPU that can also do 3 8bit ops or 16bit+8bit op per cycle if it will be more beneficial than the current 2:1 16/32bit model.
I personally would stick to FP32 across the board for my ML efforts, but we have an entire cottage industry of people coming up with approximations to drive up perf and perf/W, all of which will prove irrelevant until Moore's Law runs out IMO. And even then, I'll still stick to FP32 personally. Speaking from direct experience, bulletproof mixed precision is tough.
I don't think it is secret sauce. If you're gonna let customers send operations to these TPU's, one could figure out what kind of multiplier is used almost immediately upon inspection of a few inputs and outputs.
We’re always exploring the best hardware for the dollar. We’re a founding member of OpenPOWER and to your question about AMD parts, we’ve previously (publicly) run Opterons when they were the best choice. At this time, we don’t have any announcements to make :).
But I’d like to note that even if we were to use parts internally at Google (or not!), that for Cloud what matters is market demand. If there really was enormous customer demand for say ARM64, then we would look into it, even if the rest of Google wasn’t interested.
That $6.50/hr rate might be the big deal here. Amazon does offer instances with a V100 GPU (https://aws.amazon.com/ec2/pricing/on-demand/, the P3 instances), but if you're training something like ImageNet, you'll want the biggest image (p3.16xlarge) at $24.48/hr.
Attaching a VM of similar power to a TPU on Google Compute Engine is much cheaper (https://cloud.google.com/compute/pricing, n1-highmem-64, +$3.78/hr to the TPU cost for $10.28/hr total).
Per recent benchmarks for training ImageNet (https://dawn.cs.stanford.edu/benchmark/), training ImageNet on a p3.16xlarge cost $358, when this post claims it'll cost less than $200. (EDIT: never mind; the benchmark uses ImageNet-152, and Google compares TPU performance against ImageNet-50) Interesting.
Back of the envelope, a TPU costs a little more than 2x as much as a Volta on AWS P3, and delivers a little less than 2x the performance (180 TOPs for the TPU, 100 for Volta). On a raw performance/$ metric, I'm not sure the TPU is that interesting.
It might be worth it if I were willing to pay a huge amount to get back results from an experiment faster, by using lots of TPUs- distributed learning on GPUs doesn't seem easy yet.
Peak ops/second isn’t the only thing that matters though. You have to be able to feed the units. The V100 does lots of finer-grained matrix multiplies which can make it harder to keep up.
Don’t get me wrong, the V100 is a great chip. And we’re all looking forward to more (preferably third-party) benchmark results, to tease out when one is the better choice for a workload. But don’t just compare ops/second or any other architectural number.
TPUv2 has 600gb/s per chip x 4 chips, so 2400gb/s [1].
As we've discussed elsewhere [2], comparing TPUv2 to V100 on a per chip basis doesn't make much sense. Who cares how many chips are on the board? If Google announced tomorrow that TPUv3 is coming out, which is identical to TPUv2 but the four chips are glued together, nobody would care.
The questions that we should instead be asking are, how fast can I train my model and how much does it cost?
Per elsewhere in thread [3], on Volta you have 900gb/s per 100Tops/s = 0.9 bytes/s per op/s, whereas on TPUv2 you have 2400gb/s memory bandwidth over 180Tops/s = 1.33 bytes/s per op/s. This means that TPUv2's memory-bandwidth-to-compute ratio is 1.33/9 = 1.5x higher than Volta's.
We can do a similar comparison for memory available. V100 has 16gb per 100Tops, TPUv2 has 64gb per 180Tops. So the memory-to-compute ratio for Volta is 16g/100T = .16 milli while for TPUv2 it's 64g/180T = .36 milli, for a ratio of .36/.16 = 2.25x higher on TPUv2.
Does any of this matter? Does it translate into faster and/or cheaper training? Do models actually need and benefit from this additional memory and memory bandwidth?
My guess from working on GPUs is yes, at least insofar as bandwidth is concerned, but it's just a guess. I'm excited to find out for real.
(Disclaimer: I work at Google on XLA, and used to work on TPUs.)
I responded to your other comment to disagree, and I'll do so again here.
Nobody is comparing DGX1-V to a single TPUv2 chip, because it doesn't make any sense to do so. they are totally different kinds of machines. But for some reason everyone is comparing a cluster of 4 TPUv2 chips to a single V100 chip.
It only makes sense to compare 4xTPUv2 to 1xV100 if they are equivalent in some meaningful metric, like total die size, power, etc.
In lieu of any available data, I'm going to continue to assume that each TPUv2 chip is roughly comparable in terms of power & die size to each V100 chip. If this was grossly wrong, I would expect that all four would be condensed into a single chip, which would dramatically increase the performance of the interconnects.
We could resolve this rapidly if there were any data available about die size, TDP, anything of TPUv2.
> But for some reason everyone is comparing a cluster of 4 TPUv2 chips to a single V100 chip.
I agree that some people are doing that. Marketing, I suppose. But that comparison is explicitly not the point of my parent post. I'm comparing the "shapes" of the chips -- specifically, the compute/memory and compute/memory-bandwidth ratios. These ratios stay the same regardless of whether you multiply the chips by 4 or by 400.
The point I was trying to make is that V100 has a higher peak-compute-to-memory(-bandwidth) ratio than TPUv2. This much seems clear from the arithmetic. Whether this matters in practice, I don't know, but I think it is relevant if one believes (as I do, based on the evidence I have as an author of an ML compiler targeting the V100) that the V100 is starved for memory bandwidth.
> In lieu of any available data, I'm going to continue to assume that each TPUv2 chip is roughly comparable in terms of power & die size to each V100 chip. If this was grossly wrong, I would expect that all four would be condensed into a single chip, which would dramatically increase the performance of the interconnects.
I'm sure Google's hardware engineers operate under a lot of constraints that I'm not aware of; I'm not about to make assumptions. But more to the point, as we've said, things like die size and TDP don't directly affect consumers. The questions we have to ask are, how fast can you train your model, and at what cost?
Just as you don't like it when people (incorrectly, I agree) insist on comparing one V100 to four TPUs, because that's totally arbitrary (why not compare one V100 to 128 TPUs?), I don't like it when people insist on comparing TPUv2 to V100 on arbitrary metrics like die size, or peak flops/chip, or whatever. So I disagree that we could resolve anything if we had more info about the TPUv2 chip itself. None of that matters.
Well, if you ignore power consumptiom because ",it doesn't matter to the end user", you're talking about economic comparisons, not technical comparisons.
BTW, I absolutely agree that memory bandwidth is the bottleneck, I've built my company around that assertion and the data for that exists (Mitra's publications come to mind)
That's... a skewed ... comparison, NVLINK is a board to board connection whereas you're talking about TPU to TPU on board communication if I understand correctly?
That's sort of the point though! We're actually selling these as the "board". So the right way to compare things is sort of DGX-1 style "deep learning rig" versus a board of four TPU units (or several connected). The on-chip network is a big part of its overall efficiency.
It's not the point though. You're comparing whole board tpu FLOPs (4x 45) but then comparing tpu single chip chip2chip communication with nVidia board2board communication.
Yes, when training DNNs memory bandwidth is the only figure you need to look at. That's why the 1080Ti is by far and away the best bang for buck right now (ignore the EULA nonsense). It has about 55% of the memory b/w of the V100 for 10% of the price.
I know people don't know what to expect from tpu performance, but does anyone actually get 100tops out of Volta? I thought you'd have to spin the tensorcores and never touch memory, which is...not realistic.
I know you hedged by saying "back of the envelope", but I'd much rather compare on real benchmarks than based on cited peak performance numbers, which are kind of meaningless.
This is true of the TPU as well, check out their paper's utilization numbers. If you ignore one outlier at ~90% utilization, their utilization plummets. I'm glad people are finally looking past the b.s. "peak" numbers for once though.
Note that the post says “less than $200” not $200. There are lots of values between 0 and 200. What we’d love is for third-party folks like yourself to do the comparison (which I know you can, Max!)
I didn’t downvote you, but presumably people disagree with “Here’s an FPGA” as comparable to being given a working piece of hardware. That is, would you have said that the best comparison to a V100 is this same FPGA box?
I (and others) get what you were trying to say: TPUs are ASICs that aren’t general purpose at all, so an FPGA is a better comparison than a more general purpose GPU. As an end user, that just isn’t true though. If someone hands you an f1.16xlarge, you have to build your own psuedo-chip for machine learning. While with this offering, TensorFlow handles the acceleration / offload for you.
... for inference. I don't know of anyone who takes training on FPGAs seriously. They tend to get crushed by GPU/TPU/other ASIC in throughput, perf/watt, and perf/$.
This is exciting. There are lots of specific reasons to choose Google Cloud over AWS (and vice versa), but proprietary hardware is surely an advantage that is going to be hard to replicate / compete with. If TPUs hold up to the hype, GCloud may become the de facto for ML/AI startups.
Having had the chance to attend a fireside chat with leadership from Google and SAP, I get the sense that the hype is likely to hold up. There are a lot of big bets happening in the Enterprise space around this notion of efficient, easy to implement ML.
I don't know what qualifies as novel for you but some use cases I've seen:
On the retail side: Using computer vision to deliver alerts about shelf condition.
For farming: Using computer vision + ML to devise and track health monitoring for crops.
For manufacturing: Predictive maintenance of equipment has been a very popular area of focus.
There have been countless use cases on the finance side of things. For instance, anomaly detection techniques help with reconciling accounts and detecting fraud.
The energy industry seems to never run out of use cases for tracking commodities and/or helping predict load.
In HR, predicting turnover and education demands are some of the early use cases being approached but I expect a lot more over time.
Logistics is another area that will have a seemingly endless supply of use case. Things like loss tracking, warehouse optimization, raw material allocation and sourcing. I don't think I've ever been involved in a logistics/manufacturing project that couldn't have used some ML to add efficiency to the process.
I am curious if DL really can deliver good results in such spaces.
We all see success stories for very refined and well defined problems with huge amount of training data, with models created by 1% top engineers, but for average business such conditions may not be achievable, to train model to recognize various shelf conditions in different situations, buildings, etc. you need nontrivial set of training data, and will have unclear expectations about model performance.
Most businesses will probably not develop and train their own systems, but rather implemented solutions developed by the folks with the expertise and training data.
From a media standpoint: frontline comment moderation. It would take a lot of the legwork out of filtering for advertisements, uncivil discussion, attacks, off topic posts, and trolling.
I believe NYT does this already, but using minimal oversight to prevent any edge case misses or false positives.
Presently there’s not much in the way of suitable options for large media that build their modules in house. At the same time media tends to prefer to not invest too heavily in hardware if they don’t have to. Convincing leadership of using a cloud service to train an AI/ML model sounds leaner and lets them tick off even more buzzwords for the executive, etc. That said, results from efforts in the aforementioned application sound promising.
Thanks! Coming from a company isn't currently implementing anything like this (you'll find many do not as of yet), it would help a great deal to improve the quality of the content which is an obvious precursor to ad impressions and subscriptions— especially for media companies who do not introduce [hard/any] paywalls.
>If TPUs hold up to the hype, GCloud may become the de facto for ML/AI startups.
Don't startups want to win a big exit though? Google won't need to buy the startup for billions, because the TOS already grants them permission to use all the models and training data for free. Seems like a Faustian bargain to me.
Cloud TPU product manager here. As I said in another thread:
The TOS you are quoting only refers to the information you provide in the survey. Here are the Google Cloud TOS: https://cloud.google.com/terms/ if you're interested in what Cloud does with customers data.
5.2 Use of Customer Data. Google will not access or use Customer Data, except as necessary to provide the Services to Customer.
It's worth noting that Google Cloud has its own terms of service that is very different from what you may be thinking of: https://cloud.google.com/terms/
Regardless of the TOS saying that or not (I haven't read them), I can think of at least two reasons why your statement doesn't hold:
1) AI startups usually don't have a lot of value to potential acquirers based on their data, but based on other things (e.g., talent, customers, business model, platform, brand). That's like saying you shouldn't use AWS because Amazon can just steal and commercialize all your data.
2) There are other companies than Google that acquire startups
Having said that, I highly doubt that Google can just use all the training data to on GCloud to launch their own products with that. They can surely look at it and maybe do stuff with them internally, but I am pretty sure that they can't use them commercially.
>They can surely look at it and maybe do stuff with them internally, but I am pretty sure that they can't use them commercially.
How would you ever know if they did? People who worked at Google have been accused, by Google, of stealing the entire self driving car program and taking it to a competitor.
It's also vastly different. Of course someone working at at google on a project has access to that project. It doesn't mean they have access to your stuff.
First of all, that's wrong (as another comment pointed out). Of course, the probability of them stealing your stuff is non-zero, but it's very rare. Even if you use all your own hard and software, people still can steal your stuff :-)
I can be hacked by malware which can leak secrets from air gapped, Faraday caged machines. Therefore, I should put my billion dollar idea on the public cloud and just trust Google.
Because you do not stay in business if you operate in such a manner. Plus it is not good from an employee standpoint in retaining. Most people prefer to conduct themselves in an ethical manner.
Hard to get employees to not steal from you if you are stealing from your customers.
Interestingly, GCP now appears to be available to individuals in Europe. It wasn't like that before, no idea when that policy got changed. Before, GCP wasn't even a consideration compared to AWS (which always handled that).
"You can’t change the tax status of your Google Cloud Platform billing account."
I think this is what tripped me up before. I closed my business years ago but it was completely impossible to get Google to fix this. Now it fixed it "by itself".
Just a warning to everyone before signing up with your main Google account :-)
A "single TPU" is 4 ASICs. It is not clear if it makes sense to compare a "single TPU" to a "single GPU."
As a point of reference, NVIDIA's numbers are 6 hours for Resnet-50 on Imagenet when training with 8xV100. From a naive extrapolation, 4xV100 would probably take ~12 hours and 1xV100 about two days.
Google has previously only compared TPUs to K80, so it will be interesting to see some benchmarks that compare TPUs to more recent GPUs. K80 was released in 2014, and the Kepler architecture was introduced in 2012.
> A "single TPU" is 4 ASICs. It is not clear if it makes sense to compare a "single TPU" to a "single GPU."
Why does the number of chips matter?
Put another way, suppose Google tomorrow announced Cloud TPU v3 which was one ASIC identical in all ways to four v2 ASICs glued together. Would that be notable in any way? Seems like it would be a nop to me.
I think what matters is, how fast can you train a model, and at what cost? Doesn't really matter if it's one chip or 10,000 behind the scenes.
It doesn't matter in the ways you are considering. The ultimate comparisons are going to be time, cost, and power to complete some benchmark, just as you say.
I only mention the number of chips because loads of people are comparing the "single TPU" to a single V100 with the assumption that it is meaningful. I don't know the TDP, die size, etc. of the TPUv2 chip, so it may well make more sense for ballpark comparisons to compare "single TPU" to 4xV100.
For example, a "single TPU" has 64 GB of memory, whereas a "single GPU" has 16 GB (V100). Is this meaningful? I don't know.
It just seems like something worth noting. I could buy a DGX1-V with 8xV100, rebrand it as the TWTW TPU, and then go around and tell everyone how my TPU is 8x faster than GPUs. It appears that everyone is normalizing by marketing unit until benchmarks come out, which is potentially flawed.
It matters when defining parallel work distribution. Unless memory bandwidth is homogeneous across the whole board (i.e. each TPU on a board gets 600 GB/s to its peers), we can't do model parallelism across ASICs efficiently, and must fall back to data parallelism. Which is fine, until you run into limits on maximum batchsize (e.g. up to 8192, as FAIR was able to manage [1] with some tweaks to SGD).
The comparison was the first generation TPUs not the second generation which is what these are.
But ultimately it comes down to the cost to complete some amount of work. Google also offers Nvidia GPUs in their cloud for training and should be able to compare the cost of using one over the other as both are supported by TF.
That is the ultimate guide on how good or not good the TPUs really are.
Google claims[0] the TPU is many times faster for the workloads they've designed it for.
> On our production AI workloads that utilize neural network inference, the TPU is 15x to 30x faster than contemporary GPUs and CPUs.
As far as I know this will be the first opportunity for the public to prove those claims, as until now they've not been available on GCP. I don't mean to sound skeptical–I'm quite confident they're not exaggerating.
Keep in mind that what you linked refers to TPUv1, which is built for quantized 8-bit inference. The TPUv2, which was announced in this blog post, is for general purpose training and uses 32-bit weights, activations, and gradients.
It will have very different performance characteristics.
The reserve TPU button has been available on the dashboard for the last few months. But I assume instances have been prioritized for large customers such as Two Sigma.
From the paper:
"Despite low utilization for some applications, the TPU is on average about 15X - 30X faster than its contemporary GPU or CPU, with TOPS/Watt about 30X - 80X higher. Moreover, using the GPU's GDDR5 memory in the TPU would triple achieved TOPS and raise TOPS/Watt to nearly 70X the GPU and 200X the CPU."
In-Datacenter Performance Analysis of a Tensor Processing Unit
It will be interesting to see some benchmarks that compare TPUs to V100, since all previously published comparisons from Google compare TPU to K80 (3 GPU architectures ago).
They're closely related though, since if the perf per watt is lower then Google can charge you less doller per perf. The price they charge you is ultimately tied to the operating cost.
I would imagine that (by design) they're not directly comparable.
I suspect that we'll see more information about the ASICs over time, but it'll take time to really understand their characteristics vs a Nvidia GPU - which are at least right now a bit better understood.
It is. TPUs perform calculations on weights using low-precision floating point and integer types. This saves a ton of computation, but doesn't matter much for training models.
GPUs are much more complex (general-purpose) and therefore cannot be optimized beyond a certain point due to timing requirements and PVT (process, temperature, voltage) variations. In other words, the more stuff you have on an ASIC, the more careful you have to be ensure a margin of tolerance for variations.
So, a way to think of this is: The speed (and therefore, cost) of training a machine learning model depends on (a) the ML techniques (how rapidly the model converges and to what accuracy); and (b) how quickly the processor executes the operations involved in the ML techniques.
The TPU is only an improvement in (b). It's not going to result in a big-O style speedup, because the same training algorithms and architectures will run on it that we run on CPUs & GPUs today.
I'm not sure what counts as "breaking new ground" - is that 10%? 100%? 1000? :-) The things to watch out for in benchmarks will be:
(a) Perf/$. This is actually a big deal - one of my students recently blew through $5000 of Google Cloud credits running Imagenet experiments, in a week. And we didn't finish them! As this cost really drops, it enables things like Neural Architecture Search, which uses tons of compute capability to explore architectural variants automatically.
(b) Absolute perf.
(c) Performance scaling. To what degree will the fast, 2D torroidal mesh allow a full pod of Cloud TPUs to scale nearly-linearly? Absolute training times matter from a user productivity standpoint. Waiting 30 minutes for a result is very different from waiting 12 hours (you can do one of these while you sneak out to go running! :-).
Is this just go-faster-juice for Tensorflow code or does it have other implications? If you train on TPUs can you still run the model efficiently elsewhere?
Very low. A lot of the performance on GPUs comes from Nvidia's optimizations in CuDNN -- it's mostly a matter of making sure TensorFlow feeds the right formats/etc. to CuDNN for core NN ops. TF should run well on CPUs, GPUs, TPUs, and likely future embedded accelerators (via tensorflow lite, which already supports the Android Neural Networks API).
(I'm part time on Brain, but, of course, this isn't some kind of Official Statement(tm)).
TF funds one of my teams explicitly just to optimize CPUs and GPUs.
Every discussion i've had with them tells me they care about making customers succeed, period.
I don't like it. Google is mixing too many things. No way to buy a TPU. No competition from other cloud providers. Proprietary hardware and vendor lock-in.
Presumably, there's a whole server behind that address that has all the right drivers and libraries: details you don't need to care about.
The only partial lock-in is that not all ops are supported and you need to figure if there are any parts of the graph in the critical part that will run on the CPU instead. There's a tool for that:
Amazon is reportedly looking into building their own. Nvidia not only added Tensor cores to the Volta series (impressively quickly, might I add), but they're also creating the NVidia GPU Cloud. Intel has been acquiring DNN hardware startups left and right (Nervana, Movidius, MobilEye) and trying to roll those into their production series.
The hardest part in DNNs is the model and data. That's basically platform-independent. My students mix and match TensorFlow and Caffe, for example, on several different models.
The next part is getting the model implemented in a framework (TensorFlow? Caffe? MXNet? PyTorch?). That's work to change, particularly if you're in a production environment. But it's not the same amount of work as collecting data and building a model.
The final part is running training - CPUs, GPUs, TPUs, etc. This is really fungible. The platform-specific optimizations are relatively small here.
Looking at it from a customer perspective:
- Can a trained model be exported (weights included) for use on another platform? (yes)
- Can the code written for training be used on the customer's own hardware? (yes, absent any small tweaks needed for TPU, but they're *small*).
- Might the customer not want to leave because of ease-of-use, particularly at scale, or performance, or total cost of ownership? (yes, and I think that's what the sales pitch is).
(disclaimer: I worked on part of the Cloud TPU stuff. I'm funded academically by Intel. I have friends at NVidia and own a lot of their GPUs. I love everyone. :)
Why are proprietary drivers a blocker? As long as you expose the same gRPC interface, your customers don't need to know what happens behind the scenes. You could have an FPGA or a Beowulf cluster of Raspberry Pis hiding.
Reading the TOS it seems like this is a really great deal for Google:
"When you upload, submit, store, send or receive content to or through our Services, you give Google (and those we work with) a worldwide license to use, host, store, reproduce, modify, create derivative works (such as those resulting from translations, adaptations or other changes we make so that your content works better with our Services), communicate, publish, publicly perform, publicly display and distribute such content. The rights you grant in this license are for the limited purpose of operating, promoting, and improving our Services, and to develop new ones."
All your training data are belong to us.
We can use your models to improve ours.
The terms will prevent me from using it. I can't grant Google permission to redistribute HIPAA PHI.
The TOS you are quoting only refers to the information you provide in the survey. Here are the Google Cloud TOS: https://cloud.google.com/terms/ if you're interested in what Cloud does with customers data.
5.2 Use of Customer Data. Google will not access or use Customer Data, except as necessary to provide the Services to Customer.
This URL isn't on the TPU beta signup page. The Google TOS is. Perhaps you can see the confusion? I would be reluctant to trust random 37 karma guy on Hacker News message board on this particularly important consideration.
I want to highlight this paragraph from the post:
> Here at Google Cloud, we want to provide customers with the best cloud for every ML workload and will offer a variety of high-performance CPUs (including Intel Skylake) and GPUs (including NVIDIA’s Tesla V100) alongside Cloud TPUs.
We fundamentally want Google Cloud to be the best place to do computing. That includes AI/ML and so you’ll see us both invest in our own hardware, as well as provide the latest CPUs, GPUs, and so on. Don’t take this announcement as “Google is going to start excluding GPUs”, but rather that we’re adding an option that we’ve found internally to be an excellent balance of time-to-trained-model and cost. We’re still happily buying GPUs to offer to our Cloud customers, and as I said elsewhere the V100 is a great chip. All of this competition in hardware is great for folks who want to see ML progress in the years to come.