Linode GPU Instances

Who_me · on June 20, 2019

Hey peeps full disclosure I work as one of Linode's RnD engineers. I want to try to get to as many of these as I can.

One of the biggest questions is why the Quadro RTX 6000? Few things:

1. Cost it has the same performance as the 8000. The difference is 8 more GB of RAM that comes at a steep premium. Cost is important to us as it allows us to be at a more affordable price point.

2. We have all heard or used the Tesla V100, and it's a great card. The biggest issue is that it's expensive. So one of the things that caught our eye is the RTX 6000 has a fast Single-Precision Performance, Tensor Performance, and INT8 performance. Plus the Quadro RTX supports INT4. https://www.nvidia.com/content/dam/en-zz/Solutions/design-vi... https://images.nvidia.com/content/technologies/volta/pdf/tes... Yes, these are manufactures numbers, but it caused us pause. As always, your mileage may vary.

3. RT cores. This is the first time (TMK) that a cloud provider is bringing RT cores into the market. There are many use cases for RT that have yet to be explored. What will we come up with as a community?!

Now with all that being said, there is a downside, FP64 aka double precision. The Tesla V100 does this very well, whereas the Quadro RTX 6000 does poorly in comparison. We think although those workloads are important, the goal was to find a solution that fits a vast majority of the use cases.

So is the marketing true to get the most out of MI/AI/Etc? Do you need a Tesla to get the best performance? Or is the Tesla starting to show its age? Give the cards a try I think you'll find these new RTX Quadros with Turning architecture are not the same as the Quadros of the past.

jamesblonde · on June 20, 2019

If you really want low cost to compute for Deep Learning and you needs lots of compute and don't want to pay for V100s, then the AMD Vega R7 is the card for you. 700 dollars, 16GB Ram, 1TB of GPU bandwidth (higher than the V100!), works with Tensorflow (pip install tensorflow-rocm), and about 60% of the performance on resnet-50.FP64 is not fully gimped (it is halved, i think - so still quite good). Put lots of them in servers with PCI 4.0, and you can do great things. Here's a recent talk on it:

https://www.youtube.com/watch?v=neb1C6JlEXc

danieldk · on June 20, 2019

If you really want low cost to compute for Deep Learning and you needs lots of compute and don't want to pay for V100s, then the AMD Vega R7 is the card for you. 700 dollars, 16GB Ram, 1TB of GPU bandwidth (higher than the V100!), works with Tensorflow (pip install tensorflow-rocm), and about 60% of the performance on resnet-50.FP64 is not fully gimped (it is halved, i think - so still quite good).

Two of my colleagues use high-end AMD GPUs to train RNNs and transformers with tensorflow-rocm. There are still some nasty bugs (e.g. [1]), so it is currently not for everyone. However, given how far they have come compared to 1-2 years ago, it is very likely that in a year or so, they are a real competitor to NVIDIA for compute. That competition was long needed.

[1] https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/...

jamesblonde · on June 20, 2019

Agreed, it is not quite prime-time yet. They are trying to upstream all the ROCm stuff in TensorFlow, and when it gets into mainline and stabilizes, i agree that it has great potential for take-off - particularly from price-sensitive researchers and large companies who need huge GPU farms.

ksec · on June 20, 2019

Two Questions.

I wonder if Google is in any way helping AMD in the TensorFlow and ROCm?

What happen when Intel join the GPU race in 2020. Making their own ROCm again?

chadmeister · on June 20, 2019

This is a terrible suggestion/comparison. AMD has nowhere near the software support in the ML/AI space that Nvidia has. I wish that AMD would invest in a CUDA competitor and break Nvidia's monopoly, but that is not even close to being a reality, unfortunately.

trsohmers · on June 20, 2019

> The difference is 8 more GB of RAM that comes at a steep premium

This is incorrect. The RTX 6000 has 24GB of VRAM and is $4000, and the RTX 8000 has 48GB of VRAM (double the amount) and is $5500. Is it worth the price increase? For a lot of people I know it is.

Also, the RTX Titan is $2500 and is identical to the RTX 6000 (at the chip level) and also with 24GB of VRAM, with the only difference being software enabling of additional H.264/5 encoding features on the Quadro. Definitely not worth the cost increase, especially for anyone doing ML.

IlGrigiore · on June 20, 2019

If you reason as a consumer the RTX Titan makes a lot more sense than the RTX 6000, however datacenters are forbidden by Nvidia to use consumer cards [1], therefore their choice makes sense.

[1]: http://fortune.com/2018/01/07/nvidia-consumer-video-cards/

trsohmers · on June 20, 2019

Except datacenter is not defined by NVIDIA in their EULA at all, and plenty of large and small datacenters continue to use "consumer cards" regardless of NVIDIA's fear mongering. I know that Tesla, OpenAI, Microsoft, Apple, and many others all continue to primarily buy primarily 2080Ti's, RTX Titans, and Titan V's since the EULA change.

PedroBatista · on June 20, 2019

How is that even legal and how nvidia gets away with that type of shit?

_jyog · on June 20, 2019

Companies make unenforceable claims all the time. That's why we've got courts. Theyr'e almost certainly never going to take any one to court, because if they did, it would get tossed out. They can't pull the same "it's a license to a product" bs media services do. Though they still try with the driver. I think for now, they've just run the numbers and figured out it gives them slightly higher datacenter card sales.

tntn · on June 20, 2019

> This is the first time (TMK) that a cloud provider is bringing RT cores into the market.

Your knowledge is incomplete. T4 has been available in google cloud for many months.

Who_me · on June 20, 2019

I stand corrected thank you!

sieabahlpark · on June 20, 2019

Has linode improved their security intrusion and disclosure policy yet?

These are great improvements but are virtually worthless if linode didn't change their behavior.

tomxor · on June 20, 2019

What incident are you referring to? (genuine question)

As far as standards go, we use Linode and all of our customers (some of them quite demanding about internal security details) have been satisfied with the various acronyms they are accredited with... Although I understand this does not necessarily guarantee anything about response behavior, so interested to hear about past incidents.

dillonmckay · on June 20, 2019

There were some compromised accounts via a Coldfusion hack of their admin portal.

Not sure if that was isolated.

There was something more recent, too.

Anyway, happy Linode customer for quite a few years now. My stuff works, no fuss.

_jyog · on June 20, 2019

Any chance you can provide more information? Linode customer as well; slightly concerned.

dillonmckay · on June 20, 2019

Google ‘linode coldfusion’. I think it was over 5 years ago.

tkulick · on June 20, 2019

(Tory from the Linode team here)

We made some improvements to our disclosure / Bug Bounty program last year and launched this on HackerOne. The community and quality of submissions has been great. More information: https://blog.linode.com/2018/05/16/linodes-new-bug-bounty-pr...

We've also been making ongoing improvements to our application security and security infrastructure through the implementation of a DevSecOps culture. This is something we take very seriously.

picozeta · on June 20, 2019

I would go with Hetzner: https://www.hetzner.com/dedicated-rootserver/ex51-ssd-gpu

GTX1080 for 100$ a month. Grantend, it is older, but it still works for DL. Let's say you do 10 experiments a month for ~20 hours. Thats 0.5$/hour and I don't think it is 3 times faster.

If you then want to do even more learning the price goes even down.

//DISCLAIMER: I do not work for them, but used it for DL in the past and it was for sure cheaper than GCP or AWS. If you have to do lots of experiments (>year) go with your own hardware, but do not underestimate the convenience of >100MByte/s if you download many big training sets.

mokus · on June 20, 2019

For traditional floating point workloads, the RTX 6000 will probably not be 3x faster. For workloads that can use the tensor ops (integer matrix multiply, basically), the RTX 6000 may be as much as 10-100x faster.

tarasmatsyk · on June 20, 2019

Agree, had exactly the same experience.

It is not a server card, however, it is much faster than any old AWS instances for 1k$/m (if you happen to be an AWS user and did not want to upgrade because of the price going up 3x) TBH, 100 bucks per month is free, while most of the researches do not have 1k$/m for a server, it is cheaper to buy hardware and put Linux on it.

There are of course other options and Linode is kinda late to the party, but I am happy they made this move.

ksec · on June 20, 2019

>There are of course other options and Linode is kinda late to the party, but I am happy they made this move.

Considering their main competitor, DO, Vultr, UpCloud, none of the them offers any GPU instances, I don't think they are late at all. If not the first for their market segment.

listic · on June 20, 2019

vast.ai is even cheaper https://vast.ai/console/create/

svd4anything · on June 20, 2019

How does data in/out work in practice with them? I see this 4 Tbit bandwidth but do you happen to know what that translates to and what happens if you exceed that?

Also check availability shows a 5 day wait current: “EX51-SSD-GPU for Falkenstein (FSN1): Due to very high demand for these server models, its current setup time is approximately up to 5 workdays.*” Or maybe there are other regions/dcs.

indalo · on June 20, 2019

I have like 18 of their auction servers that are unmetered at 1gbps and really make that bandwidth sweat. I've never had issues honestly, and they've never tried to dreamhost me. I love it.

picozeta · on June 20, 2019

So far I did not reach that limit (I used it to train networks for image segmentation) so I had mostly ingress and only downloaded large amounts to the machine not from it (which is free like with most providers).

But you can just ask them.

I have to say that not everything was 100% smooth - sometimes the proprietary NVidia driver crashed (you have to use the right CUDA and driver combination) my Linux instance and hanged the system, so I had to hard-reboot it (which is supported via their admin console) which takes some minutes. However that's not their fault as I heard the driver is a big pile of crap shit anyway because NVidia is too embarrassed to post it to LKML.

ksec · on June 20, 2019

I thought you are not allowed to put Consumer Graphics Card in Datacenter?

Or is that prohibited in US only?

Crosseye_Jack · on June 20, 2019

Tech speaking it’s nVidia’s GeForce driver that restricts datacenter usage not the card itself.

Not deep dived into it but maybe using nouveau instead of GeForce works around that restriction.

You are allowed to use the driver in data centres for cryptocurrency usage. The EULA limited datacenter usage hasn’t really been challenged in court yet. Both sides would have an argument. NVidia are using the Eula to limit an activity that a user would be allowed to do if the location that activity was different (and not even talking type of industry here, though that’s prob in the Eula too) On the other hand, it’s nVidia’s software, they are free to license it how they like.

sannee · on June 20, 2019

You can't use nouveau for CUDA, which sort of negates the whole point of having a GPU in a server in the first place.

Crosseye_Jack · on June 20, 2019

I've not deep dived into nouveau for a while so wasn't sure if they added cuda support in the past couple of years since I played with it which is why I only said "maybe".

PedroBatista · on June 20, 2019

Why?! NVIDIA is a police and a sovereign state now?

krick · on June 20, 2019

It's a flat fee of $100/month, correct? What would be the best option if the amount of training you do is rather "occasional" (but simply using colab doesn't cut it anymore)?

tootahe45 · on June 20, 2019

I don't have specific experience with ML, but AWS spot pricing is by far the best deal last time i checked for GPU. You can get something much more powerful than a gtx1080 and get your task done more quickly. The downside is that at any time your instance can be shut down after a short warning signal to backup your progress, so it may or may not be suitable for what you're doing.

krick · on June 20, 2019

Does the price actually depend on whether you are using a GPU or simply an instance you choose? Let's say you need to do some work that will require a GPU, so you spend 5 hours setting up an environment, doing some light programming/experiments in an Jupyter notebook, downloading datasets, looking at the data. Then you train for an hour then one more hour looking at the data, drinking coffee, stuff like that. Then train again.

So you were using the environment for 10 hour, but only 3 of them in total were using GPU. Will you pay for 10 hours of GPU usage, or will only 3h be expensive and 7h cheap?

cstejerean · on June 20, 2019

If you use a GPU instance you pay the cost for it whether not you use the actual GPU. If the GPU time is short relative to the other stuff you are doing (like data cleanup) it might make sense to do your non-GPU related setup on a different instance first.

icelancer · on June 20, 2019

I have one of these instances. It's awesome and I recommend it highly.

m0zg · on June 20, 2019

Still way too much money when a 2x 2080Ti comparably specced machine under my desk costs less than 2.5 months of their billing rate, and 4x 1080Ti servers in my garage cost about 1 month of their 4-GPU machine _and_ have more SSD storage. This pricing is totally insane, especially if not billed per-minute (which in Linode's case it is not) and if there are no cheaper preemptible/spot instances.

svd4anything · on June 20, 2019

I’m starting to think one can adopt the simple rule of switch to a DIY build whenever there enough work to keep a GPU busy for 2 months, otherwise if the workload is intermittent then better strategy is leasing, especially considering the purchase cost/performance is constantly dropping.

dillonmckay · on June 20, 2019

It would seem there could be a market for multi-month rentals of a ML/gaming rig, with return shipping/packaging?

trey-jones · on June 20, 2019

What's the cost for power? Serious question, and I'm not suggesting that this cost should account for a large percentage of the price, but genuinely curious. If your GPUs are working every hour of the month for you, how much is it costing you in electricity?

m0zg · on June 20, 2019

Quad GPU machines draw about 1.3KW each on average when under load. That's about $100/mo where I live, assuming they're 100% loaded 24x7 for the entire month. Realistically it's less. So it's not free, but it's not a crazy amount either.

ilaksh · on June 20, 2019

Looks amazing. Linode has worked really well for me over the years.

One thing I noticed when recently trying to get a GPU cloud instance, the high core counts are usually locked until you put in a quota increase. Then sometimes they want to call you.

So I wonder if Linode will have to do that or if they can figure out another way to handle it that would be more convenient.

I also wonder if Linode could somehow get Windows on these? I know they generally don't do anything other than Linux though. My graphics project where I am trying to run several hundred ZX Spectrum libretro cores on one screen only runs on Windows.

keytarsolo · on June 19, 2019

That pricing isn't too bad. They come with decent SSD storage too, which is key for the large datasets that make a GPU instance worthwhile.

Linode skews more towards smaller scale customers with many of their offerings so I think the GPUs here make sense. The real test will be how often they upgrade them and what they upgrade them too.

hmart · on June 20, 2019

I love Linode support. There are cheaper places but I have my Key VPSs there.

dkobran · on June 20, 2019

Interesting to see another cloud provider go with Quadro chips. NVIDIA repackages the same silicon under several different brands (GeForce, Quadro, GRID, Tesla) and we (https://paperspace.com) have found Quadro to offer the best price/performance value. Despite minor performance characteristics, such as FP16 support in the Tesla family, Quadros can run all of the same workloads eg graphics, HPC, Deep Learning etc. If you’re interested in a similar instance for less $/hr, check out the Paperspace P6000.

minimaxir · on June 19, 2019

Huh. Given that cheap cloud GPUs are nowadays sought for training AI, launching with a Workstation-oriented GPU is an odd product decision.

ksec · on June 19, 2019

Are there any difference? Seems to support CUDA as well, I don't see anything wrong with it.

Also seems to be a lot cheaper than AWS counterpart.

minimaxir · on June 19, 2019

It seems like a Quadro RTX 6000 is comparable performance-wise to a P100 (https://askgeek.io/en/gpus/vs/NVIDIA_Quadro-RTX-6000-vs-NVID...), although there are no tests for the ML angle.

A P100 on GCP is $1.46/hr alone, so maybe Linode is a good deal if the performance is indeed comparable.

elabajaba · on June 20, 2019

RTX 6000 is significantly faster than a P100 outside of FP64, and is the fastest or 2nd fastest GPU outside of FP64 work [1] (the GV100 is sometimes faster, sometimes slower than the RTX 6000 but costs more). For FP64, GV100 based GPUS are quite a bit faster than P100s.

Also, you should really ignore pretty much all of the comparison sites that show up when you search for computer component comparisons as they're nearly all awful. The one you posted doesn't show a single benchmark comparison between them, and compares numbers like clock speed which isn't comparable between architectures, or memory clock speed instead of memory bandwidth leading to the laughable conclusion that the RTX 6000 has "9.9x more memory clock speed: 14000 MHz vs 1408 MHz" vs the P100 when the P100 uses HBM2 and has 732.2 GB/s vs 672.0 GB/s of actual memory bandwidth.

[1] https://www.reddit.com/r/hardware/comments/9xx6cz/nvidia_qua...

svd4anything · on June 20, 2019

https://www.techpowerup.com/gpu-specs/quadro-rtx-6000.c3307

Has this section:

Theoretical Performance Pixel Rate 169.9 GPixel/s Texture Rate 509.8 GTexel/s FP16 (half) performance 32.62 TFLOPS (2:1) FP32 (float) performance 16.31 TFLOPS FP64 (double) performance 509.8 GFLOPS (1:32)

Perhaps for at least rough performance comparisons it is a good start.

My impression the current favorite card by DIY types is an RTX 2080 Ti.

https://www.techpowerup.com/gpu-specs/geforce-rtx-2080-ti.c3...

However I think nvidia will take legal action against any service provider trying to use those in a cloud or server offering.

thenightcrawler · on June 19, 2019

Isn't AWS cheaper?

edit: could be wrong thought I read of AWS being .65 dollars an hour for deep learning GPU use. edit2: Did a quick look, the .65 dollars doesn't include the actual instance, so its around 1.8 an hour on the low end, I think this cheaper.

perennate · on June 20, 2019

p2.xlarge comes with an NVIDIA Tesla K80 GPU for $0.90/hr, but this is now an "old" GPU and the RTX Quadro 6000 should have much higher performance (but I was unable to find any machine learning benchmarks).

p3.2xlarge has NVIDIA Tesla V100 GPU which is NVIDIA's most recent deep learning GPU, but it's $3.06/hr.

That said, AWS is among the most expensive providers if you just need a deep learning GPU (but obviously AWS offers a lot of other useful things). For example, OVH Public Cloud has Tesla V100 for $2.66/hr. And comparable NVIDIA GPUs that are not "datacenter-grade" should be even cheaper; AWS, GCP, Azure, etc. are unable to offer them because of contracts when they buy e.g. the Tesla V100.

zachruss92 · on June 20, 2019

The K80s are super outdated now. Google used to offer them for free for 12hrs a day on their Colab platform, but they upgraded them to using the Tesla T4s. Note you can get a K80 on GCP (unreserved) for $.45/hr.

This is a good deal.

tarasmatsyk · on June 20, 2019

K80 is the crappiest cards I've used last year. For sure it was a good choice 2 years ago, but now you are better to upgrade as any new desktop card is better than K80

tarasmatsyk · on June 20, 2019

It depends, for full-time usage, it is a bit more expensive, I think it is a matter of a few hundred, probably less. We've happily migrated from AWS as only one GPU instance cost us near 1k/m. BTW, the newest and the only available GPU instances now should be better RTX6000 even being more expensive.

coherentpony · on June 19, 2019

Does anybody know if there are any cloud instances with AMD GPUs?

jamesblonde · on June 20, 2019

GPUEater do, i think. Right now, though, they are a viable option for an on-premise use case where you have a budget of say a $100k dollars or more and need a huge amount of compute and have larger models to train. The Vega R7 gives you 16GB Ram (11GB in the 2080Ti) and is just slightly lower performance than the 2080Ti (322 vs 302 images/sec for resnet-50 from here: https://www.youtube.com/watch?v=neb1C6JlEXc ). And you have servers with PCI-4.0 support, so that distributed training scales (yes, 2080-Ti supports nvlink, but nvlink servers cost way more). Simple math example. A PCI-4.0 server with 256GB Ram and 8xVegaR7 should cost around $10K. With a couple of switches and racks, you can get 100s of GPUs for just a couple of hundred thousand dollars (note, only 2 GPU servers per rack for now is normal, otherwise you have to buy non-commodity racks with high power draw).

azinman2 · on June 20, 2019

What would you want it for over nvidia in the cloud?

noir_lord · on June 20, 2019

Testing if nothing else, I'd you are shipping to be people running AMD GPUs then that would be useful without having to buy a card and another machine for it to go in

zonidjan · on June 20, 2019

Oh, hey! It only adds 1/5th of the GPU's purchase price.

MuffinFlavored · on June 19, 2019

Can these be used for crypto mining at any level of efficiency? I was able to mine GRLC back in the day on AWS spot instances at a VERY mild degree of profitability.

elabajaba · on June 20, 2019

Doubtful, since these are just fully unlocked TU102 GPUs (same as the Titan RTX, 2080ti is the same TU102 GPU but partially locked at 4352 cores vs 4608 for the Quadro RTX 6000/8000 and Titan RTX). If you could be profitable with this at $1000/month then people would be flocking out to buy 2080tis for $1100 and getting 90-95% of the hashrate.

tootahe45 · on June 20, 2019

They wouldn't be available if they were profitable for that. Providers usually make you do extra verification to use these instances because people were at a time using them for that, not because it was profitable, but because they used stolen cloud accounts/cards.

walrus01 · on June 20, 2019

not really, most cryptocurrency is at the stage where the only thing effective is a combination of custom ASICs and nearly free electricity. About twelve months ago I looked into mining ethereum with state of the art GPUs and it would not have had a reasonable ROI unless I was literally paying $0.00 per kWh. And that was before its value per coin dropped a lot.

kohtatsu · on June 20, 2019

Let's replace baseboard heaters with cryptominers.

latchkey · on June 20, 2019

When the value dropped, the network hashrate dropped and difficulty went down so things actually became profitable again.

The best time to mine is during the drops, not the highs, unless you follow buy high, sell low and don't believe the market will correct for the better again (which it has).

Of course, it depends on electricity prices, but it is profitable to mine ethereum, especially if you know how to tune the cards to maximize hash/consumption.

That said, mining is competitive and difficult and unless you are going to go really large, don't bother. If you are interested in learning about it, definitely experiment though don't expect to make a lot of money.

bufferoverflow · on June 19, 2019

Can you rent them by minute?

perennate · on June 19, 2019

Linode bills per-hour: https://www.linode.com/docs/platform/billing-and-support/bil...

ThomWilhelm3 · on June 20, 2019

Will this profitably mine bitcoins? :p

beatgammit · on June 20, 2019