AMD Lists Threadripper 3000 CPU with 32 Cores, Possible New Socket

vbezhenar · on Oct 17, 2019

I wonder what's the good core count for development workstation. While it's obvious that increase from 4 cores to 12 cores is useful, for me it's not obvious that increase from 12 cores to 32 cores is similarly useful.

Sure, if I'm compiling huge project from the scratch, every core will be loaded, but that's more suitable for CI machine.

For everyday development people usually use incremental compilation and in my experience it's few files. So for me something like 12 cores looks like a golden balance between single core performance and multi core performance (because usually those high-core processors come with low frequency).

adrian_b · on Oct 17, 2019

A few years ago, when Intel had no competition, going for more than 6 cores meant that you had to sacrifice single-thread performance.

Now, if you want more than 8 cores, single-thread performance is not a reason to use less than 16 cores (AMD @ 4.7 GHz) or 18 cores (Intel @ 4.8 GHz).

You may want to use less cores for cost reasons, but not for performance reasons.

We do not know yet the frequencies for the 24-core Threadripper to be launched next month.

If they would succeed to have a 4.8 GHz single-core turbo, then ST performance would not be the reason to use less than 24 cores.

vbezhenar · on Oct 17, 2019

I can get 5.2 GHz from Intel using 8 cores. That's like 10% extra performance.

chris11 · on Oct 17, 2019

I'd compare CPUs using benchmarks. I don't think clock speed is as good as a measurement, especially between AMD and Intel.

vbezhenar · on Oct 17, 2019

In this case I'm comparing Intel with Intel. AMD did not release Threadripper numbers yet.

phaedrus · on Oct 17, 2019

I have codebases which scale almost linearly with cores. I built a 56-thread Xeon machine. One of my projects compiles in two minutes with single threaded make and two seconds with all 56 threads going.

diroussel · on Oct 17, 2019

Interesting. What language and build tool is this with?

rwmj · on Oct 18, 2019

Not the OP, but we have big projects written in C containing hundreds or thousands of source files and builds scale almost linearly, at least up to 96 cores (I tested on ARM servers).

You have to be careful about how you write Makefiles - not using Makefiles in subdirectories, but writing non-recursive Makefiles. And you have to sometimes divide up large source files.

It's usually a good idea to do a test build with make -j1 and pipe the output through moreutils “ts” utility, then sort by which operation takes longest and try to break it up / parallelize it.

dikei · on Oct 17, 2019

An example is when you are developing/testing distributed systems. Many tests which used to require a cluster to run can now be run on a single machine, which helps a lot with debugging.

gpderetta · on Oct 17, 2019

Test suite runs can take a long time. And sometime you touch that header which is, indirectly, included everywhere and trigger a global rebuild.

Still, a distributed build and test system would be superior anyway.

jstrong · on Oct 17, 2019

Personally I am very excited for a couple years from now when prices on used dual socket epyc servers come down because I want a workstation with 128 physical cores. It will be glorious.

rbanffy · on Oct 18, 2019

> good core count for development workstation

As many as you can afford.

The reasoning is that your code, when it's ready and in production, will run on machines with more cores than they had available last year. If your code can now face its future runtime environment, then you'll have fewer surprises later.

A second reason is that machines with many cores degrade more gracefully under high load. You may be compiling across a dozen cores, but you'll still be able to search Stack Overflow or answer Gmail.

You may also consider going for higher memory bandwidth. 8x8GB DIMMs is better than 2x32GB ones.

bitL · on Oct 17, 2019

Machine Learning would eat all CPU/GPU/memory/disk you could possibly throw at it. Right now training some Deep Learning model with a lot of preprocessing and all my Threadripper and Titan RTX cores are at 100%. If you use Docker a lot with multiple independent microservices/containers running, or VMs, or just compiling large code bases, the more cores the merrier (unless there is some bug in CPUs that prevent them from scaling properly).

bluedino · on Oct 17, 2019

I wish core counts hadn't gone crazy. Our developers are using PHP, Python, and Visual Basic so none of it gets used.

A 6-core system is perfect. However, we end up buying 12, 14, 16 cores.

And to make it worse, we buy dual CPU's on the servers. The buyer has determined core count > *

Give me 8 cores and all the clockspeed I can get (for our workloads)

driverdan · on Oct 17, 2019

> Our developers are using PHP, Python, and Visual Basic so none of it gets used.

Why aren't they being used? Even for single threaded languages you can spawn multiple processes.

robto · on Oct 17, 2019

Making programs concurrent is hard, especially if you don't have good support from the language. I, too, would hesitate before attempting to make a highly concurrent application in PHP, Python, or Visual Basic. And even then you're looking at considerable investment rewriting stuff that already works.

Maybe it works out if the cost savings from being able to purchase fewer machines outweighs the rewrite and the risk of new concurrency bugs.

p_l · on Oct 18, 2019

Still, even single threaded applications can be distributed across the cores by the simple fact that you can run many of them.

And web browsers eat a lot these days :-|

ahbyb · on Oct 17, 2019

That's the trick: they can't make single cores faster so they just throw more cores at the problem and call it a day. Obviously that doesn't work but who cares? It looks good in benchmarks.

plopz · on Oct 17, 2019

Thats why they should sell cpus that sacrifice security for speed. I'll take a cpu thats vulnerable to spectre/meltdown if its 20% faster.

waste_monk · on Oct 18, 2019

At least on Linux it's pretty easy to disable the mitigations: https://linuxreviews.org/HOWTO_make_Linux_run_blazing_fast_(...

I'm not sure how easy it is to disable on Windows though.

vbezhenar · on Oct 17, 2019

Are you sure that they sacrifice speed for security? It's not obvious to me. Hardware fixes are supposed to be fast. That 20% slowdown comes from software implementation (and you can actually disable it).

plopz · on Oct 17, 2019

I wasn't aware that the fixes for spectre and meltdown had no impact on performance.

paines · on Oct 17, 2019

You mean slower

hgoel · on Oct 17, 2019

It isn't the hardware developers' job to make you design good infrastructure.

dTal · on Oct 17, 2019

And yet, GPUs have hundreds to thousands of cores, and we find them very useful...

There are many, many useful tasks which are embarassingly parallel or nearly so. For those tasks, doubling the core count doubles the performance. Single core gains have stagnated, so a performance doubling is a huge win. There's no other way to just up and double performance for any class of problem.

And even outside those tasks, more cores means more simultaneous heterogeneous workload, which means deeper and richer information pipelines. If you're live editing audio or video, for example, your core count determines the number of plugins/tracks you can work with simultaneously - and it's not unusual to have hundreds of tracks and dozens of layered plugins.

strangenessak · on Oct 17, 2019

> GPUs have hundreds to thousands of cores

That's not exactly true. The GPU equivalent of a CPU core is called a Streaming Multiprocessor (in NVIDIA language), and it's latest GPU, GTX 2080 has 72 of them.

Each of these SMs can run hundreds of threads, but they run the same code in lock-step.

frankling_ · on Oct 17, 2019

That has never been true: on NVIDIA hardware the 32 threads of each Warp run in lockstep, certainly not all threads. And starting with Volta, each thread even got its own program counter and call stack.

UK-Al05 · on Oct 20, 2019

Large tests suits. If you design them well can be run in parallel.

SomeOldThrow · on Oct 17, 2019

My development environment has 12 discrete docker processes. The more cores the merrier!

tempguy9999 · on Oct 17, 2019

I'd imagine more mem would likely be more helpful? Sure you're not thrashing? Sorry if I'm patronising you... but thrashing is a killer

SomeOldThrow · on Oct 17, 2019

Why not both?

tempguy9999 · on Oct 17, 2019

Ah good, a response instead of an unhelpful downvote or 2.

I guess it's from experience, that when there's a bottleneck in my work and in many other areas I've seen it's down to lack of ram hitting VM.

Now lack of CPU is less crippling in a way, the CPU just divides n ways and things run proportionately slower. If paging happens, it can easily be far worse than lack of CPU as disk is so slow (ok, SSDs these days may ameliorate that; I've no experience there).

So why? Experience suggests lack of ram is more common than lack of cpu and the effect is worse. Of course the best thing to do is examine your system before you take anyone's advice, including mine.

SomeOldThrow · on Oct 17, 2019

I would take 64 gigs of memory/4 cores over 16gigs/12 cores any day. The idea of many core is simply less preposterous than it was even a few years ago.

tempguy9999 · on Oct 17, 2019

Oh heck yes, at least IME. I work in DBs and in a company I worked for were renting 16GB machines for the hosted service for their clients. I almost literally armtwisted them into upgrading to a 64GB dev machine. When they saw how that ran, within 3 days they'd started rolling out 64GB machines to their clients. Perf difference was huuuge because the hot dataset could fit into ram at last.

As ever, it depends on one's needs, but most often that need is greater for memory. IME. YMMV. Measure first as always.

RmDen · on Oct 17, 2019

If you are a DB developer, you can have a machine now that will have close to the number of cores on you prod/staging instance. A DB will use all cores available unless you restrict it

myrryr · on Oct 17, 2019

Depends on the language.

As a Julia dev, 32 cores would be great.

C1sc0cat · on Oct 17, 2019

Multiple VM's and Thread ripper is used for content creation

vkaku · on Oct 17, 2019

Yes. Yes!! The gap on memory channels should get closed.

rubbingalcohol · on Oct 17, 2019

I'm curious about the 280 watt TDP, especially for the 16 core part. The 16 core Threadripper 2 had a 180 watt TDP, so what are they doing with the extra hundred watts on a smaller process? Could these chips be running at much higher frequencies?

We might really be at the brink of no-compromise super high end workstation computing!

alecmg · on Oct 17, 2019

Ryzen 3900X (and probably 3950X) would agressively limit all core boost to fit in the TDP.

A Threadripper with a higher TDP limit (and better cooling) will be able to boost all core to same levels as single core, ideal for workstation workloads.

Some of that TDP is also spent on driving additional memory channels and PCIE 4.0 lanes. There were rumors of TR having 8 channels now. And judging from active cooling on X570 chipsets, PCIE 4.0 is hot.

stingraycharles · on Oct 17, 2019

As a Threadripper user, this makes me happy. I don't care about the TDP too much, as long as I can compile large C++ codebases as fast as possible (for which short-term, all-core boosts is very useful).

bjoli · on Oct 17, 2019

Isn't the ryzen 9 3950x 16core as well, but with a much lower TDP? (Whatever that means in this case). Comparing tdp is pretty useless.

driverdan · on Oct 17, 2019

It's not useless. Higher TDP means more electricity and more heat. Running costs add up.

bjoli · on Oct 17, 2019

For AMD it is. They calculate it using a formula with constants that differ for each processor.

Gamers nexus does a pretty good job explaining it: https://youtu.be/tL1F-qliSUk

There are loads of videos on YouTube about how AMD 105tdp processors beat intel processors with a lot lower TDP

rubbingalcohol · on Oct 17, 2019

True, you'd need a pretty good water cooler to run at max all the time. I would gladly trade electricity for performance though. A lot of the stage gates in my compilation are single threaded, so a highly-threaded chip that consistently boosts up to high frequencies would be worth it to me in terms of initial and ongoing costs.

paol · on Oct 17, 2019

280W is for the 32 core chip. Rumor has it that there won't be a 16 core TR, but if there were the TDP would certainly be much lower.

bryanlarsen · on Oct 17, 2019

The article lists a 280W 16 core TR.