"Some workstation users will need ECC memory, and up to 512GB of it. When memory has an error rate of 1 error per GB per year, using 512GB ensures almost two bit errors per day: something that a 60-day simulation would find catastrophic."
Just to offer a partial counterpoint to the popular "ECC is a must for real work" sentiment, I've ran many 60-day simulations where a few bit errors would be completely irrelevant. Many physics simulations include internal checks (that must be there regardless of ECC, for algorithmic reasons) that would catch virtually all such issues. While a bit error could certainly crash the simulation, long running simulations can be resumed from checkpoints with minimal loss of time and the probability of a crash is still very low for most real workloads.
In truth, there are plenty of good reasons, both for data integrity and security, to have ECC. However, there are real costs to ECC too - 10-30% immediate performance overhead (depending on your application) and much more than that for memory-bound workloads (due to lower frequencies and subtiming issues), not to mention hardware cost efficiency. Instead of oscillating between "ECC for everything" and "ECC is useless" people should rationally compare the costs and benefits for their specific workload and choose accordingly.
The performance penalties I'm referring to are memory performance penalties. If your workload isn't memory-bound (and most aren't), you'll measure a difference of 2% or less, sure, but that's not really relevant when talking about memory performance. In extreme cases (such as, for instance, lattice Boltzmann simulations, where the streaming step is completely memory bound) I've seen over 50% performance degradation, but much of it can be attributed to the notoriously loose timings on most ECC DRAM.
I feel I have the relevant real world expertise to have a say about that.
I am one of the principle engineers working on a game that was able to support over 1.2 million players concurrently, so I feel that I have some merit behind what I'm about to say.
ECC Memory does not have such an extreme overhead. Our gameserver is memory bandwidth bound. To put that into perspective a change from DDR3 to DDR4 saw a jump in performance of 34%~. Using non-ecc memory (which is obviously very available to the development team) only gave a penalty of roughly 3% in the worst case. (both DDR4 systems).
Why are we memory bandwidth bound? because we allocate 250GB of "game world" and phases and make teeny tiny modifications to them when you run around or pick up loot and shit. It's as taxing as you can be on memory bandwidth since you can't batch set or load anything.
I think what he is trying to say is there isn’t a lot of DDR4 out there at 3200mhz with tight timings like there is with non-ECC memory, not just that ECC vs non-ECC at the same specs is 10-30% slower.
That's right, I'm sorry if I worded it ambiguously in my previous posts. ECC RAM is considerably slower than non-ECC RAM, but nowadays it's mostly because of looser timings and lower frequencies. I'm well aware that there are people who claim that frequencies beyond the ECC standard are liable to cause data loss (for reasons usually left unspecified), but I've never seen any reliable data to support that claim, as long as the frequencies and timings tested by the manufacturer are maintained (not to be confused with people who overclock RAM way beyond its tested specification and experience instability).
On a further note, while the implementation of ECC itself is very fast for modern RAM, the same is not true of VRAM on GPUs. At least as implemented in NVidia Tesla GPUs, enabling ECC has a huge performance penalty by itself. [1]
I think some of this breakdown is from motherboard support. You can MB that supports a single CPU and 128GB for DDR4 4400 which is significantly higher than ECC.
When you start looking at multi CPU computers with 1+TB of ram the story changes.
PS: Some of this stuff is also really hard to test. Without ECC it's hard to data in terms of stability for petabytes of ram over years. And no extrapolations don't work very well.
[Citation Needed] With identical memory ECC and non <2% is the actual memory overhead.
Now, non ECC memory might have better specs / a higher overclock rate etc, but doing either of those increases the rate of errors. And sure you might not notice the problem directly, but that does not mean things are ok.
The main causes are altitude/background radiation, temperate, overclocking, faulty parts etc. But, memory can have 100+x the normal failure rates without being particularly noticeable in day to day operations.
The value of ECC is often more to detect errors than to correct them. Remember, if you have 10 errors an hour then you unlikely to see them in a 20 second memory test.
In practice out of 10,000 machines you might have 1/2 your errors on 10 of them. Critically, if you have ECC memory and you both detect and replace those 10 machines the apparent rate of failures then drops.
PS: This is a little dated (09) and only from a single set. But, still an interesting introduction: https://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf "Around 20% of
DIMMs in Platform A and B are affected by correctable
errors per year, compared to less than 4% of DIMMs in
Platform C and D. ... About a third of all machines in the fleet experience
at least one memory error per year (see column CE Incid.
%) and the average number of correctable errors per year
is over 22,000. "
As I understand it, the situation is even better (as in, less errors) than that with DDR4. Unfortunately, all the published data I could find is woefully out of date. Still, even if you generously assume something on the order of one correctable memory error per machine year, how many of those errors do you actually care about? Unless you have 100% memory utilization for 24/7 critical tasks, most of those errors probably don't even matter.
EDIT: the least out of date reference I could find is a blog post by Jeff Atwood, where he concludes that based on the available literature ECC-correctable errors are quite rare. [1]
The median performance for non ECC memory is very good. The problem is worst case performance is hard to detect and much much worse than that.
"The median number of errors per
year for those machines that experience at least one error
ranges from 25 to 611." "We find that for all platforms, 20% of the machines with errors make up more than 90% of all observed errors for that platform."
A minority of that 20% again have even more issues.
The worse case is literally 10,000x more than 1 bit error per year on a single machine. Without using ECC you really can't tell other than some vague not sure what's wrong just replace it.
Many physics simulations include internal checks (that must be there regardless of ECC, for algorithmic reasons) that would catch virtually all such issues.
Hmm, like what? I've run many simulations too and had the opposite experience. It's hard to implement checks because your algorithm defines how the simulation should permute over time.
While a bit error could certainly crash the simulation, long running simulations can be resumed from checkpoints with minimal loss of time and the probability of a crash is still very low for most real workloads.
Not really. Your algorithm determines how the simulation plays out, so you can't really notice small errors until much later. By then it's hard to roll back to an earlier state unless you're saving 64GB snapshots.
I don't know about checks, but any algorithm which works by iteratively converging on a solution, like Newton-Raphson and its bigger relatives, has quite a lot of bit-error tolerance built in.
That's right, any iterative solver method (which already encompasses most of the low level numerical solvers used in physics today) or, more generally, any method that algorithmically guarantees satisfying a particular mathematical constraint after each timestep (say, if you're enforcing fluid incompressibility at each timestep, most* single bitflips will not cause errors that would accumulate over time), etc.
[*] There are exceptions in specific circumstances most of which would produce obviously wrong results or crash the simulation outright. However, I'm not claiming that single bitflips are never an issue for simulations, only that it happens much less often than people sometimes think.
If a one-bit error causes the sign bit to flip, your position can go from 129,121.34 to -129,121.34. Obviously this is rare, but it seems mistaken to say that the simulation's parameters can prevent errors arising from this.
Also, it's not just physics simulations that are at risk. Voxel renderings are especially susceptible to one-bit errors.
Not sure exactly what simulation you're referring to with the position example, in most of my code that would either be fixed in the next timestep or make the whole thing blow up immediately.
Anyway, I haven't argued against using ECC memory in any of these comments. I've argued that you should know your needs and choose accordingly, and that ECC memory isn't free and therefore cannot be a universally optimal default for everyone.
For much of my physics work I've found ECC memory not to be worth the cost at all. For other things, like most database servers (and perhaps voxel rendering, I don't have enough experience there to comment) ECC memory is clearly worth it. I just think people should invest some thought into the question before evangelizing for or against ECC in the general case.
Thank you! That sentiment on HN about how "Everything should have ECC" really grinds my gears. There is no need for ECC in your iPhone or on your MacBook. Your average emoji-laden WhatsApp Conversation with your friends does not require Bit Error Correction. People who really need ECC do exactly what you just mentioned: Evaluate, and often enough they will find little to gain from ECC.
If ECC was the normal product that went into everything, and someone invented "unverified" RAM to get a tiny speed boost and slightly decreased cost, how much do you think the actual price difference would be? I expect it would be tiny.
It's a political decision by Intel to make ECC an Xeon-only product which has resulted in the vast majority of the price difference. The silicon die area required for it really doesn't cost that much more - often, the area is present on the consumer dies but just disabled so they can charge a lot more to the enterprise customers who actually require it. And political decisions like that make a lot of HN users unhappy.
> Working my way backwards, this business about segmenting? It pisses the heck off of people. People want to feel they’re paying a fair price.... And God help you if an A-list blogger finds out that your premium printer is identical to the cheap printer, with the speed inhibitor turned off.
Everyone knows that ECC parts are nearly identical to the cheap parts, with the ECC module turned off. But there's no boycott, because there are (were?) no other options, like there might be for a printer company. ECC segmentation is just something everyone knows and resents.
I imagine bit errors are highly unwanted on any domain when they occur in pointers, especially on those that are written through. Some languages make that more likely than others of course with heavy use of references.
Bitsquatting refers to registering a domain name one bit different from the target domain name. The intent is that a memory error will cause someone to get the bitsquatted site instead of the legitimate one.
"During the logging period there were a total of 52,317 bitsquat requests from 12,949 unique IP addresses. When not counting 3 events that caused extraordinary amounts of traffic, an average of 59 unique IPs per day made HTTP requests to my 32 bitsquat domains."
But when you need ECC, you really need it. My FreeNAS server has like 32GB or ECC memory because it runs ZFS, which basically doesn't work unless you have ECC memory. However, my desktop computer or my laptop don't need it. When you need ECC, use it, but if you don't need it, then no biggie.
> [...] because it runs ZFS, which basically doesn't work unless you have ECC memory.
That's a misconception, to quote Matthew Ahrens, cofounder of ZFS: "There's nothing special about ZFS that requires/encourages the use of ECC RAM more so than any other filesystem."
ZFS doesn’t need ECC. Obviously it’s better for data integrity to have ECC, but ZFS without ECC is still better than any other file system without ECC.
Traditional filesystems will let files just sit on the hard drive, untouched.
ZFS is a lot more active... checksumming and caching mean that important information is spending time in RAM. It's not good if that information gets corrupted.
Doesn't ZFS perform checksumming and error correction on that data to make sure that memory or disk issues don't get affected by corruption? I thought that was a main reason to use it.
Well right. It protects against silent bitrot on the hard drives. But it does this by caching and checksumming in RAM, so now you need to defend against bitflips in RAM.
It's not that ZFS without ECC is completely unsafe... it's just not as safe as it could be. ECC RAM becomes the next biggest concern once you've addressed on-disk bitrot.
Thanks for catching that - my statement was definitely too strong in implying data invulnerability.
I was hinting that you still get a bit of extra protection with ZFS over other filesystems, both on Non-ECC RAM. There is a chance with a stuck/corrupt bit that checksum will fail and you will get a read error. I interpreted OP as saying ECC is a must for ZFS, but I don't think you are more prone to corruption that any other FS?
Yes-and-no. In the most general case the checksum of data block A lives in the parent block B. B's checksum in turn lives in its parent block. And so forth all the way to the root of the Merkle tree.
These checksums are fixed exactly once, in the zio pipeline, as a dirtied buffer is being passed through it during a write operation. The checksum then stays on disk in that form in block B for as long as block A is reachable.
In the read case, checksum validation is done at the vdev_{raidz,mirror,...}.c level and repaired there if possible e.g. from another element of the mirror; recovery is also possible in the case of copies=N (N > 1). Generally, the checksums of read data and metadata are not used once a buffer is successfully in the ARC. (However, compare arc_buf_{freeze,thaw}(), which one can enable, and which will happily cause panics if ARC buffers are corrupted whether through physical problems (e.g. in memory, in a bus, in the CPU) or software error in the kernel.)
Block B may already be in cache for some other reason, in which case the in-memory checksum is used (this is the "yes" part of "yes-and-no"); otherwise, block B will have to be fetched before block A, because block B is where A's checksum lives. Block B may be evicted from ARC long before block A is evicted; any read() or similar call that needs data from A will not result in B being read back into memory.
A buffer kept in ARC and used as the source of subsequently-dirtied data will have its checksum generated anew, just like the fresh write above. Obviously if the data it is dirtied with is bad, it will go to disk and its parents will have a checksum reflecting the bad data. However, the source of the badness need not be within the transient dirtied buffer, nor in a long-resident ARC buffer; userland memory can be corrupted too.
If one has bad memory than bad data may end up on disk, checksummed in such a way that the badness will not be detected by the ZFS subsystem. But this is really no different from any other RAID-like data-validating system or a traditional one-disk filesystem which doesn't checksum. If bits are flipped in userland data that is ultimately subjected to a write(2) call (etc., think mmap()), no filesystem can reasonably expect to do anything other than write out the data handed to it via the system call.
Like any other filesystem, ZFS has its own metadata for tracking allocations, object metadata, and in generating the branch-to-root Merkle tree. The wrong memory corruption (and this has happened in software during ZFS development) can result in data loss or even a pool too damaged to be imported (go to backups). ZFS is not especially more exposed to that than any other filesystem.
Since the arrival of compressed ARC and crypto, only a small fraction of ARC buffers are kept uncompressed or unencrypted in memory at any time (this is controlled by the dbuf cache tunables). Bitflips in any encrypted-in-memory ARC buffer will be caught (noisily) if the buffer is subsequently read or modified, since a decryption will be done at that time and would fail under a single modified bit. Many possible corruptions in compressed-in-memory data will also be caught depending on the type of compression used when the buffer was first written out to disk. In neither case is the ZFS checksum involved in the catching of such corruptions.
Finally, this still won't protect against corruptions in userland, corruptions which hit the various data structure that point to buffers in ARC, or corruptions in the text segments in the zfs subsystem or elsewhere in the kernel. However, ZFS is not realisically more fragile to such corruptions than any other filesystem.
Bitsquatting, rowhammer and silent corruption. Basically, ECC should be the universal standard in all network gear, servers and endpoint (user) devices. Not creates an attack surface and opportunity for data-loss.
ECC is a market segmentation tool. Keeping ECC away from consumer hardware means you can charge more to make it available for industrial, professional etc. use.
Many and CPUs support ECC. It will take a bit longer to find a motherboard that supports them, but they are out there. Consumer ECC is an option but few seem to know about or opt for it.
I want to say I had ECC in my old AMD K6-350, but that was a lot of years ago. Either way, I'm pretty sure they have supported it for a long time and in many of their CPUs, even the CPUs aimed at consumers.
Here's a comment I made last night, if you want to dig in and find some current offerings:
Obviously ECC requires more resources and is therefore intrinsically more expensive. That does not preclude it from being used for market segmentation to get higher margins.
Same reason ECC is not supported on consumer processors from Intel. Here it would be almost trivial to add, but they don't.
You're probably aware, but the desktop i3 does support ECC. Well, at least through the 7th generation. It looks like now that it's added 2 cores in the 8th generation, ECC has been dropped (NB: I didn't look at all models).
So, they're okay with it on consumer parts, as long as you're not looking to do anything resembling professional work I guess.
In most previous generations all Core iX dice have had support for ECC, selectively disabled through fusing. Obviously, all internal busses and caches use error correction/detection independent of what the memory controller does.
Can you expand on this? Memory already includes controller logic and adding ECC logic does not need a seperate process. All you need is more silicon real estate because of the extra bits needed for the ECC. Silicon is cheap.
Not really. Most people don't care about ECC and gamers actually prefer the faster non-ECC ram. Since ECC is more expensive to make, the industry is obviously not going to sell ECC to people not willing to pay more for it.
And in a lucky coincidence it turns out that those who want ECC are willing to pay the moon for it, which of course gatekeepers like Intel exploit.
Well I don't think there's anything about ECC, or at least unbuffered ECC, that should make memory slower than non-ECC memory. "Fast" memory it seems to me is also just a marketing thing. If ECC was considered mainstream as it should be then I think we'd have overpriced '4000MHz Gaming X Raptor' ECC modules all the same. I'm also not convinced that unbuffered ECC makes memory modules significantly more expensive.
Basically there's no evidence that ECC memory is signficantly slower or more expensive than non-ECC memory. Obviously it's slightly more complex than non-ECC memory, but it's not a particularly high-tech addition. For all its supposed complexity/slowness, Samsung is using it in its caches on their Exynos SoCs.
Everything can be perfectly explained in terms of market segmentation.
Well ECC memory uses eight extra bits on the data bus, backed by a extra chip(s) (depending on the module's organisation); ECC memory modules effectively store one extra eighth of redundant data.
However, in many applications we find that using (forward) error correction almost always increases data density (for storage) or bandwidth (for transmission), simply because a FEC stream does not require a nearly-perfect channel any more. This is the way hard disks, SSDs, WiFi, LTE, DSL, ..., satellite communications, ...[, ...][, ...] are able to cram incredible amounts of data into very noisy channels. Thus, ECC significantly lowers cost in many dimensions (be it frequency spectra, storage prices, not having to re-cable entire countries...).
(And if you don't use the extra noise margin to increase density/bandwidth, then you can use it to increase reliability, like we usually do with ECC memory)
Thinking about it for a few minutes, the memory bus will most likely be the only bus in your computer that has no error correction/detection. USB, SATA, PCIe, all of them require it. The main memory will also most likely be the only storage that doesn't use it (apart from firmware flash chips and the like, but these often use a checksum at least).
> However, in many applications we find that using (forward) error correction almost always increases data density (for storage) or bandwidth (for transmission), simply because a FEC stream does not require a nearly-perfect channel any more.
Do you mean that ECC has those benefits, or that other applications of error correcting codes has them?
The reliability boost that ECC DRAM gives you could be reinterpreted as extra headroom for overclocking the DRAM before it becomes too unstable. Since the parity bits are carried on extra data lines, they aren't subtracting from your usable memory bandwidth so the net effect may be a substantial performance advantage when operating at equivalent reliability levels. The main concern is whether the memory controller can correct errors without a severe latency penalty. The ECC used for DRAM is far simpler than the LDPC used for things like SSDs, so it's probably not an issue. (However, systems halting on the detection of a double bit uncorrectable error would be an inconvenience.)
Gaming GPUs have different qualification targets anyway. They will error fairly frequently compared to e.g. your CPU; something that doesn't matter when most errors become invisible after about 16 ms, but not really something you want to see e.g. in a simulation.
Theoretically going with ECC Memory should add about 20% to your memory budget, but due to artificial chipset limitations and "enterprise pricing" it is often more like 500%.
Personally, I've always found this to be very annoying. The only people using non-ECC memory should be Xtreme Gamer type people who don't care about their data and only care about squeezing that last 2% out of their system to land at the top of the penis size chart. Non-ECC should be a feature like liquid N2 cooling.
ECC is often used without being advertised. For example, hard drives aren't advertised as using these kinds of codes, yet all of them do. Similarly, switches will often use ECC memory without making it an explicit point in the advertising. Industrial computers tend to have ECC, yet you will have to look closely at the data sheet to see this. And of course all internal busses and caches in pretty much any desktop/server/workstation CPU use error correction/detection.
The moral of this story is that ECC is only ever advertised explicitly, when it is special to have, i.e. because there are segments in the market where it is not used.
(A somewhat curious case are small microcontrollers. Often these control relatively important aspects of machinery, for example, drives [say a bit-flip in the drive controller turns STOP into RUN, which can get rather ugly on e.g. a lathe]; these usually don't have ECC. However, they are manufactured in very coarse processes and generally don't use DRAM.)
Current market prices for ECC RAM are approximately twice as much. I just built anew Ryzen system and considered using ECC, but I couldn't find any DDR4 3000 RAM, and the DDR 4 2666 ECC Ram was just under twice as much as the DDR4 3000 non ECC I ended up with. Opportunity for a future upgrade , though. (My motherboard does support ECC :)
The real question though for a large scale consumer product manufacturer is how much margin they could convince the RAM makers to give up on ECC. Once it became standard, it probably would not cost much more than non ECC. Maybe 15% more? (1/8th extra memory, plus some extra for other expenses.)
I found that, at least for DDR4-2400, the ECC price premium is about $50 per 16 GB stick. I sprang for it anyway, but still winced at the cash register racking up the numbers for my own Ryzen upgrade.
It is nice to see "EDAC amd64: DRAM ECC enabled." in my dmesg output. :) That, and the NVMe SSD is screaming fast.
I'm debating whether to try my hand at overclocking my R7-1700, but even leaving it at the stock clocking it's pretty speedy, and it runs cool with the stock cooler.
Depends on your simulation. Are you using your workstation to do molecular dynamics? Or turbulent fluid simulations? A little extra randomness in two bits every day won't hurt you.
See e.g. this paper by Walker and Betz from the San Diego Supercomputing Centre (paper from XSEDE '13) on the effect of ECC on GPUs for MD simulations:
https://dl.acm.org/citation.cfm?id=2484774
Except if the bit error happens in a pointer and you end up reading from the wrong part of memory. Or in your file system driver and you corrupt the file system. Yes, it will most likely be in the 500GB of float arrays, but why would you risk it?
Yeah but you'll notice, and resume the simulation from a previous checkpoint or something. The risk is with errors that you don't notice, but that nonetheless change the outcome in a meaningful way. That seems to me to actually be a pretty small space of situations.
I am in the market for an upgrade to my home PC. I use it for Compiling and Gaming.
I am finding the sheer variety of CPU's and their weird naming conventions utterly confusing. Then combine that with the almost-as-confusing choice of motherboards and the bit-better-but-not-much choice of RAM and I am completely lost.
Any tips out there for finding a path through this maze, how do I upgrade my PC without requiring an advanced degree in Intel/AMD marketing speak?
http://www.logicalincrements.com/ is a handy site that recommends PC builds that give you a good bang-for-buck at various price points. On a given horizontal line, the CPU, motherboard, and RAM will all be compatible.
The site is tilted for gaming, but GPUs are mostly interchangeable - so if you want more compute power, you can drop to a lower GPU and spend the money on the CPU instead.
Oh, and the #1 upgrade you can make for everyday use is a SSD. I was recently impressed at just how fast an old Core 2 Duo system was once I installed an SSD. It won't do much for gaming (aside from improving load times), but it will make the rest of the things you do on the computer more pleasant.
Honestly, Anandtech's linked "Best X" articles are one of the best resources for this on the web.
Compare their "Best CPUs for Workstations" and "Best CPUs for Gaming" articles, and choose a platform. Probably going to be Ryzen 7 vs. X299, and you'll probably get at least one more CPU upgrade from either of these motherboards.
Get an NVMe Samsung 960 Evo or Pro because fast storage is super important and these are currently blowing everything else out of the water. Get enough RAM for now, and leave a slot or two free for when the prices for DDR4 drop.
Pick your graphics card from the spectrum, or honestly, just keep using whatever you have right now because cryptocurrency is currently severely distorting the market and limiting availability. I don't see a "Best graphics card" summary article right now, but I think this chart is somewhat current:
The most important thing to understand is that getting the exact ideal processor doesn't really matter. The difference between competing processors will only be discernible with a stopwatch and carefully controlled tests.
Looking at the ~$375-$400 price point, the new 6 core Intel 8700k will compile large C projects ever so slightly faster[1] and might be faster or slower for games depending on which one and othe details. [2][3] I'd tend to go with AMD right now just because they don't play tricks with fusing off certain functions for market segmentation reasons. Basically, look at independent benchmarks close to what you're doing.
For choosing everything else many websites put together system guides laying out a set of compatible components they think are good at the same part of the market. Those tend to be very gaming focused but I think that's a good starting point. Intel just released a new top end chip so they're all out of date but the recommendations here[4] are where I'd advise you to look if you want a new system now. The prices have come down a bit since the guide was written, though.
Or there's the Build a PC subreddit wiki as an information resource. [5]
IIRC it started as a "compiled wisdom of 4chan gaming boards" and has morphed into affiliate links, but you might still find the Logical Increments list useful (http://www.logicalincrements.com/)
Not sure about SeanDav, but I do a lot of compiling and wonder if anyone has any suggestions about the best CPU to buy. I want short wait times for a compile, but I still care about price. I don't want to go over $500 but would prefer less.
My experience with GCC and compiling C++ has been that performance scales nearly linearly with core count. In other words, compiling C++ gets great benefits from parallelism - provided you configure your build to take advantage of it. (For make use the -j N option where N=num cores + 1; for Code::Blocks it is a preferences option. Not sure about Visual Studio.)
Things that do not parallelize: the final linking step (especially with mingw), compilation if all your code is one giant CPP file (prefer small files).
Assuming you have an SSD, your project is configured for parallel build, and your code is organized into many small CPP files, more cores == more better. I look for the highest core count CPUs with the lowest power. Last year I built a dual CPU Xeon workstation / home server using 1.8 GHz 14-core chips (mostly from second-hand used hardware to save money). I feel that that low clock speed is actually an advantage, because even with 2 sockets and 8 sticks of 8 Gb RAM, it idles at only 75W (at the outlet, as measured with a Kill-a-Watt). With 56 total threads, it builds my C++ project in 2.0 seconds flat, something which takes almost 2 minutes single-threaded. (I think the fact that there are 60 seconds in a minute, that 56 ~= 60, and that the build time went from 2 minutes to 2 seconds is not coincidental.)
Unfortunately a hardware problem with one of the used components seems to be causing the machine to lock up under Linux, and I never had the time to troubleshoot it, and moved on to a different codebase which doesn't require as much compilation so haven't yet had motivation to track down the problem.
Look at the Ryzen 1700X. Its a full fledged 8 core and down to 299 (https://www.amazon.com/exec/obidos/ASIN/B06X3W9NGG/55467-20/...). Dont think Intel has something that comes close to that for compile times at that price point (If someone has a counter point on this I would love to be corrected).
It's worth noting that not all 1700s can hit the 1700X or 1800X speeds. There is a degree of luck in overclocking. Also, it's only a useful suggestion if you don't mind losing your warranty.
Basically, IPC in an Intel core is a bit better than in an AMD core. Keeping this in mind, you can compare core counts, frequency, power draw, PCIe lines or whatever floats your boat.
To compensate for a lower IPC, AMD will give you more cores (sometimes a lot) for the same price.
It all depends of what you need your processing power for.
Remember that one-bit errors aren't necessarily small errors. A single one bit change when working with 64 bit floating point can result in a very large difference in value.
How often doe a program load an FP with a value at the absolute extreme of the range though? I mean, if my value is there, I probably already have a problem.
Yes, but the point is that flipping a single bit can change a value dramatically. While using 64 bit ints for example programs are likely using a limited range of values and flipping a bit in say the middle of the int will change the value by plus or minus 4294967296. Not many programs will handle this gracefully.
Why does the image on this article show rackmount equipment while talking about workstations? I find their selections to be, at least for my work, much more server-oriented than workstation.
Do a lot of people really run >512GB RAM in their workstations rather than running a "thin" workstation and running simulations on servers or EC2?
I just built a new workstation a couple months ago: i7 7700 (not the K), 64GB RAM, one of those closed loop CPU water coolers... As a workstation, I wanted it quiet, and performance has been great. Long running jobs I run on our dev/stg cluster (4 machines, 512GB total RAM, 48 total cores.
I second that, in most cases, 64GiB is enough for workstation clients.
I can think of some reasons to have twice, four times, or eight times as much though. You can almost never have enough block cache, after all. Local testing databases come to mind. Also, tracing/profiling Elixir programs takes up enormous amounts of memory. My colleagues at my last company did not have enough RAM to generate meaningful profiles of our system, 64GiB was just barely enough to get a signal out of it. Rather than spending a month working on the profiler, it was nice to be able to get some data out of it as-is.
Depends on what you're doing. With video or 3D work, 64GB can often be a bottleneck. With After Effects and Maya rendering we've gone over our 96GB RAM limit using dual Xeon 6-core (i.e. 24 logical) on multiple occasions, especially rendering 4K+ scenes, forcing us to set limits on RAM/thread usage. I want to upgrade them to 128GB but getting bigger ECC RAM sticks in Austria (and EU in general) was (and still is) difficult and stupidly expensive.
I wonder how the EPYC 7401P would compare with the 1950X in terms of performance/price? It has 24 cores v. 16, but with a lower clock speed, and is $51 more. If you can actually use all the cores, I suspect better?
And do the motherboards support features that you would expect out of a consumer version? They often only have a couple usb and video out, with very few features.
there are claims online that 7401p has a cinebench score of 4,200+, if such claim is true, then it is much faster than 1950x for cpu intensive embarrassingly parallel workloads.
the problem is you can't easily buy a 7401, there is no video online verifying the claimed cinebench score is a pretty good example on how hard to get access to one.
The new AMD CPUs look otherwise good, but it seems they still don't run rr. Giving up rr is not a reasonable option if the purpose of the workstation is developing software in the kind of language that gdb can make sense of (C, C++, Rust).
A workstation will be used or at least turned on 24/7. The one I just sat down at sure is. So when calculating costs one should include daily power consumption (power used + cooling needs). Some of these new intel chips run very hot.
28 cores sure, but they run a base frequency of 2.50GHz. For $1469 you can have the i9-7940X 14 cores at 3.1GHz probably more suitable for a workstation.
I'd go for the Threadripper 1950X if that is a requirement then 16 cores, 3.4 GHz, ECC support all for under $1000. Intel doesn't really have anything that meets those requirements at a reasonable price.
Money may not be an object, but choosing the first option seems a little stupid. Even rich people don't throw away money like that, because if they did, they probably wouldn't have gotten rich in the first place.
It depends on the workload but for most cases those extra cores will just sit idle (that Intel is for things like webservers serving many concurrent requests), for most general workstation tasks you'll get better performance from the AMD offering. It's sort of like buying a half million dollar tractor trailer to go fast instead of a hundred thousand dollar sports car.
So the total performance (to first order) is almost twice as good for the 28 core part. Sure you need to use all cores for that comparison to hold, but usually software either scales up to 8 cores or less or it scales up to N cores.
This gets muddied by the use of boost clocks, and this part does 3.8Ghz in that mode. What would be nice to see is a processor aimed at long stretches of interactive use with few cores (1-4) and then long stretches of offline uses with many cores (16-32 cores).
A lot of "workstation" use is long stretches of interactive use on a few cores followed by periods of offline use (NN training, Graphical rendering, Code compilation/testing etc). The boost clocks on these higher end cpu's should really be able to go even higher, even if it means the base clocks would have to go down somewhat. I'd rather have a 28core CPU where it runs either 20 cores entirely OFF and the rest at 4.5Ghz, than one where it runs at 2.5-3.8Ghz.
I don't have any precise requests, but I would like to be able to pretty much hook up an oscilliscope or logic analyzer anywhere and in principle be able to understand what's going on.
I'm looking to get into hardware by way of tinkering and my mental model is of open software. Currently, any time I have a burning question about one of my tools, I can just open a man page as well as start digging into the source. I think it would be cool to do the spiritual equivalent with hardware.
For example, I am currently digging into the bootup process--everything after CPU POST until a fully ready userspace. However, the details that between pressing the power button on the power supply and CPU POST are completely opaque to me, and some are completely opaque even in principle on my current system. I'd like to have the ability to fully grok the electrical underpinnings of what's going on or at least as close as possible.
Somewhat off-topic, but I need to replace my ancient workstation and would love a recommendation. I prefer to max cores and do not care much for clock speeds (I work mostly on algorithm development and can parallelize experiments relatively easily but often run multiple unrelated tests). 10-15k budget.
I was thinking of dual-CPU Z840; are there better options?
You can get 64 cores, 128 threads on a 2P EPYC7551 with 256GB of ECC RAM in a rackmount server with redundant power supplies for that much money. e.g. SiliconMechanics.com
Thank you for the suggestion, I will look at SiliconMechanics. I need a tower, not rack mounted, but I see capable systems with 24-28 cores there. The site seems to be in a disarray, at least for high performance workstations (two rightmost options of the four, which I would assume to be the most capable are showing up as discontinued when I click on them, etc.).
I was one step from getting a threadripper... But then I opted for a 1800x, I will invest more on the gpu side when there will be the hw to properly drive a 4/8k vr headset eh:) I wonder though if there's a way to restore disabled cores on a Epyc chip... As according to and Amd they all born with 32 cores...
They still are required, for at least three reasons - UX, cost, and security.
1. For a lot of workloads, it's more convenient to run in on a desktop. Sure, with the right tooling, you can bridge your cloud and desktop pretty seamlessly, but a lot of people can't do that or it's not possible with their tooling. Sure, it doesn't scale, but it often doesn't need to.
2. A Threadripper CPU may cost $1000, but but look at what that buys you in the cloud. The machine pays for itself within a year if used extensively.
3. Sometimes it's not easy to just upload code and data (!) to the cloud. Sure, there's all sorts of security layers and certifications and yes, clouds are very often more secure than desktops, but still, some people can't afford to let their data leave their machines.
Two of these were a thing at my former employer (but we still used the cloud for other use cases). So no, I wouldn't say these workstations are obsolete.
First of all no one has really successfully solved the running Autocad etc. in the cloud problem. Secondly the per core performance is generally pretty poor on cloud computers when compared to high end workstations and for many workloads 16 really fast cores beat out 64 slower cores.
Finally, cloud computing is still really expensive compared to dedicated hardware. A halfway decent EC2 'workstation' like the p2.xlarge is over $5000 a year (plus storage) if you pay for the whole year up front and that 'only' gets you 4 cores and 60 GB of RAM (and a pretty good GPU). $5000 will buy you a really nice workstation with much better specs, and you get to keep it at the end of the year
Yes, most engineering applications end up doing all sorts of simulation that can be summed up as 'run as much math as fast as possible.' There's just routing traces on a PCB, there can be heat or stress simulations on a part built with constructive geometry, and a lot of other things that lets an engineer simulate/try something interactively and not simply 'draw a part.' This is where good workstations can definitely make a difference. An example product that is definitely going to need the local beef: https://www.autodesk.com/products/cfd/overview
Hmm; for a workstation like the one I'm using now with 32Gb of RAM and 8 cores, EC2 has an m4.2xlarge as an equivalent.
$0.4/hour * 24 hours * 365 days = $3,504 per year. You can build the same machine for a fraction of that cost and you'll get more than a year's use out of it. Reserved instances would bring the price to rent down a bit further, but I still don't think you can beat the economics of owning a workstation if you're going to be using it most of the time.
Microsoft's desktop licence is more restrictive than their server license. Desktop OSes can't be virtualized alongside server OSes. This is why AWS Workspaces are Server 2008R2/2012 with the Windows 7/10 Experience, respectively.
Then there's still latency, and the need for some kind of local client for the remote session.
It's also a move away from "micro-computing" and back to the "mainframe-client" model that is so popular with Cloud-and-Phone.
Running a small VM on Azure full-time for a year will buy a decent desktop. Running something equivalent to a $1000 desktop machine will cost wildly more.
No, it hasn't. I'm not saying we wont reach a stage at some point where compute power is a utility like electricity or water but we are definetly not there yet. I mean, in the least, I can't go to my local PC vendor and purchase a slow terminal with some compute credit allowance to nearest EC2 center or whatnot. When we get there, we can ask this question again.
$999 for the best choice seems really high to me. I understand that they're trying to generalize a broad range of applications, but still.
They mention the Ryzen 5 is better bang for the buck, but dismiss it for having "low overall performance." I guess I'm wrong in thinking that, for 98% of people, a $250 CPU would still be wasteful.
But this is about workstations where CPU bound workloads are running, like rendering or compilation.
Most people will either buy a laptop or desktop (if their smartphone is not sufficient) and not spend nearly as much money, they only need to run a browser and a couple of other small programs.
It's about picking a CPU for workstations, though, not normal desktop machines. It's assumed you will need serious computational power, and that's not free. That's also why they highlight the presence of so many PCIe lanes.
Since this article is about "workstations", it should have only considered ECC-enabled systems. Since it didn't (hardly mentioned ECC in fact) it's a useless article IMHO.
anandtech.com used to be an okay source, but it is getting worse pretty quick, non-sense article like this is helping anantech to get that position fast. threadripper 1950x is not the overall best workstation processor, it is more like a toy for kids into gaming/overclocking.
1950x is slow for multithreaded applications, it has a low cinebench score of ~3,000, you pay some serious $ for a fancy motherboard full of LED lights, then you don't have officially verified ECC support.
As a comparison, you get the same mutithreaded speed, much cheaper system cost if you just buy second hand dual 2696v2 processors with real ECC support. if you need real processing power in a single box and have a tight budget, there is a flood of E5-2696v4 processors on the market at $950-1,100 each. for a much nicer budget, you can always go dual/quad Intel 8180.
Nice one, you compare second hand prices to brand new prices. Anandtech's goal is to guide you in the chip market buying it brand new, not hunting it on ebay. Also, what they say with the 1950x being the best compromise between number of PCIe lanes, processing power, maximum memory etc. If you didn't notive, performance is not their only criteria.
Sure, but as you say yourself, the 8180 isn't the same budget at all, so I excluded it from your price comparison. Also, it's not a competitor to the 1950x. I guess in the end it all depends on what you need to do. Your need seems to be maximum performance, why go for the best "compromise" CPU and then criticize their choice? Clearly you should go for either of the 3 recommended CPU for performance. (which are also better for PCIe lanes and memory, but have a poor performance price ratio).
if it is about budget, or performance per $, surely you'd be buying Xeon from ebay. you don't lose anything as the likelihood of having a dead processor 18 months from now is pretty much 0, no warranty required.
if it is about PCIE lanes, you should at least be buying dual socket systems, that gives you more PCIE lanes than 1950x.
if it is about memory, well, 1950x is limited to 8 DIMMs.
According to a 21 years long term AMD user. Compared to my dual 2969v4's multithreaded performance, 1950x is sloooow. Please don't bother mentioning 2p EPYC, you simply can not order it and expect a reliable ETA, most models were released on paper.
what is your first AMD processor? mine was AM486DX4-120
DX4-120? Listen here, sonny, I'm not sure that first AMD CPU you had conveys authority to speak about today's multi-threaded performance. And this opinion comes from someone who's first AMD CPU was a DX2-66, so clearly I'm less wrong. :-D