Intel Xeon D 12 and 16 core parts launched: first benchmarks

MichaelRenor · on Feb 17, 2016

> If you are using a compute optimized AWS c4.4xlarge instance: you will be able to purchase the Intel Xeon D-1587 system for about the same price as the “Partial Upfront” up front AWS fee, then colocate the box saving over $1500/ year per instance while getting better performance.

I hear this type of argument a lot, but it's so important to include the cost in engineering and talent necessary to keep your datacenter humming.

If you're a small shop with a small future growth expectation then sure, forget Amazon and start racking your own boxes. Just be ready to hire ops staff competent to run your operation. You need to be realistic about both aspects of cost.

ChuckMcM · on Feb 17, 2016

Having done this computation a number of times, with these machines it would flip over to 'colo' at about 150 machines. The really interesting thing is that amazon costs scale linearly with size while owned infrastructure scale fractionally. So the advantage just keeps growing and growing.

Bottom line, if you can host your app and resources on less than a 150 machines in Amazon you win, more than that and you're leaving money on the table. Once you get to the point where you are deploying new datacenters to support your customers you get a huge boost in operational efficiency.

kayoone · on Feb 17, 2016

Still not convinced about this. I can get a pretty beefy server from Hetzner for 60EUR per month. Skylake 6700, 64 GB RAM, 500 GB Raid1 SSD. One time setup fee of about EUR 100 but can be canceled on months end. On AWS an equally specced machine would run >300 USD per months easily. So it really only makes sense if you can scale your usage up and down a lot during the day. Does not make sense for most of smaller projects. Most of the companies i worked with had an EC2 setup of 3-10 instances that were mostly always up, which is way more expensive than doing the same thing on Hetzner. Basically they just did it because of free AWS credits for startups.

ChuckMcM · on Feb 17, 2016

Slight digression here but "RAID 1 SSD" might not be a good idea, especially if you bought two SSDs at the same time. They are probably from the same lot. My experience with 3,000 SSDs in use is that they seem to fail with a combination of age and rewrites. So I would be concerned that two equivalently aged SSDs with an equivalent set of rewrites (as RAID 1 does) would be quite likely to fail at the same time, or more precisely, when one failed you probably have a statistically harder time recovering the mirror from the other.

That is just statistics of course, but one of the things I tried to do in the Blekko infra-structure was to mix the ages of the SSDs to mitigate this risk.

rbanffy · on Feb 17, 2016

This is good advice even for spinning disks. I once met an unlucky RAID-5 array of three identical disks from the same brand, model, batch and with close serial numbers. When one of them failed, I immediately ordered 3 replacements.

They didn't arrive on time.

Good thing I had backups.

MichaelRenor · on Feb 19, 2016

I've heard there's strong correlation in failures by lot number. Can you confirm this?

What's the practical way to avoid this? Staggering your SSD purchases?

hhw · on Feb 17, 2016

This is a false dichotomy. There's no need to flip to colo, when dedicated hosting exists. The provider takes care of the network, hardware, and potentially much more depending on the service agreement. You can even have the provider setup a cloud environment for you, if you want to be able to spin up and down VMs. This eliminates the capex costs, while having much lower opex costs than colo at smaller scale.

Considering that you can easily do dedicated for 1/7th the cost of Amazon for the equivalent amount of resources, it doesn't take anywhere near 150 machines to realize cost savings, it's more on the order of one complete physical machine as a dedicated server. You can purchase a 2nd one for redundancy and/or for always available additional capacity, so that you can burst to double your usual peak performance and still save 5/7th the costs of Amazon. If you're expecting to require the ability to scale much higher than 2x, you can also purchase a bunch of servers just for 1 month, or even less time depending on the provider, or take a hybrid approach if you really need an extreme amount of dynamic scalability. The reality though is that such extreme scaling is in the minority of cases.

I'm not sure how you arrived at the 150 machines number, but that sounds like 3-4 full cabinets and about right to have a reasonable scale for colo.

thezilch · on Feb 17, 2016

Note: I manage dedicated-server hosting and all web services on Titanfall @ Respawn.

Surely your calculation doesn't / can't include bandwidth charges? And if you have Windows hosts... cloud screws you too.

You're not wrong that it takes a lot more machines than most shops need, and if you can make your scale elastic, 150 machines probably goes a long way! And to your point, developers undervalue their time a lot, and dealing with colocation and sourcing of hardware and backups can be a real drain to save some on CapEx.

ChuckMcM · on Feb 17, 2016

Our transit costs out of our Colo in Santa Clara are $6K/month for two dedicated gigabit lines. That translates roughly to 518 TB of data transfer a month if we could keep the pipes fully lit 24/7. Cogent has offered to make one of our gigabit lines burstable to 10G if we would like, same cost if we don't exceed a monthly average use of 1000Mbps.

And we use Linux so no Windows hosts charges.

There are more and more tools that are force multipliers for your Site Reliability Engineer (SRE) equivalents. You can't avoid swapping out dead drives, but with the right systems architecture you can make it pretty painless.

rdl · on Feb 17, 2016

Wow, you're getting ripped off.

cbg0 · on Feb 17, 2016

Indeed, I've seen Cogent go for $800 - $1000 for a 1GE over 10GE commit.

hhw · on Feb 17, 2016

It's possible to only pay the lower end of that range for a 2Gb 90th percentile commit on a 10Gb connection out of most major markets. Cogent pricing fluctuates wildly depending on your ability to negotiate.

Ignore the whole song and dance when they have to bring in a VP into the call to specially approve the pricing they're trying to pitch to you at. Those are standard transit sales tactics. Or avoid Cogent's sales entirely by working with a reseller.

MichaelRenor · on Feb 17, 2016

Speaking with the VP to Negotiate for transit prices? Sounds like a pita. AWS knows that you're an engineer who just wants to write code and has made that as frictionless as possible.

This goes back to my first point in that going into the DC requires staff with very different skill sets. It's not that it's not worth it, but there's a lot of costs involved part from the per-hour instance cost.

NotSammyHagar · on Feb 17, 2016

Can you elaborate? Discussions with details about things like this are always interesting.

rdl · on Feb 17, 2016

He's paying 2-3x market (although 2Gbps isn't huge). Cogent is pretty much bargain-basement, too.

I wish the industry would stop quoting prices they way we do (USD/Mbps, so e.g. $1.20/Mbps); Gbps is probably the right unit now, and it may make more sense to invert.

But there are lots of terms involved in a transit contract; unless you're buying tens of gigs, crossconnect, port cost, commit, etc. may be more meaningful than per-Mbps cost. And pricing often depends greatly on exactly where in a city you're buying (at IX, carrier neutral facility, etc. will be cheapest; off net building or monopoly building will be most expensive) -- and of course Cogent transit vs., say, Level(3) transit are only superficially the same.

jpgvm · on Feb 17, 2016

Exactly, transit cost is the big killer.

AWS/GCE/Azure are all really expensive compared to even good multi-homed transit. If you push more than 10TB and you don't need multi-homing you are already ahead.

If you do need HA multi-homed transit you are looking more at 50-100TB or so but still, there are a lot of places that easily do that every month, especially if you are doing cross-DC storage replication for example.

abakker · on Feb 17, 2016

Yes, but a huge reason to go with Amazon is to be able to take money back off the table. Sure, if you have predictably 24/7 load, then you may be better off racking your own at 150 machines. BUT, if you're not sure how much you'll need, and your app is built to scale up and down with demand, then that 150 number starts to slide upwards.

Risk mitigation is part of the cost of infrastructure that needs to be factored in, and if your business has a downturn, or your compute needs turn out to be lower, then you can scale down your OPEX pretty quickly. If you had bought them, you'd be stuck with them. (See Zynga)

The operating cost of doing it yourself is always hard to pin down, and include many many things. OS/Hypervisor licenses, staff, hardware, cooling, redundancy, disaster recovery, power, or Colo fees etc. The advantage with AWS, is that this just a few numbers on a few various services and it is very predictable.

boomzilla · on Feb 17, 2016

Yes, I guess the 150 machine is more like a guideline. It depends very much on your workload. If you need 150 beefy machine running at 90%CPU all the time, then yes, running your colo may come out ahead. On the other hand, if you run a system with an online fleet of say 20 web servers, a couple of beefy databases, and some offline analytic workload, then AWS will definitely come out ahead.

And that is not considering other AWS offering. I've come to really like DynamoDB. It takes some time to get used to, but I've found it's solving more and more problems for me, without having to scale OPS engineers. There is a danger of lock-in, but I guess as I am a paying customer, it's not going away anytime soon, and Amazon is probably not jacking up the price either.

ciupicri · on Feb 17, 2016

Yet Netflix migrated to Amazon and I think that it has way more than 150 machines.

ChuckMcM · on Feb 17, 2016

Yes they did. And that is a really great exercise in thinking about what their profitability would be like if they were structured more like YouTube in terms of their server infrastructure.

But to the point of many other folks here, if you're all cloud you can really shrink fast if you need too. Lose a millions customers? No problem, just drop some machines. Any business that can price into the monthly customer fee the marginal monthly cost of AWS to support that customer, can keep their costs flat when their customer counts vary. Someone with dedicated hardware is still paying the bills for it, even when their customers leave.

boomzilla · on Feb 17, 2016

I am not sure I'd agree that Netflix could've been more profitable if they went with their own DC. I think Netflix does not utilize that much hardware. Their streaming is on their own CDN, which I heard that are placed directly in the big cable companies' DC. Tracking and storing user profiles and other meta data would not require a huge amount of hardware. I think Netflix also does quite a bit of video encoding and analytics, but these workloads would fit well with EC2.

ChuckMcM · on Feb 17, 2016

That would be the thing right? Clearly they can compute their marginal cost per subscriber, and they know what the lifetime value of a subscriber is, so they budget what to spend on EC2 infrastructure and movies at Sundance and still meet their net income goals. They would also know how much difference it would make if they needed to add or reduce their EC2 footprint.

Sadly I don't expect them to share what those costs are. I have always felt that Amazon could, in theory, always out compete with NetFlix with their Amazon Prime Video because they would only have to charge the marginal rate for the hardware. However to date it seems like the content contracts have kept NetFlix on top.

rhizome · on Feb 17, 2016

I have to think they're getting a deal.

vosper · on Feb 17, 2016

Plus, they're the number one poster-child for AWS. Netflix running on AWS makes AWS safe for CTOs everywhere. So, yeah, they're definitely getting a deal.

kevcampb · on Feb 17, 2016

Which should be expected when buying in bulk.

BorisMelnik · on Feb 17, 2016

that would be fun to calculate

Retric · on Feb 17, 2016

Bulk rates are always going to be unlisted and cheaper.

Also, having 150 machines in a single location is rarely optimal.

Sanddancer · on Feb 17, 2016

One can always do a hybrid approach, where baseline load is handled on purchased machines, but with renting AWS time on an as-needed basis. Also, ops staff is useful, and often has skillsets that augment development. Letting someone else worry about building daemons, tuning parameters, etc, means that your devs can spend more time actually writing code. Interrupting your devs because nginx has a memory leak can be a much bigger waste of time.

seanwilson · on Feb 17, 2016

> I hear this type of argument a lot, but it's so important to include the cost in engineering and talent necessary to keep your datacenter humming.

I'm surprised how often I see arguments like this as well. If it takes even a few days of time for your employees to manage the hardware and you're only saving a few $100 a year buying the hardware yourself, you're already making a loss.

kayoone · on Feb 17, 2016

Imo it is not that easy. Amazon AWS does not manage itself, so you most probably need a Dev Ops guy anyway to not waste the time of your developers with configuring environments and keeping them running. I would say that the hardware part of devops is way less when you rent boxes and basically run them stateless.

seanwilson · on Feb 17, 2016

I'm not saying it's easy but it's easier. Different cloud services let someone else take care of replacing faulty hardware, upgrading hardware, adding new servers, software updates, backups etc. It's very difficult and time consuming to replicate all this reliably with a couple of dev ops guys.

kayoone · on Feb 18, 2016

If you rent dedicated boxes you don't have to worry about replacing hardware or adding new servers. Also the cost savings are so huge, as someone else stated a dedicated box could be 1/7 the monthly price of an equally specced AWS instance you can easily have a 1-2 devops guys on call and still save a lot of money. I don't see where buying/colo makes sense though, maybe if you have a really huge operation going on. Hetzner offers a i7 Skylake QuadCore 6700, 64GB Ram, 512GB SSD Raid1 for 60 EUR/month, an equally specced AWS instance (or multiple smaller ones) would easily be >400 EUR even when using upfront payment and reserved instances. That difference is huge if we are talking about 10-50 of these machines. The only benefit i see is when you can scale your app up and down a lot during the day to save costs, but i'd still like to see a real world case where that actually saves money. I am pretty sure most companies use AWS for the convienence and just don't care about the cost because it's minor in the grand scheme of things, like super popular startups with huge funding and years to figure out and optimize profitability.

voltagex_ · on Feb 17, 2016

I wish I had an excuse to play with one of these - I wonder what their colo costs are. I suspect you could run quite a few build boxes for all sorts of FOSS projects with a system like this.

jjn2009 · on Feb 17, 2016

Its unfortunate they didn't give an more details on the per year costs in colo. I'm pretty interested in that metric as well and how it compares to the other intel xeon chips.

zaroth · on Feb 17, 2016

Colo cost varies based on the facility, uplink, power density, and various other factors. So it doesn't quite make sense to talk about the "colo cost" of a specific device, or comparing colo cost of one chip versus another... unless what you're really just comparing is power efficiency.

Let's say you're renting half a rack with 100Mbps unmetered bandwidth and 20amps of 120V for $500 per month. So you have 16 amps / 1920 watts of continuous power you can draw. These Xeon D's are very lower power, although strangely Pat doesn't report the actual idle and full load draw as measured by a Kill-o-Watt meter. With the 10GBE I can't imagine budgeting much less than 150 watts per machine. If you can put let's say a dozen of these in that half rack, in my contrived example, the "colo cost" would be $41.66 / month or just $2.60 / core.

Patrick-STH · on Feb 18, 2016

Hi zaroth - We do not use Kill-o-Watt meters since we use calibrated Extech TrueRMS units for our "desktop" testing.

We also have a few racks in Sunnyvale, California that we use APC metered by outlet PDUs for and that we actually calibrated by testing with the Extech units and found two of over a dozen that gave us consistent readings across PDUs and ports.

This unit is in a SC515 chassis with redundant PSUs. Even with 4x 7200 rpm Seagate 2.5" drives, a 64GB mSATA drive and a Samsung SM951 m.2 SSD it is still pulling 0.24A on 208V (so 0.48A on 208V). If you were to gain a bit of efficiency running a single PSU, you could easily fit these in 1A on 120V power envelope.

What we have seen several folks do with similar D-1540 machines is actually use 1U 1A/ 120V hosting (which is very inexpensive) to deliver cheap local PoPs for their applications. We have been strongly considering decommissioning our Las Vegas 1/2 cab DR site and moving to this sort of distributed model as there are a lot of benefits with this.

We published more complete benchmark results ( http://www.servethehome.com/intel-xeon-d-1587-benchmarks-16-... )of the D-1587 yesterday, expect more in terms of reviewing the overall platform next week. The Xeon D is very platform dependent for power and the particular unit we have does have the additional LSI/ Avago/ Broadcom SAS 2116 (16 port SAS2) HBA onboard.

zaroth · on Feb 18, 2016

Thanks Pat! Long-time reader of STH, I really find your articles incredibly useful and informative!

I assume you meant to write "(so .48A on 120V)". 50W idle with 4 spinning drives, an M.2, and the redundant PSU really is really very impressive. As you say, keeping full load under 120 watts to be able to fit in a 1U/1amp colo gives an incredible value.

The hardware's not even that expensive! And for $50 - $60 / month in colo cost [1], the reliability of systems now with SSDs, and the amazing orchestration tools that are available, the likes of AWS, DigitalOcean, etc. start to look really expensive.

[1] http://www.webhostingtalk.com/forumdisplay.php?f=131

Patrick-STH · on Feb 19, 2016

Awesome! Great to hear. You got me on the typo. Should have been 0.24A per PSU (we meter by outlet on multi-outlet systems). Still 1A on 120V in a 1U is very easy to achieve.

jjn2009 · on Feb 17, 2016

The Xeon D-1540 is running 73 watts under load according to anandtech http://www.anandtech.com/show/9185/intel-xeon-d-review-perfo...

Across all the SKUs in OP the TDP is 35-65 watts. 20 More watts for the new 1587 SKU compared to the benchmarked 1540 in the anandtech article. So a rough estimate would be maybe ~95 watts at under load? Assuming the board is the same and only the CPU has changed that should be a pretty good estimate.

You're right it mostly doesn't make sense to compare machines in that way due to all of the variables surrounding colo costs, max load wattage is the metric I should be interested in.

detaro · on Feb 17, 2016

They talk about their colo stuff quite a bit, with prices here and there inbetween:

http://www.servethehome.com/going-colo-series-part-1-decidin...

http://www.servethehome.com/colo-series-part-2-picking-coloc...

http://www.servethehome.com/colocation-years-learning-experi...

mey · on Feb 17, 2016

"128GB is now starting to become a limitation. With 16 cores that is only 8GB/ core."

This comment gave me pause, I feel like I'm falling out of pace with the evolution of commodity server systems. We've gone from being network (10/40gbe) and disk io (fixed with SSD, PCIe NVRAM and Fiberchannel) bound in the the near past to swing towards memory capacity bound?

xxpor · on Feb 17, 2016

I think that's more of a reaction to the "fixes" to disk IO still not being fast enough.

The more you can keep in memory, the better off you're going to be, even with the fastest SSDs in the world.

zymhan · on Feb 17, 2016

See this article: https://queue.acm.org/detail.cfm?id=2874238

In a high-end server, with PCIe flash or NVRAM, you're increasingly unlikely to be IO bound.

frozenport · on Feb 17, 2016

My takeaway was that the single JBOD approach won't work in next generation systems as a JBOD of SSD/NVME/PCI-SSDs will be greatly limited by the controller, and that infrastructure complexity will increase presumably by having many JBODs with smaller number of disks.

Which is silly, because controllers can simply be designed to handle more data, and likewise they can be made more power efficient. So, far the only real limitation has been the interface, which off-course can be changed.

zymhan · on Feb 17, 2016

I'm not sure why that is your main take away from that article. The point is that relatively slow disk controllers designed for rotating magnetic storage are a bottleneck when you connect fast enough Flash storage. Connecting the storage via PCIe alleviates this bottleneck, to the point that you are no longer bound by the traditional constraints of small caches and large, slow permanent storage.

xxpor · on Feb 17, 2016

That was a great read, thanks for sharing!

Keyframe · on Feb 17, 2016

I wonder how soon we will see the inevitable convergence of RAM and "storage". XPoint is still slower than RAM, but is a path towards unification.

hga · on Feb 17, 2016

XPoint still wears down; it's said to be 1,000 better than NAND flash but that's still highly finite.

Keyframe · on Feb 18, 2016

Couple of iterations still and wear down suddenly won't matter anymore!

jakub_h · on Feb 17, 2016

Isn't 8 GB per core still reasonable? Especially if you can share indices and buffer caches across cores, for example. There must still be diminishing returns for caching.

Sanddancer · on Feb 17, 2016

Depends on your dataset. Systems like Databases perform much, much better the less you have to touch a storage controller. One of the things that's going into Intel's 2017/2018 platform is the support for 6 channels of memory, and not all of that needs to be DRAM. There's a new standard, NVDIMM, that lets you use non-volatile memory as RAM, and take advantage of its much higher densities. So, if you have things like a database, you can map a much larger dataset into actual memory, instead of needing to just have the indexes in there.

mnw21cam · on Feb 17, 2016

I do a load of work with high throughput genetics sequencing. One of the programs that processes the data for a sample is single-threaded, but requires about 40GB of RAM. I currently have a server with 384GB RAM and 64 CPU threads, so 6GB/thread. RAM capacity is a definite bottleneck, since I can only run 9 at a time, instead of 64.

It all depends.

zymhan · on Feb 17, 2016

"reasonable" is entirely context-dependent. Some workloads are CPU or network bound. Others are memory bound.

ihsw · on Feb 17, 2016

That is correct.

NVMe-backed storage, 10 GigE-backed network IO, and 16-core CPU gets you pretty far nowadays for ~$5K USD, and RAM starts to be the limiting factor. Disk IO and network IO can be saturated pretty easily, and that kind of work usually involves a lot of RAM so here we are.

iheartmemcache · on Feb 17, 2016

Keep in mind, Phi offers an additional 8 GDDR controllers at 2 lanes a piece. If you really ran into that 128GB wall, you could basically use your Phi as an off-loader and make those pages addressable. Profile your program and find the fetch patterns, then chuck in a cheap Phi as a memmapper for the LRU and/or older generation stuff. (I'd imagine you can borrow a lot of the last 20 years of GC research.) It's a non-trivial (but certainly doable) task.

oofabz · on Feb 17, 2016

That quote from the article makes sense if the author is thinking about virtualization. He is imagining running 16 VMs on it.

fweespeech · on Feb 17, 2016

Honestly, our production systems are 256GB RAM and 16 GB/Core.

So no, I don't think so.

thrownaway2424 · on Feb 17, 2016

What's going on with this 128GB number? What's the actual limit here? I thought we had 40 bits of physical memory to work with.

toast0 · on Feb 17, 2016

The board has 4 ram slots, and 32gb is generally available (64gb is still in 'soon' territory, I think). Xeon e5 v3 has four ram channels instead of two for the processors in the article, and can handle three dimms per channel (or at least the board in the article only haa two slots per channel).

I assume there's some market segmentation happening here, but it's also a way to reduce board size and pin count.

thrownaway2424 · on Feb 17, 2016

That makes sense. I didn't realize these had so few channels. What I was concerned about was possibly baseless market segmentation engineering, in the same way that they elide ECC from the "Core" line of CPUs.

ksec · on Feb 17, 2016

Samsung has announced they had a 128GB Stacked DRAM ready soon. So 4 DIMM could fit 512GB Memory.

So i guess the problem is with address space as mentioned above.

Sanddancer · on Feb 17, 2016

The Xeon Ds have had a max of 128 gigs, or 37 bits of addressing, since they came out last year. For Skylake, they have bumped up the addressibility to 36 bits from 35 bits, which is where it had been for a number of processor generations. For the E5 Xeons, the amount of physically addressable RAM is at 46 bits, and has been for a couple generations. Intel's used addressible RAM for a while as one of their segmentation methods to push people towards their higher end if their usage is RAM heavy.

baq · on Feb 17, 2016

it's market segmentation, sure, but also every bit of address line that you don't use translates directly into watts wasted.

bnastic · on Feb 17, 2016

Every couple of years Intel hardware keeps changing my mind from "I'll just rent everything in the cloud" to "I should build a server from this and keep everything in house!". And vice versa.

iheartmemcache · on Feb 17, 2016

I can't stress how game-changing this is to anyone in super-computing. (The whole direction Intel's been taking for the last ~5 years has been poised for this.) I went into a bit of detail here[1] (n.b. this was pre-Xeon D, the addition of which will only increase performance) for those who are curious.

Right now we're living in a golden age of financially accessible super-computing. We don't need to deal with MPI anymore, we just use RDMA and have Infiniband speed fetches to any machine joined to the cluster[see: MS Paper]. I was working on RELION as a favor to my father and got deeper and deeper into it because my docket was pretty open and it was fascinating. Long story short, I sketched up a test setup with Phi, RDMA (which admittedly wasn't used that much, as the problem set would be characterized by anyone as "embarrassingly parallel") and 10GBit. This is all commodity stuff, and we effectively never touch disk -- I cobbled together 18650's in the rack[see: batt] instead of a UPS with a uC which fires off a 'persist state to disk' message on any power interrupt but other than that it's all RAM and blazing fast. (Side-note: anyone who read the Nature Methods special on all the Cryo-EM stuff may be seeing a paper or two with a few new sets of computational methodologies in the near future ;))

Buyer beware though -- sacrificing clock speed for more cores can end up costing you a lot more overall[2]. Licensing policies have a tendency to change around from version to version, so your perfectly licensed Oracle 11g (lets say it was quantified by the physical chip you throw into the socket) might not have a straight forward upgrade path (now by the number of vCPUs). A lot of clients of mine got burned trying to move from Iron to AWS, only to realize that Oracle licensed per _available CPU_. Fines galore. They ended up downgrading to a previous revision Xeon (i.e. 4th gen E5 MSRP $x,xxx instead of 5th gen E5 MSRP $0,xxx with better performance but more cores). Anyone encountering this problem should keep that trick up their sleeve and find some 1 year old off-lease equipment with less cores but a higher clock speed to keep license compliance and still meet your computational demands. Anyways, yeah the latest and greatest might not be the most economic of choices for this reason[licensing].

[1] https://news.ycombinator.com/item?id=10805087 [2] This has so many variables as one can imagine - you can't just go by what people did in 1975 i.e. "this instruction takes x cycles, we can physically count how long our computation will take with a sheet of graph paper" because of pre-fetching, pipelining, cache patterns, the use (or lack of use) of the AVX(2) registers, etc etc. [3]https://software.intel.com/en-us/articles/intelr-xeon-phitm-... [MS Paper] http://sigops.org/sosp/sosp15/current/2015-Monterey/printabl... [Batt] As you probably have guessed, I have no formal EE training. I did however rigorously read all of the safety datasheets, use very high quality Panasonics, and strictly conform to the CC-CV guidelines so my fathers lab would not catch on fire. This is probably not the best idea, but it's electrically safe and isolated. It'd pass UL certification.... at least I think ;) [licensing] If you're a corporate entity who is moving into any sort of cloud or onto new hardware, my firm has quite a bit of expertise in licensing compliance when either a) shifting to new hardware, or b) shifting to the cloud, as well as getting your best bang for buck with existing licenses.

voltagex_ · on Feb 17, 2016

>I cobbled together 18650's in the rack[see: batt] instead of a UPS with a uC which fires off a 'persist state to disk' message on any power interrupt

How did your tests go with this? I'd totally use a system like this - properly packaged up of course. Li-ion and Li-poly make me nervous.

iheartmemcache · on Feb 17, 2016

I trust it as much as commercial grade APCs I've used in DCs. It definitely cost more in my engineering time than buying an off-the-shelf component, since I spent a lot of time researching it. But I'm a hardware nerd at heart and rarely get to putz around with this kind of stuff, so it didn't really seem like labor to me.

The precautions I took were probably overkill but since it wasn't my lab I chose to go pretty overboard. In addition to conforming to the strict charge-discharge guidelines Panasonic thoroughly delineated in their sheets, were to use 1: use "protected 18650's" (actually designated as 19760) which have internal circuitry to keep, say, cheap chinese eBay charge-units from setting peoples house on fire, 2: sourced the batteries from a vendor I've trusted for ages (since the Panasonic 'greens' have such a popular reputation, and knock-offs make it into all sorts of markets including first-tier shops like Amazon and Digikey, I asked my vendor to source directly from Panasonic JP, which he did and provided me with a packing slip). Fear of Li-* is reasonable, and I figure risking permanent damage to ones body and/or life isn't worth the 7 bucks you save buying no-names - go with the 19760's and live life to the fullest! (Pro-tip, the form factor of the 760s are about a mm or 1.2 mm or something longer than unprotected 850s due to the added circuitry -- which I presume is just a PTC thermistor + kill switch, I haven't looked it up though. I'm sure there's a mass difference too if you have access to lab quality scales or are friendly with your local cannibus vendor.

Thrown in the case I have a live K-type thermocouple in all 8 of the 1u's (a pair in each rack) with conservative failure policies should temperatures exceed parameters I set. I put put some nice fusing (HRC.. why? because I was already going overkilling and since this wasn't an off-the-shelf component, and I'm not an EE with extensive PDU experience I decided to go safe. (I also consulted with a post-doc buddy of mine who's actively in lithium battery research and after a good chuckle told me the precautions I took were satisfactory.) It came in at around 5k USD for all 8 units with everything, including ~115 a piece for the Hammond 1u's. Each unit can sustain a little under 4.5 kVA which gives more than enough buffer time for a graceful shutdown

tosseraccount · on Feb 17, 2016

Passmark has some benchmarks, including the E5 v3s , updated today ...

http://www.cpubenchmark.net/high_end_cpus.html

"The King of the Hill" is only 2.3 GHz ?

biot · on Feb 17, 2016

The article explains why:

  What one will see, very quickly, is that the new SKUs generally offer
  slightly lower clock speeds to maintain a 45w TDP. Maintaining this
  figure while adding 50% to 100% more cores and cache is no small feat
  and it makes sense that clock speeds suffer. We also see the TDP
  figures rise to 65w in order to accommodate more cores and higher
  clock speeds.

  We introduced the Core * Base GHz and Thread * Base GHz figures just
  to show how much of an improvement this is. The new chips represent
  double the cores but up to about 62% more clock cycles in aggregate
  over what we had as the previous fastest chip, the Intel Xeon D-1541.
  We also now, in the same TDP figure, have 23% more raw compute.

gleenn · on Feb 17, 2016

Given that power consumption is related to the square of the clockspeed, generally small reductions mean relatively big power and heat savings. Respect for them anyways, I'm sure it wasn't a cakewalk either.

voltagex_ · on Feb 17, 2016

With the "23% more raw compute" figure, say I've got a program running on an older system - do I need to recompile that binary (with new optimisations / assembly instructions) to see that improvement or is that an improvement in efficiency for existing instructions?

cbhl · on Feb 17, 2016

If your program is single-threaded, it will run slower. If it parallelizes well (e.g. ray tracing) then you will see that improvement.

JamesMcMinn · on Feb 17, 2016

40M Cache, 16 cores, 32 threads in a single CPU. Clockspeed has been mostly irrelevant in terms of CPU benchmarking for a long time now.

tosseraccount · on Feb 17, 2016

Didn't Intel recently juice up their Skylake i7-6700K desktop cpus to 4GHz, though. More cores and more cache has certainly been on the frontiers of pushing the cpu. I'm wondering if they can't resume getting the clock speeds up, too.

JamesMcMinn · on Feb 17, 2016

Higher clock speeds are absolutely still possible, just not in CPUs with high numbers of cores. More cores = more heat, and dealing with heat is one of the biggest challenges when designing a CPU.

For servers, doing more things slightly slower is (usually) better than doing fewer things faster, so Intel usually puts core count over clock speed for their Xeon CPUs. For a desktop system that's unlikely to be doing more than 3 or 4 things at once, you can prioritize single core performance and higher clockspeeds.

It's also considerably more efficient to have more slower cores than fewer faster cores. Reducing power consumption and reducing heat is a win-win in a datacenter.

dogma1138 · on Feb 17, 2016

Desktop applications tend not to be very optimized for multithreadding, games will still usually use 1-2 cores max which is what those CPUs end up being used mostly for anyhow beyond that professional applications also tend to optimize around 2-4 threads/cores. For the few desktop apps that do scale well 6 and 8 core i7's are available but at those prices you better get a mid range Xeon with a desktop socket since it's unlikely you'll be overclocking your cpu in that case anyhow.

sorenjan · on Feb 17, 2016

Intel will release the Core i7-6950X later this year. It has ten cores and will apparently cost around $1500. Not sure which use case except one-uppance and bragging they're targeting, but it will probably fly with the right workload. Video encoding and 3D rendering are the only things I can think of that will scale easy to that kind of CPU, although you'd think people with those needs would buy Xeons.

dogma1138 · on Feb 17, 2016

Well the "X" (Well technically E as in Broadwell-E, Haswell-E etc.) is part of their Extreme line it's meant for "enthusiasts" that really like dumping money on high end CPU's and overclocking the hell out of them this isn't a workhorse, at that price you can build a full dual socket Xeon for about the same price as the CPU (if indeed it will be 1500$, I'm betting on more like a 1000$ like their previous top of the line Extreme series).

Video Encoding isn't that well multi-threaded either as it's some what linearly dependent (you can't just encode random frames without having the previous frames/key frames done for references which means you can usually do 2-4 frames at the time so run off becomes and issue when you have more cores than frames to encode), 3D rendering is also a mixed bag depending on what type of raster and post processing you use you will get substantially different scaling between number of threads vs pure clock speeds.

In any case video encoding and 3D rendering (at least the ones that one will do on a desktop) will benefit much more from multiple high end GPU's than from an increase in CPU cores, the more or less conversion is that GPU numbers give almost 1 to 1 scaling where additional CPU's (cores) give you 0.3-0.5 on average in the best case scenarios (there are a few cases where CPU scales almost as well as GPU but they aren't that common).

sorenjan · on Feb 17, 2016

That depends on your use case. Intel Xeon E5-2602 V4 is rumored to be clocked at 5.1 GHz, with four cores.

Sanddancer · on Feb 17, 2016

The trend lately has been wide, not tall. Lots of pipelines mean that a stall in any one isn't going to affect overall performance too much, even if those individual pipelines aren't quite as fast. The trend is going to be continuing into the future, too -- the E5 v4 Xeons, which are supposed to be released Any Week Now, are going to be topping out at 22 cores per processor.

to3m · on Feb 17, 2016

There's a single thread score you can see by clicking each link.

(I don't know what this measures, though. It seems to be about 1,900 to 2,100 for the newer chips - even the one in my laptop. The score doesn't vary us much as the clock speed, though clock speed seems to be a factor.)

bigtones · on Feb 17, 2016

E5 v3 CPU's are old, and the Intel Xeon E5-2698 v3 at the top of their charts was released back in Q2 2014 - nearly 2 years ago. The new Intel Xeon E5-2698 v4 when it is released should blitz that.

tosseraccount · on Feb 17, 2016

"Leaks" say its going to have a 3.6GHz boost clock. [ http://wccftech.com/intel-broadwellep-xeon-e52600-v4-skus-le... ] Do you know what this is about?

nextos · on Feb 17, 2016

But with turbo it clocks at 3.6 GHz. Plus it has 16 cores.

rosser · on Feb 17, 2016

Isn't "turbo" in Intel's multi-core parlance the all-out speed a single core can achieve? If more than one core is maxed, they both run at a reduced "turbo" speed, dependent on how many more than one are maxed, right?

slizard · on Feb 17, 2016

The Boost-ed frequency depends on multiple factors, including the number of cores used, power consumption, current, temperature and even the type of instructions executed (e.g. AVX instructions trigger a drop in frequency on Xeon cores, not sure about Xeon D, though).

The boost levels are typically defined as "bins", i.e. the max clock if 1, 2, 3, etc. cores are active. Can't find Xeon-specific info, but this page shows some general information on the topic: https://www-ssl.intel.com/content/www/us/en/support/processo...

devicenull · on Feb 17, 2016

AFAIK, it depends on TDP. So, yea, you won't get all of them running at max turbo. (Even if you did, they could still be running at a lower p-state and just reporting a higher frequency. Frequency and performance aren't really the same thing anymore)

dman · on Feb 17, 2016

Is there some cloud for developers where one can try out new hardware from Intel / Nvidia / AMD? This is turning out to be a great year for hardware and theres too much new stuff coming out for my home lab.

Marat_Dukhan · on Feb 17, 2016

I am working on http://www.peachpy.io (an academic project) which lets performance tuning experts optimize assembly kernels for various x86-64 microarchitectures. We currently have Intel Nehalem, Intel Ivy Bridge, Intel Haswell, Intel Broadwell, Intel Skylake, AMD Piledriver, AMD Steamroller, Intel Bonnell (1st-gen Atom) and AMD Bobcat in the pool.

Source code:

- https://github.com/PeachPy

- https://github.com/Maratyszcza/PeachPy

ajtulloch · on Feb 17, 2016

PeachPy is great, thank you @Marat for this project.

dman · on Feb 17, 2016

Opened an issue that I ran into on ubuntu - https://github.com/Maratyszcza/PeachPy/issues/38.

Turns out this was an error on my part. All setup now.

jusssi · on Feb 17, 2016

On the site, the "About PeachPy" link at top right corner seems to do nothing.

Marat_Dukhan · on Feb 17, 2016

Peachpy.io is a work-in-progress...

dman · on Feb 17, 2016

Just spent the last hour looking at this - this looks awesome! Great work!

dman · on Feb 17, 2016

Will check that out.

simplexion · on Feb 17, 2016

I wonder how much it will cost to have 2 of these in a server running Windows Server 2016.

Sanddancer · on Feb 17, 2016

These can only support a single socket per box. Also, Windows Server has charged per physical processor for the past several editions, instead of per core, so the $400 version would do. Now, if you wanna see sticker shock, ask to see what Oracle will charge to run on the box.

simplexion · on Feb 18, 2016

Server 2016 will be charging per core.

BorisMelnik · on Feb 17, 2016

they quoted some basic prices in the article...also a depends on which model.

thrownaway2424 · on Feb 17, 2016

I think he was referring to the software's per-core licensing costs.

bhouston · on Feb 17, 2016

We are trying out a few of Intel Xeon D 1520 machines as the primary servers for https://Clara.io. They seem to be working fine at a low cost.

intrasight · on Feb 17, 2016

>ahead of the Xeon E5 V3 series I'm confused (probably because I don't understand Intel Xeon). Isn't that Haswell, and thus now two generation back?

Sanddancer · on Feb 17, 2016

The Server chips are pretty much always a generation back, having a development cycle to try to fix any erratum that may have popped up, as well as to do work to rearrange cores, cache, add a good amount of IO, and ditch things like the onboard graphics.

Intel's supposed to be releasing the E5 v4s in the next few weeks, which will be Broadwell based. edit: These are also Broadwell based, supplanting the Haswells they released last year.

toast0 · on Feb 17, 2016

E5 v3 is haswell, it looks like these are broadwell (not skylake, like in the consumer land). Broadwell was delayed in consumer space and didn't have many skus, not sure what we'll see with server skus...

wmf · on Feb 17, 2016

Intel's desktop/laptop chips have gotten far ahead of Xeons, so Haswell Xeon is still "current" (for a few weeks).

userbinator · on Feb 17, 2016

I wonder what the 'D' stands for, as the next earlier Intel CPU with 'D' in its model name was the Pentium D, where it meant "dual core".

wmf · on Feb 17, 2016

It stands for "full employment for Intel's branding department". Calling it Xeon E4 would have been way too simple.