Hacker News new | past | comments | ask | show | jobs | submit login
Intel Declares War on GPUs at Disputed HPC, AI Border (nextplatform.com)
151 points by jonbaer on Nov 21, 2016 | hide | past | favorite | 83 comments



Intel's roadmap is worthless, since they only seem to actually produce a given Xeon PHI SKU for about two years. Nobody -- and I mean nobody -- is happy with Intel's complete lack of availability for hardware replacements as the MIC addons fail. I can only hope the self-booting Knights Landing product is ... stabler.

I am aware of about fifteen different sites that are enacting significant architecture changes as the 5110 supplies dry up and the 7120s are clearly next.

It does not matter which product is technically superior when one of them is literally unsupportable.


So.. speaking as someone who does some deep learning (but quite a bit of ML more broadly).

Intel are correct that the memory size on GPUs is increasingly becoming a problem in some complex deep learning models. Having access to cheap(ish) DDR4 RAM using the same programming model would be a big win.

Interestingly, AMD is going after a related market (visualization) which is also very RAM hungry by building GPUs with M.2 SSDs onboard[1]. AFAIK no one has tried these for deep learning.

But within the deep learning community there's a fair bit of cynicism about Intel's claims. Over the last few years they have frequently claimed parity with GPUs... any day now. Yet they keep falling further behind in any useful benchmark.

[1] http://www.anandtech.com/show/10518/amd-announces-radeon-pro...


More memory would indeed be a bid deal for ML applications.

Found this comment useful as it's not clear to me whether the 'Knights Landing' will actually be comparable to GPU in terms of pure grunt.

Then I wonder if Nvidia will respond with more memory on their cards. From my totally uninformed perpective, that would seem easier than the transition Intel are trying to go for.


The programming model for GPUs isn't great - since they aren't general purpose platforms you have to keep thinking about moving data around between the GPU memory and the main memory.

Intel could bypass that and end up with an easier programming model. But.. believe it when you see it.

Note that this is almost entirely about deep learning though. Non DL machine learning is mostly done on CPUs at the moment (with a few exceptions) and generally performs adequately. More speed is always nice, but you don't hear a lot of complaints about XGBoost's speed, or FastText, or VopalWabbit, or the various FTL regression algorithms implemented on CPUs.


Yes, but deep learning is clearly the future. I'd love to fully unroll my rnns.


Intel seems to be taking a long time getting things to market in terms of SKUs that the general developer can buy. This is critical for gaining mindshare for the programming model. Both Knights Landing and the new chips from Altera have failed to materialise into general availability in a SKU priced at <500$. Part of what has made Nvidia so successful here is that the CUDA ecosystem scales up and down their entire product line.


> the general developer can buy

Having a non-integrated GPU (i.e. NVIDIA or AMD) in a machine is not a stretch of the imagination these days. This means that any curious dabbler can grab Tensorflow and start learning ML. Over the past few months, I've been seeing exactly this on HN: many developers with ML Show HNs. In my opinion, it is precisely the general availability of ML that has caused this explosion.

As you pointed out, the above scenario falls away entirely under an expensive SKU. I strongly believe that having to buy any specific SKU in order to approach this stuff will spell doom for the tech. If it's not on every SKU down to the Core 2, I really can't see it taking off.


Why would na expensive SKU be a problem ? as long as they offered it as a service and they ROI would be significantly better, it would be fine.


Because library developers want to have things available without paying by the hour.

Once you have the software, a service is fine. But the ecosystem of tools around NVidia CUDA and CuDNN is there because you can buy a gaming card for a few hundred dollars and hack up libraries you find useful.


Its significantly easier in academia to teach courses about hardware that is ubiquitous.


That and getting mindshare in the academic compute space. Where Supermicro is cleaning out all of Nvidia's higher end SKUs to build these academic compute clusters, Intel needs to insert itself and try to take the market. Currently they pair a mid-range Xeon with a couple GPUs, Intel needs to change that model if they want to win the market that has kept Nvidia in business.


Interesting article presenting a "war" between Intel CPU and Nvidia GPU.

I'm curious where AMD falls in this war. While it's true GPU can provide really nice theoritical TFLOPS, the cost of moving data in and out of the GPU's memory is a well known issue. This renders GPU much less attractive for real applications.

The reason I mentioned AMD is that I read some time ago about their Heterogeneous system architecture [1]. One of the HSA's goal is to get rid of the particular issue of moving that data. I wonder if this was adopted in any HPC cluster or anywhere else.

[1] http://developer.amd.com/resources/heterogeneous-computing/w...


Right now AMD is fighting for its life, it's in no position to take on Intel and Nvidia in the HPC space. If their consumer Zen processors flop then AMD will likely be heading for a buyout and/or bankruptcy. Only if both Zen and Vega (their upcoming high-end GPU) are huge successes will they have the resources to mount a serious effort to capture some of the HPC market.


Eh, AMD has been doing that and custom cores as their bread and butter, seeing as they design the chips that power all the >$100 consoles from Sony, Nintendo and Microsoft, they have a good market niche that provides them with a decent market to keep the company viable.

I'm not sure of cracking a new market space would be a good idea either, just like how they aren't trying to go toe to toe with Allwinner offering H3 SOCs for $1ea with a quadcore and H.265 hardware decoding, it might not make sense to get into a 3 way war with Intel, Nvidia, etc.

Retaking or getting into another market with a single vendor or none presently would likely be more profitable, just as VIA did with their point of sale motherboards with crypto accelerators and tons of serial ports.

AMD only has so much money and talent, getting into a pissing contest is not something they are looking to do. Intel can piss harder and longer than them (just look at the contra-revenue they did with the HP Stream 7 and the $50 Walmart android tablet, Intel charged the manufacturer for the chip $X and paid them $X+Y for using said chip to compete with ARM).


The console business has razor-thin margins, there's not enough money being made there to sustain a company the size of AMD. Where that business might save them is if say Microsoft decided to buy them out to secure their supply of chips for the Xbox. The fact that it would force Sony to look elsewhere for their next-gen console wouldn't hurt I'm sure.


Sure, I'm not saying its something with huge margins, just like how Allwinner isn't getting rich with their dirt cheap SOCs. That being said, they have a few markets like that, and that is what sustains a company.

I highly doubt Microsoft or anyone will buy AMD, as it would not make financial sense, AMD already builds these chips cheaper than any of them could build them on their own, and any kind of merger would revoke AMD's x86 license, and Intel's x86-64 license, thus destroying the ecosystem.

On another note, a razor thin margin might not be glamorous, but you can run a business like that, though it will not be glamorous.


Please fact check at the very least.

> and any kind of merger would revoke AMD's x86 license, and Intel's x86-64 license, thus destroying the ecosystem.

is known to be materially false.


I'm not a lawyer, nor have I studied the contract, but a superficial reading would imply that complete revocation is indeed the outcome of acquisition of either party. Do you have evidence to the contrary?

https://www.sec.gov/Archives/edgar/data/2488/000119312509236...

  5.2 (c)	Termination Upon Change of Control. Subject to 
  the terms of, and as further set forth in, Sections 5.2(d) 
  and 5.2(e), this Agreement shall automatically terminate as 
  a whole upon the consummation of a Change of Control of 
  either Party.

  (d)	Effects of Termination.
  	(ii)	In the event of any termination of this   
  Agreement pursuant to Section 5.2(c), and subject to the 
  provisions of Section 5.2(e), the rights and licenses 
  granted to both Parties under this Agreement, including 
  without limitation the rights granted under Section 3.8(d), 
  shall terminate as of the effective date of such   
  termination.


This is true but irrelevant - it's Mutually Assured Destruction unless they reaffirm the contract, since both AMD and Intel would literally have to stop producing x86_64 chips without it. There is no way that Intel would allow that, their business would go into freefall.

I mean, even if they did manage to wreck AMD, it doesn't get them anything particularly useful if it costs them their x86_64 duopoly/monopoly.


Not to mention that AMD's current state is absolutely perfect for intel - too weak to be a serious threat, yet viable enough so that Intel is not a completely obvious monopoly that might attract unwanted regulatory intervention.

Even if the licensing problems went away, I'm not sure it's in Intels political/legal interests to lose their only plausible competitor. AMD can't be costing them a lot of money (the real risk is slow irrelevance if ARM usage grows any more), and the risks were AMD to cease operations considerable.

That is, until some ARM-based system can at least appear competitive enough to keep justice departments (and powerful negotiating partners like apple) at bay.


He doesn't, hence why he did not link to an article where AMD, Intel or VIA's lawyers say their cross licensing agreement allows them to be bought by another company.


Uhh, have fun with that belief, AMD's lawyers say otherwise:

http://www.kitguru.net/components/cpu/anton-shilov/amd-clari...


> Right now AMD is fighting for its life,

I thought so too, but looking at it stock recently it seems to be doing much better (especially in last year). Not sure how to interpret that. Wonder if it was mostly investors being happy they divested from that ARM microserver company, or there is genuine upswing and increased market share.


> looking at it stock recently it seems to be doing much better (especially in last year). Not sure how to interpret that.

One interpretation: NVIDIA has been doing a very good job of A) pursuing deep learning and B) talking that up on earnings calls. Even if they're not seeing much money from it right now, investors are excited that they might in the future.

This leads some investors and algos to think, "Hey, NVIDIA is doing well, and AMD is nominally in the same business as NVIDIA, so let's buy AMD because it's cheaper and therefore must represent a better value." This drives up the price of AMD stock, which is itself already prone to more volatility because of their lower market cap.

However, they are two fundamentally different companies. I hear a lot about founder-led vs non-founder-led businesses, and AMD vs NVIDIA is a good example of non-founder-led vs founder-led atm. Also, they've done a very poor job with their ATI acquisition. The ATI acquisition is why they even have a leg to compete with NVIDIA in the first place, but that train has basically left the station at this point.

Intel is in a weird position because their market cap is 10x NVIDIA's, so they need markets where they can make literally 10x more money than for NVIDIA to be interested. Victims of their own success, innovators dilemma, etc. Personally, I think that will hinder them because everything they want to try won't be "big enough" while NVIDIA doesn't suffer from that problem and AMD just has no idea what it's doing besides professional supply chain management.

Respect where it's due, Nvidia will win the hardware side of deep learning and fast matrix multiplication for the foreseeable future.


Right now the only thing I can see holding AMD back is their lack of vision to drop a bit of cash on a CuDNN equivalent (and the ML framework support for it), especially fast convolution kernels optimised for their hardware. They are waiting for the open source community to do the work for them, but no one is picking it up because it's a lot of work.

A couple of good AMD engineers should be able to knock out fast Winograd kernels pretty quick.


Their best bet it to increase support for OpenCL rather than invent something new. And then improve / create OpenCL support for TensorFlow and other popular toolkits.


Yes, that would be OpenCL/SPIR-V. But you still need to put in the work to optimise convolution kernels for specific hardware.

TF has openCL support in the works, AFAIK outsourced to Codeplay - https://www.codeplay.com/portal/tensorflow%E2%84%A2-for-open...

Caffe has an OpenCL branch, works on AMD but about quarter of the speed of equivalent NVidia hardware using CuDNN 4/5, mostly due to unoptimised kernels - https://github.com/BVLC/caffe/tree/opencl


Is it really an innovators dillema for Intel ?

If they can achieve a big improvement over Nvidia , they can retain their margins.

As for the market being small - as long they can run a self sustaining , high-margin business org at it ,and there's. A big possibility of future growth , why not ? It's not like they lack those resources for another big growth opportunity they're trying to get.


> If they can achieve a big improvement over Nvidia , they can retain their margins.

That's part of my point--they can't achieve a big improvement over Nvidia. Nvidia hires from the same pool that Intel does. Intel doesn't magically get better engineers. If anything the smarter ones are at Nvidia because it's easier to double the stock price of a $13B company than a $150B company, so more upside potential on stock options, which matters to those seeking long and lucrative careers at bigcos. Salaries are about the same at both companies.

So with Nvidia's headstart and the knowledge that a bigger team does not lead to better engineering performance (as per Mythical Man Month), how does Intel magically achieve a "big improvement" over Nvidia?

> As for the market being small - as long they can run a self sustaining , high-margin business org at it ,and there's. A big possibility of future growth , why not ?

I agree logically, but I don't think Intel is quite there as an organization. Once an organization is being run by committee, it's too easy to throw cold water on an idea to kill it. Why take the risk of backing a new initiative?

Also, let's say they were on-board as an organization... See my first point: they can't run a self-sustaining high margin business because there isn't one. There isn't even an especially high margin business for Nvidia; enterprises aren't nearly as stupid or rich as everyone thinks.

If anything, Intel has made the classic big company mistake of waiting too long in the first place, then realizing they're losing, then overcompensating via acquisition. Historically, it's not a good strategy.

My whole point being: I'm glad to see competition, Nvidia (or quantum computing, etc.) will probably win this.


Their semi-custom segment basically stabilised the company and saved them from bankruptcy, their new graphics cards have been good, and from what we know about Zen it's going to be a winner for them (somewhere around Broadwell levels clock for clock).

If you keep in mind that they only need to capture something like 10% to 20% of the server market to double their revenue you can see why people are willing to bet on them.

Of course if they've been fudging the Zen numbers as they've done in the past then they'll probably be done as a company, but it seems very unlikely this time around.


AMD is losing less money than analysts predicted. They've gone from "hopeless" to "it's a longshot" and the stock has repriced accordingly. Long term prospects depend very much on how this new architectures do in the market.


Thanks, based on what I've seen this makes sense.


The stock market can remain irrational longer than you can remain solvent.


AMD is too important for China to let it fail.


Huh? AMD is too important to Intel to let them fail, if AMD ever went broke, Intel would give them money just so they didn't have to deal with every country going at them for anti-trust law violations.


China bypassed US export laws by licensing from AMD and creating a partnership which re-capitalized AMD. China wants to have the fastest and best of everything to show up the United States that China is the new world power.

http://www.eetimes.com/document.asp?doc_id=1329517 http://www.pcworld.com/article/2109703/amd-moves-desktop-pc-...


I've personally met with quite a few AMD employees over the past few years (hell, they bought me quite a bit of alcohol at times :P). They are not trying to compete in the high performance desktop or server arena anymore, they view it as an area that they will never win big marketshare in.

That being said, any market they can compete with Intel and cause them to have less than 100% marketshare in while making money is a market they are in or want to be in if they aren't already.

Intel literally has a half decade of chips that they could release right now that perform better and are ready to be released, but won't be until years down the road due to a desire to remain top dog in the event they have another Pentium 4 happen.

Perhaps that Joint Venture will go as well as Intel Israel went (brought the Core series out, replacing the failed Pentium 4) which then replaced Intel Oregon, but that is an iffy bet. The Israelis are particularly good at critical thinking, whereas that is generally not encouraged in China. I might be wrong though, but 2 to 3 years from now when we see first silicon for their modified Zen, we'll know.


I found myself thinking about that a few times these last years.

Would states really sue Intel for anti-trust when it's not really their fault if there was no competition?

I mean you can't expect a new CPU founder company to just open and compete directly...


There's nothing illegal about being a monopolist.

However, being a monopolist makes certain things illegal for you to do, and some of those things are things that are easy to suspect you of, but hard to prove you're not doing.

As such, being a monopolist constrains you, as well as invites long, expensive, frustrating probes into your business that creates costs and risk (even if you're not intentionally violating anti-trust law, doesn't mean some business unit isn't inadvertently doing so).

E.g. IBM was under an anti-trust probe from 1969 until 1982 over their mainframe business. The probe ended with the DOJ concluding the case was without merit, but it is widely considered to have had a big effect on IBMs decisionmaking for more than a decade.


I mean you can't expect a new CPU founder company to just open and compete directly...

That's exactly what the Mill CPU folks are trying. I have hopes for a RISC-V vendor to do this too.


Nintendo chose nvidia for their next generation console. Going to be arm of all things.


I realise it's not the most reliable resource but it shows something interesting regardless. It's a server grade apu, combining a cpu and GPU onto a single socket. With HBM for huge amounts of bandwidth.

They also have HIP for converting CUDA code into performance cross platform code.

http://wccftech.com/amd-exascale-heterogeneous-processor-ehp...


Intel doesn't have that much of natural room to expand. They completely missed the mobile, Moore law is slowing down, so their edge slowly erodes.

They been talking about killing GPU many years. They been doing it both technical and non-so technical point. E.g. in 2011 Intel paid $1.5Bln to NVIDIA, because the Intel killed NVIDIA chipset business through not so fair licensing practice.

Instead NVIDIA invested so much in the GPU computing ecosystem CUDA. For many early years it was niche, but quickly growing. However, they hit jackpot with deep learning. I wouldn't be surprised that majority of self-driving cars will have NVIDIA chip in it.


The only problem with Nvidia, if you look at the entire system, is that they rely on Intel's CPUs for most of the computer systems that they are a part of. Sure, they have their ARM core CPUs that have proven to be decent, but I don't see them blowing anyone out of the water with them. Furthermore, there is no interchanging GPUs, so the only people using Nvidia CPUs are the ones that buy into Nvidia's entire ecosystem, which is necessarily going to be fewer people/groups.

AMD, on the other hand, while fighting Intel and Nvidia for the past decade and coming up short, may be at the horizon of a breakthrough. Only AMD seems ready to find the synergy between CPUs and GPUs. A Zen and Vega combination of architectures could essentially be an Intel i7 top of the line + an Nvidia GTX top of the line. Add some special sauce, and they could really do it.

Nothing that Intel has done in a long time has been impressive, but Nvidia is on fire. AMD's CPUs may help them leap ahead instead of merely catching up (which Vega may do anyway).


AMD has been trying to find a synergy between CPUs and GPUs for over a decade. Heck they were the ones who bought ATI! They've been unable to deliver because "a synergy" between GPU and CPU doesn't mean very much.

In the best case, the architecture will be able to avoid memory copies, but although inconvenient, existing applications have been able to pipeline host-device/device-host memory copies, effectively negating the problem.

Even on this limited promise, AMD hasn't been able to deliver.


> They've been unable to deliver because "a synergy" between GPU and CPU doesn't mean very much.

Contrary to what you may believe, that "synergy" does mean a lot, considering that one of the worlds fastest supercomputers does run on Opterons and NVIDIA Tesla cards.

The thing is, high-performance computing is a niche market. Yet typical consumer hardware needs aren't CPU-driven, but GPU driven. Computer games don't take advantage (or need) multi-core CPUs, thus AMD's bet on the 6-core and 8-core AMD FX line didn't paid off as it was expected. Yet, everyone needs a good GPU to run games, and the "synergy" between that the CPU delivers and the GPU processes ends up being the bottleneck in consumer hardware.


Except you haven't told me what synergy means. Also the cpus mentioned have fused FP units, meaning their cores are more comparable to Intel's hyperthreads. They often failed to perform at i5 levels, and any small performance increase was offset by higher energy consumption.


Sure, they have their ARM core CPUs that have proven to be decent, but I don't see them blowing anyone out of the water with them

What? Their first iterations were somewhat meh (Tegra K1 with Denver 1 in the Nexus 9), but the latter iterations and reoriented focus (Jetson K1, K1 with A15s, Denver2) having been scoring some excellent design wins.


NVIDIA had about 5 different generations of Tegra before the K1 [1]. The A15-based K1 (T124) came before the Denver-based K1 (T132) AFAIK. Both are very nice and a clear improvement over the previous generations. Denver is a bit inconsistent in its performance, but on average it dominates A15s and A57s. Nexus 9, however, has very poor thermal design and can only dissipate a small fraction of the heat output of the T132.

[1] https://en.wikipedia.org/wiki/Tegra


More to the point, those previous 5 generations were generally disappoting. Slow chips with decent but not mindblowing graphics, and extremely short support windows that left early adopters flapping in the breeze. Nvidia had quite a bit of ground to make up, but from what I've seen of their latest chips they made it. I think they're the fastest mobile chips you can buy this side of Apple's monsters.


Tegra2 was notoriously one of the few ARMv7 chips without NEON.

I actually agree that Denver 1 in the K1 is really fast, but it's not scored many designs wins, perhaps exactly because of the thermal restraints. The A15 based design seems to have less of that problem.


The only problem with Nvidia, if you look at the entire system, is that they rely on Intel's CPUs for most of the computer systems that they are a part of.

Sure, Intel is part of every Nvidia system, but they're becoming a smaller and smaller part in dollar amounts. It's not uncommon that the GPUs in a CUDA setup costs 4-10 times as much as the CPUs. And at the end of the day it's much easier to replace Intel CPUs with AMD CPUs in a system, than it is to replace Nvida GPUs with AMD GPUs. Plus Nvidia feels a lot closer to come up with a CPU good enough to replace Intel in these systems than Intel is at coming up with something to replace Nvidia.


Nvidia makes a pretty mean arm soc. I'm curious to see where this goes. Is there any precedent for arm in the server room?


It is like every year one reads that this time AMD beats Intel or NVIDIA, but when it comes to "showing the goods", AMD products are not much more than eccentric room heaters.


> AMD products are not much more than eccentric room heaters.

You can buy an "eccentric room heater" from AMD with 8 CPU cores for around 100€, which in some benchmarks runs equivalent to Intel's 900€ processors.

Intel fanboys usually take jabs at AMD but they seldom back them up with evidence.


That probably only makes sense for people with low budget. If you really need performance, not only "in some benchmarks" then there is no alternative to Intel.


> That probably only makes sense for people with low budget.

Deciding whether you should spend 900€ or 100€ to get precisely the same performance is not a decision dictated by budget. Only a moron spends 9 times the cash to get the same performance.

> If you really need performance, not only "in some benchmarks" then there is no alternative to Intel.

The fact that AMD's 100€ offering gets you precisely the same performance as a 900€ cash-waster is the kind of stuff that makes or breaks the performance of your workstation.

After all, for the same cash some idiots waste on a i7 processor, anyone can purchase an AMD FX processir with 6 or 8 CPU cores and then spend the remaining cash on high-end GPUs.

Again, Intel fanboys usually take jabs at AMD but they seldom back them up with evidence.


FX-8350, mentioned by you, costs $149.99 according to cpubenchmark, and for extra $35 you can get E5 1620 that blows AMD out of the water. You can also install up to 384 GB of memory, whereas AMD cannot go above 64 GB. You can also get more PCI-E lanes, and squeeze more bandwidth out of the GPUs. Not to mention you cannot upgrade your AMD system to anything more powerful, without replacing everything with Intel platform ;)


> FX-8350, mentioned by you, costs $149.99 according to cpubenchmark

It's on newegg for $139.99. Close.

The AMD FX 8300 is also for sale at $115, and the AMD FX 8320 can be had for $119.

> and for extra $35 you can get E5 1620

It sells for $308 at newegg. Nearly 3 times the price you pointed out.

Intel's recommended price for the Intel Xeon e 1620 is $294.00.

http://ark.intel.com/products/64621/Intel-Xeon-Processor-E5-...

Get your facts straight.

> and for extra $35 you can get E5 1620

Benchmarks state exactly the opposite of your claims.

> You can also install up to 384 GB of memory, whereas AMD cannot go above 64 GB.

You've tried to compare a server-oriented CPU with a desktop CPU.

Even so, you failed to account for the fact that there are motherboards for the AMD FX CPU line that support up to 256 GB of RAM.

Again, Intel fanboys usually take jabs at AMD but they seldom back them up with evidence.


> If you really need performance, not only "in some benchmarks" then there is no alternative to Intel.

Different microarchitectures perform better on different workloads, nothing new here. AMD's CMT works very well on control / integer bound workloads. Their main disadvantage is that the Piledriver CPUs are 32nm and therefore lose on energy efficiency.


Exactly. Speaking from experience, I've had the opportunity to run the same machine learning job on a 12-core Intel Xeon E5-2650 server and a workstation with a 8-core AMD FX 8350 which cost less than 400€ to put together, and the AMD FX 8350 workstation finished the job slightly faster (some minutes in a 2-hour job). Granted, it was only anecdotal evidence, but the Xeon server cost 5 times more than the AMD workstation and gets the same performance.

Sometimes people need to pay attention to real-world performance, and price/performance ratios. Blindly parroting that Intel's offering is best is just idiocy, and one which is tied to a hefty price tag.


Every February I buy a new machine that is top of the line, mostly to reduce build times. I also compare it to my previous machines and see where I am at and how big my gains actually are.

I was replacing a 2nd gen i7 (affectionately named MasslessC) with an AMD-FX 8350, which ought to be faster so I named it Tachyon. Both have a raid-1 of high performance SSDs (Linux Software RAID, bonnie++ had the throughput on both at more than 1.2GB/s), both maxed out the motherboard RAM (32gb in the intel and 64gb in the bulldozer with better timings on the RAM) and my 4.5 MLOC codebase took about 3 minutes and 50 seconds to build and Tachyon with her 8350 did it in 3:30. That was not acceptable gains....

I started by overclocking Tachyon and her 8350. But I could only add 200mhz even after exceeding what the community considered safe voltage bumps. until I learned about spread spectrum clocking. I disabled that and fixed all the cores at the same speed and I got it stable at 4.5ghz, 500mhz more than the "cap". 3:15 or so. Also my definition of stable is an hour in prime 95 not crashing, not exceeding 60c and then succeeding on all my benchmark tests.

Well I decided I needed to do better than that, so I learned about CPU lapping. I got some static safe foam and mounted my the 8350 to it, and I got a small glass pane and mounted 1200 grit sandpaper to it. I did 50 strokes in each of the 4 directions to pull off the Nickel. With the rough copper exposed underneath I replaced the sandpaper with 1500 grit and did another 200 passes, then 2000 and 3000 grit each with 200 passes of 50 in the 4 directions. The top of that CPU was a copper mirror the likes of which Archimedes could use to set ablaze a thousand invading ships. I remounted it and managed to get to 4.6ghz with another voltage bump. This is because voltage bumps exponentially raise the temperature with respect clock speed. The extra cooling allow by removing the nickel allowed that heat to be wicked away just a little faster.

Then I dropped $400 on a water cooling setup and got temperatures to not exceed 29c after 60 minutes of prime95. This was insane the largest pump, and extra 2 pints in the reservoir, careful calculations of the mineral content of the water so that as it aged silver was deposited inside my copper parts, simultaneously sterilizing the fluid. Every thing that touched the water was surgical grade polypropylene, 100% pure copper or the antimicrobial silver I added specifically in such a way that it would slowly be electrolyzed away. The water has been changed only once nothing has ever grown and it is an odd milky white that conducts heat marvelously (better than antifreeze in your car, unless you own a Lamborghini). Tachyon is now a marvel of liquid flow control with hoses and radiators showing that my aesthetic skills are lacking.

This was the limit of Tachyon's CPU. Any further clock or voltage increase by the tiniest margins prevented a stable bootup of even a minimalistic custom kernel. My build times were down to 3:00 minutes plus or minus 2 seconds. I was satisfied, this $1,900 was worth the time of enjoyment I spent building it and the reduced build times. Fun, practical and the machine is stable to this day.

This took me all the way to the following February and time to buy another machine. I decided to get a slim laptop with lower performance for the sake of mobility. So I bought a System 76 gazelle (or whatever African wildlife) pro, their i7 ultrabook. It was attempting to compete with the Macbook Air. It was less than 1.8cm thick and came with a mobile 4th gen i7 clocked a 1.6Ghz. It could clock itself all 4 up to 2.5Ghz or one core up to 4Ghz, and tends to idle around 600mhz. I bought 16Gb of ram and a mismatched pair of a Sata3 and M1 SSD on Newegg and raided them.

Before deciding its name I benched it for shits and giggles. It built my codebase in 2:30, beating Tachyon and it's Bulldozer by 30 whole seconds, and with all the new changes to the codebase the 2nd gen i7, MasslessC, was up to 4 whole minutes. It beat my desktop soundly in every other test I threw at it save for GPU benchmarks and disk IO. Though it was within a few percent there. So the machine was dubbed Inflation (referring to the initial expansion of the Universe that occurred at many times the speed of light and any named tachyon I was aware of) cost about $1,400 grand total only the first $1,000 affecting the performance of builds.

This year I bought their 6th gen i7 with desktop replacement. SleeperService is similarly maxed out the RAM and 4 disks of SSD nonsense. SleeperService's build times are well below 2 minutes, above 3 minutes on Inflation and nearly 4 minutes on Tachyon because of changes to use more templates and other costly compile time stuff.

Intel is winning by leaps and bounds in every test and game I throw at it (there were many but this is already too long). I did everything I could to make Tachyon's Bulldozer fast but a slim and comparatively cheap notebook beat it on its first boot.

I dub you a biased AMD fanboy by virtue of my comprehensive experimentation.


a lot of wishful thinking down those lines about a company that consistently fails to deliver better products than competition (not that it wouldn't be great for us all, just tired of seeing the same missed expectations for last 10 years at least)


AMD is a red mage. No one uses red mages.


"Second, with the above notes in mind, remember that Intel’s strength even with its own accelerator, is that there is no funky programming model—it’s CUDA versus good old X86."

Now, please show us non-trivial use cases where recompiling "good old X86" to Knights Mill architecture yields satisfactory performance?

To get good performance you need to ensure your code makes good use of Knights Mill vector instructions (AVX-512*).


Most likely you call BLAS routines for matrix multiply, which use the right instructions. So any code using BLAS (which includes numpy) will run fast.

Using a GPU is much more complex, because it has a separate memory space and you have to copy arrays back and forth. It's easy to lose all your performance gains if you get the array allocation wrong.


If you use KNL as an accelerator you got same problem as GPU.

If you're using it in standalone mode then keep in mind that every part of code that is not using AVX runs on slow Atom-like CPU. You have to vectorize code yourself or find a library that did that for you.

Neither of above is close to "recompile good old X86 and run".


The argument is that you should be vectorizing your code anyway. You won't see as much of a boost on a skylake as you would on a KNL, but you are still going to see a pretty good boost.

Obviously if your code is completely serial (and arguably bad), you have a lot of work ahead of you. Whereas, if you have been taking advantage of AVX already (or relying on libraries that do), you are basically just going to need to recompile.

Are you going to get peak on a KNL? God no. But if it runs well on a "normal" processor, it is likely to run pretty well on the KNL. Which is a MUCH better foundation to start from than if you are trying to port a code to a GPU

And those tweaks you make to better utilize that KNL? It will probably improve normal perf as well.

It is obviously not as simple as the marketing led us to believe, but it never is. GPUs are a load of toss without a lot of work (and often some creative presentations). But the underlying principle of "Use the same code on both types of processor" does actually hold true as of KNL.

As for performance numbers: Check the presentations from SC as they become available. Multiple outlets have begun to publish, or at least present, these numbers and their experiences using KNLs. I would personally argue that we still have a long way to go, but there is actually light at the end of the tunnel and it does look like there is an actual path now.


Presumably even on good old X86 Xeon you cared about performance and figured out how to vectorize your code, so there's no change there.


The advantage is in existing, proven, stable tooling.

Then again, CUDA is NVidia only.


I'm pretty sure Nvidia is cleaning house with Intel when you compare something like say a K80 vs a Phi. The number of people who can do amazing and wonderful things (and frameworks to support them) with an Nvidia GPU vs a Phi is also dramatically different.


nVidia has really hustled to make that happen.

I expressed some vague interest in using CUDA for something and we were offered a demo account on one of their boxes, invited to apply for an academic hardware grant, and pointed at tons of training material (books, a MOOC, conferences, and even some 1-2 day sessions), libraries, and other resources.

On the other hand, my institute apparently has some Phis. I discovered this because there's a small queue with that name on our cluster. I don't think I've ever been contacted by an Intel rep or offered training material--or even marketing material--from Intel, let alone a Phi of my own to experiment with.


Nvidia's stock price has tripled from $30 to $93 in the last 12 months, while Intel's share price stayed at $34, exactly the same price a year ago. Although Intel is still a lot bigger, they want a piece of that pie too. That said, the chance for them to win a significant market share is slim. They have failed a few times in entering standalone GPU market. Will this time be different?


Nvidia has taken over a newer segment, cars, and they have a stronghold in the HPC market due to the sheer number of academic compute clusters that have based their setups on Nvidia hardware.

Intel has forced Nvidia out of the mobile and lowend/midrange desktop space, eliminating what used to be their bread and butter, so this pivot by Nvidia to markets where they either control the whole stack with their custom ARM cores, or where Intel can't lock them out by getting rid of PCIE lanes like on laptops/desktops (AMD would get subbed in cause PCIE lanes are all that matter) has definitely put them in a stronger position, justifying that valuation.


The article mentions that to write for Intels Xeon Phis, its x86. I imagine this is mostly accurate but is creating writing for this as simple as creating another thread? That seems naive

Looked into it, [0] looks like you add some compiler instructures before code you want to outsource to the coprocessor. It doesnt look extreme, mostly naming which variables need to be sent and to where. Ive never worked with cuda so I dont know if its a pain or not. Looking into it, it seems instead of x86 code it is cuda prefixed functions that will offload. Granted, there are speed advantages to having direct accessbto memory, I dont think Api is the biggest selling point.

Im quite interested in seeing if nvidia thinks this is a threat and how they will respond. New systems may stay with nvidia just because the experience and failures have been there. Though if the speed gains are even 25% I would bet there would be a migration

[0] https://software.intel.com/sites/default/files/managed/ee/4e...


Here is the main problem IMO: To get reasonable performance (i.e. comparable to GPU), you need to make use of vector instructions as well. And it's still not easier to vectorize your x86 code rather than to just port it to CUDA. With CUDA you get two parallelisms done in one programming paradigm: Both vector and multiprocessor parallelism just gets mapped to the kernel scatter initialisation, and the kernel itself is just written as a scalar operations plus some index calculations. In terms of readability / ease of use it's [naive x86] > [CUDA] > [performance portable parallelized + vectorized x86]. An if you don't have the last version of x86, which almost noone has, then Xeon Phi will be just as much portation work as GPU, if not more (because Nvidia has a considerable lead in tooling support and has actually even gained more lead since Intel's Phi came out because Intel is just not very agile).


And it's still not easier to vectorize your x86 code rather than to just port it to CUDA.

This a million times! The SIMT programming model that the GPUs (except Intel's) support so well is way easier to program and parallelize for than the explicit SIMD that x86 requires.

PS. Are you the person that wrote gogui?



> The basis of the Intel MIC architecture is to leverage x86 legacy by creating a x86-compatible multiprocessor architecture that can utilize existing parallelization software tools. Programming tools include OpenMP, OpenCL, Cilk/Cilk Plus and specialised versions of Intel's Fortran, C++ and math libraries.

https://en.m.wikipedia.org/wiki/Xeon_Phi

It's not threads, it's driver and compiler support for existing multiprocessing frameworks.


Actually, there are several modes of operation. In one mode, it acts like a 60 core machine running your favorite x86 Linux applications, in another mode you can run OCL kernels on it. The former, appears to give poor performance, see: https://cug.org/proceedings/cug2015_proceedings/includes/fil...


Though I'm not that expert on this area, can Intel make H/W (FPGA) as S/W commodity by making FPGA design as open source or equivalent unlike other or existing vendors? If Intel can do this, I think in the long term, Intel can re-take their competence.


> big plans to pull the mighty GPU down a few pegs in deep learning...will offer a 100X reduction in training time compared to GPUs...loosely laid down plans to eventually integrate...hesitant...on when

"Please don't get too far into the OpenCL/SPIR/etc software ecosystem, guys. We will totally lap these GPUs Real Soon Now. And the best part is that it will all be x86!"

I really wish Intel would produce a real GPU with fixed-function parts like the competition (and GDDR5 or whatever's next!). Their most attractive quality is their commitment to the open source world.


> 100X reduction in training time compared to GPUs

Sounds great, when will it..

>Intel loosely laid down plans to eventually integrate the unique Nervana architecture with Xeons this week, but was hesitant to put a year stamp on when that might happen.

So their theoretical eventual integration is hypothetically 100x faster than current-gen GPUs?

If they can deliver, great news for deep learning - but this seems like a lot of speculation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: