Hacker News new | past | comments | ask | show | jobs | submit login
Intel Initiates EOL for the VCA2: Three Xeons on a PCIe Card (anandtech.com)
32 points by rbanffy on May 12, 2020 | hide | past | favorite | 35 comments



Full blown Xeons on a PCI-e card?

Lately, I've been wondering as to why we don't see computers physically built more like old workstations, with processors on cards plugging into I/O backplanes. The modern incarnation might be CPU+RAM on a PCI-e x32 card, and a simple "motherboard" with power and PCI-e. (Does 32 lane PCI-e exist? I've never even seen photos.)


Look up the Intel Intel NUC 9 Extreme (aka Ghost Canyon), it's a very small form factor gaming PC in the arrangement you are talking of. The motherboard is one PCIe card, the graphics card is another with a dumb I/O plate connecting the two.

As a concept it's interesting, however the 238 x 216 x 96 mm size is just too big when the Velkase Velka 3 is 227 x 187 x 97 and uses an off the shelf ITX motherboard, same PSU and graphics card to achieve the same for much cheaper overall.

A very powerful and quiet flex ATX PSU can be had at https://www.sfftec.com/product-page/enhance-enp-7660b-pro so that's not a problem either. Same can be had as taobao item 533119895425, use a proxy like superbuy and note modular PSUs with the modular plugs sticking out are longer than the flex PSU standard so not compatible with the Velka 3 but there are plenty variants of this item which are.


You see this all the time in industrial settings, based on https://en.wikipedia.org/wiki/CompactPCI which looks like this https://www.kontron.com/products/boards-and-standard-form-fa...

Some discrete GPUs in laptops/notebooks but also rackable systems are using it when it is pluggable/"built to order" based on a common board and not hard-soldered on.


The Intel NUC Compute Element[1] is somewhat like that. I have no reason, however, to think this will last beyond this generation of the NUC.

1: https://www.intel.com/content/www/us/en/products/boards-kits...


See standards proposals for various next gen links.

https://www.openfabrics.org/images/eventpresos/2017presentat...


More like 3 iGPUs with some extra baggage the CPUs on those aren’t accessible directly in any way and not used for what needs to be offloaded from the fixed pipeline encoders/decoders and even that might be offloaded to the main CPU.


Not directly accessible? There's ethernet ports on the card, and the article mentions you can SSH into them.


The SSH is used to manage the images running on the VCA’s you aren’t adding CPU’s to the host.


They aren't beefy Xeons - look more like desktop-like E3's with some HBM. Still nice, though.

Never saw one of these, but they may present themselves as one or more computers the same way a Xeon Phi coprocessor did.


They aren’t using HBM they are using DDR4 SODIMMs.

They don’t present themselves as CPUs to the host, they run a self contained Linux image the only thing that is accessible to the host is the streaming interface.


I think OP was referring to the on-package eDRAM. It helps quite a bit with bandwidth and latency.


> they run a self contained Linux image the only thing that is accessible to the host is the streaming interface.

That's more or less how the Phi coprocessors presented themselves, with the exception you could ssh into the coprocessor and install software on it.


Because other than an a few niche markets many are now using workstation level laptops and stuff like M2M or PCMCIA never got much adoption.


"AVC transcoding at 30 FPS" sounds a lot less useful than what you can get from an NVIDIA consumer or Quadro card via NVENC. Very weird, and multiple Xeons doesn't sound cheap.

Am I misunderstanding the product?

I know Intel is coming out with a discrete GPU, which will probably have plenty of video encode hardware to compete with NVIDIA (esp since game streaming at 4K is quite popular).


It can do 40+ 1080p streams in real time per “card” and Intel provides the software.


Each card has 3 E3-1585L CPUs, each with 4 cores (8 threads).

Intel's white paper (https://www.intel.com/content/dam/www/public/us/en/documents...) says up to 12 per CPU.

The card uses 235W.

A consumer security DVR can record 16 channels at for $100.

I found a TI chip from 2011 that could encode 6 streams at 1080p30, about 10W for $100.

The biggest advantage I see, as you pointed out, is the fact intel did all the software work and it's practically drop in.


Quality is a big factor especially for broadcast bitrates, QuickSync has superb quality even at low bitrates better than even Turing NVENC which is top notch and much much better than Pascal.

I doubt the Texas Instruments chip you found can do transcoding, it probably can only encode/decode.

44 streams of 1080p30 for h264 to h264, ofc you can select any resolution and frame rate you like.

https://www.intel.com/content/dam/www/public/us/en/documents...


Gamers everywhere wish you wouldn't consider 30FPS "real time" :)


This wasn’t for gaming


Regardless, calling 30 FPS real-time is still pretty misleading.


It’s not it can transcode upto 44 streams of 1080p@30fps in real time as in there is effectively no delay which makes it suitable for broadcasting live video.

Real time means that there is no delay between frames going in and going out.


> Real time means that there is no delay between frames going in and going out.

As long as you don't put in a >30FPS source. This mindset seems prevalent throughout the modern broadcast industry, as evidenced by most modern "web" releases of old TV shows being in 30FPS progressive with the misunderstanding that it's somehow equivalent to 60i despite throwing away half the motion information found in the interlaced fields. Take an old Simpsons (or whatever) DVD, run an episode through QTGMC to get 60 progressive frames, and compare it to the same episode on a streaming service. You'll see what I mean.


That has nothing to do with anything discussed here there is no hard frame limit you can do 240fps if you want you just cut your number of maximum streams.


“Real-time” has nothing to do with frame rate and everything to do with latency and simultaneous throughput.


This card is old (maybe Quick Sync was better than NVENC in those days) and Intel will always recommend their own chips over a competitor.


This thing is sweet, and I wish the idea was still around, but it's obvious these specific cards are awful now and just produced as a symptom of Intel fucking the market on core counts.

Tangentially, I've been asking for a modular mac pro for years-- and this card is closer to the current Mac Pro as far as right ideas go.


How did these work? The article says you could ssh into them, so they all had their own RAM and ran their own individual os? I don't think the CPUs were multi socket capable.

How did they boot? Network? So the pcie connection was essentially just for networking?


Google: PCIe Non-Transparent Bridge

they're independent systems with a special PCIe switch that looks like an end device to all participants and allows shuffling data around at PCIe performance

(the NTB function is integrated on some Intel processors AFAIK, but it may also be a separate chip)


This is what I'm curious about as well. My poorly educated guess is that maybe these are just meant to process specific tasks sent to them from the host machine over SSH.

But then why have connectivity through both Ethernet and PCI-E?


If one needed more cpus, the advantage of "just more servers" vs this card I'm guessing is the extraordinarily lower latency and bandwidth available via PCIe vs some other interconnect?


These are used for live video encoding and editing mostly for broadcast.

I don’t think they’ve been in wide use outside of the sports broadcast industry but I might be wrong.

The main component that is used isn’t the actual CPU but the IGPs Intel has one of the best video encoder cores out there right now.


It has power input and ethernet ports so one can ssh into it, so does it at all need the host on the PCIe bus?

So, can I just plug power and ethernet into it and boot linux?


It may need some support from the host if it doesn't have any local persistent storage.


netboot + tmpfs root would be good enough for me


Judging by the prices Xeon Phi coprocessors fetch on eBay, I'm not really optimistic about finding these floating around.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: