What ARM needs to do is start a multi-core revolution. Give me 64 cores, 128 cor...

DannyBee · on Dec 28, 2012

For video encoding, things like OMAP/et al already give you hardware video encoding next to the ARM, usually based on an arm "DSP" like cortex m3/m4.

Given this is infinitely cheaper than using general purpose cores, why wouldn't you just build yourself an army of cortex m3/m4's and use them for video encoding now.

ARM doesn't really need to do anything here to make this happen. Since they give you all the chip related stuff necessary to build your own, just go and do it. If you want 64 m3's, go make it.

graue · on Dec 28, 2012

Parallela, featured on HN a few months ago, is aiming to do exactly that: http://www.kickstarter.com/projects/adapteva/parallella-a-su...

rorrr · on Dec 28, 2012

It's not clear what the instruction set is. Is there any software that can run on it (and not on ARM A9 like in their demos)?

It's also not clear whether it's a SIMD solution. If that's the case, they are basically reinventing GPUs.

vidarh · on Dec 28, 2012

All the architecture documentation is available. The instruction set is custom. There's a gcc port. The OS is hosted on the ARM, but several of their demos involve calculations offloaded to the Epiphany chip, otherwise there wouldn't be much point.

It is not SIMD. Each core is totally independent of the others. Basically the cores are structured in a grid, and all cores have 32KB in-core static RAM, but can also access the memory of all the other cores (at the cost of a few cycles delay), or main memory.

You could certainly use them as much like a SIMD if you wanted to, but then you'd likely get better performance out of a GPU.

We'll see when they deliver on their kickstarter project what type of performance people get out of it in reality...

mtgx · on Dec 28, 2012

Nvidia intends to do that with Project Denver/Boulder (2014) with 64 bit chips, and supposedly a new Maxwell-based GPU. I think they will be making 16-core on die, but they'll also help their customers make 512-1024 core racks. They already did it with Tegra 3 with some customer from Barcelona.

Applied Micro also intends to introduce a 128 core SoC in 2014:

http://www.apm.com/products/x-gene

mirashii · on Dec 28, 2012

Actually, video encoding for most codecs is a task which doesn't lend itself to parallelism due to temporal encoding. Because each frame depends on the last one, you really don't gain much from going to multiple cores (at least yet).

halter73 · on Dec 28, 2012

But couldn't you just have each core start at a different key frame?

miahi · on Dec 28, 2012

To start encoding at different times with each core you would need a really fast (read: expensive) storage (every core will need a steady stream to compress, if you have 64 cores you need a storage that can handle 64 parallel requests really fast or a lot of RAM to cache everything) and the chips have to handle really heavy I/O. The current many-core ARMs only have a small amount of dedicated RAM for each core.

Also, keyframes are not known from the start in many encodings. A keyframe is inserted when the difference between a frame and the next one is big; this can only be detected when encoding, or with a multi-pass encoding.

rorrr · on Dec 28, 2012

Even cheap modern SSDs can handle 400-500 MB/sec reads. You can consume 500 MB chunks of video into RAM and eliminate I/O bottleneck completely.

Video encoding is CPU-bound, not I/O bound.

aemerson · on Dec 29, 2012

It would be nice, but maintaining a coherent cache is a huge challenge at 100+ cores.