> I'd rather target 4 cores with 16-way SIMD than 64 cores each with scalar
You're assuming problems that are suitable for SIMD. If you have problems suitable for SIMD, use a GPU. Lots of problems are NOT suitable for SIMD.
If those 64 data streams all happen to require branches regularly, for example, your 4x 16-way SIMD is going to be fucked.
> Besides, this is 28 nm technology and 15x15 mm, no?
Where did you get that idea? Their site states 2.05mm^2 at 28nm for the 16 core version. 0.5mm^2 per core.
So by your math, more like ~26M transistors, or ~1.6M per core. Your estimated die size is 70% larger than what they project for their future 1024 core version...
This is a ludicrous argument when arguing for a GPU architecture instead. A GPU architecture gets affected far worse for many types of problems, because what is parallelizable on a system with 64 general purpose may degenerate to 4 parallel streams on your example 4 core 16-way SIMD.
There are plenty of problems that do really badly on GPU's because of data dependencies.
> when I can have ~900 brawny cores
Except you can't. Not at that transistor count, and die size, anyway.
> NVIDIA and AMD have been exploring this space for almost a decade now and to start over without considering what they may have gotten right and what they have learned while doing so seems a little daft to me.
Have they? Really? They've targeted the embarrassingly parallel problems with their GPU's, rather than even try to address the multitude of problems that their GPU's simply will run mostly idle on, leaving that to CPU's with massive, power hungry cores and low core count. I see no evidence they've tried to address the type of problems this architecture is trying to accelerate.
Myabe the type of problem this architecture is trying to accelerate will turn out to be better served by traditional CPU's after all, but we know that problems that don't execute the same operations on a wide data path very often are not well served by GPUs.
That said, this is where the R&D done by AMD and NVIDIA have expanded what is amenable to running on a GPU. Specifically, instructions like vote and fast atomic ops can alleviate a lot of branching in algorithms that would otherwise be divergent. It's not a panacea, but it works surprisingly well, and it's causing the universe of algorithms that run well on GPUs to grow IMO.
What I worry about with Parallela is that by having only scalar cores, and lots of them, it has solved issues with branch divergence in exchange for potential collisions reading from and writing data to memory. The ideal balance of SIMD width versus cores count is a question AMD, Intel, and Nvidia are all investigating right now. But again, ~26M transistors - no room for SIMD...
You're assuming problems that are suitable for SIMD. If you have problems suitable for SIMD, use a GPU. Lots of problems are NOT suitable for SIMD.
If those 64 data streams all happen to require branches regularly, for example, your 4x 16-way SIMD is going to be fucked.
> Besides, this is 28 nm technology and 15x15 mm, no?
Where did you get that idea? Their site states 2.05mm^2 at 28nm for the 16 core version. 0.5mm^2 per core.
So by your math, more like ~26M transistors, or ~1.6M per core. Your estimated die size is 70% larger than what they project for their future 1024 core version...
Source: http://www.adapteva.com/products/epiphany-ip/epiphany-archit...
> it'll get smashed by Amdahl's Law
This is a ludicrous argument when arguing for a GPU architecture instead. A GPU architecture gets affected far worse for many types of problems, because what is parallelizable on a system with 64 general purpose may degenerate to 4 parallel streams on your example 4 core 16-way SIMD.
There are plenty of problems that do really badly on GPU's because of data dependencies.
> when I can have ~900 brawny cores
Except you can't. Not at that transistor count, and die size, anyway.
> NVIDIA and AMD have been exploring this space for almost a decade now and to start over without considering what they may have gotten right and what they have learned while doing so seems a little daft to me.
Have they? Really? They've targeted the embarrassingly parallel problems with their GPU's, rather than even try to address the multitude of problems that their GPU's simply will run mostly idle on, leaving that to CPU's with massive, power hungry cores and low core count. I see no evidence they've tried to address the type of problems this architecture is trying to accelerate.
Myabe the type of problem this architecture is trying to accelerate will turn out to be better served by traditional CPU's after all, but we know that problems that don't execute the same operations on a wide data path very often are not well served by GPUs.