Note that this core runs at ~100 MHz to get equivalent performance (cycle accura...

ajross · on Feb 19, 2016

Exactly. This is a "old is new" kind of design. Back when logic was so expensive as to make computers almost impossible, designs were build around a small number of execution units on a small number of buses and a comparatively compact microcode table that could share them sequentially. So you only needed one adder because the IP increment and ALU operation could share it on different clocks.

Then once there was space to put all the stuff needed for a single instruction on the die, we found ourselves clock-limited by long logic depth and started splitting the functions out across "pipeline" stages, which begat RISC, and we've never looked back.

But all that being said: this design is totally cheating. Sure, the logic takes only 308 LUTs. But the microcode is stored in 4 block RAMs, which a quick Google tells me are 36kbit a piece. That's a much more significant chunk of chip resources than is implied in the linked article.

abortz · on Feb 19, 2016

Perhaps in terms of gates, but LUTs represent a much larger fraction of resources on an FPGA than the block RAMs. In other words, if you wanted to pack a lot of these onto an FPGA you'd run out of LUTs before block RAM.

ajross · on Feb 19, 2016

I'm too lazy to look up numbers for the Kintex-7 part in question, but I'm almost certain you're wrong on this. A block RAM is a big chunk of die, and there are comparatively few of them to go around. A LUT is a tiny object (comparable in computation power to 10-50 dedicated transistors) and there are hundreds of thousands of them on the FPGA.

I'm willing to bet lots that 4/n_block_ram > 308/n_lut.

ajross · on Feb 19, 2016

Yeah, I went and looked up. The details are hairy, becuase Xilinx. But a KC7K410T part, which is roughly their mid-range offering has 63550 slices, where IIRC a slice has two LUTs (pay no attention to their "logic cells" number -- that's a normalized thing scaled so as to be linear with the old 4-input LUT design from long ago) and 28620kbit of block RAMs, where each block is 36kbit.

So that design uses 308/(2*63550) =~ 0.2% of the logic resources on the FPGA, but 4/(28620/36) = 0.5% of the RAM.

Not nearly as imbalanced as it sounded to me originally, but still: the LUT numbers are spun by more than a factor of two. The design is more closely equivalent to "640 LUTs". Which interestingly is very comparable to the equivalent transistor count on the original part from Intel.

abortz · on Feb 19, 2016

Yes, you were correct. I was working from the bulk stats on the chip and mixed up Kb and KB ;-)

sklogic · on Feb 19, 2016

Block RAMs are 2-port. You can up it to 4 ports by doubling their clock rate, if the rest is merely crawling on 100mhz. Yet, each core would eat up one port for reading its microcode every clock cycle, so you can only have up to 4 cores for each group of 4 brams with stored microcode.

rasz_pl · on Feb 20, 2016

The same big microcode resulting in small use of logic resources trick is also often used in software emulation. For example this classic Bisqwits NES in C++ with extensive explanation:

https://www.youtube.com/watch?v=QIUVSD3yqqE&t=4m40s

PeCaN · on Feb 19, 2016

Not just "relatively small"—positively tiny. You could fit hundreds of these on a single FPGA.

dingdingdang · on Feb 19, 2016

Yes, I was thinking the same thing: couldn't this be made into say a 500 core 8086? And if so, would this thing then be parallel programmable? I mean.. sort of fascinating.. though probably in an idle sort of way since the speed penalty of using cores that slow would almost definitely wipe out the parallel gain?

cantankerous · on Feb 20, 2016

Routing remains an issue when you scale up to that many IP cores. It's one of those aspects of FPGA design that doesn't really have an analog in software.

sklogic · on Feb 19, 2016

Yes, that's the most impressive bit. My best efforts in building a heavily microcoded tiny 16bit core resulted in a minimum of 700 ICE40 cells. Now I know that it's possible to go even further down.