"No FPGAs, no microcontrollers, just discrete logic"
Very hard to believe that. .. How? Is it a SUBLEQ-style Turing tarpit that uses a very high clock and simple hardware to run RISC-V "in software" at a slower clock? Do the discrete logic gates get woven into an ad-hoc FPGA?
Considering the breadboard CPUs I saw a few years ago, it just doesn't look like enough board space... I could be wrong.
A few years ago I considered trying to build a simple 32-bit RISC processor out of 74xx series logic chips. I had just taken parts 1 and 3 of the the MITx MOOC 6.004x, "Computation Structures".
Part 1 covered CMOS, combinatorial logic, sequential logic, FSMs and stuff like that, and performance considerations, and culminated with designing a 32-bit ALU, which did add, sub, all 16 logic ops, compares, and shifts (logical or arithmetic) of 0 to 31 bits in either direction.
In part 2 the students design a 32-bit processor and implement it in the simulator. The design is all at the gate level, except we were given two things as black boxes: (1) a register file of 32 32-bit registers, and (2) a ROM with 64 18-bit words.
Part 3 adds caching and other performance.
Here was the parts list I came up with for my design, not counting whatever it would have to do the 32x32 register file and the 64x18 ROM. In the follows the name of a logic element (NOR, MUX, etc) followed by a number means an element with that number of inputs. E.g, NOR2 is a 2 input NOT gate, and MUX4 is a 4 input multiplexor. DREG is something that can store 1 bit, which would probably be a D flip flop.
That came out to around 350 chips. My biggest breadboard could hold about 32 chips, so I'd need around 11 or 12 of those, plus whatever more would be needed for the register file and ROM.
353 of those 563 MUX2s are in the shifter in the ALU, which can handle left or right arithmetic or logical shift by 0 to 31 in one clock cycle. If I added a new instruction that just does a logic right shift by 1 and made the old shift instructions all trap so they can be handled in software that would get rid of most of those 353 MUX2s.
That would cut the chip count to around 270. That was still more than I was willing to deal with so that was the end of that.
Given that with a PCB instead of a breadboard you'd get higher density, I think mine would have easily fit (even with the full shifter) in the amount of space that it looks like they are using with plenty left over, if I'm estimating the size of their boards right from the photos, so I don't see anything obviously implausible about their project.
> If I added a new instruction that just does a logic right shift by 1 and made the old shift instructions all trap
You can use microcode for that. Also microcode can be used to implement multiply/divide by adding/subtracting in a loop. Microcode is very bad for performance, but in a DIY project it can save lot of chips. Microcode was used in 60s and 70s computers.
And the simplest microcode sequencer is made of 1 (one) register and 1 (one) ROM. Although, for a RISC-V you would probably need more than 1 ROM (microcode is usually very wide, 16- or more bits).
I think now they use it because they can implement the bajillion x86 instructions in terms of much simpler and more efficient micro-ops, without implementing every single x86 instruction completely uniquely.
For counting chips, remember that each 74xx chip typically has multiple gates. For example, one of them has four AND gates, and another has six inverters.
Sometimes, many gates! The 74xx series shows the progress of semiconductor technology over time. Later 74xx series chips are MSI. Oh, that term is quite antiquated now isn't it? Medium-scale integration. (Just about everything is far beyond VLSI these days. Very large scale integration.)
The 7400, the first device in the series, was just four NAND gates in one IC. NAND is the easiest function to implement in TTL (and most other logic families). The somewhat later 7476 is a dual JK flip-flop. It takes eight NAND gates to make a JK flip-flop, so a 7476 is about four times denser than the 7400. And devices like the even later 74161 4-bit synchronous binary counter have many dozens of gates [1], being roughly four times as much integration again over the 7476.
The 7400, 7476, and 74161 were at the cutting edge of integration in 1963, 1966, 1969, respectively.
I built a simple 16-bit RISC processor with a custom instruction set during my engineering coursework...inside a Xilinx Artix-7 FPGA. It was still a lot of work.
> Given that with a PCB instead of a breadboard you'd get higher density...
I can see the appeal of wanting to do this with individual gates on a PCB instead of using Verilog (you can draw bare gates and let Vivado synthesize that too), but you'd have to be crazy to want to do this with breadboards! I can only imagine the frustration of dealing with signal integrity across the ratsnest of individual wires. Which connection looks good but has a loose spring terminal? Yikes! Soldering on perfboard or old-school wire wrap would seem much more reliable, and PCBs are so cheap now that I'd go in that direction for sure.
There's a logisim simulation of the uarch. Assuming that matches the boards, it looks like a relatively straightforward pipeline-less implementation of a RISC.
And it'd be much larger on a breadboard. Breadboards don't really allow your wiring and connections to be nearly as dense as even two layer boards. This project seems doable to me with relatively high scale integration chips handling a lot of heavy lifting, and maybe an EEPROM or two as a poor man's PLD for certain kinds of combinatory logic.
> and maybe an EEPROM or two as a poor man's PLD for certain kinds of combinatory logic.
The entire ALU and Control Unit of the CPU are programmed in EEPROMs. The ALU uses 7 ROMs [1], the Control Unit uses 3 ROMs [2], the program counter uses 5 ROMs [3], the bit shifter uses another ROM [4], so I see 16 EEPROMs in total. Thus, hundreds if not thousands of logic gates are replaced by bitstreams in EEPROMs. This CPU certainly is based on the design philosophy of "EEPROMs as poor-man's FPGA" (not meant to be interpreted in a discouraging way, I found this design is still rather interesting).
Yeah that is cool, because I hear that the tooling for FPGAs is still very proprietary. And it doesn't do much good if one's CPU is "Click the CPU button on the FPGA GUI"
EEPROMS, I assume, you can build your own programmer pretty easily?
> EEPROMS, I assume, you can build your own programmer pretty easily?
Programming a EEPROM by bitbanging it from a microcontroller is a basic programming exercise. First, send a special unlock sequence according to the datasheet (usually with a lot of 0x55 0xAA). Next, put all the address bits on the address lines, strobe that in via a control line. Finally, put all the data bits on the data lines, strobe that in via a control line. You only need around 100 lines of code to create a basic programmer (although making a universal one capable of programming all existing models on the market would be non-trivial, as it requires a massive look-up table similar to the one in "flashrom").
I recently contemplated using a CPLD for a project and discovered that (at least the ones I was able to find suppliers and documentation for) are even less open source friendly than FPGAs are, none of the CPLDs I found seemed to have any open source tooling that I could find.
I would love to be corrected on this because other than proprietary tooling that only supports windows … a CPLD seemed perfect for this project.
From the picture, the logic chips are all in SOIC packages. The use of surface-mount components with 4-layer PCB should already significantly boost routing density compared to a breadboard with DIP chips. All the chips can be tightly packed together.
Furthermore, both the ALU and the Control Unit are entirely in EEPROMs. The ALU uses 7 ROMs [2], the Control Unit uses 3 ROMs [3], the program counter uses 5 ROMs [4], the bit shifter uses another ROM [5], so I already see 16 EEPROMs in total. This means all the discrete components needed for random logic are largely eliminated, consolidating possibly hundreds (or thousands?) of gates into just a few chips and some lookup tables to program. In fact anther maker already demonstrated that it's sufficient to design a functional CPU entirely using RAM and ROM with just 15 chips in total. [6]
Programmmers usually think ROMs as data storage devices, but they are also the most rudimentary form of programmable logic, as they transform x-bit of address inputs into arbitrary y-bit data outputs, so they can implement arbitrary combinational logic. In fact, lookup tables are the heart of modern FPGAs. As a result, you may argue that this means any ROM-based design has ad-hoc FPGAs (especially when EEPROMs are so large after the 1980s, 64 K for 16-bit chips). But the use of Mask ROMs and PLAs in Control Units has always been a legitimate and standard way to design CPUs even back in the 70s, so I won't call it "cheating" (and using ROMs for ALUs or Control Unit wouldn't really be much different from using a pre-made 74181 or AMD Am2900 anyway).
Thanks for pointing this out.
In the olden days it used to be fairly common to use eeproms (or just proms) with a few latches as a state machine. This way it was possible to accomplish many of the things you'd need a CPU for, but without needing one at all.
It's not too far to add an ALU and some control flow after that, hehe.
The use of microcode in CPUs began like this too. Originally, people replaced random logic in a CPU's Control Unit with a Mask ROM or a PLA. The goal was simply to avoid the trouble of building decoders from gates one by one. Although it's a form of programmable logic, few would describe these simple ROM lookup tables as "programs". Later, people pushed the idea further to build a small state machine to control the ALUs, and the microcode in ROM would then be used in turn to control that state machine. This marked the birth of "micro-sequencers". In addition to the internal use in integrated circuits, they also found applications as universal blocking blocks for custom discrete CPUs in the 1980s, examples included the AMD Am2900, Am29100 and Am29300 series chips. Building a CPU became as easy as connecting these logic blocks together and writing some microcode for the micro-sequencer. Continue the development of microcode further, and you eventually get the final form, which is a CPU within a CPU, and you may argue that the CPU is not really hardware anymore, but is driven by software.
At which point does the use of microcode makes a CPU software rather than hardware? There's really no clear boundary, so the microcode always exists on the blurring line between hardware and software. This issue was also a heavily-contented subject in several lawsuits that involved cloning CPUs, as hardware is not covered by copyright, but software is.
>Do the discrete logic gates get woven into an ad-hoc FPGA?
Like literally? Wouldn't that require even more chips?
Or are you saying that the act of combining logic chips itself constitutes a 'buggy, poorly specified [FPGA]'? In which case aren't you erasing the distinction between an FPGA and the alternative.
There is no way it implements the entire RV32I base profile. Maybe it only implements the instructions that are relevant for CS students taking a computer architecture class. That is typically what these breadboard CPUs implement.
RV32I is pretty tiny. Particularly when you can squint hard enough at the spec. FENCE can be a nop. EBREAK and ECALL can be something like a halt of the machine.
Arguably, RV32I is "only the instructions that are relevant for CS students taking a computer architecture class", judging by the history in its earliest published spec [0].
Where were these kids when Tom West was forming the team for what became the Data General MV/8000?! Not many people building 32-bit CPUs out of discrete logic these days (and even in those days!).
(For those of you who don't get the reference, I highly recommend "The Soul of a New Machine" by Tracy Kidder.)
They were here all along, just didn't get the chance. This is why we need universally accessible education, an open Internet, IP laws that encourage and not hinder experimentation/innovation, etc.
Along the lines of building a full computer out of discrete logic parts, slu4 created the Minimal 64, described in his Youtube channel: https://www.youtube.com/@slu467
the video embedded in the page says over 230 discrete logic and memory devices. All SMD.
So the key, as someone else commented, is that there is a _lot_ of stuff going on in the eeproms. I don't think it means this is any less of an amazing result
Is a vertical stack like that really the best? Even if you are using large discrete chips like NAND gates or multiplexers, those connectors and 0.1" headers obviously take up so much space.
And then the parasitics... Connectors like that will have resistance, capacitance and inductance that grossly complicates the flow of electricity.
I mean, PCB traces also have parasitic elements, but we humans are better able to understand a microstrip transmission line than... well.... Whatever is going on in that vertical stack.
It's impressive nonetheless and a show of good work. My immediate revision would be maybe mounting the board vertically and running fewer layers. Some connectors and 0.1" pin headers are useful, but you really shouldn't have this meany IMO.
At a 500kHz clock rate, with reasonable rise/fall times, the electrical effect of some 0.1" headers is minuscule. "Microstrip transmission lines" are enormous overkill.
On a PCB, you "can" or you "may" perform advanced transmission line analysis upon your PCB traces. I don't think that a project like this needs much more than a 2-layer board (so microstrips are definitely overkill), but that doesn't change the fact that PCB parasitics analysis is basically a solved problem today and available on a lot of software packages.
Anyway, my overall point is that large contiguous blocks of PCB are easier than thinking about connector issues (or for the matter: being forced to solder and manage all those connectors or headers).
Not only is it easier to go header-free (and use a larger PCB), its better engineering due to tighter specs and analysis available.
------------
Now a bit of connector doodads and having fun is probably fun and all. But a stackup of 9x PCBs is reaching the point of absurdity. I can't think of any good engineering reason to have so many connectors.
That's 9-separate PCBs on this design, meaning at a minimum, 9-separate ground planes on this design. Is that... good engineering? I don't think so. Its horribly complex but for seemingly no appreciable reason. Maybe sound / analog circuits go on a separate board, but I'm not seeing any sound here...
------------
That being said: I agree that at 500kHz maybe this level of analysis is not needed. But... on the other hand... these 74xxxx chips all have 30ns rise/fall times, so its not outside the realm of reality to be running this computer at 5MHz or 10MHz or so, 10x to 20x faster. At those higher speeds, you'll possibly need to think about these issues a bit more (or perhaps... more appropriately I should say... the author of this project "could have" aimed at a higher-speed target if they so chose)
Unless you want to get a beautiful spectrum on a EMI/EMC test receiver, but it's certainly not relevant in this project... Building a discrete CPU at some MHz and passing modern EMI/EMC tests sound like a fun (and somewhat expensive [1]) nerd-snipping project to do.
[1] The expense of EMI/EMC tests at an actual lab is well-known. Doing it in a home lab at a much lower expense is possible with pre-compliance tools like a TEM cell, a broadband antenna and a spectrum analyzer, but these equipment still costs a few thousand dollars.
If he uses HC logic and is careful to keep traces pulled back from the edge of the pcbs, and also carefully deals with the interconnects (those might be the killer) then he could probably pass EMC. Maybe some series termination on the clock tree.
At the speed it's running, the edge rates of the IO drivers and the input/output of the entire thing are the problem.
A simple improvement of the interconnect is using two rows of pin headers, one rows for signals and another rows for dedicated grounds. This should significantly reduce the loop area. I once did a quick simulation and found that even controlled impedance is somewhat possible with 1.27 mm headers, although I haven't done any experiment with it (yet)... Another possible modification is converting all I/Os to differential signaling before they leave the board, a classic use is eliminating the radiation from long ribbon cables. Ribbon cables with ground planes also do exist if one's willing to pay...
That's intetesting but this board stack is more intetesting to me because I want to do that to build my own 8086 system. How can that handle the even higher frequencies of a RISC-V chipset? Oh... It can't... 500kHz. I see. How much better can we do, practically?
Everyone that comments here, please note a lot of this is based on the work of a high school kid from Czech Republic. That's alone worth upvoting!
I am currently reading "Gaming the Iron Curtain" (https://ironcurtain.svelch.com/). It's about the incredible innovations happening in the Czech Republic before the fall of the wall and what people there did to participate in the computer revolution. This kid seems like he comes from that heritage!
I think it’s fair enough in this case to say “modern” probably means “modern programming languages can compile to and run on it”
There’s some question in the comments here about how much of the ISA is actually implemented but it should theoretically be possible to write Rust code and run it on this thing, for example. There are many other toy CPU designs out there which are much more limited in terms of what can compile to run on them.
Architecture-wise, even a regular modern CPU isn't that hard. For example, Berkeley has the BOOMv3 core[0] which is performance-competitive[1] with commercial chips taped out in the last few years. I think commercial chips are faster because of improvements in analog design, and not some super special architecture sauce (although I'm sure there's some special sauce -- it's probably not the defining factor).
I disagree. Modern refers to using the learnings and improvements that have been made over the last few decades of designing CPUs. To me including 2 features doesn't necessarily make a design modern.
My point is to separate the absolute toys (CPUs which are on a level with the 6502) from the somewhat more complicated designs that can be used to demonstrate fundamental concepts beyond just decoding and executing instructions.
Very hard to believe that. .. How? Is it a SUBLEQ-style Turing tarpit that uses a very high clock and simple hardware to run RISC-V "in software" at a slower clock? Do the discrete logic gates get woven into an ad-hoc FPGA?
Considering the breadboard CPUs I saw a few years ago, it just doesn't look like enough board space... I could be wrong.