What made the 1960s CDC6600 supercomputer fast?

dwheeler · on Feb 15, 2020

I used and programmed the 6600, including in assembly language. They were incredibly fast for the time at numerical calculation. I used them for electronics simulations in SPICE, and they were great for that.

However, they had 60 bit words and no way to address data directly within a word. By convention, characters were six bits long, stuffed in 10 characters to a word. So while this machine was incredibly fast for its time for numerical calculation, it was painful to do text manipulation. You had to pack and shift characters into words, and unshift and unpack. You could do interesting things with great cleverness, but it took a lot of work to do simple things.

Thanks for the trip down memory lane.

tpmx · on Feb 15, 2020

Cool!

This expands on the text processing issues:

https://www.museumwaalsdorp.nl/en/history/computerhistory-ba...

"There was no byte addressability. If you wanted to store multiple characters in a 60-bit word, you had to shift and mask. Typically, a six-bit character set was used, which meant no lower-case. These systems were meant to be (super)computing engines, not text processors! To signal the end of a text string, e.g. a sentence, two different coding techniques were invented. The so-called 64 character set was the CDC-default. A line end comprised of two (or more) null-“bytes” at the end of a word followed by a full zero word. The 63 character set, quite popular in the Netherlands and the University of Austin, Texas, signalled the line termination by two (or more) null-“bytes” at the end of a 60-bit word.

The Michigan State University (MSU) invented a 12-bit character set, which was basically 7-bit ASCII format with five wasted bits per character. Other sites used special shift/unshift characters in a 6-bit character set to achieve upper/lower case."

mark-r · on Feb 15, 2020

I had my own text library where I converted to/from ASCII internally. There was nothing special about the 6 bit boundaries, so you could use any number of bits per character that you wanted until it was time to interact with the rest of the system. By the time I used it they had extended the character set to include lower case by using a special prefix character.

tpmx · on Feb 15, 2020

Why were you doing text processing on a "scientific computer" in the first place? :) Sounds like there may be an interesting story there, maybe.

dwheeler · on Feb 15, 2020

I do have a story, I hope it's interesting.

The CDC 6600 was used as the student mainframe for George Mason University (GMU) in Northern Virginia in the 1980s. For engineering classes (e.g., electronics engineering) the 6600 was still excellent; it could run simulations far faster than many systems that had been built later, and it was certainly faster at that task than the personal computers of that early time (Apple //es or the original IBM PC). People also used the 6600 for writing text, compiling, etc. The computer was a terrible match for that, but it was fast enough that the mismatch of capabilities still made it useful for the purpose.

Oh, and a quick aside: Today's computers are much faster, but much of that speed is taken away by software that's more functional and less efficient. I once shared an IBM PC (4.77MHz) among 16 users. If your computer software is generally efficient, and you're willing to reduce what it does, a single computer can serve a remarkably large number of users. Nobody worried about font selection, for example. So it was with the 6600; it could serve many users, if carefully managed.

Now for the story, which I can't confirm but I believe to be true. At the time GMU was a scrappy new local university. It was in a great location (Northern Virginia, which was expanding rapidly). However, although it was a state university, it had little support from the capital (power is divided among counties, and the rural counties easily outvoted Northern Virginia).

GMU's president, President Johnson, had much bigger plans for GMU. So Johnson arranged for a hand-me-down supercomputer (the 6600) for nearly nothing. This computer was considered obsolete by then, and it was terrible at text processing. Even so, it was still fast enough to be useful, and more importantly, it was a supercomputer, even though it had been obsoleted. My understanding is that Johnson sold the pitch of "GMU has a supercomputer now" to all the local businesses, asking them to help fund GMU & cooperate in various ways. The local businesses did, greatly so.

I suspect most of the businesses knew the "supercomputer" wasn't a current speed champion, and as I far as I know no lies were told. But that wasn't the point. The pitch "we have a supercomputer" was good enough to make it much easier for some people in various businesses to justify (to their cohorts) donating to GMU & working with GMU. Many businesses in Northern Virginia wanted GMU to succeed, because they would greatly benefit long-term with a good university in their backyard... they just needed a good-enough story to win over those who might veto it internally. This "we have a supercomputer" pitch (and other things) worked. Once GMU got "real" money, they invested it, e.g., they upgraded (and kept upgrading) to better equipment. GMU grew quickly into quite a powerhouse; it's now the largest state university in Virginia. One of GMU's distinctives is the huge number of connections it has to local businesses and government organizations. Originally this was because GMU couldn't rely on state funding, but the need to interconnect with local organizations led over time to an emphasis on applying academic knowledge to real-world problems. It's interesting to see how problems + scrappiness can lead to long-term cultures within an organization. Johnson passed away in 2017 ( https://www2.gmu.edu/news/427091 ), but his legacy continues.

mark-r · on Feb 15, 2020

The 6400 at the University of Minnesota could time-share over 300 simultaneous users.

flomo · on Feb 16, 2020

Gopher, not csci. The University certainly did promote their Cray supercomputing connection for local business support, just as GP describes. Cray computer was the cover model for the coursebook, while us plebs really got to timeshare on a BSD VAX :)

tpmx · on Feb 15, 2020

Love it. :)

dwheeler · on Feb 15, 2020

Thanks! I think it's important to note that to my knowledge, no lies were told.

Those in organizations who delved into the details found that yes, it's a supercomputer. It's a really obsolete one. But it is more capable than the PCs. More importantly, it showed that the university was very resourceful, and imagine what it could do if it got real money! In addition, having a good university next door was attractive, but only if there was a plausible path to get there.

But that was a complicated story to tell, so this whole thing provided a simpler story: "They have a supercomputer". All big organizations have bureaucratic inertia, and this simpler story was useful for people who didn't want to go into the details but might veto something.

My wife calls this "theater", and that's a good word. In this case, it was theater that helped counter bureaucratic inertia.

GMU took the few resources it had, and did a lot with them. People saw that, gave them real resources, and GMU quick grew into a powerhouse. I think that's an interesting story, and the 6600 played a part in it.

monocasa · on Feb 15, 2020

To be fair a 6600 was a great choice too to have students learn on at the time. It's basically a Cray-0, and would be representative of the architecture of supercomputers up through the mid/late nineties.

Hell, at the time, given the choice between a Cray and two 6600s, for students I'd lean two 6600s.

mark-r · on Feb 15, 2020

It was the general student computer at the University of Minnesota, and the uses it were put to were all over the map. Despite being optimized for number crunching it was an amazing general purpose computer.

The most interesting architectural feature was that all I/O was relegated to peripheral processors so that the main CPU could run unimpeded.

Gibbon1 · on Feb 16, 2020

I think UCLA had a CDC 6600 being used as a time share system. My memory is very hazy though. We used it remotely via 150 baud terminals. On hot days occasionally bits would get scrambled on the way there and back.

10 PRINT "YOUR MOMMA" came back as 10 PRINT "KOUR IOMMA"

BadThink6655321 · on Feb 15, 2020

Supercomputers need compilers, et. al. And now I need to go back and revisit the Pascal compiler for those machines...

dfox · on Feb 15, 2020

The toolchain does not have to run on the supercomputer itself. Most supercomputer architectures have self-hosting toolchains there are also supercomputers that do not. Also compiling or even debugging programs directly on the machine is in most cases plain waste of (expensive) computing resources and it is not that one would ever have only the supercomputer and not any other computers (in fact, many traditional supercomputers cannot boot on their own and have to be booted by some kind of frontend computer).

kps · on Feb 15, 2020

> many traditional supercomputers cannot boot on their own and have to be booted by some kind of frontend computer

CDC went all in on this. Their large computers had ‘peripheral processors’ (for the CDC6600, based on the CDC160) that essentially ran the OS, leaving the main processor free for what it was good at.

monocasa · on Feb 15, 2020

You'd be surprised how true that is today too.

The Wii and WiiU run most of the "OS" on an external ARM core "Starlet"/"Starbuck". All I/O, network access, encryption for SSL, booting the main cores, the USB stack, etc. is on that ARM core, not the main PowerPC cores so those can be dedicated to running "game code".

The Cell in the PS3 is a SPI slave that gets booted by an external processor.

The PS4 is the same way, and that external core holds most of the filesystem (how game updates happen with the console "off").

And then most SoCs (including most AMD and Intel chips) boot system management cores (ME/PSP/etc.) that then is responsible for initializing the rest of the core complexes on the chip. Pretty much every ARM SoC sold these days will talk about how they have a CortexM3 in addition to they CortexA cores; that's what it's for. SiFive's Linux capable chip has one of their E series cores in addition to their U series cores for the same purpose on the RISC-V side of things.

sjburt · on Feb 16, 2020

> Pretty much every ARM SoC sold these days will talk about how they have a CortexM3 in addition to they CortexA cores; that's what it's for.

Usually the advertised-on-the-datasheet M cores are available for user code and you'll get a virtual serial port or some shared memory to go between them and the big core. I don't doubt that there are additional hidden cores taking care of internal power management, early boot etc.

At least, this is how it is on the STM32MP1 and the TI Sitara AM5 socs.

mark-r · on Feb 16, 2020

That overstates the case a little. The peripheral processors didn't run any user code, in particular the compilers still used the main processor.

BadThink6655321 · on Feb 15, 2020

You are confusing theory with practice. Back then, computers were expensive and rare. The general student population at my university had two choices: the CDC 6400, or an HP time-sharing system that ran BASIC. A friend and I actually wrote a complete toolset in BASIC that allowed students to learn HP-2100 assembly language. (I did the editor and assembler, he did the emulator and debugger). But writing a PASCAL cross-compiler in BASIC, that output a paper tape of COMPASS, or binary? No way. Or FORTRAN, SNOBOL, Algol, ...

mark-r · on Feb 20, 2020

I learned FORTRAN on a HP 2000C timesharing system, using a FORTRAN emulator written in BASIC. It was dog slow, but it worked. I have no idea where the emulator came from.

tpmx · on Feb 15, 2020

Did they also have that at launch time(s), back in the 60s/70s?

B1FF_PSUVM · on Feb 15, 2020

I believe so, the comp. arch. textbooks were pretty emphatic on the description of the CDC 6600 as "full of peripheral processors", e.g. for I/O and printing, etc. Deliberately, not something tacked on later as an afterthought.

dfox · on Feb 15, 2020

I cannot find any information about whether one of the peripheral processors in CDC 6600 (which were full-blown CPUs, not glorified DMA engines as in Cray-1 or System/360) has some kind of system management role. On the other hand Cray-1 needs not one, but two frontend computers to work (one is DG Nova/Eclipse supplied by Cray which actually boots the system and second one has to be provided by customer and is essentially an user interface)

vajrabum · on Feb 17, 2020

The peripheral processors were integral to the CDC 6600 and it's successors (6400,6200,6700,7600, and Cyber 70 series) built inside the same mainframe cabinet. In the 6000 and Cyber 70 series There were '10 of them' that shared the same ALU with a barrel shifter that would shift 12 bits after each instruction. That shift would load the registers for the 'next PP' in a round robing fashion. They were pretty primitive. There were no index registers so self modifying code was a regular thing and polling was the only method of IO supported at least at first. I think the later models did support some sort of DMS. The PPs did have access to the 60 bit main memory and there was an instruction exchange jump or XJ which would load the register block and switch between user and supervisor modes.

pinewurst · on Feb 16, 2020

What do you mean? The CDC OSes actually ran on the PPs and for all intents and purposes managed the system. The two-headed video console was hardwired to a PP as well, and used to managed the system.

tpmx · on Feb 15, 2020

That makes a lot of sense. I bet a lot of these "scientific" machines ended up primarily being used for software development...

http://www.lysator.liu.se/hackdict/split2/pascal.html

> Pascal n. An Algol-descended language designed by Niklaus Wirth on the CDC 6600

Someone · on Feb 15, 2020

”The 63 character set, quite popular in the Netherlands and the University of Austin, Texas, signalled the line termination by two (or more) null-“bytes” at the end of a 60-bit word.”

So, did Dijkstra invent the null-terminated string? (What else links Austin and the Netherlands in computing at that time?)

eesmith · on Feb 15, 2020

I can safely conclude that the content at the site cannot lead us to conclude that Dijkstra invented the nul-terminated string.

https://www.museumwaalsdorp.nl/en/history/comphistory/comput... says:

> A Control Data 6400 system was installed in May 1974 ...

> The Laboratory, like all other computer centres in the Netherlands, had opted for the so-called 63 character set. Control Data only tested systems with the 64 character set in America. Unsatisfactory code or code from “new” programmers yielded one or more errors with almost every new release, which we corrected at the TNO Physics Laboratory and made public with a lot of misgivings through the Problem Reporting System mechanism (PSR). Every two weeks a set of microfiches with all complaints and solutions collected worldwide was sent to all computer centres by Control Data. At every release level, it was exciting whether the errors we found were the first to report or that our colleague from the University of Arizona was going to blame

That gives a mismatch between "University of Austin, Texas" and "University of Arizona".

https://en.wikipedia.org/wiki/Edsger_W._Dijkstra says he joined the University of Texas at Austin in 1984. In the 1970s he lived in the Netherlands.

https://en.wikipedia.org/wiki/Null-terminated_string says that the PDP-11 in 1970 had NUL-terminated strings ("Null-terminated strings were produced by the .ASCIZ directive of the PDP-11 assembly languages and the ASCIZ directive of the MACRO-10 macro assembly language for the PDP-10.")

This was when Dijkstra was at the Eindhoven University of Technology.

Therefore, whatever is described at the link cannot be used to conclude that Dijkstra had anything to do with nul-terminated strings.

Someone · on Feb 16, 2020

Thanks. I searched a bit further, and found http://www.cs.utexas.edu/users/EWD/transcriptions/EWD04xx/EW..., where Dijkstra critizises how line endings were stored in the 64 character set:

”Niklaus told a terrible story about CDC-software. With 10 six-bit characters (from an alphabet of 63) packed into one word, CDC used the 64th configuration to indicate "end of line"; when for compatibility reasons a 64th character had to be added, they invented the following convention for indicating the end of a line: two successive colons on positions 10k+8 and 10k+9 —a fixed position in the word!— is interpreted as "end of line". The argument was clearly that colons hardly ever occur, let alone two successive ones! Tony was severely shocked "How can one build reliable programs on top of a system with consciously built-in unreliability?". I shared his horror: he suggested that at the next International Conference on Software Reliability a speaker should just mention the above dirty trick and then let the audience think about its consequences for the rest of his time slice!”

I don’t think that

jhayward · on Feb 15, 2020

> What else links Austin and the Netherlands in computing at that time?

It could be related to the fact that UT Austin wrote and maintained their own OS (UT-2D) for the CDC/CYBER architecture, and maintained it well in to the late 1990's/early 2000's.

tpmx · on Feb 15, 2020

> So, did Dijkstra invent the null-terminated string?

Seems plausible to me. Or maybe some grad student working with him, and noone really cared who "invented it", since it seemed like such an obvious idea....

pvg · on Feb 15, 2020

it was painful to do text manipulation

Another way to think of it is that it was fully emoji-and-variant-selector-ready way ahead of its time.

protomyth · on Feb 15, 2020

The T. J. Watson Jr memo gives a bit of insight into what the competitors thought of the CDC6600 https://www.computerhistory.org/revolution/supercomputers/10...

tomohawk · on Feb 15, 2020

Cray's response was priceless:

"It seems like Mr. Watson has answered his own question."

rbanffy · on Feb 17, 2020

Interesting. It looks typewritten, but it has proportional spacing. I'm not aware of any typewriter of the time that could do that. The Selectric Composer could, but it was released in 66 and the 6600 predates it by two years.

The different weights also make it look like a mechanical typewriter rather than an electric one, which would be an odd choice for the office of TJW.

stonogo · on Feb 22, 2020

All IBM Executive-series typewriters made after World War II featured proportional spacing.

exmadscientist · on Feb 15, 2020

Nice work! Any article that features results from a test PCB is a winner in my book!

> Let me know if you find similar devices.

The Rohm high-frequency BJTs look promising: 2SC3838K is the fastest (fT wise), but there are several others (see page C28 of Rohm's 2019 catalog). I only checked the datasheet for the one, but it's got a very nice Figure 10, showing that it'll probably switch fastest around 20mA collector current, and should be about 3x faster there than the MMBTH10L at 4mA.

blattimwind · on Feb 15, 2020

The thing is that if you want logic to go fast proper you don't use saturation logic (DTL/RTL/TTL), but current steering logic (ECL). That way your usable clocks gets much closer to the fT of the transistors involved, instead of being limited to a tiny fraction of it.

That's how Cray built supercomputers after the CDC6600/7600.

TheOtherHobbes · on Feb 15, 2020

At the cost of insane power/heat budgets - although one of the nice things about ECL is the power draw is relatively constant because the transistors don't saturate, and you don't get the spikes and PSU hash you get with TTL etc.

ECL was amazing for its time, but I'm honestly more impressed by modern PC/phone electronics. PCBs and chip designs are mass-produced commodity products clocked at microwave frequencies - sometimes with battery power.

This is incredibly impressive compared to the state of the art in the 60s and 70s. And it's taken for granted as an everyday thing.

fanf2 · on Feb 15, 2020

I recently found a 1980s Cray installation guide which has a lot more detail on the power, cooling, and other physical requirements. https://news.ycombinator.com/item?id=22284518

dfox · on Feb 15, 2020

One notable thing about the constant power draw of ECL and Cray-1 in particular is that power draw is constant enough that the logic supply is unregulated, just rectified and filtered output of 400Hz transformers (placed physically in the "bench part" of the chasis). What is regulated is the 208V/400Hz supply for the thing (produced by the motor generator shown in the article), but regulation of that IIRC involves manual turning of physical knobs during installation/maintenance and is more about compensating for unstable mains on the input side.

thedance · on Feb 15, 2020

It seems like the power consumption of this computer must have been pretty spectacular anyway. 60mW per gate is no joke.

neonate · on Feb 15, 2020

"Optimize one basic thing very well, replicate it, and use it as a hierarchical building block."

mark-r · on Feb 15, 2020

I think Cray kept that basic philosophy with all his subsequent designs.

sitkack · on Feb 15, 2020

All successful systems are self similar.

GnarfGnarf · on Feb 15, 2020

I learned FORTRAN in 1965 on a CDC 3100. Real core memory, tape drives, vacuum drum card reader, Calcomp plotter. Super slick.

I took a computer science course at Dalhousie University in 1970, used a CDC 6400. My professor was obsessed with pseudo-random numbers, the 60-bit word size was a godsend.

larusso · on Feb 15, 2020

Oh I love these detailed runs into the past. Also the fact that physics and chemistry where the main driver. Software became the most prominent figure in computing or at least that is how I perceived this growing up. I learned very late what the advances in semi conductor development really meant and how important they were.

Gibbon1 · on Feb 15, 2020

It was amusing reading that the transistor used int he CDC6600 was gold doped. Turns out that common switching diodes like the 1n4148 are gold doped for the same reason. Increases switching speed. You pay for it though, they leak like a sieve esp at higher temps.

blattimwind · on Feb 15, 2020

1N4148 are also excellent photodiodes if you don't want that.

todd8 · on Feb 17, 2020

What I remember about assembly language programming the CDC6600 was how beautifully simple the machine's principles of operation were at the register and instruction level.

In 1974 I learned CDC6600 assembly language in grad school. In comparison to IBM360 assembly language programming I had previously done, the CDC 6600 was so straightforward. It took perhaps one day to learn all of it. I still have the small book that I learned from, Assembly Language Programming for the Control Data 6000 Series and the Cyber 70 Series by Ralph Grishman.

Addressing: The machine had unusual data layout, it was word, not byte addressable and each word was 60 bits long, not 16 or 32 bits long.

Data Format: Text was stored in six-bit fields, ten per word and so the characters weren't directly addressable. Furthermore, integers were stored in 1's complement, not the more common 2's complement or sign-magnitude format. There was a single 60-bit floating point format (1-bit sign, 11-bit exponent, and 48-bit coefficient).

Speed: Floating point multiplication took 1000ns, but the 6600 could do two floating point multiplies, a floating point add, and an integer add simultaneously if coded carefully.

Memory: The memory of the 6600 was stored in a ferrite core memory and it had a maximum size of 128K words. Later, there were slower, larger memory tiers as options.

I/O: This was handled by peripheral processors that had access to the main memory and could offload data transfers to devices so that the central processing unit didn't have to handle expensive interrupts (expensive because the out-of-order execution of instructions meant that saving and restoring the state of the CPU was relatively time-consuming).

Registers: There were 8 X-registers. These are 60-bit registers that are used as the operands in the assembly language instructions. There are also 8 18-bit A-registers that are used for addressing and an additional 8 18-bit B-registers for use as loop indexes, etc.

Instructions: Opcodes are always 6 bits, there are only 71 instructions (one of the 64 possibilities in the 6-bit op code is further divided into 8 instructions). The instructions were simple:

    IX4 X5+X6   ; Integer sum of X5 plus X6 goes into X4

Such an instruction takes 15 bits, six for the opcode, three each for the three registers.

This was all a lot less to understand than the intricacies of the IBM360 principles of operation. The IBM360 of the time had instructions like TRANSLATE-AND-TEST or the SHIFT-AND-ROUND-DECIMAL. Here for example are the first four paragraphs explaining the TRANSLATE-AND-TEST instruction from the IBM 360 Principles of Operation:

> The eight-bit bytes of the first operand are used as arguments to reference the list designated by the second operand address. Each eight-bit function byte thus selected from the list is used to determine the continuation of the operation. When the function byte is a zero, the operation proceeds by fetching and translating the next argument byte. When the function byte is nonzero, the operation is completed by inserting the related argument address in general register 1, and by inserting the function byte in general register 2.

> The bytes of the first operand are selected one by one for translation, proceeding from left to right. The first operand remains unchanged in storage. Fetching of the function byte from the list is performed as in TRANSLATE. The function byte retrieved from the list is inspected for the all-zero combination.

> When the function byte is zero, the operation proceeds with the next operand byte. When the first operand field is exhausted before a nonzero function byte is encountered, the operation is completed by setting the condition code to O. The contents of general register 1 and 2 remain unchanged.

> When the function byte is nonzero, the related argument address is inserted in the low-order 24 bits of register 1. This address points to the argument last translated. The high-order eight bits of register 1 remain unchanged. The function byte is inserted in the low-order eight bits of general register 2. Bits 0-23 of rcgister 2 remain unchanged. The condition code is set to 1 when the one or more argument bytes have not been translated. The condition code is set to 2 if the last function byte is nonzero.

> ... [There's a lot more]

kgran · on Feb 15, 2020

Interesting to find this on HN as I'm reading a book on Seymour Cray and his supercomputer adventures (Charles J. Murray's "The Supermen: The Story of Seymour Cray and the Technical Wizards behind the Supercomputer"). Not too much technical intricacies there, but still an interesting read from a general/histori perspective.

dieselerator · on Feb 15, 2020

The article shows us transistor level logic circuits to study, but I think the quick answer is as the architect Seymour Cray gets the credit.

qubex · on Feb 15, 2020

Definitely.

For an insight into that great man’s life, I highly recommend reading The Supermen: The Story of Seymour Cray and the Technical Wizards Behind the Supercomputer by Charles J. Murray. I remember reading it back when I was in high-school so it’s more than twenty years old by now, but it’s still an amazing account of how those amazing people built those stunning machines.

jacobwilliamroy · on Feb 15, 2020

I hope these people are all dead now so you never have to meet them and accept that they're not as amazing as the book makes you think they are.

CodeWriter23 · on Feb 15, 2020

I met Al Marshall, inventor of Token Ring Networking, some 20 years after. I had never heard of him until the guys I was working with ran into him at NetWorld, and we went to dinner. So he wasn’t one of my heroes. I also was more a fan of non-deterministic Ethernet.

We went to dinner and the conversation started around the Buzz at the show: Shell Oil was deploying a 500 megabit network in the Dallas Area. Like a half hour and a dozen topics later, Al just blurts out “they need that bandwidth to ship around the imaging data they collect from their surveys”. It came across like he had two brains, one that was engaged in the conversation, and the other figuring out what in the hell are these guys doing with all that bandwidth. Which was a lot in the day of 14.4k modems and T1 lines. After dinner when we parted ways with Al, our conversation was all about how we thought his brain might work.

kken · on Feb 15, 2020

Some of Crays speeches are available on Youtube

https://www.youtube.com/watch?v=8Z9VStbhplQ

The Q&A section is quite interesting. His knowledge of technical details is very impressive and you can feel the tremendous amount of respect that was bestowed on him by the audience.

mark-r · on Feb 15, 2020

I met Seymour Cray, and he was definitely as amazing as you can imagine. He's also very much dead.

dboreham · on Feb 16, 2020

I held a door open for him in 1989. Unfortunately I didn't realize who he was until later.

mark-r · on Feb 18, 2020

Reminds me of a story I heard on the radio.

A guy went out to eat in a New York restaurant. A couple came in, a pale white dude accompanied by the most beautiful black woman this guy had seen. They were seated close to him. It was immediately obvious to him that they were getting much better service than him - for instance they got their food almost immediately after ordering, while he was still waiting for his. So he started to heckle them.

Fast forward to a year later. David Bowie has died, and his picture is on the front cover of every magazine. This guy finally realizes who it was he'd been heckling.

protomyth · on Feb 15, 2020

Killed by a drunk driver in 1996 just as he was starting to develop his version of a massively parallel machine. Friggin irresponsible people ruin the world.

dang · on Feb 16, 2020

Care to say more? How did you meet him? what was it like?

mark-r · on Feb 17, 2020

I was part of a small group of students from the University of Minnesota who were invited to the Cray plant in Chippewa Falls WI. The highlight was a visit with him in his office which didn't last long. He was very gracious. I don't remember much of what he said, except for the pumpkin. He pointed to a pumpkin on his desk and proclaimed that it was the Cray-3. His daughter had grown it in the garden, and knowing he was already working on a Cray-2 project decided that it should be called the Cray-3.

To give you an idea of when this was, I think they were just finishing the final tests of serial #5 of the Cray-1. They were very proud of the cooling system and invited us to touch the panels.

qubex · on Feb 15, 2020

Good morning sunshine!

kragen · on Feb 15, 2020

Burton Smith is still around, I think. My friend Norm, who worked on Stretch, died a couple of years ago. He was pretty fucking amazing.

convolvatron · on Feb 15, 2020

Burton passed a couple years ago :(

kragen · on Feb 15, 2020

Thanks. Sorry to hear it.

interrealmedium · on Feb 15, 2020

>10 MHz

>1964

That's insane. What's even more insane is that a bit over 20 years later homecomputers reached that frequency. And in the next decade they reached over 100 MHz. - Pure lunacy.

Posted from my 5 GHz homecomputer.

dwheeler · on Feb 15, 2020

> That's insane. What's even more insane is that a bit over 20 years later homecomputers reached that frequency. And in the next decade they reached over 100 MHz.

The CDCs still had a good run. This line was originally released in 1964. It started at 10MHz, but that was 60 bit words, and special floating-point systems. IIRC floating-point multiplies were only one clock cycle; if that's correct, it took 100 nanoseconds.

The Apple II came out in 1977, 1MHz, 8 bit CPU and no floating-point circuits. You had to use many cycles to do any floating point, and typically you only used 32-bit floating point (because it was painful enough there). A single 32-bit floating point multiply took 3-4 milliseconds according to: https://books.google.com/books?id=xJnfBwAAQBAJ&pg=PA26&lpg=P...

The original IBM PC came out in 1981. Its clock was 4.77 MHz. But again, that was misleading. Internally the 8086 was a 16-bit CPU but its memory I/O was only 8 bits wide. It didn't normally come with a floating-point processor. There was one, the 8087, and I think the original IBM PC had a socket for it, but it cost big $$$ and the 8087 wasn't actually available for purchase until ~6 months after the PC's release. That one could go 4-10MHz. If you bought a coprocessor, you were finally getting to somewhat similar speeds for numerical calculations... but that was 16+ years later.

abbeyj · on Feb 15, 2020

> Internally the 8086 was a 16-bit CPU but its memory I/O was only 8 bits wide.

Did you mean 8088 here?

dwheeler · on Feb 15, 2020

Yes, 8088. Thanks for the fix. The 8088 was a 16-bit CPU with an 8-bit bus. The 8086 was a 16-bit CPU with an actual 16-bit bus.

Accujack · on Feb 17, 2020

Interestingly, the original "sx" designation for Intel 386 chips (80386sx) meant the same sort of thing... the 386sx was a 32 bit chip with a 16 bit bus. The dx was 32/32.

A product generation later, Intel changed what this meant to indicate whether or not the CPU had an on-chip FPU.

B1FF_PSUVM · on Feb 15, 2020

> If you bought a coprocessor,

Those were the days you looked at the TRW multipliers (packed in 64+ pin DIPs) advertised in trade magazines like auto buffs might at V8 engines ...

Intel sold their own 8087, until it was digested into the vast expanses of CPU chips opened by scaling.

tpmx · on Feb 15, 2020

Seymour Cray had a number of fantastic early designs:

CDC-6600, 1964: 60-bit processor @ 10 MHz, 2 MIPS, 1 MFLOPS

CDC-7600, 1967: 60-bit processor @ 36 MHz, 15 MIPS, 36 MFLOPS

Cray-1, 1975: 64-bit processor @ 80 MHz, 80 MIPS, 160 MFLOPS

I think this means that if a corresponding development in video coprocessors had been taking place (there wasn't really a recognized need for them back then, as far as I can tell), the CDC-6600 could have been running Wolfenstein 3D decently well in 1964, 28 years before the launch in 1992.

And the CDC-7600 could have been running DOOM decently well in 1967, 26 years before the launch in 1993.

kragen · on Feb 15, 2020

The first commercial GPU might have been the LDS-1 from Utah in 1969, which was indeed used for FPS gaming: https://en.m.wikipedia.org/wiki/LDS-1_(Line_Drawing_System-1...

But it built on the history of “display processors” already in use: http://www.cap-lore.com/Hardware/Wheel.html

I had thought that computers of the time didn't have enough RAM for a framebuffer, but evidently the CDC6600 did, with what we would call 982 kilobytes today.

RantyDave · on Feb 15, 2020

Dedicated matrix multiply in 1969. Damn! people in the past, they were nuts.

tpmx · on Feb 17, 2020

I get this realization all of the time when reading about and exploring past innovations (it's one of my primary hobbies). The really smart people back then were just as smart as the really smart people today. It's just they had crappier tech and less knowledge to work with.

"On the shoulder of giants" is an old and very true statement.

Accujack · on Feb 17, 2020

It was originally meant as a dig at the height of some fellow scientists, really.

tpmx · on Feb 15, 2020

Yeah, I guess bitmapped "GPUs" were considered an extreme luxuary/waste of money back then.

kragen · on Feb 17, 2020

I suspect they were considered an inefficient use of memory. The TX-0 and PDP-1 had screens of about a million points. For uses like plotting data, playing video games, or drafting mechanical linkages, this resolution is highly desirable — I've done CAD on a 320×200 CGA screen, and the screen resolution was extremely limiting. (For those who haven't used one, this resolution is such that you can fit 25 lines of 40-column text on it before readability starts to suffer.)

You could imagine putting something like a 320×200 CGA on a CDC 6600, taking what we would now call 16000 bytes of memory, at a cost of around US$40 000 in 1965 (https://jcmit.net/memoryprice.htm says). But it seems like it would be hard to justify that as a good alternative to adding those 16000 bytes to the machine's main memory, supplying an extra 1.6% of its address space with storage and allowing it to tackle problems that were, say, 5% larger.

(There may also have been a question of memory speed. The memory cited above that cost US$2.50 a bit was core, and maybe it had a 300 ns cycle time; a 320×200 display at 50 Hz minimally requires a 312.5 ns dot clock, and probably quite a bit faster than that because of the blanking intervals, so you might have needed two to four memory banks just to satisfy the RAMDAC's hunger for pixels. Of course they wouldn't have called it a "RAMDAC" or "dot clock" at the time.)

RantyDave · on Feb 15, 2020

There still are GPU's that work by rendering the entire display list for every line. Yes, ie thousands of times a second. https://www.ftdichip.com/EVE/EVEProducts.htm

tpmx · on Feb 17, 2020

That's fascinating! Can you tell us more about their typical use cases?

I mean.. I did see https://www.ftdichip.com/EVE/EVEApplications.htm .. but, in reality?

Are you based out of Taiwan?

krallja · on Feb 15, 2020

Well, now I know what story is going to show up on HN next month

“Wolfenstein 3D on my basement CDC 6600”

tomohawk · on Feb 15, 2020

The Cray's were hard 64 bit machines - smallest addressable unit was 64 bits. This had a lot of implications for software and computer languages that assumed you could easily access bytes.

tpmx · on Feb 17, 2020

Ahh..

blattimwind · on Feb 15, 2020

ECL logic systems reached effective clock frequencies in excess of 500 MHz in the late 60s or so. It was extremely fast compared to contemporary RTL/TTL logic.

ajross · on Feb 15, 2020

Exactly. It was MOS that lagged, not "transistors" or "computers", really. Transistorized microwave circuits (analog, not digital) in the GHz range were operating as early as the 1960's too.

MOS is just hard. It's hard to fabricate, it's hard to design (i.e. you need computers to make computers!), it's hard to scale. It required new chemistries be invented and new circuit design techniques to manage the (really tiny!) components. So for a long time you could have a big circuit in discrete bipolar components which ran fast or a tiny one in MOS which was slow.

But MOS was tiny and MOS was cheap. So by the early 90's it had caught up in speed with all but the fastest discrete technologies (GaAs was "right around the corner" for like two decades as it kept getting lapped by routine process shrinks -- now no one remembers it) and the market abandoned everything else.

kken · on Feb 15, 2020

>MOS is just hard

I think the important point is that "MOS scales". All the bipolar technologies never had anything like Dennard scaling, which was the backbone of Moores Law.

blattimwind · on Feb 17, 2020

Case in point, you could get bipolar ECL RAM in the 80s with access times of around 1-2 ns (which is at least four times faster than the fastest DDR-SDRAM in 2020). Except those things would have a few kilobits at most and burn a Watt or two; an entire 16 GB stick of DDR4 doesn't require much more than that. (This is SRAM of course; you can't build good DRAM on a bipolar process, and MOS SRAM is much faster than DRAM as well. However, MOS SRAM in the 80s would have access times of 20-150 ns; it's typically the suffix of the part number, e.g. 62256-70)

mark-r · on Feb 15, 2020

I worked at the company Jim Thornton founded after he left CDC, Network Systems. Each network adapter box had a custom processor implemented in ECL.

cptnapalm · on Feb 15, 2020

I had not heard of ECL before. Thank you for popping in with this.

bonzini · on Feb 15, 2020

These days ECL lives on as CML, which is similar and mostly used for signal transmission. It's very fast so HDMI uses it. Some crypto circuits use CML logic too, because it's less susceptible to side channel attacks.

dfox · on Feb 15, 2020

Almost all modern fast serial interfaces are descended from ECL. Often with a twist that while receiver is ECL/CML-style long-tailed pair (which is the obvious implementation of comparator), the transmit side is normal CMOS totem-pole output stage coupled with some passive network to produce right voltage levels (and right output impedance).

analognoise · on Feb 16, 2020

Wait, what? You can do a MOS long-tail pair just fine, can't you? Similarly for a totem pole with BJTs.

They're topological circuit constructions that work for either family - aren't they?

rbanffy · on Feb 16, 2020

> The design of the machine is well documented in a book by James Thornton, the lead designer

The type on the twin CRTs of the console is very interesting. The cover of the book shows a sample of it and the slight imprecisions of the beam deflection give it a whimsical quality, as if the fastest computer of its time used Comic Sans to communicate with its operator.

burlesona · on Feb 15, 2020

Does anyone know why 60 bits and not some power of two? How did that work? Or am I just being silly and it doesn’t matter? :)

retrac · on Feb 15, 2020

At the time, many computers were decimal, or had word lengths like 18, 24, 36 or 48, or even 72 bits. Characters were usually 6 bits. The power-of-two standard based around an 8-bit byte didn't exist yet.

Whoever picked 60 bits (Cray?) was almost certainly thinking in octal, not hex. 60 bits is multiple of 3, it fits in 20 octal digits, and it holds 10 characters. Most importantly, a 60-bit floating point number is precise enough for just about any calculation.

AnimalMuppet · on Feb 15, 2020

IIRC, some early mainframes (IBM and maybe Sperry?) had 36-bit words. 36 bits was enough for 12 (decimal) digits of accuracy with fixed-point arithmetic, or 10 digits for floating point. It was good enough for atomic calculations (where the difference between an atom's mass and the masses of the two atoms it breaks into is a very small fraction of the initial mass).

ScottBurson · on Feb 16, 2020

The DEC PDP-10 was a 36-bit machine.

ThomasBHickey · on Feb 16, 2020

With an interesting and functional set of 'byte' instructions where you could specify the number of bits per chunk. IIFRC 6 or 9 bits were typically used for characters, but I think there was a 5 bit character set in use as well.

burlesona · on Feb 15, 2020

That makes sense, thanks for sharing!

baybal2 · on Feb 15, 2020

As transistors are getting smaller, it's said that when transistors will approach <10nm gate sizes, RCL may reappear again because at these sizes semiconductors will begin to lean so much, that FET based logic will no longer have advantage in the current draw over current based logic families.

dfox · on Feb 15, 2020

I would not expect return to RTL/DTL but iwould not be surprised by use of NMOS-style pull-up transistors/resistors in combination with traditional CMOS logic. You can make NMOS gate in CMOS process quite easily and it comes out significantly smaller. Doing DTL in CMOS process seems somewhat pointless given the fact that simplest way to make diode-like thing in CMOS is transistor. And then there is the issue of small fanout of RTL/DTL.