I remember reading some story way back with some ... Sony exec? Someone saying t...

corysama · on Feb 24, 2021

I bet they were writing Vector Unit assembly in Excel. The VLIW design meant that you literally packed 2 assembly instructions together into a 64-bit word. Each side had it's own subset of execution units it could utilize and would have it's own separate latency for each operation. So, to keep track of everything going on you pretty much had to write out two columns of code containing your own no-op notation to represent the delay cycles between instructions. That would be much more convenient to do in Excel than in a linear text editor.

monocasa · on Feb 24, 2021

The N64 RSP is similar. I've been playing with a N64 RSP assembler that'll understand how instructions can dual issue, and give warnings when two instructions on the same line can't dual issue, so I can for sure see myself using excel back in the day to help manage the complexity.

danbolt · on Feb 24, 2021

Is that the bass assembler, or something else? I've been playing around with the default assembler setup in libdragon, but I'd be curious about trying something more intuitive.

monocasa · on Feb 26, 2021

It's an assembler I'm writing as a Rust proc_macro. Lets you do fun stuff like share constants between Rust running on the main CPU and the RSP, and handles some runtime code generation pretty nicely

danbolt · on Feb 26, 2021

Not to pry too much, but I'd love to learn more about it. Is there a chance you have it on Git(ea/Lab/Hub/etc.) somewhere?

Threeve303 · on Feb 24, 2021

Hopefully the Vector Unit assembly didn't require more than 16,384 lines of code if they were using Excel 95.

monocasa · on Feb 24, 2021

The bigger VU only had 16k of of memory so that's not an issue, lol.

p_l · on Feb 24, 2021

The story I heard was of a very, very heavily macro-based excel sheet that would automatically highlight possible problems including pipeline stalls in one of the VLIW units.

Been looking for confirmation or just a screenshot of that monster since :D

phire · on Feb 24, 2021

I found a comment from me speculating on how such a spreadsheet be set-up: https://news.ycombinator.com/item?id=12294472

I can't find any evidence of actual spreadsheets. They probably would have been setup by each programmer for their liking and maybe shared between programmers on the private sony sdk message board.

Or maybe I have an overactive imagination.

But I have heard is that the leaked Playstation 2 SDK from 2005 does contain a windows program called "VUEditor.exe". It's a custom IDE for writing VU assembly and it kind of acts more like a spreadsheet than a text editor, allowing the programmer to shift highlighted columns of instructions up and down.

It does show stalls and highlight issues. I couldn't find any screenshots, but one could find the sdk and try it out if they wanted.

p_l · on Feb 26, 2021

Interestingly enough, it could be that said VUEditor embeds Excel. These days it seems forgotten trick, but all of MS Office on Windows provides components that allow you to embed features inside your own programs

h0l0cube · on Feb 24, 2021

It might have made things simpler. From memory there was also a VU compiler that would take a single stream of instructions and then try to interleave the the 'left/right' instructions of the VLIW in the most efficient orders. It never was quite as good as hand coding - which is something kind of satisfying once you've memorized the stats of the entire ISA.

One half of the instruction word was mostly devoted to vector instructions which would each have a 4 cycle latency, and a single cycle throughput, to say, multiple one 4 word vector of floats with another. The other half had the kitchen sink, like trancendental functions (sin/cos/tan/etc), divide (x/y, 1/x), etc. However an instruction would stall if a calculation was depending on a register whose calculation was still in flight.

So if an instruction has a latency of 4 cycles, you could interleave 4 similar calculations on more of the dataset in parallel. So in 4 cycles, you could start 16 float calculations, and by the 8th cycle, they would be available. Though likely you'd be looping with something like Duff's device to maximize throughput. It would also mean you might use the vector calculations with a taylor series, in preference to the sine instruction which had a latency of something like 15 cycles.

mhh__ · on Feb 24, 2021

Excel as in microsoft excel!?

girvo · on Feb 24, 2021

I can totally see it working well for writing in columns, and being able to do rudimentary automated analysis!