I remember reading some story way back with some ... Sony exec? Someone saying that there were people writing engine code in Excel to be able to track the co processors and get the timing right.
Does anyone know of the truthiness of such a story? I'm learning FPGA design and even there, when timing is pretty controlled, people end up recommending design patterns where you gate stuff similar to how you would do it in general software design (expecting one branch to go faster than the other, etc)
I bet they were writing Vector Unit assembly in Excel. The VLIW design meant that you literally packed 2 assembly instructions together into a 64-bit word. Each side had it's own subset of execution units it could utilize and would have it's own separate latency for each operation. So, to keep track of everything going on you pretty much had to write out two columns of code containing your own no-op notation to represent the delay cycles between instructions. That would be much more convenient to do in Excel than in a linear text editor.
The N64 RSP is similar. I've been playing with a N64 RSP assembler that'll understand how instructions can dual issue, and give warnings when two instructions on the same line can't dual issue, so I can for sure see myself using excel back in the day to help manage the complexity.
Is that the bass assembler, or something else? I've been playing around with the default assembler setup in libdragon, but I'd be curious about trying something more intuitive.
It's an assembler I'm writing as a Rust proc_macro. Lets you do fun stuff like share constants between Rust running on the main CPU and the RSP, and handles some runtime code generation pretty nicely
The story I heard was of a very, very heavily macro-based excel sheet that would automatically highlight possible problems including pipeline stalls in one of the VLIW units.
Been looking for confirmation or just a screenshot of that monster since :D
I can't find any evidence of actual spreadsheets. They probably would have been setup by each programmer for their liking and maybe shared between programmers on the private sony sdk message board.
Or maybe I have an overactive imagination.
But I have heard is that the leaked Playstation 2 SDK from 2005 does contain a windows program called "VUEditor.exe". It's a custom IDE for writing VU assembly and it kind of acts more like a spreadsheet than a text editor, allowing the programmer to shift highlighted columns of instructions up and down.
It does show stalls and highlight issues. I couldn't find any screenshots, but one could find the sdk and try it out if they wanted.
Interestingly enough, it could be that said VUEditor embeds Excel. These days it seems forgotten trick, but all of MS Office on Windows provides components that allow you to embed features inside your own programs
It might have made things simpler. From memory there was also a VU compiler that would take a single stream of instructions and then try to interleave the the 'left/right' instructions of the VLIW in the most efficient orders. It never was quite as good as hand coding - which is something kind of satisfying once you've memorized the stats of the entire ISA.
One half of the instruction word was mostly devoted to vector instructions which would each have a 4 cycle latency, and a single cycle throughput, to say, multiple one 4 word vector of floats with another. The other half had the kitchen sink, like trancendental functions (sin/cos/tan/etc), divide (x/y, 1/x), etc. However an instruction would stall if a calculation was depending on a register whose calculation was still in flight.
So if an instruction has a latency of 4 cycles, you could interleave 4 similar calculations on more of the dataset in parallel. So in 4 cycles, you could start 16 float calculations, and by the 8th cycle, they would be available. Though likely you'd be looping with something like Duff's device to maximize throughput. It would also mean you might use the vector calculations with a taylor series, in preference to the sine instruction which had a latency of something like 15 cycles.
Does anyone know of the truthiness of such a story? I'm learning FPGA design and even there, when timing is pretty controlled, people end up recommending design patterns where you gate stuff similar to how you would do it in general software design (expecting one branch to go faster than the other, etc)