NEC’s Forgotten FPUs

trzy · on Sept 2, 2021

Cool stuff. The V60 (a 32-but CPU that is not x86 compatible) was used in Sega’s Model 1 arcade board (Virtua Racing, Virtua Fighter, etc.)

meithecatte · on Sept 2, 2021

Wait, I don't follow. NEC had a license for the Intel chips, but then got sued for producing a chip like the 8086?

astoriafloyd · on Sept 2, 2021

Likely they had a license for the Intel chips, and then used that to make their own chips based on the Intel chips, but are technically not, and therefore able to sell without paying Intel royalties or whatever.

Cullinet · on Sept 2, 2021

Unless you are very explicit, yes, derivative works aren't covered by normal IP licensing and case law was established very long ago testing interpretations. I don't have Lexis unfortunately to pull the dockets, but towards the end of this year I will, and I am going to try and provide as much summary research as possible via a blog - because silicon IP is entering our lives in a big way soon as architectural complexity demands standard latency mitigating by implementing software in silicon (Oracle is claiming that expression as a common law trademark since a couple of SPARC generations, so even our normal terms of ref are going to be fraught.)

speed_spread · on Sept 2, 2021

Another reason silicon IP could become commonplace is chip shortage. It's not so far fetched to imagine a simplified computer without specialized coprocessors but with a single reconfigurable gate array allocated to the problem at hand.

Cullinet · on Sept 2, 2021

Exposure to trade secrets is the most important standard problem with derivatives. Access to even extremely detailed and informative documentation doesn't necessarily convey infringement certain IP, but it is not the most popular pastime of a litigation lawyer to create strategies for layman's comprehensions of differences between novel creation non obvious to the skilled in the art and generically intelligible deductions reasonable to expect no protection from laws even if both parties went to enormous expense to negotiate and conclude the terms of access and application of the descriptions essential to contested ip.

noipv4 · on Sept 3, 2021

I remember NEC Earth Simulator being the world's #1 supercomputer when I was an intern there in 2004 in Princeton. Their CPU for internal designs was the V80.

twic · on Sept 2, 2021

> In addition to expanding the register set to 32 FP registers, the ‘691 also added a complete suite of matrix math functions.

Anyone know anything about these? I can't imagine what matrix maths functions means at this point in time; is this SIMD, or functions to help compute determinants etc, or something else?

fulafel · on Sept 2, 2021

I thought this was going to be about NEC's vector supercomputer[1] processors. Anybody know about writeups regarding these?

[1] https://en.wikipedia.org/wiki/NEC_SX

kencausey · on Sept 2, 2021

Short but relevant? https://www.cpushack.com/2015/02/22/nec-sx-ace-quad-core-vec...

wmeddie · on Sept 2, 2021

I am currently working with these. Anything you'd like to know more about?

fulafel · on Sept 3, 2021

Cool. A few questions, don't feel obliged to answer all of them: Is it a custom instruction set? What are similarities / differences to desktop vector instructions like sse/avx (or tpu etc "neural processors")? What's the sw/compiler stack like, how easy is it to port software, or is sw more commonly custom written for the platform?

wmeddie · on Sept 3, 2021

All good questions.

1) It is a custom instruction set, you can rean the ISA guide over at https://www.hpc.nec/documentation

2) The main difference in simple terms is that AVX instructions have a fixed vector length (4, 8, 16 etc). With the SX the vector length is flexible so it can be 10, 4, anything up to the max_vlen (up to 256 on the latest ones). Essentially the idea is you have a single instruction that can replace a whole for loop. Without a good compiler though that means you have to re-write your nested loops.

3) There's currently two options when it comes to the compiler, you can use the proprietary NCC or use the open source LLVM fork NEC has. NCC is less compatible than GCC/Clang (particularly modern C++17 is problematic) but has a lot of advanced algorithms for taking your loops and rewriting them and vectorizing them automatically. The LLVM-fork currently supports assembly instruction intrinsics but they are still working on contributing better loop auto-vectorization into LLVM.

4) Porting software is not terribly difficult to get working, but quite a bit harder to get performing very well depending on the type of workload. Since the Scalar core is pretty standard, you can almost always take regular CPU code and get it running (unlike GPU code in general). If you don't leverage the vector processor though, the performance you get will be nothing special, especially at 1.6GHz. Most of the software made for it starts off as being CPU code and is then modified with pragmas or some refactoring to get it running with good performance on the VE. In almost all cases the resulting code still runs on a CPU just fine. One example of a project that supports both in a single code-base is the Frovedis framework[1].

I think the chip deserves a little more interest than it does. It's one of the few accelerators that you can 1) Buy today, right now 2) Has open source drivers [2] 3) Can run tensorflow [3]. The lack of fp16 support really hurt it for Deep Learning but it's like having a 1080 with 48 GB of RAM, still lots of interesting things you can do with that.

[1]: https://github.com/frovedis/frovedis [2]: https://github.com/veos-sxarr-NEC/ve_drv-kmod [3]: https://github.com/sx-aurora-dev/tensorflow

fulafel · on Sept 4, 2021

Fascinating stuff, thanks for the details! I had no idea that they made PCIe accelerator based configurations now.