Hacker News new | past | comments | ask | show | jobs | submit login
Nyuzi: An experimental FPGA multicore GPGPU processor (github.com/jbush001)
88 points by jdmoreira on March 30, 2016 | hide | past | favorite | 31 comments



Just thinking about writing VHDL again gives me the heebie-jeebies. But this is very cool, and I love the possibilities enabled by FPGAs and OSS. However we're still a ways away from having an entire open source FPGA development stack.


I've been really wanting to try Chisel https://chisel.eecs.berkeley.edu/

They have a RISC V implementation in it so it can't be too bad


Chisel doesn't help. It just converts to verilog. The verilog is then converted by a closed source program to a closed file format that is then uploaded by a closed sourced program to closed fpga hardware.


This open source stack may interest you, then. http://www.clifford.at/icestorm/


Right now for a project im using kicad, yosys, arachne-pnr and icestorm. As n00b to this layer (most exposure was through matsci and cheme req classes), its so much fun to be able to learn along the way (and i get to save $600 instead of buying someone else fpga and peripherals and getting exactly want I want i need out of it, no more or less). Although i wish i could get on with the rest of my project, this is a fun detour.


oh man, VHDL was the worst. My hope is things like this will get people to build better, open sourced tooling.


At least it's better than Verilog. But there's Python to the rescue! http://www.myhdl.org/

Still a lot of tooling left but my FPGA tinkering became much more enjoyable when I stumbled across MyHDL.


Ugh.

Every time I see imperative language X adapted to RTL/VHDL/Verilog I want to slap someone.

They aren't the same, gates are a fundamentally different primitives. You shouldn't be using a language built around serial actions for something that's inherently parallel. You bring all sorts of baggage along that you don't need.

[edit] They don't even specify if their state machine generated is Mealy vs Moore. This is the stuff that you want control over and not abstracted away by some language that you don't know how it will synthesize.


You did not get it. MyHDL is simply a macro-processor used to generate an RTL. And generating a sequence of similar, say, module instances is ok even with an imperative language, even Verilog got for loops for this.

Chisel and Clash are not any different.


You might want to re-read my post again, I'm specifically railing against imperative -> RTL/VHDL/Verilog.

Verilog may have for loops but they're largely used for simulation, they're expensive in terms of synthesized hardware(mostly toolchain/process dependent on how they synthesize) and can only be fixed size.

I've got no issue with Chisel, it's a DSL which I think is a good fit. What I'm arguing against is constructs that don't have a clear mapping to hardware representations which means you have to guess at what the compiler generates(see my Mealy vs Moore comment above).


My understanding is that MyHDL is in fact more like Chisel/Clash, it specifically says "It does not turn arbitrary Python into silicon" in their getting started guide (and as sklogic says).

See also: a MyHDL UART: https://github.com/andrecp/myhdl_simple_uart/blob/master/ser...


Yeah, but that feels like round hole/square peg -ish. It's like saying we've got all this stuff over here but don't use it, really, we mean don't!

You're going to spend a ton of docs explaining to the user what parts of the language you can't use rather than having a DSL spec that's clear in what's supported.

With a proper DSL you also don't have to massage language features that don't quite map(say enums for state machines) into a format that it's not meant for.


You have no idea what you're talking about. You're wrong, MyHDL is a proper hardware DSL in Python. You've never used it and you have no idea what you're talking about


MyHDL is exactly the same kind of a eDSL as Chisel. And you have a full control of what it generates, see the examples.


If it's Python, it's not a DSL by definition.

Since you don't seem interested in addressing just one issue(I'm sure I could come up with more) around state machines I don't see any reason in continuing this discussion.


> If it's Python, it's not a DSL by definition.

Scala is not any better. It's exactly the same kind of a eDSL - no macros, nothing, just generating objects in runtime and then serialising this tree into a Verilog code. Nothing fancy.

> around state machines

What state machines?!? It does not have any more features for defining FSMs than an underlying Verilog.


Actually Verilog developers are orders of magnitude more efficient than their VHDL counterparts.

http://www.bawankule.com/verilogcenter/contest.html


Wow 1997, but thanks for the link, it was a fun read. It'd be fun to hear from people in the industry what their experience has been on larger multi-people projects -- e.g. how long would it take to create an ASIC like http://www.jandecaluwe.com/hdldesign/digmac.html ?


Clash, Chisel, and Lambda-CCC are all big improvements over Verilog/VHDL.


Looks like he got it running on a Altera DE2-115 board, which has these specs:

    114,480 logic elements (LEs)
    3,888 Embedded memory (Kbits)
    266 Embedded 18 x 18 multipliers
    4 General-purpose PLLs
    528 User I/Os
[1]: http://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=E...


Does anyone know if there's an opposite analog of this? I would very much like to run a parallel language like VHDL or Verilog on the GPU since:

1) OpenCL/CUDA have an OpenGL-inspired syntax with a steep learning curve and limited generalizability

2) FPGAs don't seem to be gaining the economies of scale of GPUs

I simply want to be able to emulate thousands of CPUs (millions of gates) for physics, AI, big data etc, in a way that's accessible, affordable and won't catch fire. I'm thinking MATLAB or Octave but with near-ideal speedup for embarrassingly parallel problems.


Do you already know VHDL or Verilog? Most people would not consider them simpler or more productive than OpenCL IMO.

Julia fits your last sentence.


people use theano/tensorflow for this


Is this GPGPU only, or does it also support GLES or OpenGL?


I guess you could implement GLES on top of it. They implemented a renderer for quake maps... https://github.com/jbush001/NyuziProcessor/tree/master/softw...


Looks like it runs at about 1 FPS on a 50Mhz core(with screenshots): http://latchup.blogspot.com/2015/06/not-so-fast.html

Still crazy awesome, that's a ton of work.


Well, you can apparently do 3D rendering but it's kind of slow based on the latest information I can find: http://latchup.blogspot.co.uk/2015/06/not-so-fast.html

They are (or were) spending a huge amount of instructions on stuff that'd have dedicated hardware/instruction set support on a proper GPU. Normally rasterization and texture sampling runs in dedicated hardware, colour packing/unpacking is integrated into the memory access instructions (at least on Radeon), etc. Stuff that'd be one or two instructions on a commercial GPU instead took dozens or hundreds.


(author here)

Yeah. One area I'd like to investigate is adding specialized instructions. The existing renderer is not highly optimized.


Man, it'd be super sweet if we could get an OpenCL frontend for this target.


Technically it already supports OpenCL, as it has an LLVM backend and Clang port. However, it will generate scalar code that doesn't take advantage of the vector unit. To support it properly, it would need extra passes for SPMD vectorization.


I think there's a lot of follow-through beyond just llvm and clang support in order to make a full OCL platform -- device enumeration, etc. Plus I don't think clang distributes a complete front end (headers/type defns etc). There's some open source projects that could supplement this, though.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: