Hacker News new | past | comments | ask | show | jobs | submit login
Open Source Needs FPGAs; FPGAs Need an On-Ramp (blinklight.io)
309 points by heathjohns on March 31, 2017 | hide | past | favorite | 147 comments



I am currently waiting for my MATRIX Voice to arrive.

https://www.indiegogo.com/projects/matrix-voice-open-source-...

I have a Spartan 3AN dev board, another Spartan6 board, and an Arty. I was using Xilinx ISE free version to develop for the other boards until I bought the Arty. It came with a one year license for Vivado. I did not know that activating the Vivado license locked me in to developing only for Arty. ISE will no longer synthesize for any other target. I strongly dislike the closed nature of their software licensing.

I am retired but the last 10 years I worked writing VHDL. I can kind of read Verilog and understand what it does but do not know it well enough to write in it. The systems I worked on were for oil well logging. My circuits went down a 16,800 ft. hole where it was 350° F and the pressure was over 7000 PSI. Production quantities were typically less than 100. We used no bigger an FPGA than was needed to keep power at a minimum as there was no way to dissipate heat. Also the circuit boards were quite small since they needed to fit into a housing less than 2" in diameter. Frequent design changes were needed but all ran on the same boards.

I am currently working on a processor design that I call NISC. The set of all opcodes is the null set. It's a single instruction machine that does a move instruction with two operands, Source Address and Destination Address. I have considered putting the specifications and design on the internet as open source but am not sure where I should put it. Would anybody be interested in seeing it and where do you think I should put it?


Check out the Open Source Hardware Association best practices document for some ideas of what people release: https://www.oshwa.org/sharing-best-practices/

BeagleBone Blue is a good recent example of an open hardware project using GitHub: https://github.com/beagleboard/beaglebone-blue

OpenROV is another example: https://github.com/OpenROV


I did not know that activating the Vivado license locked me in to developing only for Arty. ISE will no longer synthesize for any other target. I strongly dislike the closed nature of their software licensing.

That is absolutely not correct. Something else is going wrong with your licensing. Have you asked around on the Xilinx customer forum?


Curious, how would one implement adding two numbers on your NISC architecture?

Or is the idea that you move the operands to the inputs of an adder circuit, and then move the result?

How would conditional control flow work?



Thanks for this link. I will read it when I have time.

I currently have an accumulator at an address.

Moving to the next address ANDs with the accumulator.

Moving to the following address ORs with the accumulator.

Subsequent addresses XOR, ADD, ADC, SUB, SBB with the accumulator.

I have a location called Z and one called NZ which may be written. Reading from either location returns what was written to Z if the Z flag is set, otherwise both read what was written to NZ. Moving either to the PC effects a conditional JUMP. Moving either to the relative register adds it to the PC (relative conditional Jump). Moving either to the Indirect register pushes the PC on the stack and writes to the PC (conditional call).

I envision the capability of using the accumulator as a floating point register and having locations which perform floating point operations in a similar manner. It could also be considered as a vector and there could be locations which perform vector operations on it.


I would have expected your Vivado and ISE licences to be separate, do you still have contacts with a FAE to ask for help ?


Do it on Github!



Out of curiosity, what is the advantage of having a single instruction?

I imagine that the benefit of a simple syntax in the assembly code is counterbalanced by the complexity of implementing complex logic, say control flows?

Also, what is the performance impact of this approach? Wouldn't it cause more cycles for the same code in general, compared to multiple instructions?

Lastly, could there be an optimized hardware for this set? (Executing directly on the RAM chipset maybe?) How do you see it perform against more traditional hardware?


I see one of the advantages being that it is very easy to learn due to its simplicity. There is also no need for complex instruction decoding placed on the hardware designer.

For software there may be a slight increase in complexity for control flow because it is necessary to specify the addresses where execution continues both when the condition is met and when it fails versus CISC which specifies only the address to branch to when the condition is met and defaults to the next instruction inline when it fails but there is also a gain in flexibility by specifying both addresses because you could, for example, call one subroutine for Z and call another subroutine for NZ and both would return to the next instruction inline.

When I first read about Move machines it appeared that they were very inefficient and the simple ones described were. In the decades since then I have devised means to offset that inefficiency with more powerful hardware functions. Especially with an FPGA target it is possible to perform complex functions in hardware that only needs to be memory mapped. Depending on your application you could have vector math, complex number math, or SIMD instructions which run in a single machine cycle. It is possible to have a shift-and-add multiplier which takes several cycles that can be loaded then you could perform some other operations while waiting for completion. I have written shift-and-add multiply routines for microprocessors which needed the processor to do the shifts and to do the adds which kept it fully occupied. While I do not claim that the design I have documented is the optimal processor, I maintain that it should have reasonable performance.

One of my goals in releasing this design is to draw interest and receive suggestions and recommendations for improving it from the highly intelligent community that reads HN and uses Github. I would like to see it optimized.


Something that's not really spoken to but is just as important is that FPGA development isn't software development. You're specifying hardware and usually that hardware doesn't just scale to new platforms.

Time will tell if Open Source can make the transition over to that domain I'm cautiously optimistic but I think there's a larger cultural divide there than called out.


FPGAs are sexy to many people because they are exotic. Imagine a custom chip designed to do whatever you want and able to be reconfigured on the fly?

The problem is that many aspiring learners want to approach it like a software problem. That can be a valid approach as long as they are willing to learn new abstractions. Many are not and then get frustrated and blame the tool ecosystem. Granted, the tool ecosystem does suck, but people (like me and others) are doing real work with these tools. Even if you scoff at FPGA tool quality, you have to remember that almost every ASIC out there was developed with similar (maybe slightly better) quality tools.

There has to be flexibility on all sides. Tool vendors have to adapt to changing times (more open ecosystems) and software developers branching out into FPGAs have to be willing to learn how hardware works.


> The problem is that many aspiring learners want to approach it like a software problem.

That is precisely why I made Blinklight - it's an educational platform for starting at the very bottom.


Maybe it would be better to use the highest abstraction tools possible (chisel? Maybe DSL's for hardware generation and verification) ?

Because compared to let's say software development, or embedded systems development, real chip design and mostly the tons of verification you need, is boring.


Look up Synflow's language for HLS. It's C based and open-source.


I don't want to diminish what they and other similar projects do and actually I'm not really familiar with FPGA development, but I have one thought: what if we don't need one more language, but rather something on a different level of abstraction?

The first neural networks that ran on GPU was written using low-level GPU primitives [1]. This was a non-trivial process that required to do a lot of low-level stuff. It required system programming skills and time to implement new architectures. But a group of researchers at the University of Montreal developed Theano [2], a framework that allows you to define computational graphs in Python programming language and then compile them to CUDA code that could be then executed on GPU. Instead of spending resources on development of a new language they put their efforts on actual thinking out useful abstractions and implementing compiler that works efficiently. It is also notable that they didn't include very high level abstraction in Theano too, but there are libraries like Lasagne [3] and Keras [4] that introduced higher level abstractions (neural network layers and pluggable pre-implemented models) on top of Theano. It is safe to say that Theano boosted Deep Learning research, making programming of new neural networks architectures quicker and accessible.

What if actually we need just the same thing for FPGA? Just a Python library that defines useful abstractions for logic circuits building, allows to construct arbitrary graphs using them, and then compile these graphs to VHDL. Assuming that the basic building blocks defined in Python are well defined and tested, it would be easy to implement verification and some testing in pure Python, the tooling like visualising logic diagrams can be implemented in pure Python too.

In addition to reduced efforts for development (because you don't need to design and implement a new language), it would be easier to pick up by software programmers: they would not need to learn new syntax, but concentrate on core conceptions like gates/summators/other stuff and graphs involiving them. It wouldn't be necessary to develop special-purpose editors, because it just Python, and again, because the tests for the resulting schemas could be written and ran in pure Python, it would be possible to use standard CI tools like Travis for open source development.

Edit: It seems like there is already a project MyHDL [5] that does something very close to what I described above.

[1] https://hal.inria.fr/inria-00112631/en/

[2] https://github.com/Theano/Theano

[3] https://github.com/Lasagne/Lasagne

[4] https://github.com/fchollet/keras

[5] http://www.myhdl.org/


I'm one of the core devs of MyHDL(https://github.com/myhdl/myhdl). It does most of what you're describing and converts to both Verilog and VHDL.

I'll be happy to answer any questions you have.



Ha! Looking at their website, I realized a toy project I just did for a hiring process was actually designed to produce simulatable inputs to Clash!


You didn't want another language but then mention a complicated pile of languages and libraries? And people doing high-level synthesis weren't putting "efforts on thinking out useful abstractions and implementing a compiler that works efficiently?" That's the opposite of true and condescending to HLS or HDL fields where building useful, efficient abstractions on hardware easy for developers has an abysmal success rate despite large brainpower and money invested. Lets look into your recommendation, though.

" Just a Python library that defines useful abstractions for logic circuits building, allows to construct arbitrary graphs using them, and then compile these graphs to VHDL. "

Because you're really describing either an enhanced HDL like MyHDL or a High-Level Synthesis tool. The first requires people to learn hardware to use right. That's hard for developers based on their online comments. If they do, they can make some pretty efficient designs, though. The second has shown to be easier for developers since it can be close to their way of thinking. However, turning high-level abstractions into efficient, low-level code is more like automatic programming than regular compilers given every step is NP-Hard w/ tons of constraint combinations. Like in software, automatic programming never happened: anything doing synthesis usually performs less than hand-made stuff in numerous attributes. This can be significant in affecting say the clock rate. So, CompSci is investing in both directions with numerous HDL's and HLS tools made. For HDL's, BlueSpec, Chisel, and MyHDL are probably most successful since they help hardware people handle hardware better. For HLS, I'm not sure since they don't disclose their numbers. ;) Here's a list of them, though. Bluespec is on same list so maybe its features fit in multiple categories. (shrug)

https://en.wikipedia.org/wiki/High-level_synthesis

Synflow's people were doing HLS research at a University of ironically more similar to what you brought up than most HLS. It had a parallel focus. Must have not gone anywhere or got corporate lock-in. So, they went the other route to build something easy for developers. Then open-sourced the compiler and IDE extensions along with some cheap IP and a cheap board. So, when a developer asks about doing FPGA work I reference easy-to-learn tools like that or NAND to Tetris if they want the hard route.

Note: There's also commercial support for OpenCL on FPGA's by Altera. Maybe others. There's also CUDA to FPGA work. CompSci isn't being narrow: they've been hitting every idea they can think of with low adoption you see being because almost none of it works. They keep trying, though.

http://cadlab.cs.ucla.edu/~cong/papers/FCUDA_SASP09_CR.pdf


The culture clash between the traditional VLSI mindset and programmer may be avoided... all we need is a REPL :)


It can and it can't. It's sort of like microcontroller code. You can define a core set of functions (like an FFT or something) that is very portable, but the peripheral mapping (which pins to output on, where resources are located, etc) is chip specific.


Yeah, even things like Block RAM, DSP units, carry + routing and other parts vary widely from chip/vendor.


I don't disagree. I'm reasonably optimistic that current and near-future FPGAs are overkill enough for the job of motherboard chipset and basic peripherals that that some basic portability abstractions can be put in place (e.g. ASM -> C) without causing such a performance hit that it's not viable any more.


I think it will still be a very uphill battle to get such chips integrated anywhere for some very basic reasons: power and heat.

All those extra transistors that allow FPGAs to be reprogrammable also dissipate a lot of heat and use a lot of power (or did back when I was mounting massive heatsinks on custom networking FPGAs).

For a laptop or phone manufacturer if the choice is between an ASIC and an FPGA that consumes 10x the power it is an easy choice. It's not just dollar cost, but power and heating budgets.

In general I love the idea.


Things have been changing - they put FPGAs in phones now:

https://www.ifixit.com/Teardown/Samsung+Galaxy+S5+Teardown/2...

(search for "FPGA" in the page).

That's a tiny one, granted, but things are certainly getting better in that regard.


CPLDs have been pretty commonplace in complex embedded system designs for a while. They're great for consolidating a bunch of little logic bits into a single package, and allow you to build some control logic using code rather than iterating hardware when you need to make small changes. There's huge advantages for a system like a phone where you are extremely space-constrained, and also with quick turn engineering cycles where the hardware can't go through 6 revisions before being completed. Throw a CPLD or a low-power FPGA like the Lattice ICEstorm series in there and let the HDL do the rest!


I've also seen CPLDs used to recover from board layout mistakes.


That's cool to see, although given the exploding Samsung Galaxy phone situation (probably completely unrelated) I'm not sure their engineering of power consumption/cooling is the best thing to cite. :)


A defective battery is about as far as you can get from chip design.


Maybe we could have a hardware interface designed to be open?


For inter-IC interfaces, AMBA [1] and Wishbone [2] are open standards for connecting various sections of a chip --bridge between cpu, i/o bits, etc. External interfaces are also pretty open. A lot of FPGAs have dedicated logic for talking with a PCI bus, I2C bus, etc. Unless you're doing something extremely interesting and specialized, hardware's pretty open as is.

[1] https://en.wikipedia.org/wiki/Advanced_Microcontroller_Bus_A...

[2] https://en.wikipedia.org/wiki/Wishbone_(computer_bus)


Good place to start this would be here:

https://github.com/wkoszek/freebsd_netfpga

https://wiki.freebsd.org/FPGA/

https://www.freshports.org/devel/xc3sprog/

It's a FreeBSD NetFPGA driver for an older 1G card that I wrote during my internship 8 years ago, and a little ecosystem to make development more functional. I could program the FPGA from the FreeBSD and synthesize the code on FreeBSD that time.

What you say is very hard, though. Speeds which are achieved with modern ASICs are hard to compete with, because ASICs do a lot of advanced stuff with DMAs, interrupts, checksum offloading. This might be doable with an expensive hardware, but nobody in DYI community has money to do this.

Additionally the tools for synthesis are proprietary, and everything touching FPGA is pretty much proprietary too. Looking into date/author of the synthesis is as far as I could get:

https://github.com/wkoszek/libxbf

(can only open file and tell you where the bitstream starts; not what it actually is) So the road to complete freedom goes through ASIC world, in my opinion.


Great list.

I have Bunnie's awesome Novena, which came with an FPGA (as well as an SDR add-on).

But, as the article says "FPGA development has always been an industrial activity, dominated by brutalist, opaque, and proprietary tools."... I haven't found the FPGA-FOR-DUMMIES guide to help me do anything with the thing.


Do you think there's a market for an FPGA-based tutorials?

I think I have a good grasp of stuff necessary for this, but I always felt that it's a really niche business, even within the DYI/open-source community.

If you want to start doing anything, I found it's best to look at the popularity of the board. S3E Starter Kit is by far the best, since most of the people could afford it, and it's nicely support. Any other board: case by case basis.


Realistically no.

However, I doubt many folks predicted the wild popularity of Arduino or Rpi either. All it takes is for some cool applications. The price point isn't a huge barrier these days.

I keep hoping some post-grad will come up with a great DUMMIES like guide that covers Hello World to signal processing.


Author here, that is precisely the point of Blinklight (the site this blog post is from) :)

It starts from the absolute beginning, and the tools and tutorials are web-based and integrated with each other:

https://www.blinklight.io

The focus is from Hello World to personal computer, but I've built a DSP-based guitar pedal in the past, and signal processing is really fun - I'd love to create a parallel learning path from Hello World to DSP at some point.


Andrew Zonenberg has a whole list of open projects for FPGAs on his wiki:

https://github.com/azonenberg/openfpga/wiki

There has been a lot of exciting work in the same repo for Silego lines and general foundations for Open Source FPGA toolchains for many of the types out there.

I've been really enjoying Clifford's Project IceStorm - open source tools which I've been using to develop/test on real hardware build up some Verilog chops:

http://www.clifford.at/icestorm/

I made a quickstart for it in a repo here for those interested in starting some Verilog adventures : )

https://github.com/gskielian/TEAM-VERILOG


For anyone interested in developping applications for FPGAs in a high-level DSL embedded in Scala, this project (https://github.com/stanford-ppl/spatial-lang) from a Stanford Lab might interest you.

Disclaimer: I am part of the lab.


I noticed there's a lot of RISC-V cores today implemented in Berkeley's Chisel (https://chisel.eecs.berkeley.edu/). How would you say they compare?


Actually, Chisel is one of our main target codegen. We aim to be more high-level than Chisel.

See here for a quick and very incomplete tour: http://spatial-lang.readthedocs.io/en/latest/tutorial.html


I was never able to convince my coworkers to program in FPGA HLLs, just because when you need to debug and simulate, you have to touch VHDL/Verilog in order to communicate with the board and understand what your high level abstraction is compiling to (not to mention the great deal of work that relies on tweaking and instantiating FPGA parameters like clock lines, buffers, DSP cells, etc). And actually this takes up the majority of the design time... so to make a software analogy, why bother writing in Scala when you have to verify and debug at at the assembly level?


Is that a serious question? Because most software development is now done in high level languages.

With the proper abstractions, there's no need to debug the high level code at the assembly level; you just need to debug the abstractions.


Maybe I should have made a different comparison: you can write a kernel driver in Python, but maybe you shouldn't. Debugging your HLS compiler is not a process full of joy and happiness.


I love FPGAs in a simplistic sort of way. I first started working with them when I was in school (1994-96) and thought I was going to spend my life with them. Other than some simulations with very expensive software that had been donated by Motorola I never actually used FPGAs. But from when I first started looking at them, I thought they would be the building blocks for an AI machine that could be added on to forever. I still think so, but I thought that there would be more visibility into the internal workings of the chips.


TimVideos is an open source project using FPGAs and Python for conference (and other) video capture: https://hdmi2usb.tv/home/

So if you're interested in hardware, FPGAs or Python, it's a great opportunity to get hacking! They are also part of GSoC this year, but it might be too late to apply.


I'm not exactly clear on what the post is advocating. If it's saying that there should be an open source implementation of an FPGA, then I just have to say that I think there's no way that's happening anytime soon. There are way too many hurdles.

If the argument is just that the open source community should leverage FPGAs more as a means of creating more powerful "open source" hardware, and that there should be more resources for people to learn how to write hardware, then I guess I agree with that. But I don't think FPGAs will be the panacea the author seems to think they will. FPGA implementations will always entail a performance and/or efficiency hit compared to ASIC implementations, and I think many people won't want to take that hit, limiting the number of users who are willing to adopt the open source solutions.


Author here - I'm advocating the latter.

I agree with you to a point. However, I believe that those things that have been with us for decades: sound cards, 2D graphics adaptors, network cards, etc. can be done in FPGAs, and should be.

The speed is there, and the power used by the southbridge and peripherals is eclipsed by the processor and the screen backlight, so I don't think the power consumption is worth worrying about (I'd be interested to see evidence to the contrary, though).

Put another way: much of the foundational chips on motherboard are no long performance-sensitive, so we shouldn't be paying a compatibility price for it.


Why not just choose a small set of "golden" chips, create high quality drivers that abstract away incompatibilities if possible, and verify the heck out of that ?


That's… sort of happening with laptops. Pretty much any modern laptop with an Intel CPU uses a small set of Intel chips for everything. My dmesg includes:

em0: <Intel(R) PRO/1000 Network Connection>

iwm0: <Intel(R) Dual Band Wireless AC 7260>

xhci0: <Intel Panther Point USB 3.0 controller>

ehci0: <Intel Lynx Point LP USB 2.0 controller USB>

ahci0: <Intel Lynx Point-LP AHCI SATA controller>

drmn0: <Intel Haswell (ULT GT2 mobile)>

Most laptops from the same generation use the exact same set of chips.


The trouble with "golden" chips is a) the manufacturer can EOL it at any time (and I don't think just the OSS people are numerous enough to keep any specific set in production), and b) it freezes things in time. For things that don't change much (e.g. sound cards), no problem, but there's still improvements to be made in other areas.

The focus for FPGAs has traditionally been performance, but I think there's a model where they can be used for both long-lived compatibility and cutting-edge devices living together on the same chip.


But a fully open-source, verified implementation - could be the basis for a structured asic, that improves performance and cost with volume, and production may start to happen at kickstarter scale.

Altough still, manufacturers and governments could create bugs, if they want.

But what will the big difference be between a minimal set of "golden" commercial chips, that have fully open-source drivers and a comprehensive test suite ? not much, it seems.

Maybe the way to attack this is seek the places where an FPGA implementation(over say a desktop/laptop? card) would be superior and offer that. One such place could noise cancelling algorithms - such high quality headphones are very expensive(and highly proprietary), need very low-latency compute, and offer an interesting challenge/competition for open-source designers.


Having read through the comments, I'm surprised that no one is talking about the true benefit of a tech like fpga. It's that it's field programmable, with many that can be reprogrammed in fewer cycles than a cache miss. This means that ultimately, fpgas have the potential to be an optimization flag in your favorite compiler or jit.

I see this as an inevitable convergence, especially when rumor of fpga transistors/mm^2 growing faster than cpu transistors/mm^2. What I can't tell is how much of what people talk about fpgas is hearsay, and how much is simply exaggerated. I know that every time I start to look into fpgas I always feel let down compared to their potential.


Since I didn't see any other good place to post feedback, I thought I would point out that it's "sheer number" not "shear number."

Also, lowrisc.org is not accessible via HTTPS.


Thank you, fixed!


Also, now that I've had time to start going through things, I wanted to add: great job! This seems like a pretty fantastic introduction so far. Although it might be helpful to provide solutions/additional hints in case someone gets stuck. (I'll admit I'm currently a little stuck at the "boss battle" in chapter 1.)


A sorry, I missed your comment! I'm currently working on getting a forum up so that people can help each other through challenges. In the meantime, if you're still stuck feel free to email me at heath@blinklight.io and I can give you some hints :)


Personally, I'd rather have the ability to create a chip for $5K.

It's too stupidly expensive for the CAD tools (>$100K) when a wafer run is less than $20K for a very old process nowadays.


$5K wouldn't cover a single photomask, unless you're talking VERY OLD process.

Maybe there are direct-write (laser or ebeam) litho foundries... I don't know.


Multi-Project Wafer [0] services like MOSIS [1] reduce the costs significantly. Not sure what the lowest practical budget for a project is, but $5k would be in the ballpark. Access to student licenses for design software is possible, too.

[0] https://en.wikipedia.org/wiki/Multi-project_wafer_service

[1] https://www.mosis.com


What about structured asic companies like eAsic and baysand ? don't they have a flow from fpga to a 65nm/45nm structured asic , starting at $70K ?


The trick is getting more EE's to spend their grants on open-source FPGA's and tooling:

https://www2.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-...

http://opencircuitdesign.com/qflow/welcome.html


The author's example about a huge multitude of ethernet drivers doesn't really map well to the solution suggested (i.e. A big blank slate of FPGA logic with a bunch of ADC/DAC/codec fabric on the chip periphery). The wire side of Ethernet is really well specified, and tightly implemented. (Your chip won't be 802.3 compliant if it's not!) That's a result of really good specs on the IEEE's part. They put a TON of time into detailing how the Ethernet physical layer talks chip-to-chip. The driver side implementation for Ethernet, however, is basically the wild fucking West.

There are cases where the author's approach makes sense - software defined radio jumps to mind. Adding one more layer of abstraction to ethernet drivers isn't one if them. That seems like an area where software could learn from hardware - namely, that good specs drive good implementations.


AFAIK, the digital interface of the physical layer (PHY) of the Ethernet stack are pretty much standardized (MII/RMII/GMII/RGMII). Most SOCs implement their own MAC layer, but are usually compatible with off-the-shelf PHY chips. The function and interface differences in proprietary MAC implementation is where the complexity of the kernel driver comes from.

Suppose that we also standardized the interface between the MAC layer and the kernel and add an FPGA inside the existing PHY chip which is opened to real-time programming, a ton of optimization can be done. For example, DDOS packets could be thrown away well before reaching the kernel space.

Previously I had done that with some low-cost FPGA kits, with less than 1K of HDL codes, it could get out only the IP packets wanted and then pass the packets directly to the application layer via the memory interface between the embedded CPU and FPGA logics.

I think a programmable network interface is well worth the money if we care enough for openness and efficiency.


You're right - MII and it's derivatives are fairly standard. I agree that the MAC -> CPU interface is the root of a lot of the complexity that the author writes about. However, that still sort of serves to illustrate my point: the software to interface with the MAC is the chunk that introduces the most variability. Standardize that and a lot of your problems are solved.

Integrating a bunch of analog/digital/vice versa fabric onto the same die is not a tenable solution to this particular issue. I'll admit the concept is mildly intriguing, but I think you'd find quickly that the silicon demands for a DAC/ADC of the quality needed to emulate any high speed waveform are a little out of reach fab wise. Too much space, not enough drive speed.


Hi there,

we started a Kickstarter campaign a few days ago. We would really appreciate your expert feedback to our FPGA related board. Link: http://kck.st/2orXGCv

Our board connects a Raspberry Pi to the DE0 Nano. The FPGA can be programmed and reconfigured by the Raspberry Pi. The overall goal of our company is to make the entry to the FPGA world as easy as possible.

Motivation for our connector board: We try to avoid the proprietary tools as much as possible.

My experience with FPGAs so far (and only my opinions! please roast me)

CASE 1: Only FPGA, no processor: To complex for bigger projects, because of the missing high abstraction layer. Drawbacks: Slow development process; To make something useful you need a lot of stuff in your FPGA. Very boring for beginners and in my opinion also for experts;

CASE 2: FPGA with Soft-Core processor: For my bachelor thesis I once used the SCARTS Soft-Core processor from open cores in my DE0 Nano. Using an arbitrary Soft-Core processor from open cores can't be done Out-of-the-Box. You have to be quite experienced working with FPGAs. For my thesis for example I had to write my own SDRAM controller and add an additional pipeline stage to the processor. Drawbacks: Soft-Core processor quite slow; To hardcore for beginners; Simulation time makes you consume a lot of coffee;

CASE 3: FPGA with processor on chip: Advantages: High speed interconnect between FPGA and processor; Fast processor; Disadvantages: Being fully dependent on the proprietary toolchains. In my opinion the "FPGA only"-tools suck, but the so-called "system builders/designers" drove me crazy;

CASE 4: FPGA with external processor: In my opinion this is by far the best compromise. With some of my colleagues at the university I once made a bitcoin hashing cluster with our student boards. We also had an atmel microcontroller and a PC for that project. We just needed two days to make a fully working system. So FPGA programming can be very easy actually.

But if you combine the Raspberry Pi and the DE0 Nano it should be even easier.

With Raspberry Pi you have a clean and maintained Linux and with the DE0 Nano you have a powerful and still quite cheap FPGA board.

Again I would really appreciate any feedback. What do you do with FPGAs and how do you approach bigger FPGA projects?

Best regards, Joe


Cool project.

As someone from the FPGA industry, bigger projects just use your CASE 3 on a board to start prototyping. A Zedboard or similar. Cheaper version could be the Zybo. It's like $500 vs $200.

When you get to production stage, you whip up your own board.

What exactly is your complaint about the propriety tools? You are using Altera tools with the DE0 Nano.


Thanks

k. Interesting. A former employer of mine used CASE 4. The FPGA was connected to the CPU via a memory interface. In general this option should be more flexible. You can choose the exact processor and FPGA type you want. On the other hand CASE 3 may has components which are optimized for each other. But for an industry product I would feel to be in a better position in CASE 4 if one of the components is discontinued.

About the complaint. It was a bit too emotional, because I had some specific issues in mind, that cost me a couple of forum searches and hours. Included GUI bugs, some none-intuitive settings and IP blackboxes that weren't working or buggy. Not only Altera but also Lattice. The Lattice FPGA contained a Hard-IP SPI. Didn't do anything. I posted on the forums, a couple of users replied who had the same problem but no reaction from Lattice at all.


I think CASE 3 is going to be much faster due to the high performance AXI ports that Zynq's have.


Yes. That's true


Well there are some things like http://papilio.cc/

I personally backed the papilio duo kickstarter. I haven't done anything with the board yet but there has been some small stuff...and some pretty complex projects too.


Are there any fpga open hardware implementations available?


Actual chips, not as far as I know. There is a completely open source tool chain for a small FPGA, the icestorm project.

The VPR project by university of Toronto has several architechure models defined, but there is probably no way to turn these into a real chip.

FPGa vendors in general are extremely secretive about their designs.

(I'm a grad student working in The FPGA space)


> FPGa vendors in general are extremely secretive about their designs.

That is the part that saddens me. I had always wanted to be able to play with the internal programming. Back when I started they were not anywhere as big, and we almost had visibility.


At least with VPR you can define /modify your own FPGA architecture file! They are written in xml and parsed by the tool when doing place and route.


If you mean stuff that can run on FPGA's then https://opencores.org/


I also mean the actual hardware. :)


This is the only project I know of.

http://www.clifford.at/icestorm/


Quite sure the answer is no.

There was a student project a while ago but it was more of a proof of concept. Open source chips of any kind are few and far between, and generally not at the cutting edge, technology-wise.

Patents, as usual, will make this hard, before we even get to the economic aspects.


I'd be more than happy with 1997-era (20 year patent expiry) FPGAs if they were open. Unfortunately, I don't think anyone else would.


Me too. For some things you need to be as fast or cheap or low-power or whatever as the competition to win, other times you just want to see how far you can go.

At my hackerspace we've been mucking about with Clifford Wolf's PicoRV Risc-V implementation on Lattice Ice40 FPGAs lately. This'll give you something that's maybe on par with a lower-end ARM Cortex-M, plus the flexibility of the few thousand spare look-up tables. All open and hackable with open source tools (except for the actual FPGAs, mind). It's not much, but it's a start.


None that I'm aware of. It would be a great leap as it would open the way for open source tools that go beyond netlist generation, but right now all major vendors keep their architecture proprietary and you need their tools to program their FPGAs


The course at that site is pretty interesting. There is an error that doesn't affect any of the examples but might be a problem for future circuits: what are called OR are actually XOR. It took me a while to figure it out because the notation for "don't care" was not obvious to me (it was explained later in the text after table circuits are introduced).

I was able to build a circuit to test this by copying an OR to the display block and hooking up the inputs to the buttons and the output to column 1. It might be interesting to have some scheme to test sub-blocks directly.


Aside from the acceleration angle, the maker community would also benefit from a richer ecosystem around FPGAs, especially inexpensive ones.

They are very handy for input and output of signals where you need precise timing.

Things like oscilloscopes, video cards for vintage displays, driving led billboard modules, and so on. All of which either aren't possible, or aren't optimal/scalable on either Arduino or Rpi boards.

There are also bits like drop in clones of atmega microcontrollers that run on the fpga, so you can leverage some of what you already know to interface with it.


> FPGA development has always been an industrial activity, dominated by brutalist, opaque, and proprietary tools.

Brutalist tools? As in, made out of raw concrete (béton brut)?


Xilinx ISE does give the notion of raw concrete.


Question to all: is there a ycombinator for open source projects? Is it even possible to have one without profit as a motivator?


> Question to all: is there a ycombinator for open source projects.

I think it's called "YCombinator".

> Is it even possible to have one without profit as a motivator?

YC funds nonprofits, and open source, in any case, can (and often does) have a profit motive, so either way it seems YC would potentially be an option for an open source project.


YC does fund nonprofits.

Also open source does not necessarily mean non profit.


While tangential to the main point, it's weird to read about the failure of the open source phone knowing about the Android Open Source Project. What's missing from AOSP is Google Play Services aka proprietary web services.

Open source has been a huge hit on the client, it's the firewalled server that has slowed the spread of free software.


The Open-Source community needs Open-Source HDL compilers. Until this happens OSS for FPGAs will continue to be slow.


Like clash [0]?

[0] http://www.clash-lang.org/ ?


No. Clash is a transpiler from Haskel to other HDL languages. It still needs propritary compilers.


Nice coincidental timing... I literally just ordered a Lattice Ice-40 eval board yesterday, and am hoping to get started using IceStorm to do some FPGA design stuff. It's all brand new to me, so any good entry level stuff, tutorials, docs, etc. on any of this stuff are very much appreciated.


Author here - have a look at the main site, it was made just for that purpose: https://www.blinklight.io

It's based around the iCE40/IceStorm, but it has its own specific "dev board". I'm planning on making it more generic later on.


I am trying to teach myseful CUPL on Atmel 16v8's as a cheap, super basic way to get my feet wet in the FPGA/ CPLD/ GAL world. I have a breadboard half wired up to blink lights based on input, etc. I like older technology, but hope that what I learn applies to more modern FPGAs.


Unrelated to the article content, but the webpage zoom is broken on mobile.


Do FPGAs have a future beyond "prototyping non-memory intensive algorithms for eventual ASIC implementation"?

It seems to me that the scale out + scale up method of x86 and GPUs are the most promising and profitable arenas still, besides some very niche and very particular applications?

I could and would like to be wrong. I'd go learn FPGA dev then if there were things that needed accelerating and were personally lucrative to me :)


They for sure have a future. But imho the main usage scenario for FPGAs is not the acceleration of things which are otherwise run on CPUs and GPUS.

I think one of the main values of FPGAs when you need custom hardware/"glue" between other chips, because you e.g. need to connection 30 I²S connections to your application processor or need to handle multiple video streams with different formats for which no custom IC exists. That's for sure niche applications, but lots of these products reach 6digits of sales. The other usecase is when you need superhard-realtime behavior, which means things should work exactly synchronously to a hardware clock. E.g. I used it back at university for radar systems, but there are also lots of applications around telecomms, networking and audio/video processing and transmission. These are often low to medium volume products, where custom ASICs are not lucrative or updateability is required. Afaik custom ASICs also got more expensive over time through higher process costs, so you need a higher volume to justify it instead of going for FPGAs.


and here is an example of exactly that sort of application:

http://accelconf.web.cern.ch/AccelConf/ICALEPCS2013/papers/t...


Ah someone that's actually used them eh :) ? My experience is the same.

People have this impression that they're being used for all this exciting stuff when the average FPGA is because you need some multiplexing on IO lines and you're not running a RTOS. Basically using them in place of what would be software if your micro was better suited to your application


They've grown beyond prototyping a long time ago. They are full featured SoCs with great flexibility that fill niches where CPUs, GPUs, ASICs can't compete in flexibility/power consumption/development cost.

- ASICs are too rigid and require high volumes to be profitable

- GPUs are too power hungry

- CPUs are not good for massively parallel processing

FPGAs are heavily used in industrial/military/aerospace applications


If you think GPUs are too power hungry then you're in for a shock with FPGAs. Switching FPGAs are incredibly power hungry since they run on much larger processes.

FWIW most modern FPGAs use discrete DSPs anyway so you're not really getting the flexibility at that level.


The processes used for FPGA are VERY competitive with GPUs. Stratix 10 is at 14nm, for example. Stratix V was built using 28 nm process and that was at least 5 years ago - on par or exceed NVidia.

You can fuse more operations into DSP using FPGA and/or you can perform less operations per FLOP. One example is to avoid rounding and packing/unpacking when creating deep pipelines for floating point processes.


Cheapest Stratix V I can find on digikey starts at ~$1,800 and quickly goes up to $10k for the bare chip in quantities of 24.

For open source hardware I doubt you'll see people shell out the cost of a used car to be able to match modern ~$200 GPUs.

I don't even what to know what Stratix 10 starts at.


The complete devkit solution is about $7K - https://www.altera.com/products/boards_and_kits/dev-kits/alt...

The Xeon Phi board https://www.cnet.com/products/intel-xeon-phi-coprocessor-712... is $4.2K..$5K

The Stratix V board will not consume more than 60W when used as PCI Express card. The requirements for Xeon Phi is at least 250W.

With a difference of ~200W, there will be difference in ~4.5kW/h per day or ~1600kW/h or $160 in hard cash per year (US average). Very probably more - getting rid of heat produced, etc.


Wow. These things just scream military use. My guess is phased array radar.


I worked on a space based imaging radar which used an FPGA. It cost mid 6 figures per chip...

This was admittedly a space certified radiation hardened chip. Still alarming when you had to pick it up and carry it somewhere


> Switching FPGAs are incredibly power hungry since they run on much larger processes.

I don't think the comment about process is really true, from what I can tell, Xilinx is only a few months behind the biggest SoC makers in terms of its process adoption, and is shipping 14nm parts currently. Not sure about Altera, but they are on Intel's process, which is bit ahead of the competitors anyway.

In terms of switching power, you definitely pay a penalty to have the reconfigurability in hardware, but on the other hand you don't have all the unused logic that you would on a GPU. I'd guess the comparative efficiency depends on the specific problem and specific implementation, but I don't have any numbers to back that up.


It's a couple things, process is a large part. You're also dealing with 4-LUT instead of transistors so you pay both in switching power and leakage since you can't get the same logic-to-transisitor density that's available on ASICs.

Also there's a ton of SRAM for the 4-LUT configuration so you're paying leakage costs there as well.


Tell me more about leakage.

NVidia managed to get it right about year and half ago. Before that their gates leaked power all over the place.

The LUTs on Stratix are 6-to-2, with specialized adders, they aren't at all that 4-LUTs you are describing here.

All in all, there are places where FPGAs can beat ASICs. One example is complex algorithms like, say, ticker correlations. These are done using dedicated memory (thus aren't all that CPU friendly - caches aren't enough) and logic and change often enough to make use of ASIC moot.

Another example is parsing network traffic (deep packet inspection). The algorithms in this field utilize memory in interesting ways (compute lot of different statistics for a packet and then compute KL divergence between reference model and your result to see the actual packet type - histograms created in random manner and then scanned linearly, all in parallel). GPUs and/or CPUs just do not have that functionality.


The Arria 10 (previous high-end Altera series) was at 20nm. The new Stratix 10 is at 14nm. UltraScale+ is in the 14-20nm range, I think, and Xilinx got there first.

(I don't know if you can publicly get Stratix 10 devkits yet, but you can get an Arria at least.)


Achronix had production Speedsters at 16nm in 2016. They should be on that list given they're 1.5GHz or sonething like that.


The unused logic part isn't exactly true. The way FPGA's are built doesn't allow for unused sections to be completely shut off. Instead of dark silicon it's more like grey silicon. The unused parts of the chip still use substantial power, unlike ASIC where these unused portions simply wouldn't exist


Not my personal/professional experience. Even with heavy DSP usage with nodes >14nm you can still create designs with lower than 30 W power consumption, since you can control the frequency and use low power design techniques. There are vendors like Microsemi/Lattice that specialize in low power FPGAs where you can do even better


So there might be specific products that hit certain market segments(FWIW I really like Lattice's offerings). It's just that Watt to Watt GPUs should be more efficient since they can use the latest process and don't have to carry around 4-LUT + SRAM.

On the DSP side, you're using a ASIC DSP(can't change the width for instance) anyway on most modern FPGAs so you're comparing ASIC to ASIC at that point.


Designing an ASIC with low power in mind will always beat an FPGA, no question there. But the design cost is prohibitive for many applications (not to mention the lack of flexibility to re-iterate/patch your design with little to no cost). But compared to GPUs I'm fairly certain you can do much better power-wise going with an FPGA. You have finer control over your frequency, over which sections to power down, how often to switch, etc.

Of course you can get better price/GFLOP with GPUs + quicker time to market


And yet they (GPUs) aren't (better for FLOP/watt than FPGA) everywhere.

GPUs have very particular cache and computation hierarchy which is not necessarily a best/good fit for all problems that are being thrown at them.


FPGAs are typically used as the some of the first non test chips for a new process due to their fairly regular structure.


And it's a valid use-case for putting a lot of SRAM on a chip.


That's really interesting


Agreed!

Just being a SoC has another advantage: reduced part count, so you can fit more functionality onto a smaller PCBA.

Another commenter mentioned the Cyclone V and Zynq SoCs. My day job is in the telco industry. The equipment vendors we sell to are always pushing for higher and higher densities in their chassis and line cards, and it drives a lot of our design decisions. Chips like the Cyclone V and Zynq help a lot in achieving those aims. Our PCBAs have shrunk dramatically (with increasing functionality) over the last decade.

The continual pressure to reduce costs and power consumption also leads to choices like minimizing the other available system resources (e.g. clocks, memory bandwidth, RAM, flash), which can end up reducing the effectiveness of a GPU.

As with everything in engineering, there are many different problem domains, each with unique constraints dictating different solutions.


You would need to compare performance per watt for your particular application. If you just read the Box, a GPU uses 250 Watts, while an FPGA is tens of Watts. But then consider that you are getting a large amount of very high speed memory and much higher clock speeds. Then once you take that into account, you can begin to start analyzing which is better for your project.


Sometimes they're good for "need ASIC-like results in a low-volume design" but that's more a special case of what you give, and of course the 'upgradable glue logic' case also is in play.

I think the killer application would be to have programatically reconfigurable FPGAs that reconfigure on the fly to fit the needs, which has been discussed in various overhyped ways in the past, but I don't think the tooling is there. This effort might or might not make that possible -- imagine a chip where, say, the regex you're going to use to scan 1GB of data could be turned into an FPGA and then it just shovels the whole thing through a "HW" FIFO that detects matches much faster than a processor could. Nifty... but is it worth the effort to do?


This is what I've used them for.

Low volume design as in thousands of chips a year rather than millions. Or cases where reprogrammability greatly improves the ability to provide fixes and improvements in the field as a part of very expensive support contracts.


You can kind of do this now if you want. The SOCs like the Cyclone V or Zinq have an arm core or two surrounded by FPGA fabric. The only barrier right now would be the dynamic reloading of the FPGA which I haven't seen done, but would be simple in principle (as opposed to having a soft core that would likely have it's state interrupted if you re-flashed everything)


Most newer Intel (Cyclone and beyond) and Xilinx (-7 series) support various levels of partial logic reconfiguration. If you plan your device layout right, you can simply reprogram a region of fabric that is hardwired wired to the 'shell' of your system (which is connected to the peripherials). If you have setups like Vivado's SDSoC/high-level synthesis, you can do things like dynamically reconfigure OpenCL kernels, etc.

Of course, the problem is synthesis, verification, testing etc is all rather difficult in practice and takes a long time. The boards themselves also can't do synthesis/compilation, obviously, since they're fairly limited and it's a time-consuming process. You'd have to make it act more like an OTA update system, but that comes with a lot of its own problems...


this is possible today. check out the digilent pynq ( www.pynq.io ). i just bought a dev board, i'm pumped to try a few things out.


I really wanted to get that board - but the price difference between academic and regular users was a bit too much given this was coming out of my own pocket (I ended up buying used FPGA dev boards from ebay, etc.).

I recall there was a lot of excitement around the Pynq. Has anything notable been done with it yet?


FPGAs are a useful means of accelerating any algorithm where you want more performance than you would get from executing it on a CPU, but can't afford to create your own ASIC because the volume is relatively low. For example, it may be used in DSP applications for things like radar. This use as a reprogrammable accelerator is also why Amazon has introduced instances with on-board FPGAs to AWS.


Also worth noting that for real time control you can't beat the speed. Essentially, everything you do on an FPGA happens at the same time. You could see a 1000 line Verilog file that completes a huge operation in a handful of cycles.


Also note that "speed" includes latency, and not just low latency but also deterministic. You can do things like adjust a timestamp or tweak a control equation knowing that a sample has been delayed through an N-stage pipeline/filter that advances at M hertz.


Yea, I misspoke there. %s/speed/latency/

Certain industries (esp nuclear) really like them because they can "prove" the set of attainable states in the FPGA.


Exactly.

There is an algorithm for detecting bursts of unusual activity which is extremely well suited for FPGAs: http://www.cs.nyu.edu/cs/faculty/shasha/papers/burst.d/burst...

Basically it is a tree of registers with checks. It can compute burst position and duration in the O(log(window size)) time (clocks). You look here at 2.5..5ns (200MHz..400MHz) multiplied by log2(window size) - 25..50 ns for window with 1024 samples. You just cannot get that kind of connectivity with CPU/GPU. Processing these samples in CPU will get you into several hundreds of ns, if not more.


>I'd go learn FPGA dev then if there were things that needed accelerating and were personally lucrative to me

HFT


Heat (and as a result: size), power generation and cost.

There are plenty of problems where, sure, you could get an x86 or a powerful GPU and do it faster, but you'll be paying for it with power usage.

E.g. take [1] - they're making a modern M68k compatible CPU in an FPGA. They're getting performance that's beating ASIC ColdFire CPUs (Motorola/Freescale's M68k descendant) and ASIC PPCs with several times the clock, and beating the fastest "real" M68k systems by a factor of 3x-10x.

You could beat that with a software emulator on a powerful enough x86, sure. Easily. Especially if investing enough time in dynamic translation and the like.

But this thing instead fits on a far less power hungry FPGA that gives off far less heat, and fits on a board that'll fit in one of the real small case Amiga's - try to do that with a x86 with heat sink..

[1] http://www.apollo-core.com/


The Apollo core only has a market because Amiga fans are willing to pay a premium in price, complexity, heat and reduced performance to use something other than ARM or x86. I wouldn't be surprised if software emulation on a low-power, cheap ARM core could beat it; JIT on a powerful x86 is apparently 5-7x faster. It's also the exact opposite of open source, being a proprietary core tied to a single-source commercial boards.


> The Apollo core only has a market because Amiga fans are willing to pay a premium in price, complexity, heat and reduced performance to use something other than ARM or x86.

Heat? Really? A modern x86 with a heatsink won't physically fit in the cases these boards go in. To be viable these cards must run cooler than the alternatives - that's part of the point.

> I wouldn't be surprised if software emulation on a low-power, cheap ARM core could beat it; JIT on a powerful x86 is apparently 5-7x faster.

The current boards are kept cheap - they're nowhere near using the fastest available FPGAs, and the cores keep making massive performance leaps iteration to iteration and have been tested and works on FPGAs more than twice as fast as the ones on the current Vampire boards. The publicly released cards are now beating ASIC PPC cores clocked 7x as fast on some benchmarks. If JIT on a powerful x86 is only 5-7x faster now, odds are they'll be able to reach parity with the UAE JIT soon.

> It's also the exact opposite of open source, being a proprietary core tied to a single-source commercial boards.

Which is entirely irrelevant to the point I was making.


They are better than GPUs at machine learning inference, so there's that. Ask those guys for some benchmark results, you'll be impressed: http://mipsology.com


FPGA hardware is vendor locked and incredibly proprietary. The current business model sells chips at cost or below and makes it up with software licensing.

This is a huge problem for open source adoption, and unless manufacturers change the business model we will never see widespread use of FPGA's.


As an FPGA designer, I can't disagree more with your statement of their business model. They make money on chips. That is indisputable.

My last 3 designs used the vendor's free software, except for some IP we bought (IP=Intellectual property, a specific core). You might think of the IP as a library you would buy as a software engineer.


This is entirely my experience also. The chips themselves are very expensive and it's exactly why you move to an ASIC for volume as soon as you can. Assuming you can... If you are not shipping in volume then maybe the toolchain licensing becomes an issue. And if you are not shipping because you're a hobbiest, student, etc. then some of them grant you free licenses, they used to at least.


bcarlton0, what area of business are you in? I'm a young engineer working with FPGAs in the medical space and I'd love to hear about what else is out there.


Ductapemaster, I have done designs in networking (e.g. packet processing and 10 Gb Ethernet), wireless (e.g. baseband part of a modem), glue logic, and other more specialized areas. FPGAs have a wide variety of uses other than ASIC prototyping and small glue logic.


The difference with software is that I don't need to buy the library to use the computer. Not sure which FPGA's you're using since all the big players force you to use their shitty tools and charge massive license fees for the privilege


thowayedidoqo, I can still use them without the libraries. For example, I could write my own FFT or compression routine. It just may be cheaper to license them from the vendor or a third party. I have done plenty of designs without paid IP. I have used IP from Xilinx, Altera (pre-Intel), and third parties.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: