Hacker News new | past | comments | ask | show | jobs | submit login
Nyuzi – An Experimental Open-Source FPGA GPGPU Processor (github.com/jbush001)
165 points by peter_d_sherman on Feb 14, 2021 | hide | past | favorite | 68 comments



There's people keeping OpenVGA alive[1]. With the failure of the open grahics project[2] is there any known promising projects besides libregpu[3]?

[1] https://github.com/elec-otago/openvga

[2] https://en.wikipedia.org/wiki/Open_Graphics_Project

[3] https://libre-soc.org/3d_gpu/


>> is there any known promising projects besides libregpu?

I think the most useful thing right now would be a high quality version of the "easy" parts of a GPU. Basic scan out, possibly overlays, color space conversion, buffer handling. This would allow ANY open processor projects to have frame buffer graphics and run LLVMpipe for basic rendering and desktop compositing. This may be slow, but it is required for every open GPU project, while a SoC can live without the actual GPU for some applications.

IMHO, first thing first.


this is easy to chuck together in a few days, literally, from pre-existing components found on the internet.

* litex (choose any one of the available cores)

* richard herveille's excellent rgb_ttl / VGA HDL https://github.com/RoaLogic/vga_lcd

* some sort of "sprite" graphics would do https://hackaday.com/2014/08/15/sprite-graphics-accelerator-...

the real question is: would anyone bother to give you the money to make such a project, and the question before that is: can you tell a sufficiently compelling story to get customers - real customers with money - to write you a Letter of Intent that you can show to investors?

if the answer to either of those questions is "no" then, with many apologies for pointing this out, it's a waste of your time unless you happen to have some other reason for doing the work - basically one with zero expectation up-front of turning it into a successful commercial product.

now, here's the thing: even if you were successful in that effort, it's so trivial (Richard Herveille's RGB/TTL HDL sits as a peripheral on the Wishbone Bus) that it's like... why are you doing this again?

the real effort is the 3D part - Vulkan compliance, Texture Opcodes, Vulkan Image format conversion opcodes (YUV2RGB, 8888 to 1555 etc. etc.), SIN/COS/ATAN2, Dot Product, Cross Product, Vector Normalisation, Z-Buffers and so on.


Seriously? VGA with DVI outputs? And a link to a Sprite engine?

We need HDMI output, preferably 4K capable. I also mentioned colorspace conversion. Should have also said to "just throw in" video decoder for VP9 and AV1 if that's available. The point is that the likes of SiFive and other Risc-V SoC vendors should be making desktop chips, not just headless Linux boards or ones with proprietary GPUs.

Like I said, the "easy" part should be done and available - not theoretically assemblable from various pieces.

If this were readily available, I'd be able to buy it from someone today. There IS a market for it and that will be growing fast. Add a real GPU and things look even better.


The problem with HDMI is the patent encumbrance. DisplayPort would be a better target for "libre" applications. It looks like you may even need to be actively licensing HDMI to buy a displayport to hdmi bridge as well.


yes. HDCP has "infected" HDMI, eDP, USB-C and so on.

this can entirely be avoided with:

* TFP410a (RGBTTL to DVI/HDMI)

* SN75LVDS83b (RGBTTL to LVDS)

* SSD2828 (RGBTTL to MIPI)

* various other converter ICs

4k btw is MENTAL bandwidth. multiply 3840 by 2160, then by 4, then by 60: this gives the number of bytes per second required of the internal memory bus.

turns out to be 2 gigabytes per second, doesn't it?

now check the datasheets on what affordable FPGA boards can do, what memory ICs they have, and whether they can cope with that level of bandwidth.

you'll find that there aren't any.

i do find it ironic that "incremental steps" are recommended, "to get something running", but lack of knowledge of the difficulty surrounding DRM and in ramping up to high speed leads to "disbelief". ah well :)


To be fair, I wasn't constraining myself to "affordable" FPGA boards. An expensive FPGA board is a one-time expense, but the license for a proprietary HDMI IP core and HDMI membership is at last an order of magnitude larger, and a recurring expense. You can get some fairly high-end FPGAs before needing the paid tier of the Xilinx tools.

As far as memory bandwidth goes for 4K, HDR10 content is about 2 GB/s, regular SDR content is 1.5 GB/s for normal 24 bit color, and if you're running SDR content with chroma subsampling (4:2:0), you're at 750 MB/s.

If we're considering the "affordable" tier of FPGA boards:

An Artix 7 can be found on boards <= $200, and the GTP channels are capable of HDMI 2.0 speeds (6 Gb/s per lane). HDMI uses an 8/10b encoding, so the gearbox setup would be 600 MHz, which is doable on the -3 and -2/-2L speed tiers. You can also use 64 bit DDR3 @ 1066 MT/s (8.5 GB/s) on -3 speed grades and 64 bit DDR3 @ 800 MT/s on -2/-2L speed grades (6.4 GB/s).

You don't necessarily have to be rendering a 4K stream from RAM to want a 4K output. You could be generating the higher resolution content on demand and not be putting that much pressure on the memory. Think tiled graphics, or native resolution overlays over an upscaled input stream.


unfortunately the data still has to be in a full 32BPP framebuffer (in order for 2D or 3D Window Managers to write to it).

all the graphics software will write at that full 32BPP rate: it is unfortunately not reasonable to expect all Graphics Software to perform data compression.

yes there will be a few opportunities for compressed streaming, and funnily enough we were just thinking "what if the GPU were to do the same trick as the Amiga used to do, 40 years ago, by following the scan-lines?"

in the case of video playback you might reasonably expect that an area of the screen is "reserved" (not written to) but that when the Video-Output HDL gets to that pixel instead reads directly from a completely different device: one that is formatted already in HDR/SDR or YUV.

effectively this is a modern "sprite" engine.

... but, again, we have significantly diverged from the original "Just Make It Real Simple Why Don't You Just", yeh? :)


Yeah. I also miss the small but firm steps approach.


this is what we're doing in Libre-SOC, and using standard python software engineering practices (writing unit tests at every step of the way) to do it. some of those unit tests are actually Formal Correctness Proofs (asserts, but for hardware).

i was stunned to find that in the hardware world, test-driven development is not standard practice: people simply haven't been trained that way. they write thousands to tens of thousands of lines of unverified code and finally do an integration test.


I'm a total lay person here, but my understanding is that designing a new processor is very challenging these days because of the patent situation. That is, so much in hardware design is patented that you're bound to run into problems if you don't know what you're doing.

Is this true, and is it of relevance here?


Not sure of how relevant it is here, but yes, GPUs architectures are bound by tons of patents so you can bet your a$$ that if you were to commercially launch your own GPU IP, you'll have Nvidia's and AMD's lawyers knocking on your door in under 10 seconds.

IIRC most companies out there selling GPU IP are still paying royalties to AMD for their patents on shader architecture which they got from their acquisition of ATI which in turn came from their acquisition of ArtX which was founded by people who worked at the long defunct SGI (Silicon Graphics).

The funny thing is, if you backtrack through all GPU innovations, most stem from former SGI employees.

When 3Dfx went under, even though Nvidia's GPU tech was already superior to anything 3Dfx had, Nvidia immediately swept in and picked their carcass clean, mostly for their patents in this space, so they would have more ammo/leverage against competitors going forward.

Regardless how you feel about patents, with their pros and cons, hardware engineering is a capital intensive business and without patents to protect your expensive R&D, it wouldn't be a viable business.


I learned GL in the 1990s on SGI systems. Shaders didn’t exist, poly counts were in the 100s, and textures were a massive processing burden. The rendering pipeline of course was quite different. And yet so much is the same! Code organization, data types, all is quite familiar, whether it’s OpenGL or DirectX or what not. The achievements of SGI engineers have literally benefited generations.


Jeff's evaluation of GPLGPU is fascinating: https://jbush001.github.io/2016/07/24/gplgpu-walkthrough.htm...

you are absolutely correct in that everything has moved on from "Fixed Function" of SGI, and how GPLGPU works (worked) - btw it's NOT GPL-licensed: Frank sadly made his own license, "GPL words but with non-commercial tacked onto the end" which ... er... isn't GPL... sigh - but everything commercially has now moved on to Shader Engines.

that basically means Vulkan.

however you may be fascinated to know, from Jeff's evaluation, that there are still startling similarities in basic functionality in not-GPL GPLGPU and in modern designs targetted at Shader Engines.


To clarify what I think is the relevance, as well as to explore my own questions:

If someone were to clean-room design their own GPU chip, how likely is it that Nvidia and AMD would come down on them anyway simply by virtue of the fact that they (presumably) have patents on everything that you could think of putting in that chip?

In essence: do you now have to be an expert in what you’re not allowed to put in before you even start?


So here's what I would do if I were in this situation. I wouldn't build a graphics processing unit per se, but instead would build a highly parallel SIMD CPU organized in workgroups, and with workgroup-local shared memory. These cores could be relatively simple in some respects (they wouldn't need complex out-of-order superscalar pipelines or sophisticated branch prediction), but should have good simultaneous multithreading to hide latency effectively.

Then, if you wanted to run a traditional rasterization pipeline, you'd do it basically in software, using approaches similar to cudaraster (which is BSD licensed!). The paper on that suggests that it would be on the order 2X slower than optimized GPU hardware for triangle-centric workloads, but that might be worth it. The good news is this story gets better the more the workload diverges from what traditional GPUs are tuned for - in particular, the more sophisticated the shaders get, the more performance depends on the ability to just evaluate the shader code efficiently.

It would of course be very difficult to make a chip that is competitive with modern GPUs (the engineering involved is impressive by any standards), but I think a lot would be gained from such an effort.

I should probably disclaim that this is definitely not legal advice. Anyone who wants to actually play in the GPU space should plan on spending some quality time with a team of topnotch lawyers.


(project author here) That is pretty close to the approach this project has taken, although my motivation was not so much avoiding IP as exploring the line between hardware acceleration and software.


allo jeff nice to see you're around :) thank you so much for the time you spend guiding me through nyuzi. also for explaining the value of the metric "pixels / clock" as a measure for iteratively being able to focus on the highest bang-per-buck areas to make incremental improvements, progressing from full-software to high-performance 3D.

have you seen Tom Forsyth's fascinating and funny talk about how Larrabee turned into AVX512 after 15 years?

https://player.vimeo.com/video/450406346 https://news.ycombinator.com/item?id=15993848


Great to hear! I've poked around a little and see that, and in any case wish you success and that we can all learn from it.


To clarify further, Nvidia and AMD (and probably other small players like ARM, Quallcomm, Imagination) own the patents on core shader tech, which are the building blocks of any modern GPU design.

If you want to design a GPU IP that works around all their patents, you probably can, but unless you're a John Carmack x10, your resulting design would be horribly inefficient and not competitive enough to be worth the expensive silicon it will be etched on and probably not compatible to any modern API like Vulcan or DirectX.

But if you just want to build your own meme GPU for education/shits and giggles, that doesn't follow any patents or APIs, then you can and some people already did:

https://www.youtube.com/watch?v=l7rce6IQDWs


I am not in the graphics space but I am quite familiar with tech business practices.

I think the chance you would be sued is near 100%. If you released and showed any market traction at all, you would immediately become a threat to the duopoly; they surely remember the rise of 3Dfx. Don’t bother arguing the merits of the patents because it would be a business decision, not a technical one—this is the kind of thing that’s decided at the C-level and then justified (or cautioned against) by the company’s legal team, not the other way around. Patents are merely leverage to effect the defense of the business, and you can be sure they’ll be used.


I agree with you (and definitely a conversation worth having) but for the sake of this thread let’s pretend that legal action would only be taken when a patent was actually matched with what was put in the chip.


I’m sad you’re being downvoted because I would also have been interested in the discussion, but I think the unfortunate reality is it just doesn’t matter that much.


I appreciate voicing that you’d also be interested in that discussion. These days it feels to me as though patents and litigation are such a large battleground that they dwarf conversations like these, but underneath it all patents do still exist the way they used to and that seems to me like that still means that these discussions are important.


if it were done, say, as a Libre/Open processor, say, with the backing of NLnet (a Charitable Foundation), where the "Bad PR ju-ju" for trying it on was simply not worth the effort

if it were done. say, as a Libre/Open processor, say, with the backing of NLnet (a Charitable Foundation), where NLnet has access to over 450 Law Professors more than willing to protect "Libre/Open" projects from patent trolls by running crowd-funded patent-busting efforts

if it were done as a Libre/Open Hybrid Processor, based on extending an ISA such as ooo, I dunno, maybe OpenPOWER, which has the backing of IBM with a patent portfolio spanning several decades, who would be very upset if tiny companies like NVidia or AMD tried it on against a Charitably-funded project.

that would be a very interesting situation, wouldn't it? i wonder if there's a project around that's trying this as a strategy? hmmm, hey, you know what? there is! it's called http://libre-soc.org


I don’t see how patents acquired from SGI could possibly still be protected and require licensing.


> I don’t see how patents acquired from SGI could possibly still be protected and require licensing.

It isn't those patents specifically, but the IP developed on top of the safe harbor that those additional patents provided at the time.

It is difficult at any particular point in time to develop IP in this space without infringing patents that are still in force, because even if you go back and base your work on patents that have expired, unless you are very careful and clever you will be infringing on newer patents that are themselves also based on those expired patents.

You may be able to show that some of the key claims of those newer patents were "obvious to a person having ordinary skill in the art" and/or come up with prior art to give yourself some wiggle room, but that's a lot of effort with an uncertain result.

A surer strategy is to patent new developments on top of current IP that, while you can't use them without licensing the original IP, neither can the holder of the original IP, and if your new stuff is in their critical path and engineering around it would be annoying enough, you may be able to get them to negotiate a cross-licensing arrangement or something similar.

IOW, if someone's moat is stopping you or their tolls are too high, start digging a moat around their moat to get them to cut you a deal.

It's still tricky though: you can't patent stuff too far ahead of where the original IP holder is going because you may guess wrong about where they (or the industry) is going to have to go, or even if you're right your patents may expire before they become important (and you ran out of money to keep building more IP on top because your innovation wasn't implementable yet or you were too early to market), and if you aren't far enough ahead you run the risk of them patenting whatever innovation you are working on before you do, or even having the "obvious to a person with ordinary skill in the art" shoe on the other foot (and their legal department is bigger than yours).


Aren't patents supposed to expire?

Isn't that the idea, you have a patent for 10-20 years, build your business (which AMD/nVidia did, very successfully) then everyone is free to use it, possibly leading to innovation?

I'm poorly versed in this, so if anyone with more knowledge could share some thoughts, that would be appreciated.


Yes, and for example 3dfx's patents for the original Voodoo architecture expired a few years ago. But a fixed-function graphics pipeline is not terribly useful these days.

Technology moves so fast that perhaps tech patents should be given shorter terms. As it stands, the big companies just build up huge patent portfolios which discourage competition.


only if the patent holder does not create an "improvement" on the old one. then the older (referenced) patent is extended. Bosch have done this specifically so that they can hold on to the original CAN Bus patent.


> only if the patent holder does not create an "improvement" on the old one. then the older (referenced) patent is extended. Bosch have done this specifically so that they can hold on to the original CAN Bus patent.

The original patent still expires. The problem is when the patented improvement is obvious enough that anyone who wants to build on the expired patent is going to want to do it the way the new patent does it (ie. patent N+1 is "just" a modernized reimplementation of patent N), but un-obvious enough (to a "person of ordinary skill in the art", superficially at least) that it is still patentable.

Alternatively, the original patent holder throws money and people at the problem and patents every variation on their original patent they can conceive of, and every N*M combination with their other patents, even ones they have no intention of reducing to practice, maybe even ones that seem nonsensical, just in case. IBM used to be notorious for this.


Correct, patents in the US expire after 20 years.


And if I remember correctly, (I wrote my last patent 10 years ago) there are annuals fees that would invalidate any right if not paid.


>Nvidia immediately swept in and picked their carcass clean, mostly for their patents in this space, so they would have more ammo/leverage against competitors going forward.

That's surely not a healthy situation either. Courts should never be a central part of competition among businesses.


Yes but there are alot of profitable applications which dont need to be advertised. You run it in-house and make money on the output, ie. ML farms or mining. You dont take preorders for the hardware at all and just have boutique custom units and nobody knows the architecture, even if you offer some remote rental/SaaS tool.


> Yes but there are alot of profitable applications which dont need to be advertised. You run it in-house and make money on the output, ie. ML farms or mining. You dont take preorders for the hardware at all and just have boutique custom units and nobody knows the architecture, even if you offer some remote rental/SaaS tool.

That can work, but only up to the point that a disgruntled former employee blows the whistle.


As if they would even know the issue and risk breaking all their civil contracts and criminal trade secrets protections with their guesswork


This processor seems to be a barrel processor architecture from my quick look so not entirely new.

https://en.wikipedia.org/wiki/Barrel_processor


Yes, IP cores are very expensive to license, if they’re even available for licensing at all. This is part of the appeal of RISC-V - an open-spec, royalty-free processor architecture that is free of charge for chip designers to implement.


RISC-V is not an IP core, just an instruction set architecture.

Any implementation of it has the exact same patent minefield to navigate as any other ISA. Most of the patents are around implementation techniques not instruction set.


The RISC-V instruction set was carefully designed not to require the use of any currently valid patents to do an implementation. It is up to each processor designer to not violate any patents in their project.


this is unfortunately not true (that the RISC-V ISA was designed not to require currently-valid patents). people may believe that to be the case, but it's not. from 3rd hand i've heard that IBM has absolutely tons of patents that RISC-V infringes. whether IBM decide to take action on that is another matter. they're a bit of a heavyweight, so there would have to be substantial harm to their business for the "800 lb gorilla" effect to kick in.


See this report about the origin of each basic RISC-V instruction:

https://riscv.org/technical/specifications/risc-v-genealogy/

Of course, the various extensions might violate current patents (I would guess that packed SIMD and cryptography extensions are particularly at risk). But the basic ISA does not use anything that was not already widely adopted by 2003.


It's not actually possible to do this though, it's up to the other side's lawyers to decide if they're going to sue you, and the answer is yes if they can afford it. You don't have a jury on hand to evaluate every patent that ever exists.

Besides that, engineers in large companies are told to explicitly not look up any patents so they won't be accused of willful infringement.


I did not mean to imply that RISC-V implementations are patent/royalty/license free, only that the spec is. Not having to define a spec of your own is a significant time and money savings.


unfortunately, if you make modifications and you want them to be "upstreamed" (using libre/open project terminology as an alonogy) you cannot do that without participating in the RISC-V Foundation. you can implement APPROVED (Authorized) parts of the RISC-V specification. you cannot arbitrarily go changing it and still call it "RISC-V", that's a Trademark violation.


Related:

Ben Eater - Let’s build a video card!

https://eater.net/vga

Embedded Thoughts Blog - Driving a VGA Monitor Using an FPGA

https://embeddedthoughts.com/2016/07/29/driving-a-vga-monito...

Ken Shirriff - Using an FPGA to generate raw VGA video:FizzBuzz with animation

http://www.righto.com/2018/04/fizzbuzz-hard-way-generating-v...

Clifford Wolf - SimpleVOut -- A Simple FPGA Core for Creating VGA/DVI/HDMI/OpenLDI Signals

https://github.com/cliffordwolf/SimpleVOut

PDS: Also, this looks interesting, from SimpleVOut:

>"svo_vdma.v

A video DMA controller. Has a read-only AXI4 master interface to access the video memory."


Yeah, but these people aren't doing GPGPU computation


Or even anything resembling even 2D graphics acceleration.


The following processor, if it can properly parallelize 3D Math tasks (or if it could be adapted to do so):

"World’s First 1k-Processor Chip, Powered by a Single AA Battery (2016)"

https://news.ycombinator.com/item?id=26140957

...Might make a good candidate for a future GPU!

...As might any massively-parallel "CPU" architecture, for example, the Adapteva Parallella:

https://en.wikipedia.org/wiki/Adapteva

>"The 16-core Parallella has roughly 5.0 GFLOPs/W, and the 64-core Epiphany-IV made with 28 nm estimated as 50 GFLOPs/W (single-precision),[27] and 32-board system based on them has 15 GFLOPS/W.[28] For comparison, top GPUs from AMD and Nvidia reached 10 GFLOPs/W for single-precision in 2009–2011 timeframe.[29]"

In other words, whenever you hear about a CPU with a "massive amount of cores" -- think "could make a great GPU", or "could make a great GPU with the right adjustments" -- from this point forward!

In other words, any massively parallel multi-core CPU architecture -- becomes or can become -- a candidate architecture to become a GPU -- with further changes/optimizations...


> In other words, whenever you hear about a CPU with a "massive amount of cores" -- think "could make a great GPU", or "could make a great GPU with the right adjustments" -- from this point forward!

Before doing that, though, they should research the difference between SIMD and MIMD. Multi-core CPUs are generally MIMD, and GPUs are always SIMD†.

† If someone knows of a MIMD GPU architecture, please share!


-- researching this a bit now, the one thing I've found is the Intel Larrabee: https://en.wikipedia.org/wiki/Larrabee_(microarchitecture), cancelled and transmogrified into the xeon phi.

still haven't found a consumer graphics card with a MIMD architecture. I'm guessing there might be more on-going research in this area with the GPGPU on FPGA communities.


Are there open source opencl to FPGA compilers?

If you're playing with FPGA's, you might as well directly compile the kernel into a circuit, rather than building a gpu on an FPGA and running your kernel on that.

Proprietary solutions like Altera OpenCL compiler exist.


Very Cool project:

Love GPGPU, I git clone it and try to understand the code better here:

   https://www.code-scope.com/s/s/u#c=sd&uh=0f2c2fa280a2&h=afe7a329&di=-1&i=38
It looks like 5 stages FP (FP32?) pipe lines, NUM_VECTOR_LANES =16 NUM_REGISTERS=32

Are you writing your own kernel from scratch? If so which CPU does it runs on - some embedded CPU inside FPGA?

In the mandelbrot.c code, it has following: #define vector_mixi __builtin_nyuzi_vector_mixi

    How does it get translate to vector operations in FPGA?  Where is the code implement the __builtin_*?
Thanks a lot an very interesting project.


Fun facts, Intel built-in integrated GPU or iGPU is now standard across Intel's CPU offering including the low power processors designed for embedded IoT systems namely Atom, Pentium and Celeron [1].

This new iGPU has similar performance with a discrete Nvidia's GPU two years back, GeForce GT 1030!

[1]https://www.notebookcheck.net/Intel-s-Elkhart-Lake-SoC-will-...


One of the things that interests me (of many), is the use of cmake.

Does anyone have good references on extending cmake to new tools that don’t produce executables per se, or otherwise work in non traditional ways?


Layman here! I see a lot of posts on that subject lately so I need to ask: Can someone design a RAM chip?


Yes, designing ram is a lower level operation as compared to designing logic via an HDL and needs to directly take the process (chemistry, optics, mechanics) of the fab into account.

https://openram.soe.ucsc.edu/


DRAM chips are fascinating.

Instead of going with the much more expensive SRAM, someone decided that refreshing billions of capacitors hundreds of times a second while performing read and write operations is an acceptable way of storing data (even if only while powered).

I wonder what the managers who first heard the idea must've thought :D

And it works so well! It's probably one of the most reliable component in a computer.


Many of the early RAM systems were non-persistent (mercury delay lines, phosphor) and some were destructive-read (core memory).

Appears to have been invented by Dennard of Dennard Scaling: https://www.thoughtco.com/who-invented-the-intel-1103-dram-c...


What do you mean specifically by "design a RAM chip"? (obviously RAM chips that you can buy are designed before they are made, so that's probably not what you are after?)

FPGAs typically do contain dedicated RAM areas, because implementing it out of FPGA logic slices is terribly inefficient.


FPGA designer here. Just wanted to point out that “efficiency” is highly context sensitive in FPGA design. Everything is an area / speed / power trade-off. If you only need a ram that is 8-bits wide and 64 words deep then it might be way inefficient to waste a dedicated 18kbit block ram on it when it would fit better into 2 LUTs. This is why Xilinx, for one, provides pragma such as RAM_STYLE to help guide synthesis:

(* ram_style = "distributed" *) reg [data_size-1:0] myram [2**addr_size-1:0];

block: Instructs the tool to infer RAMB type components.

distributed: Instructs the tool to infer the LUT RAMs.

registers: Instructs the tool to infer registers instead of RAMs.

ultra: Instructs the tool to use the UltraScale+TM URAM primitives.

See: https://www.xilinx.com/support/documentation/sw_manuals/xili...

edit: formatting*


Hey there, sorry for going off topic, but as a FPGA designer, are there any learning resources you would particularly recommend for someone starting with FPGAs by themselves? With the goal of learning more about computer architecture and the way CPUs work on a lower level.


Not off the top of my head. I would definitely recommend trying to find a mentor with current experience rather than getting started by yourself if possible depending on your background. The last time I mentored someone and looked at beginner resources was over 10 years ago and at that time I was recommending beginners with at least some of the requisite college coursework buy and read "The Design Warriors Guide to FPGAs" to get started and then buy and use at least one of Peter J. Ashenden VHDL and Verilog texts as HDL language references. Much of The Design Warriors Guide might still be relevant but I haven't looked at in a while. Ashenden's books are probably still mostly relevant for the versions of the languages they covered, but these days I would try to also find a good book on SystemVerilog. All that is probably overkill towards the goal of learning more about computer architecture and the way CPUs work on a lower level though. For me I found all my undergraduate and graduate coursework, including a class on operating systems, to be essential to my understanding and real world use.


Thank you very much for your reply.

Finding a mentor is very difficult for me unfortunately (from a third world country, and not particularly rich). I will check out the books and research if there are any obsoleted / old parts. Thanks for the heads-up on SystemVerilog.

Thanks again :)


Possibly, but why would you want a less efficient RAM chip that costs more compared to something that's a commodity you can buy?


There’s basically certain little magic sauce to RAM chip design. The production process (which is rather independent from the commonly discussed “node size” production for CPU/GPU-related tech) is where the magic happens.


A guide for the impatient:

GPU = CPU, but one that is placed specifically on the graphics card, and one that may specifically be optimized to perform instructions related to 3D calculations faster than other instructions.

As far as the 3D stuff, don't think "3D", instead think Math specifically related to 3D.

OK, so what is that Math?

Well, we could consult Stack Overflow for that:

https://stackoverflow.com/questions/1320403/math-used-in-3d-...

OK, so next question, what does a GPU do then?

Well, remember that a CPU can run any generalized algorithmic computations -- including those required for 3D graphics(!) (they are Turing-Complete, after all!), but the thing is, 3D graphics cards have their GPU's, er, CPU's, er, GPU's -- specifically optimized in various different ways (parallelism, specific instruction types, etc.) for the mathematics of 3D.

So you see, what is being searched for is as follows:

The fastest CPU, er, GPU, er CPU, that can be created on an FPGA, that can be optimized for specific 3D MATH operations, that can be created patent-free.

Now, here's the thing...

In Mathematics, TMTOWTDI (There's more than one way to do it) -- is King.

While some math algorithms (side philosophical question to self, "How does anyone patent MATH?". No really, how do they do it? That idea is utterly mind boggling and self-contradictory! It's like what the Principal said to Billy Madison: "Mr. Madison, what you've just said is one of the most insanely idiotic things I have ever heard. At no point in your rambling, incoherent response were you even close to anything that could be considered a rational thought. Everyone in this room is now dumber for having listened to it. I award you no points, and may God have mercy on your soul."), while some math algorithms implemented in hardware are patented (and possibly still under patent) -- there are many others that are not!

In other words, there are other ways to get the job done...

Some good starting points might be to read up on the 3D hardware in the original Playstation (1994) https://en.wikipedia.org/wiki/PlayStation_technical_specific...

>"Geometry Transformation Engine (GTE)

Coprocessor [PDS: aka "GPU"] that resides inside the main CPU processor, giving it additional vector math instructions used for 3D graphics, lighting, geometry, polygon and coordinate transformations – GTE performs high-speed matrix multiplications.

Well, there we go, there's the start of the 3D math part, vector math instructions and high-speed matrix multiplications.

Then there is "Fused Multiply Add", aka "Multiply Accumulate Operation":

https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_op...

>"A fast FMA can speed up and improve the accuracy of many computations that involve the accumulation of products:

Dot product Matrix multiplication Polynomial evaluation"

PDS: Which would be used in 3D Graphics acceleration...

Anyway, those are the starting points to look into if someone wants to "leap off" into this area...

This is an interesting topic; it sort of exists as confluence of several disciplines -- including Digital Logic Engineering, Algorithms, Law and Math...

(But seriously, how does anyone patent MATH itself? I mean, seriously! <g>)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: