Hacker News new | past | comments | ask | show | jobs | submit login
Developer Preview – EC2 Instances with Programmable Hardware (amazon.com)
643 points by jonny2112 on Nov 30, 2016 | hide | past | favorite | 204 comments



These FPGAs are absolutely _massive_ (in terms of available resources). AWS isn't messing around.

To put things into practical perspective my company sells an FPGA based solution that applies our video enhancement technology in real-time to any video streams up to 1080p60 (our consumer product handles HDMI in and out). It's a world class algorithm with complex calculations, generating 3D information and saliency maps on the fly. I crammed that beast into a Cyclone 4 with 40K LEs.

It's hard to translate the "System Logic Cells" metric that Xilinx uses to measure these FPGAs, but a pessimistic calculation puts it at about 1.1 million LEs. That's over 27 times the logic my real-time video enhancement algorithm uses. With just one of these FPGAs we could run our algorithm on 6 4K60 4:4:4 streams at once. That's insane.

For another estimation, my rough calculations show that each FPGA would be able to do about 7 GH/s mining Bitcoin. Not an impressive figure by today's standards, but back when FPGA mining was a thing the best I ever got out of an FPGA was 500 MH/s per chip (on commercially viable devices).

I'm very curious what Amazon is going to charge for these instances. FPGAs of that size are incredibly expensive (5 figures each). Xilinx no doubt gave them a special deal, in exchange for the opportunity to participate in what could be a very large market. AWS has the potential to push a lot of volume for FPGAs that traditionally had very poor volume. IntelFPGA will no doubt fight exceptionally hard to win business from Azure or Google Cloud.

* Take all these estimates with a grain of salt. Most recent "advancements" in FPGA density are the result of using tricky architectures. FPGAs today are still homogeneous logic, but don't tend to be as fine grained as they were. In other words, they're basically moving from RISC to CISC. So it's always up in the air how well all the logic cells can be utilized for a given algorithm.


Any thoughts on why AWS/Xilinx didn't go for a mid-range FPGA to help validate customer requirements?

My guess is that Amazon will have to be very careful not to price themselves out of the market, for mid-range Deep Learning based cloud apps.

Wild guestimate but I think it'll cost more than $20/hr for each instance.


Based on my speculation, and to make a long analysis short: fewer, bigger FPGAs are better in the cloud from a user experience perspective than more, smaller FPGAs. The big applications are all going to consume as much FPGA fabric as they can (machine learning, data analysis, etc). Even "mid-range" Deep Learning will consume these FPGAs like candy. Non-deep learning will too; they can always just go more parallel and get the job done faster.

Amazon is betting on the fact that they can get better pricing than anyone else. They probably can. No one else will be buying these FPGAs in quantities Amazon will if these instances become popular (within their niche). So for the medium sized players it'll be cheaper to rent the FPGAs from Amazon, even with the AWS markup, than to buy the boards themselves. Especially for dynamic workloads where you're saving money by renting instead of owning (which is generally the advantage of cloud resources).

That's my guess anyway.


It would not be inconceivable that Amazon just buys Xilinx (before someone else does).


Thank you so much for these posts, fpgaminer. They've been extremely helpful to me in framing how these things could be used.

Once upon a time I thought seriously about going in to hardware design. I took a couple different courses in college (over 10 years ago now... sigh) dealing with VHDL and/or verilog and entirely loved it. If not for a chance encounter with web programming during my co-op my career would have been entirely different. With AWS offering this in the cloud if it is not prohibitively expensive I'll be looking in to toying with it and hopefully discovering uses for it in my work.


What can each one of those 2.5 million "logic elements" do? Last time I used an FPGA, they were mostly made up of 4-bit LUTs.

How many NOT operations can this do per cycle (and per second)? I realise FPGAs aren't the most suited for this, but the raw number is useful when thinking about how much better the FPGA is compared to a GPU for simple ops.


The 2.5 million number quoted in the article is "System Logic Cells", not Logic Elements. Near as I can tell, since I haven't kept pace with Xilinx since their 7 series, a "System Logic Cell" is some strange fabricated metric which is arrived at by taking the number of LUTs in the device and multiplying by ~2. In other words, there is no such thing as a System Logic Cell, it's just a translucent number.

Anyway, the FPGAs being used here are, I believe, based on a 6-LUT (6 input, 2 output). So you'd get about 1.25 million 6-LUTs to work with, and some combination of MUXes, flip-flops, distributed RAM, block RAM, DSP blocks, etc.

Supposing Xilinx isn't doing any trickery and you really can use all those LUTs freely, then you'd be able to cram ~2.5 million binary NOTs into the thing (2 NOTs per LUT, since they're two output LUTs). So 2.5 million NOTs per cycle. I don't know what speed it'd run at for such a simple operation. Their mid-range 7 series FPGAs were able to do 32-bit additions plus a little extra logic, at ~450 MHz and consume 16 LUTs for each adder.


6-input, 1-output or 5-input, 2-output. They're implemented as a 5-input, 2-output LUT with a bypassable 2:1 mux on the output.


The metrics have gotten pretty opaque since the old days when an FPGA was a "sea of LUTs" all alike; modern ones include a ton of (semi-)fixed function hardware like multiply-accumulate blocks and embedded dual-port RAM. Even the LUTs themselves can be reprogrammed into small RAM blocks or shift registers, so counting "logic elements" is mostly a marketing exercise.


While yes the architectures have become more "CISC-like", they aren't particularly convoluted or opaque. It's pretty easy to describe the architectures and come up with numbers for them. Xilinx could literally just say, "1 Million 6-to-2 LUTs" and that would be entirely transparent and helpful.

So it's not so much changes in architecture that have given rise to the translucency of these numbers. It's a measuring contest between Xilinx and IntelFPGA who believe you need to present bigger numbers in marketing material to win engineers. I can't speak for other FPGA engineers, but personally it just frustrates me and wastes my time. I don't ever take those numbers at face value, and I wouldn't hire anyone who did. Xilinx is the worst offender here. At least IntelFPGA will often quote their parts both in transparent terms (# of ALMs) and useful comparisons (# of equivalent LEs). I've never seen them pull a completely made up "System Logic Cell" out of thin air.


If you don't click through to read about this: you can write an FPGA image in verilog/VHDL and upload it... and then run it. To me that seems like magic.

HDK here: https://github.com/aws/aws-fpga

(I work for AWS)


This is so awesome, I can't even. I wrote arachne-pnr [0] to learn about FPGAs to get ready for this day. Just signed up, can't way to play with these!

I hope the growing popularity of FPGAs for general-purpose computing will help push the vendors to open up bitstreams and invest in open-source design tools.

[0] https://github.com/cseed/arachne-pnr


Wow Clifford is that you ? I hope this, exciting as it may be, won't make you leave open fpga efforts for the dark side (saw your talk last Fosdem, was very exciting)


Cotton is the author of arachne-pnr. Clifford is the author of Yosys and IceStorm, which are all separate projects. Not the same person.

FWIW, Clifford has recently started reversing the bits of the modern Xilinx FPGA series. So, stay tuned for a Xilinx IceStorm-equivalent sometime down the road (a few years, probably...)


No, Clifford is cliffordvienna on HN. He wrote Yosys (and amazing piece of software) and did the iCE40 reverse engineering (amazing work). I wrote the place and router, arachne-pnr.


And kudos for that.


I'm very curious if/how you have managed to make the developer experience sane and enjoyable. I've experience with a FPGA cluster of ~800 FPGAs and it definitely does not get used to its full potential because of the tooling around it.


Is that repo going to be made public? It looks to be private right now.


Yup, sorry -- working on fixing that now. Check back in a bit.


Still not fixed. I'all reply here when it is. Might be a few days because of reinvent stuff.


Thanks for the update. Been chasing that link all morning :-)


If you guys are curious about these announcements I'll be recapping them and going into more detail on twitch.tv/aws at 12:30 pacific


Huh? Isn't Twitch just for gaming content?


What others have said is true, and also note that Amazon bought Twitch 2 years ago, so I'm sure Amazon can run their own product announcements through Twitch if they want :)

EDIT: updated when amazon bought twitch, woops


Nope. Twitch is excellent for all kinds of live content.


https://www.twitch.tv/p/rules-of-conduct

"All content that is neither gaming-related nor permitted under the rules for Twitch Creative Conduct is prohibited from broadcast."


I've seen many people programming on Twitch https://www.twitch.tv/directory/game/Creative/programming

While its mainly Game dev or game dev related its not limited to game dev stuff. From their FAQ https://help.twitch.tv/customer/portal/articles/2176641

  Examples of what you can broadcast on Twitch Creative:
  ...
  Programming and coding  
  Software and game development  
  Web development
EDIT: It seems that re:invent is being streamed on twitch anyway.


This is a product announcement, though


And Twitch have had TV Shows streamed on it in the past https://www.twitch.tv/whoismrrobot

My guess that was a sponsored deal or something. But as my edit from before it seems that re:invent is being streamed on Twitch anyway so guessing its all above board (and as others have said Amazon owns Twitch).


Amazon might just let Amazon talking about their products slide ;)


I'm guessing it is covered under the Twitch Creative Conduct, since there is an entire Creative category now that is getting more popular which involves people painting, cosplay, digital art, etc.


What's the cost?


So it's tied to the PCIe bus - how do you interact with your FPGA once you programmed it - are there general drivers you can use, or do you also have to create a linux driver to talk to your FPGA ?


Xilinx provide software drivers and IP for PCIe DMA and memory mapped interfaces. These are fairly easy to integrate (probably not the best for latency though - I've developed my own but I require a specific use case - low latency but don't care about bandwidth).


I'm not sure what you mean by the "magic" part here, can you please clarify?

[background: many years of writing VHDL specifically for FPGAs, using various dev boards and custom boards]


The magic part is the thing we have gotten used to with the cloud -- virtual hardware you never see and rent by the minute. Imagine having an FPGA idea and not needing to make board, pay for a dev board, or even find a dev board in your lab... Like your idea and need more? Spin up 100 more right now...


Exactly what I thought. This is amazing. FPGA is commonly used in embedded systems to perform application specific tasks and now application developers have access to this power too. I guess many machine learning application might take profit of that power instead of using comparatively very expensive graphics hardware.


How do FPGAs compare with GPUs for the inference stage of Deep Learning algorithms? Can they accelerate it a lot?


No, but they do use less power:

To the best of our knowledge, state-of-the-art performance for forward propagation of CNNs on FPGAs was achieved by a team at Microsoft. Ovtcharov et al. have reported a throughput of 134 images/second on the ImageNet 1K dataset [28], which amounts to roughly 3x the throughput of the next closest competitor, while operating at 25 W on a Stratix V D5 [30]. This performance is projected to increase by using top-of-the-line FPGAs, with an estimated through- put of roughly 233 images/second while consuming roughly the same power on an Arria 10 GX1150. This is com- pared to high-performing GPU implementations (Caffe + cuDNN), which achieve 500-824 images/second, while con- suming 235 W. Interestingly, this was achieved using Micros oft- designed FPGA boards and servers, an experimental project which integrates FPGAs into datacenter applications.

https://arxiv.org/pdf/1602.04283v1.pdf


That's hard to compare. Typically FPGAs are doing fixed-point math, so they can do more operations with less power. GPUs have traditionally done floating point. However, with the new Pascal architecture, certain cards (P4/P40) support 8-bit integer dot products, which give a massive boost in performance/W. It's still fairly high at 250W, but that's for an entire card with 24GB of memory. You'd have to compare that to an FPGA with that much memory on a PCIe card if you're doing apples to apples. Something like this is appropriate for comparison: http://www.nallatech.com/store/fpga-accelerated-computing/pc...


This is very awesome. Could you add some more thoughts on the tooling and the development workflow? Is it possible to target the Xilinx hardware using only open source (or AWS proprietary) tools? Or is Vivado still required for advanced stuff?


Vivado is required for all advanced features and programming Xilinx chips in general; like the sibling post said, there is no open FPGA toolchain implementation for Xilinx devices, especially for extremely high end ones like the ones being offered on the F1 (I expect they'd run at like, several thousand USD per device, on top of a several thousand dollar Vivado license for all the features).

It doesn't look like there's much AWS proprietary stuff here, though we'd have to wait for the SDK to be opened properly to be sure. I imagine it's mostly just making all of the stuff prepackaged and easily consumable for usage, and maybe some extra IP Cores or something for common stuff, and lots of examples. If you're already using Vivado I imagine using the F1/Cloud won't introduce any kind of major changes to what you expect.


> I expect they'd run at like, several thousand USD per device...

You're guessing about an order of magnitude too low, actually. The VU9P FPGAs Amazon is using cost between $30,000 and $55,000 each, depending on the speed grade.

Yes, this means a fully equipped F1 instance costs nearly half a million dollars. Don't count on the instances being cheap to run.


Do you have a source? I am curious. http://www.digikey.com/product-detail/en/xilinx-inc/XCKU040-... this surely is not the right chip then.


https://aws.amazon.com/ec2/instance-types/

Scroll down to "F1"; it says:

> Xilinx UltraScale+ VU9P FPGAs

The VU9P isn't available through DigiKey, but is listed by Avnet. I don't know which specific package and speed grade Amazon is using, but here's one:

https://products.avnet.com/shop/en/asia/programmable-logic/f...


The press release says:

"This AMI includes a set of developer tools that you can use in the AWS Cloud at no charge. You write your FPGA code using VHDL or Verilog and then compile, simulate, and verify it using tools from the Xilinx Vivado Design Suite (you can also use third-party simulators, higher-level language compilers, graphical programming tools, and FPGA IP libraries)."

So basically, buying a copy of Vivado is the minimum. There aren't any open source tools that directly output Xilinx FPGA bitstreams that I know of.


It looks like the FPGA Developer AMI includes Vivado and a license explicitly for use on these platforms (look at the PuTTY screenshot in the blog post; it has a customized MOTD). You just need to set up the license server that Vivado will use and point it to the right license.

So I guess the real question is: what exactly is granted by the Vivado license on these AMIs? Do we get things like SDSoC, SDAccel, etc, and all the libraries? [1] The blog seems to imply you can program these things with OpenCL too (AKA SDAccel), so I'm guessing that these features are all enabled, but details about the included Vivado license in the AMI would be nice.

[1]: https://www.xilinx.com/products/design-tools/vivado.html#buy


+1 for VHDL. :)


that repository is 404?


Hmm this still isn't public. Any ETA?


This is really cool. Do you think it will be possible to run MongoDB on an FPGA anytime soon?


I really hope that is sarcasm.


I'm currently working on this. Speedup around 2x for most operations. Not kidding, quite a few startups are currently trying to optimize typical data operations with special algorithms.


Aren't there other software databases that are already more than 2x faster than Mongo and don't lose data?


Maybe, I'm not talking about Mongo specifically.

You can find 'equivalents' to CPU data structures for FPGAs and speed up operations on/with them while still saving power. There's lots of trouble with how buffers are used and memory is accessed. So it's not a trivial task, but IF you can optimize generic data structures and replace the existing ones you basically have 2x the speed or half the energy consumption for any DB.


But what's the developer time/cost for that?


Totally depends on your use case.

From the blog post:

> From here, I would be able to test my design, package it up as an Amazon FPGA Image (AFI), and then use it for my own applications or list it in AWS Marketplace.

As a user of those Marketplace images, you just look at the hourly fees. Your team needs to set this up, of course, and replace the old stuff, e.g. MongoDB with a new, sped-up FPGA-MongoDB. (And you'd need to fix some new bugs.)

If time is super-critical to you, e.g. if you're working with analytics: do you really need to speed up your processing pipeline? E.g. processing stuff not once but twice per day? If yes, then you'd better off having people on your team who understand all this and are able to fix and implement stuff themselves. Second scenario would be quite a bit more expensive, but still, FPGAs aren't rocket science and there's no way around them in the future.


Sure, but they aren't Web Scale!


> Today we are launching a developer preview of the new F1 instance. In addition to building applications and services for your own use, you will be able to package them up for sale and reuse in AWS Marketplace.

Wow. An app store for FPGA IPs and the infrastructure to enable anyone to use it. That's really cool.


>Wow. An app store for FPGA IPs

I see people making video transcoder instances on day 1, and MPEGLA bankrupting Amazoners with lawsuits on day 2


I guess the online distribution of FPGA configurations was an eventual event?


This was already a thing. Plenty of marketplaces exist for FPGA IPs. It's just not that well known because high end FGPAs run $7k+ and complex IP cores can be $20k+ for a license.


So if you were to get a cheap license via a service like this, do you get access to the VHDL or equivalent or could you extract it in some way?


Probably not. Depends on how the core is distributed. Either you'll get HDL or netlists, and they may or may not be encrypted. Obviously the synthesis software has to decrypt it to use it, so like all defective by design DRM it doesn't make it impossible to get at the code, it just makes it more difficult. However, a netlist is just a schematic, so you would have to 'decompile' that back to HDL (and lose the names, comments, etc) if you want to modify it. It's also possible that you would only get the binary FPGA configuration file (this marketplace seems like one more for complete appliances and not IP cores) so you would have to back a netlist out of that somehow and then reverse-engineer it from there.


there are "encrypted" hdls, but encrypted no more than dvd discs. Find a right irc channel and ask for keys


Pretty sure the AWS EULA makes you responsible for violating any IPs. I didn't read it, but if it doesn't say so already then their lawyers are crap.


> I see people making video transcoder instances on day 1, and MPEGLA bankrupting Amazoners with lawsuits on day 2

Only if they distribute it through amazon. Just put the code up in a torrent; anyone can run it without MPEGLA knowing.


Yeah this is incredible for fpga users. There is now a market for freelance fpga developers


Yes this is cool if you're already using FPGAs and yeah, there will be a market for FPGA designers.

But I also think this is FPGAs for the Rest of Us. Suddenly, FPGAs are available without having to buy some development board from Xilinx, install a toolchain, use said (shitty) toolchain ...

Me, I was thinking of FPGAs as being something I'd use down the road a few years, eventually, etc. But instead, I'm looking at this right now. This morning. Waiting for the damn 404 to go away on:

https://github.com/aws/aws-fpga

This reduces the barrier to entry. It also reduces the transaction cost (h/t Ronald Coase).


I think this is going to be well outside the pricing range for most people to use for an extended period of time, which is necessary for learning a lot. Depends on the specs of the developer AMI, too, which comes with Vivado and everything. But synthesis can be insanely CPU intensive for large designs, so who knows how they'll spec it. It might cost more due to including a Vivado license. And you'll need to do extensive amounts of testing, no matter what you're doing, so be prepared to synthesize and test on "Real World" F1 instances, on top of simulating, testing, etc.

If you truly want to get started with FPGAs, you can do it on a small chip, with an open source Verilog toolchain, and open source place/route/upload. Textual source code -> hardware upload, open source. Today, for like $50! I think this is way better as a tool for educational purposes, and a lot cheaper. Also, you don't have to deal with a giant bundle of EDA crapware.

What you want is Project IceStorm, the open source Verilog flow for iCE40 FPGAs: http://www.clifford.at/icestorm/ -- you install 3 tools (Yosys, Arachne P&R, IceStorm) and you're ready to go. It's all open source, works well, and the code is all really good, too.

You can get an iCE40 HX8k breakout board for $42, and it has ~8,000 LUTs. Sure, it's small, but fully open source can't be beaten and that's cheap for learning: http://www.latticestore.com/products/tabid/417/categoryid/59...

I think this is a much better route for learning how to program FPGAs, personally, with high quality software and cheap hardware -- just supplement it with some Verilog/VHDL books or so. There are pretty decent Verilog tutorials on places like https://www.nandland.com or https://embeddedmicro.com/tutorials/beginning-electronics for example.


You would think after 20+ years of FPGAs, commercial FPGA tools would be more usable and more productive than something a couple guys hacked together from reverse-engineering a small FPGA. But that hasn't been my (limited) experience. Hats off to the IceStorm team.


I would not count on these instances being useful for development.

The parts Amazon is using cost somewhere between $30K and $50K each. Hourly costs will be substantial -- it will likely be cheaper to buy an entry-level development board than to spin up an F1 instance every time you want to run your design.


(post author here)

You can do all of the design and simulation on any EC2 instance with enough memory and cores. You don't have to run the dev toolchain on the target instance type.


You can do some simulation on a computer, but it's much slower than real time, even for a small design. Prototyping PCIe communications is also difficult without real hardware.


Hi Jeff, Point of clarification for this: In instances with more than one FPGA, dedicated PCIe fabric allows the FPGAs to share the same memory address space and to communicate with each other across a PCIe Fabric at up to 12 Gbps in each direction.

Does that mean you can have FPGAs running on multiple F1 instances connected via the PCIe Fabric? It's not clear if this means FPGAs within a single F1 instance, or between multiple F1 instances.


Any recommendations on an entry-level development board?


Digilent Arty [1]. XC7A35T, 256MB RAM, Ethernet, $99.

[1]: http://store.digilentinc.com/arty-board-artix-7-fpga-develop...


If you are in the academic sector, pretty much everybody has an university program where one can get hardware/software for reduced prices or as a donation.

Personally, I started dabbling with a zynq zedboard. FPGA+Hard arm cores, many stuff on board + extension capabilities.


> install a toolchain, use said (shitty) toolchain ...

Did I miss some details? Don't you still need those shitty toolchains to do design/simulation before you'd deploy it?


Deploy? No. With the F1 instance, I can sell it without said toolchains. I need them to develop but not to deploy.

Edit: This isn't magic. This is availability. FPGA development remains hard.


It's not that simple, I think. FPGA development is extremely labor intensive, both in design and especially in verification and testing. Large scale designs increase both the time needed for synthesis, verification labor, and testing, by a lot. If you don't have extremely large scale or resource intensive designs, this probably isn't for you anyway. Making the FPGAs more readily available is good for a lot of reasons, and opens the market to some new stuff, but it doesn't necessarily dramatically change the true costs of actually developing the designs in the first place.

Basically, you're not going to deploy fabric live to your customers without having extensively tested it on your real world, production hardware setup, even if it's just a separate lab with identical hardware. That's going to be going on throughout the entire development lifecycle, and for anything remotely complex you can expect that to be a long process. You're going to be using that F1 server a lot for the development, so you have to factor in that cost. If you're a solo/small place you can probably amortize it somewhat thanks to The Cloud, however (reserved instances, on demand batching, etc).


> buy some development board from Xilinx, install a toolchain, use said (shitty) toolchain

You can start only with the toolchain's simulator, and then get a board from ebay for 50 bucks.


I guess this will be a game changer for FPGA-mineable digital currencies. Maybe not for Bitcoin, because people have invested heavily into dedicated mining hardware, but I'm interested to see what it'll do for the smaller altcoins.


For any cryptocurrency that's profitably mineable on AWS the difficulty immediately increases to the point that it's no longer profitable.


Yes, this IMO is the biggest flaw in bitcoin's design. It makes miners impossible to commoditize. As soon as a new ASIC becomes widely available, it defeats itself and becomes useless at the next difficulty increase.

Consumer ASICs stopped being a thing for this reason. Now we have mega-secret custom hardware that can't be shared without destroying the investment put into their development.

I don't think Satoshi's intention was that only a handful of massive investors with their own proprietary chipsets would be able to mine, but that's the natural consequence of the difficulty mechanism, and it really undermines bitcoin's core principle of decentralization.


Perhaps not for the entire time, but bear in mind that the prices of cryptocurrencies fluctuate and you can spin up the VMs when it is profitable and shut them down when it isn't.


Would this mean that the currency would never fluctuate below that price point?


It's a price ceiling, not a floor. If prices go up high enough, it's worth turning on EC2 instances and paying for them with the mined cryptocurrency until prices fall. There's some wiggle room above the break-even point, since there is some delay between turning on miners receiving USD, and the miner has to eat that risk.


Ah, yes, that makes sense, thank you!


not if you have a free-fall dump going on. This happens when investors lose hope in the long term future


> Maybe not for Bitcoin, because people have invested heavily into dedicated mining hardware

The thing is, it seems like people always invest heavily into dedicated hardware when using FPGAs. I'll be interested to see what people actually end up using this service for.


I'm surprised that no one has linked to http://opencores.org/ opencores yet. They've got a ton of vhdl code under various open licenses. The project's been around since forever and is probably a good place to start if you're curious about fpga programming.


OVH is testing Altera chips - ALTERA Arria 10 GX 1150 FPGA Chip

https://www.runabove.com/FPGAaaS.xml


If anyone is wondering how the FPGA board looks like

https://imgur.com/a/wUTIp



Are they actually using that one, or is that just a board that happens to have that particular FPGA on it?


The FPGA in the image is the retail version. But it's more than likely that amazon is using the same one since they don't modify the GPUs although they purchase them on a much larger scale.


How do we know they are using that particular board from bittware as opposed to a board from a different manufacturer or even an in-house design? The linked article does not mention bittware or the board part number.


Good lord that is beautiful. What a massive FPGA.


Here's a post by Bunnie Huang, from a few months ago saying that Moore's law is dead and we will now have more of such stuff - http://spectrum.ieee.org/semiconductors/design/the-death-of-...

Pretty interesting read. Also, kudos to AWS !


For my institute this is going to be _really_ useful for Genomics data processing because we can't justify buying expensive hardware for undergrad research. Using a FPGA hardware over cloud sounds almost magical!


Wouldn't bandwidth/transfer costs basically nullify the computing gains? I know someone who used to be in genomics and cloud-anything was priced-out due to transfer costs.


You can't justify buying it but you can justify renting it? Has your department heard of amortization?


Most research finance departments are absolutely horrified at OpEx because any strange non-capital expenditure makes them look less efficient than the next research institute. This comes in handy when two labs are up for a grant, and they are equally qualified. The more efficient institute gets the grant. You can imagine asking for the lab credit card for EC2 time is not met with enthusiasm.


That's interesting. So, buying a lot of expensive, soon-to-be-obsolete hardware makes your lab more attractive?


Exactly. I haven't worked at my old lab for more than 5 years, but they are still advertising on their web page the systems I built when employed there. Woo! 2010-era blade servers!


Renting for a short period of time vs. buying the hardware are very different IMO.


The traditional EDA tool companies (Mentory, Cadence, Synopsys) all tried offering their tools under a could/SaaS model a few years back and nobody went for it. Chip designers are too paranoid about their source code leaking. I wonder if that attitude will hamper adoption of this model as well?


Chip designers are too paranoid about their source code leaking.

It's more an issue of being able to reproduce an existing build later on. You can't delegate ownership of the toolchain to the "cloud" (read: somebody else's computer) if you think you'll ever need to maintain the design in the future.


I'm not so sure that is the issue. Currently you delegate ownership of the toolchain to the EDA vendor. Sure you have tools installed locally on your machines, but the tools typically have licenses that expire, so there's never a guarantee you can build it later with the exact same toolchain. Also EDA vendors end-of-life tools at some point, so even if you pay, that tool won't exist for ever, and the license will not be renewable.

I do think the issue with cloud is the concern over IP. There are not a lot of EDA vendors, so the chances that your competitor is also using that same EDA vendor is pretty high. I think companies are pretty wary of using a cloud hosted service where you could literally be running simulations on the same machines as your competitors. Can you imagine some cloud/hosting snafu resulting in your codebase being accessible by your competitors?

EDA companies also sell ASIC/FPGA IP, and VIP (verification IP), so there's also a pretty clear conflict of interest if they have access to your IP. So, if you're really paranoid, imagine the EDA vendors themselves picking through your IP and repackaging/reselling it as IP to other customers (encrypted of course so you can't readily identify the source code)?


the EDA tools need your source code (HDL) to simulate or synthesize the design. But with these F1 instances, potentially the model doesn't have that problem. You develop/design an FPGA solution (some type of accelleration), then you provide it as a service. You don't expose your source code to your end customer, or the EDA tool companies.

You do however, potentially expose your source code to Amazon. But possibly not, if you do your design/testing on EDA tools under your control, then deploy FPGA build packages to the F1 instances for hardware testing.


Quick Question: If anyone wants to learn programming an FPGA is learning C only way to go ? how hard is to learn and program in verilog/VHDL without electrical background ?

If anyone suggests links or books, please do

Thank You


I have a physics background but not an EE background. I found verilog pretty easy to grasp. VHDL took me a lot longer.

To get some basic ideas I always recommend the book code by charles petzold: https://www.amazon.com/Code-Language-Computer-Hardware-Softw...

It walks you through everything from the transistor to the operating system.

(Apparently I need to add that I work for AWS on every message so yes I work for AWS)


Thank you


I would suggest Digital Design by Morris Mano[1]. It'll start off with basic intro from digital gates to FPGAs itself! And you really don't need any EE background for this book. This book starts from absolute basics and it'll also teach you Verilog along the way. And verilog is used more in the industry than VHDL(which more popular in Europe and in the US army for some reason).

I'm surprised where you got the idea of using C to program FPGAs, are you thinking of SystemC or OpenCL (they're both vastly different from each other)

I'm really surprised a sibling comment recommended the code book. It really meant to be a layman's reading about tech. It's a great book but it won't teach you programming FPGAs.

[1]: https://www.amazon.com/Digital-Design-Introduction-Verilog-H...


I thought FPGA are programmed using low level languages like C


Not really. C is considered high level in digital design. There are some tools for high level synthesis, from languages like C but they aren't used much.

Most Fpga "programming" is a textual description of a directed graph of logic elements in a language like vhdl or Verizon (and now systemverilog).

Synthesis engines have gotten better over the years to allow things like +,* to decribe addition and multiplication instead of describing the logic graph of multiplication.

And most Fpgas now have larger built in primitives like 18x18 multipliers now.

You can judiciously use for-loops for repeated structures.


verizon was meant to be verilog. didn't catch that autocorrect.


220 new.

I am so glad I don't have to buy textbooks anymore.


Maybe I should've included this but older versions are not much different from the new one and the more older ones go for dirt cheap compared to the new one.

Here's a 4th edition used book for $13.95

http://www.ebay.com/itm/Digital-Design-4th-Edition-Ciletti-M...

But yeah, the publishing industry is in a pretty bad shape (for students/consumers)


C won't help you here; the Verilog/VHDL model is very different from normal languages due to intrinsic parallelism and the different techniques you need to use - you can't allocate anything at runtime, for example. As well as language quirks like '=' vs '=>' which trip up beginners.


No, you can also go the Ada way with VHDL.

One key difference to keep in mind for digital programming is that everything happens in parallel, unless explicitly serialized, which is the opposite of the usual software development most people know about.


Here's a collection of get-started resources: http://tinyurl.com/fpga-resources

You can start with the EDA Playground tutorial, practice with HDLBits, while going through a book alongside (e.g., Harris & Harris) for examples, exercises, and best practices.

Similarly to a sibling thread, I'd also go with a free and open source flow, IceStorm (for the cheaply available iCE40 FPGAs): http://www.clifford.at/icestorm/

You can follow-up from the aforementioned tutorial and continue testing the designs on an iCE40 board -- starting here: http://hackaday.com/2015/08/19/learning-verilog-on-a-25-fpga...

Here are some really great presentations about it (slides & videos) by the creator (which can also serve in part as a general introduction):

- http://www.clifford.at/papers/2015/icestorm-flow/

- http://www.clifford.at/papers/2015/yosys-icestorm-etc/

Have fun!


Honestly, when it comes to learning logic design, if you're not already a programmer you're probably better off.

A C programmer will just spend a lot of time learning why the things they already know how to do are not useful.


I actually recommend Haskell. The division between computation and I/O is strikingly similar (and equally difficult to avoid), and they're both declarative systems.



I found VHDL much easier than C or verilog, I think it has to do with how your brain is wired.


Very interesting. I'd still like to see the JVM pick up the FPGA as a possible compile target, that way people could run apps that seamlessly used the FPGA where appropriate. I have mentioned this to Intel, who are promoting this technology (and also have a team that contributes to the JVM), but so far no one is stating publicly that they are working on such a thing.


An Intel VP mentioned it at JavaOne. He said they would provide FPGA support for OpenJDK. One central use case he mentioned would be big data & machine learning on Spark.

It was very pleasant surprise! The JVM world usually does not a have a great interface to the heterogenous world. I think it would yield tremendous benefits. FPGA-accelerated matrix multiplication, sorting, graph operations sound very appealing.

And then, as you mentioned, is the possibility of JITting things. http headers parsing ends up on the FPGA, and routes things to a message queue an actor can read. Or FPGA based actors; Does that make sense?

----

I have been unable to follow this development at all, however. Do you have any news about this project? I've been looking for a blog, a github or a mailing list, but can't find any.


Maxeler sells a compiler (and hardware) for writing FPGA apps in Java: https://www.maxeler.com


Because the model is so different there would be no benefit.


Intel already have a compression library as a proof of concept that shows a large benefit. The JVM compiler knows A) how many instructions each method is and B) how CPU hot it is. Just with a compression library, the compiler could identify very hot and very small methods and test them on the FPGA, in parallel to normal execution, and measure the performance difference, and switch to the FPGA if it was beneficial (which may be for <1% of methods). I believe the JVM already has much of the infrastructure to do such parallel method tests.


That compression library offloads compression to a dedicated FPGA implementation of the compression algorithm, not a translated version of the same code that runs on the CPU.

Approaches do exist to automatically translate some types of code to run on an FPGA, though.


Seems as though you could use a tracing jit approach to offload hot loops to an FPGA very generically. Though it'd have to be a really long-running loop to outweigh the substantial overhead in synthesizing the logic gates.


You can't just find hotspots; you have to find hotspots whose data set is sufficiently disjoint between the FPGA and the CPU. With the right architecture (such as Intel's Xeon+FPGA package), you can have an incredibly fast interconnect, but it's still not the speed of the CPU's register file, so you can't hand off data with that granularity. You can get more than enough bandwidth, but the latency would crater your performance. You want to stream larger amounts of data at a time, or let the FPGA directly access the data.

For instance, AES-NI accelerates encryption on a CPU by adding an instruction to process a step of the encryption algorithm. Compression or encryption offloading to an FPGA streams a buffer (or multiple buffers) to the FPGA. Entirely different approach. (GPU offloading has similar properties; you don't offload data to a GPU word-by-word either.)

But even if you find such hotspots, that still isn't the hardest part. You then have to generate an FPGA design that can beat optimized CPU code without hand generation. That's one of the holy grails of FPGA tool designers.

Right now, the state of the art there is writing code for a generic accelerator architecture (e.g. OpenCL, not C) and generating offloaded code with reasonable efficiency (beating the CPU, though not hitting the limits of the FPGA hardware).


It's cool to know it's an area of active research. I wonder if there are also power consumption ramifications though. While e.g. AES-NI is incomparable performance-wise, my novice (perhaps incorrect) understanding is that ARM beats x86 power consumption by having a drastically simpler instruction set.

Could a simple ARM-like instruction set plus a generic "synthesize and send this loopy junk to FPGA" have power implications without a major performance impact on cloud servers? (Yeah I know this is likely a topic for hundreds of PhD theses, but is that something being investigated too?)


java hello world will not fit even into a 10 gigagate chip


Funny thing is that bytecode is actually pretty dense, more dense than x86. But it's that everything else which makes Java images pretty huge.


Tree shaking is not difficult; the JVM just hasn't had much need for trimming its runtime OR static compilation.


This is amazing! We have been developing a tool called Rigel at Stanford (http://rigel-fpga.org) to make it much easier to develop image processing pipelines for FPGA. We have seem some really significant speedups vs CPUs/GPUs [1].

[1] http://www.graphics.stanford.edu/papers/rigel/


Are the speedups enough to negate the much higher cost of an FPGA vs a GPU ?


Given that the Amazon cloud is such a huge consumer of Intel's X86 processors, even using Amazon-tailored Xeon's, it's surprising that Amazon chose Xilinx over the Intel-owned Altera.

These Xilinx 16nm Virtex FPGA's are beasts, but Altera has some compelling choices as well. Perhaps some of the hardened IP in the Xilinx tipped the scales, such as the H.265 encode/decode, 100G EMAC, PCI-E Gen 4?


Stratix10 (the large, Intel 14nm family) was delayed, delayed, delayed, and delayed some more. Last I heard it was supposed to be in high-prio customer hands by end of 2016, but unclear if that meant "more eng samples" or the actual, final production parts. Either way Xilinx beat them to market by approx 3-6 months AFAICT.


I'm a total FPGA n00b, so here's a dumb question: what can you do with this FPGA that you can't with a GPU?

OK, here's a concrete question: I have a vector of 64 floats. I want to multiply it with a matrix of size 64xN, where N is on the order of 1 billion. How fast can I do this multiplication, and find the top K elements of the resulting N-dimensional array?


FGPA = Field Programmable Gate Array.

Basically, you can create a custom "CPU" for your particular workflow. Imagine the GPU didn't exist and you couldn't multiply vectors of floats in parallel on your CPU. You could use a FPGA to write something to multiply a vector of floats in parallel without developing a GPU. It would probably not be as fast as a GPU or the equivalent CPU, but it would be faster than doing it serially.

Another way to put it: you can create a GPU with a FPGA, but not vice versa.


Thanks. But what's the capacity of this particular FPGA? How much can it "do" ? Surely it can't emulate a dozen Xeons; so what's the upper bound on what can be done on this FPGA?


I can't answer the GPU comparison question, but I can answer the question of what you "can" do on a FPGA. Here are some example cores for FPGAs: http://opencores.org/projects

Hopefully, by browsing that list, you can see how FPGAs aren't really directly comparable to something like a GPU.


Does this mean that ML on FPGA's will be more common? Can someone comment on viability of this? Would there be speedup and if so would it be large enough to warrant rewriting it all in VHDL/Verilog?


Yes, definitely to your first and last two questions!

It's not as viable as it resulting in a large scale FPGA movement anytime soon since the the industry and academia is heavy experienced with using GPUs. The software and libraries on GPUs, like CUDA, TensorFlow and other open source libraries are very mature and are optimized for GPUs. There will have to be libraries in Verilog (I for one I'm hoping to be a part of this movement for some time now, so I'd love it if anyone can guide me to anything going on)

There are some major to minor hurdles. Although some of them might not seem like much[0], here they are:

1. Till now deep learning/machine learning researchers have been okay with learning the software stack related to GPUs and there are widespread tutorials on how to get started, etc. Verilog/VHDL is a whole different ball game and a very different thought process. (I will address using OpenCL later)

2. The toolchain being used is not open source and it's not really hackable. Although that is not that important in this case, since you're starting off writing gates from scratch, there will be problems with licensing, bugs that will be fixed at snail's pace (if ever) till there will be a performant open source toolchain (if ever, but I have hope in the community). You'll have to learn to give up at a customer service rep if you try to get help, unlike open source libraries where to head to github's issue page and get help quickly with the main devs.

3. Although this move will make getting into the game a lot easier, it will still not change the fact that people want to have control over their devices and it will take time for people to realize they have to start buying FPGAs for their data centers and use them in production, which has to happen sometime soon. Using AWS's services won't be cost effective for long term usage, just like GPUs instances(I don't know how the spot instance siutation is going to look with the FPGA instances).

This comes with it's own slew of SW problems and good luck trying to understand what's breaking what with the much slower compilation times and terribly unhelpful debugging messages.

4. OpenCL to FPGA is a mess. Only a handful of FPGAs supported using OpenCL. So this has lead to there being little to no open source development surrounding OpenCL with FPGAs in mind. And no the OpenCL libraries for GPUs cannot be used for FPGAs. More likely as from scrach rewrite. There should be a LOT more tweaking done to get them to work. OpenCL to FPGA is not as seamless as one might think and is ridden with problems. This will again, take time and energy by people familiar with FPGAs who have been largely out of the OSS movement.

Although I might come of as pessimistic, I'm largely hopeful for the future in the FPGA space. This move isn't great news just because it lowers the barrier, but introduces a chip that will be much more popular and now we have a chip for which libraries can focus their support on, compared to before, when each dev had a different board. So you'll have to get familiar with this -- Virtex Ultrascale+ XCVU9P [1]

And also, what might be interesting to you is that, Microsoft is doing a LOT on research on this.

I think all of the articles on MS's use of FPGAs can explain better than I can in this comment.

Some links to get you started: MS's blog post: http://blogs.microsoft.com/next/2016/10/17/the_moonshot_that...

Papers: https://www.microsoft.com/en-us/research/publication/acceler...

Media outlet links: https://www.top500.org/news/microsoft-goes-all-in-for-fpgas-... https://www.wired.com/2016/09/microsoft-bets-future-chip-rep...

I'd suggest started with the wired article or MS's blog post. Exciting stuff.

[0]: Remember that academia moves at a much slower pace in getting adjusted to the latest and greatest software than your average developer. The reason CUDA is still so popular although it is closed source and you can only use nvidia's GPUs is that it got in the game first and wooed them with performance. Although OpenCL is comparably performant(although there are some rare cases where this isn't true), I still see CUDA regarded as the defacto language to learn in the GPGPU space.

[1]: https://www.xilinx.com/support/documentation/selection-guide...


Would love to know what that gets priced at per hour, as well as if they plan to have smaller FPGAs available while developing.


Bitcoin mining. WPA2 brute forcing.

Maybe someone will finally find the triple-des password used at adobe for password hashing.

The possibilities are endless :)


Mining is unlikely, with bitcoin at least. Bitcoin passed the FPGA stage and moved onto ASICs many years ago. There are some alt coins that are currently best mined on GPUs though and this may change that or put their claims to a real test.


The boards used for this preview do not have enough memory bandwidth to pose even a modest threat to the latest batch of memory-hard GPU PoW algos.


So know anyone can run their High Frequency Trading business on their side :-P.

So much easier than buying hardware. Also deep learning works sometimes similarly. It's easier to play with on AWS with their hourly billing than buying hardware for many use cases.


The latencies from AWS servers to the exchanges probably would make HFT applications unfeasible.


Not when you use Amazon's new regions, us-fin-1, that is within the exchange's datacenter. /s?


That would actually be tremendously disruptive! Superb idea.


for a moment I got super excited and thought us-fin-1 was real then saw that trailing slash indicating sarcasm.

maybe we'll see High Frequency Trading For The Masses sort of situation in the future that wipes out profits for existing guys although it seems unlikely seeing how arbitrage opportunities are all automated by large capital holders.


Yeah you need to be in the colo. Also these aren't on the network card, the cpu introduces too much latency


> Xilinx UltraScale+ VU9P fabricated using a 16 nm process.

> 64 GiB of ECC-protected memory on a 288-bit wide bus (four DDR4 channels).

> Dedicated PCIe x16 interface to the CPU.

Does anyone know whether this is likely to be a plug-in card? and can I buy one to plug in to a local machine for testing?



Even if it does, this can easily sell for $10k+.


Much more. The FPGA alone will push the parts cost to ~$30K-55K+.


Yeah, the point is that you should need to buy any hardware even for development, which is the biggest win to me!


But having the hardware is vital. You have to test your design a lot. You're still going to need Vivado (which isn't cheap) and you'll need instance time to test the design on the real hardware with real workloads, along with any syntheiszable test benches you want to run on the hardware.

The pricing structure of the development AMI is going to be meaningful here, because it clearly includes some kind of Vivado license. It might not be as cheap as you expect, and you need to spend a lot of time with the synthesis tool to learn. The F1 machines themselves are certainly not going to be cheap at all.

If you want to learn FPGA development, you can get a board for less than $50 USD one-time cost and a fully open source toolchain for it -- check my sibling comments in this thread. Hell, if you really want, you can get a mid-range Xilinx Artix FPGA with a Vivado Design Edition voucher, and a board supporting all the features, for like $160, which is closer to what AWS is offering, and will still probably be quite cheap as a flat cost, if you're serious about learning what the tools can offer: http://store.digilentinc.com/basys-3-artix-7-fpga-trainer-bo... -- it supports almost all of the same basic device/Vivado features as the Virtex UltraScale, so "upgrading" to the Real Deal should be fine, once you're comfortable with the tools.


I expect a Vivado license with all my development cards.

P.S. The few orders I had were SoC Zynq boards in the $400-1000 range.


For complex designs the simulator that comes with the Vivado tools (Mentor's modelsim) is not going to cut it. I wonder if they are working on deals with Mentor (or competitors Cadence and Synopsys) to license their full-featured simulators.

Even better, maybe Amazon (and others getting into this space like Intel and Microsoft) will put their weight behind an open source VHDL/Verilog simulator. A few exist but they are pretty slow and way behind the curve in language support. Heck, maybe they can drive adoption of one of the up-and-coming HDL's like chisel, or create one even better. A guy can dream...


Nowadays, I don't believe you need a paid-for simulator like Questa, VCS, etc. I am developing verilog in my day job for FPGA's using icarus verilog (an open source simulator)which works fine for fairly large real world designs (I am also using cocotb for testing my code) and supports quite a lot of system verilog too.


As someone who has little experience with FPGAs beyond some experiments with a Spartan-6 dev board that mostly involved learning to write VHDL and building a minimal CPU, I found the simulator to be of limited use. My tiny projects were small enough that the education simulator was plenty fast. It was nice when I didn't have the board available, and occasionally, the logic analyzer was useful when I didn't understand what my code was doing to a data structure. But usually, it was just a lot easier to simply flash the board and run the thing.

What's the use of a simulator when you can spin up an AWS instance and run your program on a real FPGA?


Simulations give you better controllability and better visibility. In other words, you can poke and prod every internal piece of the design in simulation land. In real hardware, not so easy.

That being said, you are far from alone as an FPGA developer in skipping sim and going straight to hardware. Tools like Xilinx's chipscope help with the visibility problem in real hardware too.


> For complex designs the simulator that comes with the Vivado tools (Mentor's modelsim) is not going to cut it.

It's now called QuestaSim I believe. But are you sure it can't handle simulating large designs? If yes, what is the full-featured software from Mentor that can?

> Heck, maybe they can drive adoption of one of the up-and-coming HDL's like chisel

Chisel isn't a full-blown HDL from what I understand; it's only a DSL that compiles to Verilog. In other words, you'd still need a Verilog simulator to actually run your design.


Questa is the full blown tool. Modelsim is a step down and that's what comes with FPGA tools. Usually the version of modelsim that Xilinx and Altera ship is crippled performance wise.


I'd be interested in practical use cases that come to your mind (like someone who commented about genomics data processing for a university).

What could YOU use this for professionally?

(I certainly always wanted to play around with an FPGA for fun...)


Machine Learning, most likely. See this: https://news.ycombinator.com/item?id=13074021


Monte Carlo sims for options pricing? I've done this before on FPGA, might have a go at doing it for this instance as a fun exercise to test the concept!


Not sure if that makes sense with the offer that Amazon has. The machines are huge, so either you're pricing a huge amount of options at a very high speed (which you'd probably do in-house with FPGAs that you own), or you'll be much cheaper using a good machine locally. Never found MC sims to be a bottleneck regarding time, but YMMV I guess?


I've heard (although admittedly never seen in practice) that some places take a long time for this sort of things (running over a cluster of computers overnight). If you could do the same job on a single F1 instance in say an hour then I think that would be compelling! Bearing in mind that simple experiments I did showed an improvement of around 100x for this sort of task over a GPU.


Just wait till this gets combined with Lambda.


How would they do that? Since the FPGAs are not shared, I don't see how you could use it for very short-lived instances.


If the spin up time is fast enough then they could do it. Alternatively if it's active enough then there would a stream of requests processed by the same, already loaded, FPGA.


Anyone have a hardware ZLIB implementation that I can drop into my Python toolchains as a direct replacement for ZLIB to compress web-server responses with no latency?

Could also use a fast JPG encoder/decoder as well.


Why stop there? Hack your kernel to deliver network packets directly to the FPGA and then implement the whole server stack in the FPGA. Why settle for response times on the order of milliseconds when you can get nanoseconds?

But seriously, I'm open to ideas for technologies that you or anyone else needs implemented for these instances. Would make an interesting side business for me.

EDIT: I should point out that I'm an experienced "full-stack" engineer when it comes to FPGAs. I've implemented the FPGA code and the software to drive them. None of this software developed by "hardware guys" garbage.


Speaking as a hardware guy, I think that's the ultimate goal as well :)

Been planning a NIC card that directly serves web apps via HDL for a while now...


Given that your EC2 Web server is limited to 20 Gbps, you're probably better off using Intel zlib and choosing the right compression level tradeoff. If you're willing to pay a fortune for 100 Gbps of zlib then the FPGA might be more appropriate.

For JPEG the GPU instances might be better.


The problem is the latency associated with software Zlib, on the order of several milliseconds for a typical web response, and the CPU usage the entails, thereby limiting web request-response throughput.


I'm not totally up to date on it, but the RISC-V project has a tool (Chisel) that "compiles" to verilog... Interesting times for sure!


Also checkout clash-lang.org which takes an almost Haskell language to VHDL or Verilog or others.


_NOW_ things are getting really interesting!


FPGA Instances are a game changer in every way.

Let this day be known as the beginning of the end general-compute infrastructure for internet scale services.


Newbie question: What do verilog and VHDL compile down to, i.e. what is the assembly/machine language for FPGAs?


A logic-gate/register netlist, i.e. a digital schematic of your design. This is done by a synthesizer program. It is then mapped to the available resources of your chosen FPGA, by a mapping program. Now you have the logic equivalent schematic using the FPGAs resources. Then the netlist is place-and-routed to fit it into the FPGA. If the design is to large/complex or the timing requirements to strict (to high a clock frequency), this phase can fail. This phase can also take many hours to complete, even on fast computers.


Binary FPGA configuration instructions - block RAM contents, routing switch configruation, register configuration an initial state, PLL/DCM configuration, and of course LUT contents. That's the final result of the toolchain, ready to get sent to the FPGA via JTAG or written into a configuration flash chip. It's the FPGA equivalent of machine code.

For the higher level object file or assembly language, that would be a netlist - essentially a digital representation of a schematic. The HDL is transformed into a netlist, then the netlist is optimized and the components converted from generics to device-specifc components, then the placement and routing is determined, and finally a 'bit' file is generated for actually configuring the FPGA. This process can take several hours for a large design.


logic-gate layout?


Azure will be next I guess. They're already using FPGA based systems to power Bing and their Cognitive Services.


That's just anecdotal. No one has seen it. The Wired article sounded like content marketing.


They've published papers about it -- https://www.microsoft.com/en-us/research/publication/configu... -- they're giving talks about it -- Mark Russinovich was here a few weeks ago with a very long talk. Doug Burger and Derek Chiou are leading a lot of these efforts, and they're absolutely for real.

I'm not sure I agree with them that this is the right path forward (but they're smart and know their stuff, so I'm probably wrong), but it's absolutely for real.


wow. that's what was going through my mind reading this article but it quickly dawned upon me (and sad) that I probably won't be able to build anything with it as we are not solving problems that require programmable hardware but euphoric nonetheless to see this kind of innovation coming from AWS.


Is there direct DMA access to/from the network interface bypassing the CPU?


Doesn't look like it from the article. That could be very interesting, but there could be network architecture constraints that prevent Amazon from providing that from the get-go. And it wouldn't be used in all cases, so that could burn a lot of switch ports. Seems like they're targeting more compute offload and less network appliance.


Are these custom fpgas or an Altera or Xylinx?


Looks like they are using Xilinx Ultrascale+ FPGAs.


It appears to be a Xylinx.


Oh man...this is freaking awesome!


This is huge




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: