Developer Preview – EC2 Instances with Programmable Hardware

fpgaminer · on Dec 1, 2016

These FPGAs are absolutely _massive_ (in terms of available resources). AWS isn't messing around.

To put things into practical perspective my company sells an FPGA based solution that applies our video enhancement technology in real-time to any video streams up to 1080p60 (our consumer product handles HDMI in and out). It's a world class algorithm with complex calculations, generating 3D information and saliency maps on the fly. I crammed that beast into a Cyclone 4 with 40K LEs.

It's hard to translate the "System Logic Cells" metric that Xilinx uses to measure these FPGAs, but a pessimistic calculation puts it at about 1.1 million LEs. That's over 27 times the logic my real-time video enhancement algorithm uses. With just one of these FPGAs we could run our algorithm on 6 4K60 4:4:4 streams at once. That's insane.

For another estimation, my rough calculations show that each FPGA would be able to do about 7 GH/s mining Bitcoin. Not an impressive figure by today's standards, but back when FPGA mining was a thing the best I ever got out of an FPGA was 500 MH/s per chip (on commercially viable devices).

I'm very curious what Amazon is going to charge for these instances. FPGAs of that size are incredibly expensive (5 figures each). Xilinx no doubt gave them a special deal, in exchange for the opportunity to participate in what could be a very large market. AWS has the potential to push a lot of volume for FPGAs that traditionally had very poor volume. IntelFPGA will no doubt fight exceptionally hard to win business from Azure or Google Cloud.

* Take all these estimates with a grain of salt. Most recent "advancements" in FPGA density are the result of using tricky architectures. FPGAs today are still homogeneous logic, but don't tend to be as fine grained as they were. In other words, they're basically moving from RISC to CISC. So it's always up in the air how well all the logic cells can be utilized for a given algorithm.

neurotech1 · on Dec 1, 2016

Any thoughts on why AWS/Xilinx didn't go for a mid-range FPGA to help validate customer requirements?

My guess is that Amazon will have to be very careful not to price themselves out of the market, for mid-range Deep Learning based cloud apps.

Wild guestimate but I think it'll cost more than $20/hr for each instance.

fpgaminer · on Dec 1, 2016

Based on my speculation, and to make a long analysis short: fewer, bigger FPGAs are better in the cloud from a user experience perspective than more, smaller FPGAs. The big applications are all going to consume as much FPGA fabric as they can (machine learning, data analysis, etc). Even "mid-range" Deep Learning will consume these FPGAs like candy. Non-deep learning will too; they can always just go more parallel and get the job done faster.

Amazon is betting on the fact that they can get better pricing than anyone else. They probably can. No one else will be buying these FPGAs in quantities Amazon will if these instances become popular (within their niche). So for the medium sized players it'll be cheaper to rent the FPGAs from Amazon, even with the AWS markup, than to buy the boards themselves. Especially for dynamic workloads where you're saving money by renting instead of owning (which is generally the advantage of cloud resources).

That's my guess anyway.

_pmf_ · on Dec 1, 2016

It would not be inconceivable that Amazon just buys Xilinx (before someone else does).

phairoh · on Dec 1, 2016

Thank you so much for these posts, fpgaminer. They've been extremely helpful to me in framing how these things could be used.

Once upon a time I thought seriously about going in to hardware design. I took a couple different courses in college (over 10 years ago now... sigh) dealing with VHDL and/or verilog and entirely loved it. If not for a chance encounter with web programming during my co-op my career would have been entirely different. With AWS offering this in the cloud if it is not prohibitively expensive I'll be looking in to toying with it and hopefully discovering uses for it in my work.

deafcalculus · on Dec 1, 2016

What can each one of those 2.5 million "logic elements" do? Last time I used an FPGA, they were mostly made up of 4-bit LUTs.

How many NOT operations can this do per cycle (and per second)? I realise FPGAs aren't the most suited for this, but the raw number is useful when thinking about how much better the FPGA is compared to a GPU for simple ops.

fpgaminer · on Dec 1, 2016

The 2.5 million number quoted in the article is "System Logic Cells", not Logic Elements. Near as I can tell, since I haven't kept pace with Xilinx since their 7 series, a "System Logic Cell" is some strange fabricated metric which is arrived at by taking the number of LUTs in the device and multiplying by ~2. In other words, there is no such thing as a System Logic Cell, it's just a translucent number.

Anyway, the FPGAs being used here are, I believe, based on a 6-LUT (6 input, 2 output). So you'd get about 1.25 million 6-LUTs to work with, and some combination of MUXes, flip-flops, distributed RAM, block RAM, DSP blocks, etc.

Supposing Xilinx isn't doing any trickery and you really can use all those LUTs freely, then you'd be able to cram ~2.5 million binary NOTs into the thing (2 NOTs per LUT, since they're two output LUTs). So 2.5 million NOTs per cycle. I don't know what speed it'd run at for such a simple operation. Their mid-range 7 series FPGAs were able to do 32-bit additions plus a little extra logic, at ~450 MHz and consume 16 LUTs for each adder.

alexforencich · on Dec 1, 2016

6-input, 1-output or 5-input, 2-output. They're implemented as a 5-input, 2-output LUT with a bypassable 2:1 mux on the output.

al2o3cr · on Dec 1, 2016

The metrics have gotten pretty opaque since the old days when an FPGA was a "sea of LUTs" all alike; modern ones include a ton of (semi-)fixed function hardware like multiply-accumulate blocks and embedded dual-port RAM. Even the LUTs themselves can be reprogrammed into small RAM blocks or shift registers, so counting "logic elements" is mostly a marketing exercise.

fpgaminer · on Dec 1, 2016

While yes the architectures have become more "CISC-like", they aren't particularly convoluted or opaque. It's pretty easy to describe the architectures and come up with numbers for them. Xilinx could literally just say, "1 Million 6-to-2 LUTs" and that would be entirely transparent and helpful.

So it's not so much changes in architecture that have given rise to the translucency of these numbers. It's a measuring contest between Xilinx and IntelFPGA who believe you need to present bigger numbers in marketing material to win engineers. I can't speak for other FPGA engineers, but personally it just frustrates me and wastes my time. I don't ever take those numbers at face value, and I wouldn't hire anyone who did. Xilinx is the worst offender here. At least IntelFPGA will often quote their parts both in transparent terms (# of ALMs) and useful comparisons (# of equivalent LEs). I've never seen them pull a completely made up "System Logic Cell" out of thin air.

ranman · on Nov 30, 2016

If you don't click through to read about this: you can write an FPGA image in verilog/VHDL and upload it... and then run it. To me that seems like magic.

HDK here: https://github.com/aws/aws-fpga

(I work for AWS)

cottonseed · on Nov 30, 2016

This is so awesome, I can't even. I wrote arachne-pnr [0] to learn about FPGAs to get ready for this day. Just signed up, can't way to play with these!

I hope the growing popularity of FPGAs for general-purpose computing will help push the vendors to open up bitstreams and invest in open-source design tools.

[0] https://github.com/cseed/arachne-pnr

makapuf · on Dec 1, 2016

Wow Clifford is that you ? I hope this, exciting as it may be, won't make you leave open fpga efforts for the dark side (saw your talk last Fosdem, was very exciting)

aseipp · on Dec 1, 2016

Cotton is the author of arachne-pnr. Clifford is the author of Yosys and IceStorm, which are all separate projects. Not the same person.

FWIW, Clifford has recently started reversing the bits of the modern Xilinx FPGA series. So, stay tuned for a Xilinx IceStorm-equivalent sometime down the road (a few years, probably...)

cottonseed · on Dec 1, 2016

No, Clifford is cliffordvienna on HN. He wrote Yosys (and amazing piece of software) and did the iCE40 reverse engineering (amazing work). I wrote the place and router, arachne-pnr.

makapuf · on Dec 6, 2016

And kudos for that.

orbifold · on Nov 30, 2016

I'm very curious if/how you have managed to make the developer experience sane and enjoyable. I've experience with a FPGA cluster of ~800 FPGAs and it definitely does not get used to its full potential because of the tooling around it.

adamdecaf · on Nov 30, 2016

Is that repo going to be made public? It looks to be private right now.

ranman · on Nov 30, 2016

Yup, sorry -- working on fixing that now. Check back in a bit.

ranman · on Dec 1, 2016

Still not fixed. I'all reply here when it is. Might be a few days because of reinvent stuff.

__d · on Dec 1, 2016

Thanks for the update. Been chasing that link all morning :-)

ranman · on Nov 30, 2016

If you guys are curious about these announcements I'll be recapping them and going into more detail on twitch.tv/aws at 12:30 pacific

brian-armstrong · on Nov 30, 2016

Huh? Isn't Twitch just for gaming content?

tyleraldrich · on Nov 30, 2016

What others have said is true, and also note that Amazon bought Twitch 2 years ago, so I'm sure Amazon can run their own product announcements through Twitch if they want :)

EDIT: updated when amazon bought twitch, woops

frikk · on Nov 30, 2016

Nope. Twitch is excellent for all kinds of live content.

brian-armstrong · on Nov 30, 2016

https://www.twitch.tv/p/rules-of-conduct

"All content that is neither gaming-related nor permitted under the rules for Twitch Creative Conduct is prohibited from broadcast."

Crosseye_Jack · on Nov 30, 2016

I've seen many people programming on Twitch https://www.twitch.tv/directory/game/Creative/programming

While its mainly Game dev or game dev related its not limited to game dev stuff. From their FAQ https://help.twitch.tv/customer/portal/articles/2176641

  Examples of what you can broadcast on Twitch Creative:
  ...
  Programming and coding  
  Software and game development  
  Web development

EDIT: It seems that re:invent is being streamed on twitch anyway.

brian-armstrong · on Nov 30, 2016

This is a product announcement, though

Crosseye_Jack · on Nov 30, 2016

And Twitch have had TV Shows streamed on it in the past https://www.twitch.tv/whoismrrobot

My guess that was a sponsored deal or something. But as my edit from before it seems that re:invent is being streamed on Twitch anyway so guessing its all above board (and as others have said Amazon owns Twitch).

mattnewton · on Nov 30, 2016

Amazon might just let Amazon talking about their products slide ;)

res0nat0r · on Nov 30, 2016

I'm guessing it is covered under the Twitch Creative Conduct, since there is an entire Creative category now that is getting more popular which involves people painting, cosplay, digital art, etc.

shaklee3 · on Nov 30, 2016

What's the cost?

noselasd · on Nov 30, 2016

So it's tied to the PCIe bus - how do you interact with your FPGA once you programmed it - are there general drivers you can use, or do you also have to create a linux driver to talk to your FPGA ?

scott_wilson46 · on Dec 1, 2016

Xilinx provide software drivers and IP for PCIe DMA and memory mapped interfaces. These are fairly easy to integrate (probably not the best for latency though - I've developed my own but I require a specific use case - low latency but don't care about bandwidth).

eliben · on Nov 30, 2016

I'm not sure what you mean by the "magic" part here, can you please clarify?

[background: many years of writing VHDL specifically for FPGAs, using various dev boards and custom boards]

ebrewste · on Nov 30, 2016

The magic part is the thing we have gotten used to with the cloud -- virtual hardware you never see and rent by the minute. Imagine having an FPGA idea and not needing to make board, pay for a dev board, or even find a dev board in your lab... Like your idea and need more? Spin up 100 more right now...

zyngaro · on Nov 30, 2016

Exactly what I thought. This is amazing. FPGA is commonly used in embedded systems to perform application specific tasks and now application developers have access to this power too. I guess many machine learning application might take profit of that power instead of using comparatively very expensive graphics hardware.

cma · on Nov 30, 2016

How do FPGAs compare with GPUs for the inference stage of Deep Learning algorithms? Can they accelerate it a lot?

nl · on Dec 1, 2016

No, but they do use less power:

To the best of our knowledge, state-of-the-art performance for forward propagation of CNNs on FPGAs was achieved by a team at Microsoft. Ovtcharov et al. have reported a throughput of 134 images/second on the ImageNet 1K dataset [28], which amounts to roughly 3x the throughput of the next closest competitor, while operating at 25 W on a Stratix V D5 [30]. This performance is projected to increase by using top-of-the-line FPGAs, with an estimated through- put of roughly 233 images/second while consuming roughly the same power on an Arria 10 GX1150. This is com- pared to high-performing GPU implementations (Caffe + cuDNN), which achieve 500-824 images/second, while con- suming 235 W. Interestingly, this was achieved using Micros oft- designed FPGA boards and servers, an experimental project which integrates FPGAs into datacenter applications.

https://arxiv.org/pdf/1602.04283v1.pdf

shaklee3 · on Dec 1, 2016

That's hard to compare. Typically FPGAs are doing fixed-point math, so they can do more operations with less power. GPUs have traditionally done floating point. However, with the new Pascal architecture, certain cards (P4/P40) support 8-bit integer dot products, which give a massive boost in performance/W. It's still fairly high at 250W, but that's for an entire card with 24GB of memory. You'd have to compare that to an FPGA with that much memory on a PCIe card if you're doing apples to apples. Something like this is appropriate for comparison: http://www.nallatech.com/store/fpga-accelerated-computing/pc...

grandalf · on Nov 30, 2016

This is very awesome. Could you add some more thoughts on the tooling and the development workflow? Is it possible to target the Xilinx hardware using only open source (or AWS proprietary) tools? Or is Vivado still required for advanced stuff?

aseipp · on Nov 30, 2016

Vivado is required for all advanced features and programming Xilinx chips in general; like the sibling post said, there is no open FPGA toolchain implementation for Xilinx devices, especially for extremely high end ones like the ones being offered on the F1 (I expect they'd run at like, several thousand USD per device, on top of a several thousand dollar Vivado license for all the features).

It doesn't look like there's much AWS proprietary stuff here, though we'd have to wait for the SDK to be opened properly to be sure. I imagine it's mostly just making all of the stuff prepackaged and easily consumable for usage, and maybe some extra IP Cores or something for common stuff, and lots of examples. If you're already using Vivado I imagine using the F1/Cloud won't introduce any kind of major changes to what you expect.

duskwuff · on Nov 30, 2016

> I expect they'd run at like, several thousand USD per device...

You're guessing about an order of magnitude too low, actually. The VU9P FPGAs Amazon is using cost between $30,000 and $55,000 each, depending on the speed grade.

Yes, this means a fully equipped F1 instance costs nearly half a million dollars. Don't count on the instances being cheap to run.

chx · on Nov 30, 2016

Do you have a source? I am curious. http://www.digikey.com/product-detail/en/xilinx-inc/XCKU040-... this surely is not the right chip then.

duskwuff · on Dec 1, 2016

https://aws.amazon.com/ec2/instance-types/

Scroll down to "F1"; it says:

> Xilinx UltraScale+ VU9P FPGAs

The VU9P isn't available through DigiKey, but is listed by Avnet. I don't know which specific package and speed grade Amazon is using, but here's one:

https://products.avnet.com/shop/en/asia/programmable-logic/f...

RandomOpinion · on Nov 30, 2016

The press release says:

"This AMI includes a set of developer tools that you can use in the AWS Cloud at no charge. You write your FPGA code using VHDL or Verilog and then compile, simulate, and verify it using tools from the Xilinx Vivado Design Suite (you can also use third-party simulators, higher-level language compilers, graphical programming tools, and FPGA IP libraries)."

So basically, buying a copy of Vivado is the minimum. There aren't any open source tools that directly output Xilinx FPGA bitstreams that I know of.

aseipp · on Nov 30, 2016

It looks like the FPGA Developer AMI includes Vivado and a license explicitly for use on these platforms (look at the PuTTY screenshot in the blog post; it has a customized MOTD). You just need to set up the license server that Vivado will use and point it to the right license.

So I guess the real question is: what exactly is granted by the Vivado license on these AMIs? Do we get things like SDSoC, SDAccel, etc, and all the libraries? [1] The blog seems to imply you can program these things with OpenCL too (AKA SDAccel), so I'm guessing that these features are all enabled, but details about the included Vivado license in the AMI would be nice.

[1]: https://www.xilinx.com/products/design-tools/vivado.html#buy

pjmlp · on Nov 30, 2016

+1 for VHDL. :)

ap22213 · on Nov 30, 2016

that repository is 404?

grandalf · on Dec 2, 2016

Hmm this still isn't public. Any ETA?

cardigan · on Nov 30, 2016

This is really cool. Do you think it will be possible to run MongoDB on an FPGA anytime soon?

Something1234 · on Nov 30, 2016

I really hope that is sarcasm.

gnofehufo · on Nov 30, 2016

I'm currently working on this. Speedup around 2x for most operations. Not kidding, quite a few startups are currently trying to optimize typical data operations with special algorithms.

wmf · on Nov 30, 2016

Aren't there other software databases that are already more than 2x faster than Mongo and don't lose data?

gnofehufo · on Nov 30, 2016

Maybe, I'm not talking about Mongo specifically.

You can find 'equivalents' to CPU data structures for FPGAs and speed up operations on/with them while still saving power. There's lots of trouble with how buffers are used and memory is accessed. So it's not a trivial task, but IF you can optimize generic data structures and replace the existing ones you basically have 2x the speed or half the energy consumption for any DB.

brian-armstrong · on Nov 30, 2016

But what's the developer time/cost for that?

gnofehufo · on Nov 30, 2016

Totally depends on your use case.

From the blog post:

> From here, I would be able to test my design, package it up as an Amazon FPGA Image (AFI), and then use it for my own applications or list it in AWS Marketplace.

As a user of those Marketplace images, you just look at the hourly fees. Your team needs to set this up, of course, and replace the old stuff, e.g. MongoDB with a new, sped-up FPGA-MongoDB. (And you'd need to fix some new bugs.)

If time is super-critical to you, e.g. if you're working with analytics: do you really need to speed up your processing pipeline? E.g. processing stuff not once but twice per day? If yes, then you'd better off having people on your team who understand all this and are able to fix and implement stuff themselves. Second scenario would be quite a bit more expensive, but still, FPGAs aren't rocket science and there's no way around them in the future.

jonjenk · on Nov 30, 2016

Sure, but they aren't Web Scale!

wyldfire · on Nov 30, 2016

> Today we are launching a developer preview of the new F1 instance. In addition to building applications and services for your own use, you will be able to package them up for sale and reuse in AWS Marketplace.

Wow. An app store for FPGA IPs and the infrastructure to enable anyone to use it. That's really cool.

baybal2 · on Nov 30, 2016

>Wow. An app store for FPGA IPs

I see people making video transcoder instances on day 1, and MPEGLA bankrupting Amazoners with lawsuits on day 2

toomuchtodo · on Nov 30, 2016

I guess the online distribution of FPGA configurations was an eventual event?

alfalfasprout · on Nov 30, 2016

This was already a thing. Plenty of marketplaces exist for FPGA IPs. It's just not that well known because high end FGPAs run $7k+ and complex IP cores can be $20k+ for a license.

problems · on Nov 30, 2016

So if you were to get a cheap license via a service like this, do you get access to the VHDL or equivalent or could you extract it in some way?

alexforencich · on Dec 2, 2016

Probably not. Depends on how the core is distributed. Either you'll get HDL or netlists, and they may or may not be encrypted. Obviously the synthesis software has to decrypt it to use it, so like all defective by design DRM it doesn't make it impossible to get at the code, it just makes it more difficult. However, a netlist is just a schematic, so you would have to 'decompile' that back to HDL (and lose the names, comments, etc) if you want to modify it. It's also possible that you would only get the binary FPGA configuration file (this marketplace seems like one more for complete appliances and not IP cores) so you would have to back a netlist out of that somehow and then reverse-engineer it from there.

baybal2 · on Nov 30, 2016

there are "encrypted" hdls, but encrypted no more than dvd discs. Find a right irc channel and ask for keys

tener · on Nov 30, 2016

Pretty sure the AWS EULA makes you responsible for violating any IPs. I didn't read it, but if it doesn't say so already then their lawyers are crap.

duaneb · on Dec 1, 2016

> I see people making video transcoder instances on day 1, and MPEGLA bankrupting Amazoners with lawsuits on day 2

Only if they distribute it through amazon. Just put the code up in a torrent; anyone can run it without MPEGLA knowing.

ChargingWookie · on Nov 30, 2016

Yeah this is incredible for fpga users. There is now a market for freelance fpga developers

CalChris · on Nov 30, 2016

Yes this is cool if you're already using FPGAs and yeah, there will be a market for FPGA designers.

But I also think this is FPGAs for the Rest of Us. Suddenly, FPGAs are available without having to buy some development board from Xilinx, install a toolchain, use said (shitty) toolchain ...

Me, I was thinking of FPGAs as being something I'd use down the road a few years, eventually, etc. But instead, I'm looking at this right now. This morning. Waiting for the damn 404 to go away on:

https://github.com/aws/aws-fpga

This reduces the barrier to entry. It also reduces the transaction cost (h/t Ronald Coase).

aseipp · on Nov 30, 2016

I think this is going to be well outside the pricing range for most people to use for an extended period of time, which is necessary for learning a lot. Depends on the specs of the developer AMI, too, which comes with Vivado and everything. But synthesis can be insanely CPU intensive for large designs, so who knows how they'll spec it. It might cost more due to including a Vivado license. And you'll need to do extensive amounts of testing, no matter what you're doing, so be prepared to synthesize and test on "Real World" F1 instances, on top of simulating, testing, etc.

If you truly want to get started with FPGAs, you can do it on a small chip, with an open source Verilog toolchain, and open source place/route/upload. Textual source code -> hardware upload, open source. Today, for like $50! I think this is way better as a tool for educational purposes, and a lot cheaper. Also, you don't have to deal with a giant bundle of EDA crapware.

What you want is Project IceStorm, the open source Verilog flow for iCE40 FPGAs: http://www.clifford.at/icestorm/ -- you install 3 tools (Yosys, Arachne P&R, IceStorm) and you're ready to go. It's all open source, works well, and the code is all really good, too.

You can get an iCE40 HX8k breakout board for $42, and it has ~8,000 LUTs. Sure, it's small, but fully open source can't be beaten and that's cheap for learning: http://www.latticestore.com/products/tabid/417/categoryid/59...

I think this is a much better route for learning how to program FPGAs, personally, with high quality software and cheap hardware -- just supplement it with some Verilog/VHDL books or so. There are pretty decent Verilog tutorials on places like https://www.nandland.com or https://embeddedmicro.com/tutorials/beginning-electronics for example.

PeCaN · on Dec 1, 2016

You would think after 20+ years of FPGAs, commercial FPGA tools would be more usable and more productive than something a couple guys hacked together from reverse-engineering a small FPGA. But that hasn't been my (limited) experience. Hats off to the IceStorm team.

duskwuff · on Nov 30, 2016

I would not count on these instances being useful for development.

The parts Amazon is using cost somewhere between $30K and $50K each. Hourly costs will be substantial -- it will likely be cheaper to buy an entry-level development board than to spin up an F1 instance every time you want to run your design.

jeffbarr · on Nov 30, 2016

(post author here)

You can do all of the design and simulation on any EC2 instance with enough memory and cores. You don't have to run the dev toolchain on the target instance type.

duskwuff · on Dec 1, 2016

You can do some simulation on a computer, but it's much slower than real time, even for a small design. Prototyping PCIe communications is also difficult without real hardware.

gricardo99 · on Dec 1, 2016

Hi Jeff, Point of clarification for this: In instances with more than one FPGA, dedicated PCIe fabric allows the FPGAs to share the same memory address space and to communicate with each other across a PCIe Fabric at up to 12 Gbps in each direction.

Does that mean you can have FPGAs running on multiple F1 instances connected via the PCIe Fabric? It's not clear if this means FPGAs within a single F1 instance, or between multiple F1 instances.

corysama · on Dec 1, 2016

Any recommendations on an entry-level development board?

duskwuff · on Dec 1, 2016

Digilent Arty [1]. XC7A35T, 256MB RAM, Ethernet, $99.

[1]: http://store.digilentinc.com/arty-board-artix-7-fpga-develop...

davrosthedalek · on Dec 1, 2016

If you are in the academic sector, pretty much everybody has an university program where one can get hardware/software for reduced prices or as a donation.

Personally, I started dabbling with a zynq zedboard. FPGA+Hard arm cores, many stuff on board + extension capabilities.

wyldfire · on Nov 30, 2016

> install a toolchain, use said (shitty) toolchain ...

Did I miss some details? Don't you still need those shitty toolchains to do design/simulation before you'd deploy it?

CalChris · on Nov 30, 2016

Deploy? No. With the F1 instance, I can sell it without said toolchains. I need them to develop but not to deploy.

Edit: This isn't magic. This is availability. FPGA development remains hard.

aseipp · on Nov 30, 2016

It's not that simple, I think. FPGA development is extremely labor intensive, both in design and especially in verification and testing. Large scale designs increase both the time needed for synthesis, verification labor, and testing, by a lot. If you don't have extremely large scale or resource intensive designs, this probably isn't for you anyway. Making the FPGAs more readily available is good for a lot of reasons, and opens the market to some new stuff, but it doesn't necessarily dramatically change the true costs of actually developing the designs in the first place.

Basically, you're not going to deploy fabric live to your customers without having extensively tested it on your real world, production hardware setup, even if it's just a separate lab with identical hardware. That's going to be going on throughout the entire development lifecycle, and for anything remotely complex you can expect that to be a long process. You're going to be using that F1 server a lot for the development, so you have to factor in that cost. If you're a solo/small place you can probably amortize it somewhat thanks to The Cloud, however (reserved instances, on demand batching, etc).

floatboth · on Nov 30, 2016

> buy some development board from Xilinx, install a toolchain, use said (shitty) toolchain

You can start only with the toolchain's simulator, and then get a board from ebay for 50 bucks.

perlgeek · on Nov 30, 2016

I guess this will be a game changer for FPGA-mineable digital currencies. Maybe not for Bitcoin, because people have invested heavily into dedicated mining hardware, but I'm interested to see what it'll do for the smaller altcoins.

wmf · on Nov 30, 2016

For any cryptocurrency that's profitably mineable on AWS the difficulty immediately increases to the point that it's no longer profitable.

cookiecaper · on Nov 30, 2016

Yes, this IMO is the biggest flaw in bitcoin's design. It makes miners impossible to commoditize. As soon as a new ASIC becomes widely available, it defeats itself and becomes useless at the next difficulty increase.

Consumer ASICs stopped being a thing for this reason. Now we have mega-secret custom hardware that can't be shared without destroying the investment put into their development.

I don't think Satoshi's intention was that only a handful of massive investors with their own proprietary chipsets would be able to mine, but that's the natural consequence of the difficulty mechanism, and it really undermines bitcoin's core principle of decentralization.

tener · on Nov 30, 2016

Perhaps not for the entire time, but bear in mind that the prices of cryptocurrencies fluctuate and you can spin up the VMs when it is profitable and shut them down when it isn't.

nkkar · on Nov 30, 2016

Would this mean that the currency would never fluctuate below that price point?

ThrustVectoring · on Nov 30, 2016

It's a price ceiling, not a floor. If prices go up high enough, it's worth turning on EC2 instances and paying for them with the mined cryptocurrency until prices fall. There's some wiggle room above the break-even point, since there is some delay between turning on miners receiving USD, and the miner has to eat that risk.

nkkar · on Dec 1, 2016

Ah, yes, that makes sense, thank you!

ktta · on Nov 30, 2016

not if you have a free-fall dump going on. This happens when investors lose hope in the long term future

mi100hael · on Nov 30, 2016

> Maybe not for Bitcoin, because people have invested heavily into dedicated mining hardware

The thing is, it seems like people always invest heavily into dedicated hardware when using FPGAs. I'll be interested to see what people actually end up using this service for.

Sanddancer · on Nov 30, 2016

I'm surprised that no one has linked to http://opencores.org/ opencores yet. They've got a ton of vhdl code under various open licenses. The project's been around since forever and is probably a good place to start if you're curious about fpga programming.

irq-1 · on Nov 30, 2016

OVH is testing Altera chips - ALTERA Arria 10 GX 1150 FPGA Chip

https://www.runabove.com/FPGAaaS.xml

ktta · on Nov 30, 2016

If anyone is wondering how the FPGA board looks like

https://imgur.com/a/wUTIp

jeffnappi · on Dec 2, 2016

Here's an even better view: http://www.bittware.com/xilinx/wp-content/uploads/sites/5/20...

via http://www.bittware.com/xilinx/product/xupp3r/

Thanks OP

alexforencich · on Dec 1, 2016

Are they actually using that one, or is that just a board that happens to have that particular FPGA on it?

ktta · on Dec 1, 2016

The FPGA in the image is the retail version. But it's more than likely that amazon is using the same one since they don't modify the GPUs although they purchase them on a much larger scale.

alexforencich · on Dec 2, 2016

How do we know they are using that particular board from bittware as opposed to a board from a different manufacturer or even an in-house design? The linked article does not mention bittware or the board part number.

PeCaN · on Dec 1, 2016

Good lord that is beautiful. What a massive FPGA.

anujdeshpande · on Nov 30, 2016

Here's a post by Bunnie Huang, from a few months ago saying that Moore's law is dead and we will now have more of such stuff - http://spectrum.ieee.org/semiconductors/design/the-death-of-...

Pretty interesting read. Also, kudos to AWS !

prashnts · on Nov 30, 2016

For my institute this is going to be _really_ useful for Genomics data processing because we can't justify buying expensive hardware for undergrad research. Using a FPGA hardware over cloud sounds almost magical!

r00fus · on Dec 1, 2016

Wouldn't bandwidth/transfer costs basically nullify the computing gains? I know someone who used to be in genomics and cloud-anything was priced-out due to transfer costs.

brian-armstrong · on Nov 30, 2016

You can't justify buying it but you can justify renting it? Has your department heard of amortization?

op00to · on Nov 30, 2016

Most research finance departments are absolutely horrified at OpEx because any strange non-capital expenditure makes them look less efficient than the next research institute. This comes in handy when two labs are up for a grant, and they are equally qualified. The more efficient institute gets the grant. You can imagine asking for the lab credit card for EC2 time is not met with enthusiasm.

CamperBob2 · on Nov 30, 2016

That's interesting. So, buying a lot of expensive, soon-to-be-obsolete hardware makes your lab more attractive?

op00to · on Dec 1, 2016

Exactly. I haven't worked at my old lab for more than 5 years, but they are still advertising on their web page the systems I built when employed there. Woo! 2010-era blade servers!

prashnts · on Nov 30, 2016

Renting for a short period of time vs. buying the hardware are very different IMO.

krupan · on Nov 30, 2016

The traditional EDA tool companies (Mentory, Cadence, Synopsys) all tried offering their tools under a could/SaaS model a few years back and nobody went for it. Chip designers are too paranoid about their source code leaking. I wonder if that attitude will hamper adoption of this model as well?

CamperBob2 · on Nov 30, 2016

Chip designers are too paranoid about their source code leaking.

It's more an issue of being able to reproduce an existing build later on. You can't delegate ownership of the toolchain to the "cloud" (read: somebody else's computer) if you think you'll ever need to maintain the design in the future.

gricardo99 · on Dec 1, 2016

I'm not so sure that is the issue. Currently you delegate ownership of the toolchain to the EDA vendor. Sure you have tools installed locally on your machines, but the tools typically have licenses that expire, so there's never a guarantee you can build it later with the exact same toolchain. Also EDA vendors end-of-life tools at some point, so even if you pay, that tool won't exist for ever, and the license will not be renewable.

I do think the issue with cloud is the concern over IP. There are not a lot of EDA vendors, so the chances that your competitor is also using that same EDA vendor is pretty high. I think companies are pretty wary of using a cloud hosted service where you could literally be running simulations on the same machines as your competitors. Can you imagine some cloud/hosting snafu resulting in your codebase being accessible by your competitors?

EDA companies also sell ASIC/FPGA IP, and VIP (verification IP), so there's also a pretty clear conflict of interest if they have access to your IP. So, if you're really paranoid, imagine the EDA vendors themselves picking through your IP and repackaging/reselling it as IP to other customers (encrypted of course so you can't readily identify the source code)?

gricardo99 · on Dec 1, 2016

the EDA tools need your source code (HDL) to simulate or synthesize the design. But with these F1 instances, potentially the model doesn't have that problem. You develop/design an FPGA solution (some type of accelleration), then you provide it as a service. You don't expose your source code to your end customer, or the EDA tool companies.

You do however, potentially expose your source code to Amazon. But possibly not, if you do your design/testing on EDA tools under your control, then deploy FPGA build packages to the F1 instances for hardware testing.

technological · on Nov 30, 2016

Quick Question: If anyone wants to learn programming an FPGA is learning C only way to go ? how hard is to learn and program in verilog/VHDL without electrical background ?

If anyone suggests links or books, please do

Thank You

ranman · on Nov 30, 2016

I have a physics background but not an EE background. I found verilog pretty easy to grasp. VHDL took me a lot longer.

To get some basic ideas I always recommend the book code by charles petzold: https://www.amazon.com/Code-Language-Computer-Hardware-Softw...

It walks you through everything from the transistor to the operating system.

(Apparently I need to add that I work for AWS on every message so yes I work for AWS)

technological · on Dec 1, 2016

Thank you

ktta · on Nov 30, 2016

I would suggest Digital Design by Morris Mano[1]. It'll start off with basic intro from digital gates to FPGAs itself! And you really don't need any EE background for this book. This book starts from absolute basics and it'll also teach you Verilog along the way. And verilog is used more in the industry than VHDL(which more popular in Europe and in the US army for some reason).

I'm surprised where you got the idea of using C to program FPGAs, are you thinking of SystemC or OpenCL (they're both vastly different from each other)

I'm really surprised a sibling comment recommended the code book. It really meant to be a layman's reading about tech. It's a great book but it won't teach you programming FPGAs.

[1]: https://www.amazon.com/Digital-Design-Introduction-Verilog-H...

technological · on Dec 1, 2016

I thought FPGA are programmed using low level languages like C

DigitalJack · on Dec 1, 2016

Not really. C is considered high level in digital design. There are some tools for high level synthesis, from languages like C but they aren't used much.

Most Fpga "programming" is a textual description of a directed graph of logic elements in a language like vhdl or Verizon (and now systemverilog).

Synthesis engines have gotten better over the years to allow things like +,* to decribe addition and multiplication instead of describing the logic graph of multiplication.

And most Fpgas now have larger built in primitives like 18x18 multipliers now.

You can judiciously use for-loops for repeated structures.

DigitalJack · on Dec 1, 2016

verizon was meant to be verilog. didn't catch that autocorrect.

serge2k · on Nov 30, 2016

220 new.

I am so glad I don't have to buy textbooks anymore.

ktta · on Dec 1, 2016

Maybe I should've included this but older versions are not much different from the new one and the more older ones go for dirt cheap compared to the new one.

Here's a 4th edition used book for $13.95

http://www.ebay.com/itm/Digital-Design-4th-Edition-Ciletti-M...

But yeah, the publishing industry is in a pretty bad shape (for students/consumers)

pjc50 · on Nov 30, 2016

C won't help you here; the Verilog/VHDL model is very different from normal languages due to intrinsic parallelism and the different techniques you need to use - you can't allocate anything at runtime, for example. As well as language quirks like '=' vs '=>' which trip up beginners.

pjmlp · on Nov 30, 2016

No, you can also go the Ada way with VHDL.

One key difference to keep in mind for digital programming is that everything happens in parallel, unless explicitly serialized, which is the opposite of the usual software development most people know about.

matt_d · on Dec 1, 2016

Here's a collection of get-started resources: http://tinyurl.com/fpga-resources

You can start with the EDA Playground tutorial, practice with HDLBits, while going through a book alongside (e.g., Harris & Harris) for examples, exercises, and best practices.

Similarly to a sibling thread, I'd also go with a free and open source flow, IceStorm (for the cheaply available iCE40 FPGAs): http://www.clifford.at/icestorm/

You can follow-up from the aforementioned tutorial and continue testing the designs on an iCE40 board -- starting here: http://hackaday.com/2015/08/19/learning-verilog-on-a-25-fpga...

Here are some really great presentations about it (slides & videos) by the creator (which can also serve in part as a general introduction):

- http://www.clifford.at/papers/2015/icestorm-flow/

- http://www.clifford.at/papers/2015/yosys-icestorm-etc/

Have fun!

CamperBob2 · on Nov 30, 2016

Honestly, when it comes to learning logic design, if you're not already a programmer you're probably better off.

A C programmer will just spend a lot of time learning why the things they already know how to do are not useful.

duaneb · on Dec 1, 2016

I actually recommend Haskell. The division between computation and I/O is strikingly similar (and equally difficult to avoid), and they're both declarative systems.

hofstee · on Dec 1, 2016

http://www.clash-lang.org/

grandalf · on Nov 30, 2016

I found VHDL much easier than C or verilog, I think it has to do with how your brain is wired.

brendangregg · on Nov 30, 2016

Very interesting. I'd still like to see the JVM pick up the FPGA as a possible compile target, that way people could run apps that seamlessly used the FPGA where appropriate. I have mentioned this to Intel, who are promoting this technology (and also have a team that contributes to the JVM), but so far no one is stating publicly that they are working on such a thing.

BenoitP · on Nov 30, 2016

An Intel VP mentioned it at JavaOne. He said they would provide FPGA support for OpenJDK. One central use case he mentioned would be big data & machine learning on Spark.

It was very pleasant surprise! The JVM world usually does not a have a great interface to the heterogenous world. I think it would yield tremendous benefits. FPGA-accelerated matrix multiplication, sorting, graph operations sound very appealing.

And then, as you mentioned, is the possibility of JITting things. http headers parsing ends up on the FPGA, and routes things to a message queue an actor can read. Or FPGA based actors; Does that make sense?

----

I have been unable to follow this development at all, however. Do you have any news about this project? I've been looking for a blog, a github or a mailing list, but can't find any.

__d · on Dec 1, 2016

Maxeler sells a compiler (and hardware) for writing FPGA apps in Java: https://www.maxeler.com

theatrus2 · on Nov 30, 2016

Because the model is so different there would be no benefit.

brendangregg · on Nov 30, 2016

Intel already have a compression library as a proof of concept that shows a large benefit. The JVM compiler knows A) how many instructions each method is and B) how CPU hot it is. Just with a compression library, the compiler could identify very hot and very small methods and test them on the FPGA, in parallel to normal execution, and measure the performance difference, and switch to the FPGA if it was beneficial (which may be for <1% of methods). I believe the JVM already has much of the infrastructure to do such parallel method tests.

JoshTriplett · on Nov 30, 2016

That compression library offloads compression to a dedicated FPGA implementation of the compression algorithm, not a translated version of the same code that runs on the CPU.

Approaches do exist to automatically translate some types of code to run on an FPGA, though.

daxfohl · on Nov 30, 2016

Seems as though you could use a tracing jit approach to offload hot loops to an FPGA very generically. Though it'd have to be a really long-running loop to outweigh the substantial overhead in synthesizing the logic gates.

JoshTriplett · on Nov 30, 2016

You can't just find hotspots; you have to find hotspots whose data set is sufficiently disjoint between the FPGA and the CPU. With the right architecture (such as Intel's Xeon+FPGA package), you can have an incredibly fast interconnect, but it's still not the speed of the CPU's register file, so you can't hand off data with that granularity. You can get more than enough bandwidth, but the latency would crater your performance. You want to stream larger amounts of data at a time, or let the FPGA directly access the data.

For instance, AES-NI accelerates encryption on a CPU by adding an instruction to process a step of the encryption algorithm. Compression or encryption offloading to an FPGA streams a buffer (or multiple buffers) to the FPGA. Entirely different approach. (GPU offloading has similar properties; you don't offload data to a GPU word-by-word either.)

But even if you find such hotspots, that still isn't the hardest part. You then have to generate an FPGA design that can beat optimized CPU code without hand generation. That's one of the holy grails of FPGA tool designers.

Right now, the state of the art there is writing code for a generic accelerator architecture (e.g. OpenCL, not C) and generating offloaded code with reasonable efficiency (beating the CPU, though not hitting the limits of the FPGA hardware).

daxfohl · on Nov 30, 2016

It's cool to know it's an area of active research. I wonder if there are also power consumption ramifications though. While e.g. AES-NI is incomparable performance-wise, my novice (perhaps incorrect) understanding is that ARM beats x86 power consumption by having a drastically simpler instruction set.

Could a simple ARM-like instruction set plus a generic "synthesize and send this loopy junk to FPGA" have power implications without a major performance impact on cloud servers? (Yeah I know this is likely a topic for hundreds of PhD theses, but is that something being investigated too?)

baybal2 · on Nov 30, 2016

java hello world will not fit even into a 10 gigagate chip

CalChris · on Nov 30, 2016

Funny thing is that bytecode is actually pretty dense, more dense than x86. But it's that everything else which makes Java images pretty huge.

duaneb · on Dec 1, 2016

Tree shaking is not difficult; the JVM just hasn't had much need for trimming its runtime OR static compilation.

jsh3d · on Dec 1, 2016

This is amazing! We have been developing a tool called Rigel at Stanford (http://rigel-fpga.org) to make it much easier to develop image processing pipelines for FPGA. We have seem some really significant speedups vs CPUs/GPUs [1].

[1] http://www.graphics.stanford.edu/papers/rigel/

petra · on Dec 2, 2016

Are the speedups enough to negate the much higher cost of an FPGA vs a GPU ?

huntero · on Dec 1, 2016

Given that the Amazon cloud is such a huge consumer of Intel's X86 processors, even using Amazon-tailored Xeon's, it's surprising that Amazon chose Xilinx over the Intel-owned Altera.

These Xilinx 16nm Virtex FPGA's are beasts, but Altera has some compelling choices as well. Perhaps some of the hardened IP in the Xilinx tipped the scales, such as the H.265 encode/decode, 100G EMAC, PCI-E Gen 4?

rphlx · on Dec 1, 2016

Stratix10 (the large, Intel 14nm family) was delayed, delayed, delayed, and delayed some more. Last I heard it was supposed to be in high-prio customer hands by end of 2016, but unclear if that meant "more eng samples" or the actual, final production parts. Either way Xilinx beat them to market by approx 3-6 months AFAICT.

1024core · on Nov 30, 2016

I'm a total FPGA n00b, so here's a dumb question: what can you do with this FPGA that you can't with a GPU?

OK, here's a concrete question: I have a vector of 64 floats. I want to multiply it with a matrix of size 64xN, where N is on the order of 1 billion. How fast can I do this multiplication, and find the top K elements of the resulting N-dimensional array?

cheez · on Nov 30, 2016

FGPA = Field Programmable Gate Array.

Basically, you can create a custom "CPU" for your particular workflow. Imagine the GPU didn't exist and you couldn't multiply vectors of floats in parallel on your CPU. You could use a FPGA to write something to multiply a vector of floats in parallel without developing a GPU. It would probably not be as fast as a GPU or the equivalent CPU, but it would be faster than doing it serially.

Another way to put it: you can create a GPU with a FPGA, but not vice versa.

1024core · on Dec 1, 2016

Thanks. But what's the capacity of this particular FPGA? How much can it "do" ? Surely it can't emulate a dozen Xeons; so what's the upper bound on what can be done on this FPGA?

deelowe · on Nov 30, 2016

I can't answer the GPU comparison question, but I can answer the question of what you "can" do on a FPGA. Here are some example cores for FPGAs: http://opencores.org/projects

Hopefully, by browsing that list, you can see how FPGAs aren't really directly comparable to something like a GPU.

adamnemecek · on Nov 30, 2016

Does this mean that ML on FPGA's will be more common? Can someone comment on viability of this? Would there be speedup and if so would it be large enough to warrant rewriting it all in VHDL/Verilog?

ktta · on Nov 30, 2016

Yes, definitely to your first and last two questions!

It's not as viable as it resulting in a large scale FPGA movement anytime soon since the the industry and academia is heavy experienced with using GPUs. The software and libraries on GPUs, like CUDA, TensorFlow and other open source libraries are very mature and are optimized for GPUs. There will have to be libraries in Verilog (I for one I'm hoping to be a part of this movement for some time now, so I'd love it if anyone can guide me to anything going on)

There are some major to minor hurdles. Although some of them might not seem like much[0], here they are:

1. Till now deep learning/machine learning researchers have been okay with learning the software stack related to GPUs and there are widespread tutorials on how to get started, etc. Verilog/VHDL is a whole different ball game and a very different thought process. (I will address using OpenCL later)

2. The toolchain being used is not open source and it's not really hackable. Although that is not that important in this case, since you're starting off writing gates from scratch, there will be problems with licensing, bugs that will be fixed at snail's pace (if ever) till there will be a performant open source toolchain (if ever, but I have hope in the community). You'll have to learn to give up at a customer service rep if you try to get help, unlike open source libraries where to head to github's issue page and get help quickly with the main devs.

3. Although this move will make getting into the game a lot easier, it will still not change the fact that people want to have control over their devices and it will take time for people to realize they have to start buying FPGAs for their data centers and use them in production, which has to happen sometime soon. Using AWS's services won't be cost effective for long term usage, just like GPUs instances(I don't know how the spot instance siutation is going to look with the FPGA instances).

This comes with it's own slew of SW problems and good luck trying to understand what's breaking what with the much slower compilation times and terribly unhelpful debugging messages.

4. OpenCL to FPGA is a mess. Only a handful of FPGAs supported using OpenCL. So this has lead to there being little to no open source development surrounding OpenCL with FPGAs in mind. And no the OpenCL libraries for GPUs cannot be used for FPGAs. More likely as from scrach rewrite. There should be a LOT more tweaking done to get them to work. OpenCL to FPGA is not as seamless as one might think and is ridden with problems. This will again, take time and energy by people familiar with FPGAs who have been largely out of the OSS movement.

Although I might come of as pessimistic, I'm largely hopeful for the future in the FPGA space. This move isn't great news just because it lowers the barrier, but introduces a chip that will be much more popular and now we have a chip for which libraries can focus their support on, compared to before, when each dev had a different board. So you'll have to get familiar with this -- Virtex Ultrascale+ XCVU9P [1]

And also, what might be interesting to you is that, Microsoft is doing a LOT on research on this.

I think all of the articles on MS's use of FPGAs can explain better than I can in this comment.

Some links to get you started: MS's blog post: http://blogs.microsoft.com/next/2016/10/17/the_moonshot_that...

Papers: https://www.microsoft.com/en-us/research/publication/acceler...

Media outlet links: https://www.top500.org/news/microsoft-goes-all-in-for-fpgas-... https://www.wired.com/2016/09/microsoft-bets-future-chip-rep...

I'd suggest started with the wired article or MS's blog post. Exciting stuff.

[0]: Remember that academia moves at a much slower pace in getting adjusted to the latest and greatest software than your average developer. The reason CUDA is still so popular although it is closed source and you can only use nvidia's GPUs is that it got in the game first and wooed them with performance. Although OpenCL is comparably performant(although there are some rare cases where this isn't true), I still see CUDA regarded as the defacto language to learn in the GPGPU space.

[1]: https://www.xilinx.com/support/documentation/selection-guide...

klagermkii · on Nov 30, 2016

Would love to know what that gets priced at per hour, as well as if they plan to have smaller FPGAs available while developing.

majke · on Nov 30, 2016

Bitcoin mining. WPA2 brute forcing.

Maybe someone will finally find the triple-des password used at adobe for password hashing.

The possibilities are endless :)

problems · on Nov 30, 2016

Mining is unlikely, with bitcoin at least. Bitcoin passed the FPGA stage and moved onto ASICs many years ago. There are some alt coins that are currently best mined on GPUs though and this may change that or put their claims to a real test.

rphlx · on Dec 1, 2016

The boards used for this preview do not have enough memory bandwidth to pose even a modest threat to the latest batch of memory-hard GPU PoW algos.

jakozaur · on Nov 30, 2016

So know anyone can run their High Frequency Trading business on their side :-P.

So much easier than buying hardware. Also deep learning works sometimes similarly. It's easier to play with on AWS with their hourly billing than buying hardware for many use cases.

zitterbewegung · on Nov 30, 2016

The latencies from AWS servers to the exchanges probably would make HFT applications unfeasible.

spullara · on Nov 30, 2016

Not when you use Amazon's new regions, us-fin-1, that is within the exchange's datacenter. /s?

grandalf · on Nov 30, 2016

That would actually be tremendously disruptive! Superb idea.

brilliantcode · on Nov 30, 2016

for a moment I got super excited and thought us-fin-1 was real then saw that trailing slash indicating sarcasm.

maybe we'll see High Frequency Trading For The Masses sort of situation in the future that wipes out profits for existing guys although it seems unlikely seeing how arbitrage opportunities are all automated by large capital holders.

theocean154 · on Nov 30, 2016

Yeah you need to be in the colo. Also these aren't on the network card, the cpu introduces too much latency

RossBencina · on Nov 30, 2016

> Xilinx UltraScale+ VU9P fabricated using a 16 nm process.

> 64 GiB of ECC-protected memory on a 288-bit wide bus (four DDR4 channels).

> Dedicated PCIe x16 interface to the CPU.

Does anyone know whether this is likely to be a plug-in card? and can I buy one to plug in to a local machine for testing?

jasonwatkinspdx · on Nov 30, 2016

https://www.xilinx.com/products/boards-and-kits/device-famil...

smilekzs · on Nov 30, 2016

Even if it does, this can easily sell for $10k+.

duskwuff · on Nov 30, 2016

Much more. The FPGA alone will push the parts cost to ~$30K-55K+.

errordeveloper · on Nov 30, 2016

Yeah, the point is that you should need to buy any hardware even for development, which is the biggest win to me!

aseipp · on Nov 30, 2016

But having the hardware is vital. You have to test your design a lot. You're still going to need Vivado (which isn't cheap) and you'll need instance time to test the design on the real hardware with real workloads, along with any syntheiszable test benches you want to run on the hardware.

The pricing structure of the development AMI is going to be meaningful here, because it clearly includes some kind of Vivado license. It might not be as cheap as you expect, and you need to spend a lot of time with the synthesis tool to learn. The F1 machines themselves are certainly not going to be cheap at all.

If you want to learn FPGA development, you can get a board for less than $50 USD one-time cost and a fully open source toolchain for it -- check my sibling comments in this thread. Hell, if you really want, you can get a mid-range Xilinx Artix FPGA with a Vivado Design Edition voucher, and a board supporting all the features, for like $160, which is closer to what AWS is offering, and will still probably be quite cheap as a flat cost, if you're serious about learning what the tools can offer: http://store.digilentinc.com/basys-3-artix-7-fpga-trainer-bo... -- it supports almost all of the same basic device/Vivado features as the Virtex UltraScale, so "upgrading" to the Real Deal should be fine, once you're comfortable with the tools.

user5994461 · on Nov 30, 2016

I expect a Vivado license with all my development cards.

P.S. The few orders I had were SoC Zynq boards in the $400-1000 range.

krupan · on Nov 30, 2016

For complex designs the simulator that comes with the Vivado tools (Mentor's modelsim) is not going to cut it. I wonder if they are working on deals with Mentor (or competitors Cadence and Synopsys) to license their full-featured simulators.

Even better, maybe Amazon (and others getting into this space like Intel and Microsoft) will put their weight behind an open source VHDL/Verilog simulator. A few exist but they are pretty slow and way behind the curve in language support. Heck, maybe they can drive adoption of one of the up-and-coming HDL's like chisel, or create one even better. A guy can dream...

scott_wilson46 · on Dec 1, 2016

Nowadays, I don't believe you need a paid-for simulator like Questa, VCS, etc. I am developing verilog in my day job for FPGA's using icarus verilog (an open source simulator)which works fine for fairly large real world designs (I am also using cocotb for testing my code) and supports quite a lot of system verilog too.

LeifCarrotson · on Dec 1, 2016

As someone who has little experience with FPGAs beyond some experiments with a Spartan-6 dev board that mostly involved learning to write VHDL and building a minimal CPU, I found the simulator to be of limited use. My tiny projects were small enough that the education simulator was plenty fast. It was nice when I didn't have the board available, and occasionally, the logic analyzer was useful when I didn't understand what my code was doing to a data structure. But usually, it was just a lot easier to simply flash the board and run the thing.

What's the use of a simulator when you can spin up an AWS instance and run your program on a real FPGA?

krupan · on Dec 3, 2016

Simulations give you better controllability and better visibility. In other words, you can poke and prod every internal piece of the design in simulation land. In real hardware, not so easy.

That being said, you are far from alone as an FPGA developer in skipping sim and going straight to hardware. Tools like Xilinx's chipscope help with the visibility problem in real hardware too.

Cyph0n · on Nov 30, 2016

> For complex designs the simulator that comes with the Vivado tools (Mentor's modelsim) is not going to cut it.

It's now called QuestaSim I believe. But are you sure it can't handle simulating large designs? If yes, what is the full-featured software from Mentor that can?

> Heck, maybe they can drive adoption of one of the up-and-coming HDL's like chisel

Chisel isn't a full-blown HDL from what I understand; it's only a DSL that compiles to Verilog. In other words, you'd still need a Verilog simulator to actually run your design.

krupan · on Dec 1, 2016

Questa is the full blown tool. Modelsim is a step down and that's what comes with FPGA tools. Usually the version of modelsim that Xilinx and Altera ship is crippled performance wise.

the_duke · on Nov 30, 2016

I'd be interested in practical use cases that come to your mind (like someone who commented about genomics data processing for a university).

What could YOU use this for professionally?

(I certainly always wanted to play around with an FPGA for fun...)

ktta · on Nov 30, 2016

Machine Learning, most likely. See this: https://news.ycombinator.com/item?id=13074021

scott_wilson46 · on Dec 1, 2016

Monte Carlo sims for options pricing? I've done this before on FPGA, might have a go at doing it for this instance as a fun exercise to test the concept!

dx034 · on Dec 1, 2016

Not sure if that makes sense with the offer that Amazon has. The machines are huge, so either you're pricing a huge amount of options at a very high speed (which you'd probably do in-house with FPGAs that you own), or you'll be much cheaper using a good machine locally. Never found MC sims to be a bottleneck regarding time, but YMMV I guess?

scott_wilson46 · on Dec 1, 2016

I've heard (although admittedly never seen in practice) that some places take a long time for this sort of things (running over a cluster of computers overnight). If you could do the same job on a single F1 instance in say an hour then I think that would be compelling! Bearing in mind that simple experiments I did showed an improvement of around 100x for this sort of task over a GPU.

koolba · on Nov 30, 2016

Just wait till this gets combined with Lambda.

dx034 · on Dec 1, 2016

How would they do that? Since the FPGAs are not shared, I don't see how you could use it for very short-lived instances.

koolba · on Dec 1, 2016

If the spin up time is fast enough then they could do it. Alternatively if it's active enough then there would a stream of requests processed by the same, already loaded, FPGA.

mozumder · on Nov 30, 2016

Anyone have a hardware ZLIB implementation that I can drop into my Python toolchains as a direct replacement for ZLIB to compress web-server responses with no latency?

Could also use a fast JPG encoder/decoder as well.

fpgaminer · on Dec 1, 2016

Why stop there? Hack your kernel to deliver network packets directly to the FPGA and then implement the whole server stack in the FPGA. Why settle for response times on the order of milliseconds when you can get nanoseconds?

But seriously, I'm open to ideas for technologies that you or anyone else needs implemented for these instances. Would make an interesting side business for me.

EDIT: I should point out that I'm an experienced "full-stack" engineer when it comes to FPGAs. I've implemented the FPGA code and the software to drive them. None of this software developed by "hardware guys" garbage.

mozumder · on Dec 1, 2016

Speaking as a hardware guy, I think that's the ultimate goal as well :)

Been planning a NIC card that directly serves web apps via HDL for a while now...

wmf · on Nov 30, 2016

Given that your EC2 Web server is limited to 20 Gbps, you're probably better off using Intel zlib and choosing the right compression level tradeoff. If you're willing to pay a fortune for 100 Gbps of zlib then the FPGA might be more appropriate.

For JPEG the GPU instances might be better.

mozumder · on Dec 1, 2016

The problem is the latency associated with software Zlib, on the order of several milliseconds for a typical web response, and the CPU usage the entails, thereby limiting web request-response throughput.

_ytji · on Nov 30, 2016

I'm not totally up to date on it, but the RISC-V project has a tool (Chisel) that "compiles" to verilog... Interesting times for sure!

emmelaich · on Dec 1, 2016

Also checkout clash-lang.org which takes an almost Haskell language to VHDL or Verilog or others.

_nrvs · on Nov 30, 2016

_NOW_ things are getting really interesting!

mmosta · on Nov 30, 2016

FPGA Instances are a game changer in every way.

Let this day be known as the beginning of the end general-compute infrastructure for internet scale services.

lisper · on Dec 1, 2016

Newbie question: What do verilog and VHDL compile down to, i.e. what is the assembly/machine language for FPGAs?

JensSeiersen · on Dec 1, 2016

A logic-gate/register netlist, i.e. a digital schematic of your design. This is done by a synthesizer program. It is then mapped to the available resources of your chosen FPGA, by a mapping program. Now you have the logic equivalent schematic using the FPGAs resources. Then the netlist is place-and-routed to fit it into the FPGA. If the design is to large/complex or the timing requirements to strict (to high a clock frequency), this phase can fail. This phase can also take many hours to complete, even on fast computers.

alexforencich · on Dec 1, 2016

Binary FPGA configuration instructions - block RAM contents, routing switch configruation, register configuration an initial state, PLL/DCM configuration, and of course LUT contents. That's the final result of the toolchain, ready to get sent to the FPGA via JTAG or written into a configuration flash chip. It's the FPGA equivalent of machine code.

For the higher level object file or assembly language, that would be a netlist - essentially a digital representation of a schematic. The HDL is transformed into a netlist, then the netlist is optimized and the components converted from generics to device-specifc components, then the placement and routing is determined, and finally a 'bit' file is generated for actually configuring the FPGA. This process can take several hours for a large design.

anilgulecha · on Dec 1, 2016

logic-gate layout?

jordz · on Nov 30, 2016

Azure will be next I guess. They're already using FPGA based systems to power Bing and their Cognitive Services.

XnoiVeX · on Nov 30, 2016

That's just anecdotal. No one has seen it. The Wired article sounded like content marketing.

dgacmu · on Nov 30, 2016

They've published papers about it -- https://www.microsoft.com/en-us/research/publication/configu... -- they're giving talks about it -- Mark Russinovich was here a few weeks ago with a very long talk. Doug Burger and Derek Chiou are leading a lot of these efforts, and they're absolutely for real.

I'm not sure I agree with them that this is the right path forward (but they're smart and know their stuff, so I'm probably wrong), but it's absolutely for real.

brilliantcode · on Nov 30, 2016

wow. that's what was going through my mind reading this article but it quickly dawned upon me (and sad) that I probably won't be able to build anything with it as we are not solving problems that require programmable hardware but euphoric nonetheless to see this kind of innovation coming from AWS.

Ceriand · on Nov 30, 2016

Is there direct DMA access to/from the network interface bypassing the CPU?

alexforencich · on Dec 1, 2016

Doesn't look like it from the article. That could be very interesting, but there could be network architecture constraints that prevent Amazon from providing that from the get-go. And it wouldn't be used in all cases, so that could burn a lot of switch ports. Seems like they're targeting more compute offload and less network appliance.

SEJeff · on Nov 30, 2016

Are these custom fpgas or an Altera or Xylinx?