Designing a Programming Language for the Desert

keldaris · on June 19, 2018

How close is Futhark to being considered production-ready? How close is the generated code to hand-optimized OpenCL for real problems (I did look at the benchmarks, but only the Rodinia suite had comparison numbers, and those are trivial to beat)?

OpenCL is easy to write for anyone who knows C (far more people than fans of functional programming) and moderately difficult to write decently well. What's the value proposition of Futhark? Is it just that some people prefer writing in a functional DSL rather than C, or does it purport to enable some optimizations that are hard to write well manually?

Athas · on June 19, 2018

> How close is Futhark to being considered production-ready?

It's not 1.0 yet, and there are ways to trip up the compiler by accidentally writing irregular code (and the workarounds are nonobvious), but several people have used it productively (albeit non-production by most standards). Addressing this is part of the research we're doing, and we are quite sure we know how to get there.

> OpenCL is easy to write for anyone who knows C (far more people than fans of functional programming) and moderately difficult to write decently well. What's the value proposition of Futhark? Is it just that some people prefer writing in a functional DSL rather than C, or does it purport to enable some optimizations that are hard to write well manually?

In principle, a skilled GPU programmer can always outperform Futhark. The value proposition of Futhark is partly that you can obtain decent code without having low-level knowledge (usually within x2 of hand-written performance), and partly that even a skilled GPU programmer can get there much faster. Most low-level GPU languages (like OpenCL and CUDA) make it very hard to write code that is both fast and modular. For example, you might write some really nice CUDA kernel that operates on vectors, but there is no obvious way to apply it to every row of a matrix - probably you will have to write a new kernel from scratch; one that is very similar to the old one, but without direct code reuse. In Futhark, you would just 'map' the old function over the matrix, and let the compiler figure out how to turn it into one or more segmented operations.

Essentially, the Futhark compiler is not bound by the restriction that the optimised code should be maintainable by humans, so it is able to use inlining, duplication, fusion and a host of other transformations that are helpful for performance, but no sane programmer would write if they cared about the maintainability of their code.

In practice, Futhark usually gets beat on single primitive operations (matrix multiplication, FFT, etc), but at an application level, it tends to do well.

keldaris · on June 19, 2018

Thank you for the detailed response. I'll try to allocate a weekend to port some of my usual scientific GPGPU test cases to Futhark and see how it goes.

> Most low-level GPU languages (like OpenCL and CUDA) make it very hard to write code that is both fast and modular.

That's absolutely true, which is why a frequent solution is to achieve modularity by using simple metaprogramming to generate kernel code from higher level constraints. I understand Futhark aims to obviate at least a subset of these issues, so there's value there.

> Essentially, the Futhark compiler is not bound by the restriction that the optimised code should be maintainable by humans, so it is able to use inlining, duplication, fusion and a host of other transformations that are helpful for performance, but no sane programmer would write if they cared about the maintainability of their code.

Am I correct in assuming that you can still (easily) obtain, inspect and profile the Futhark-generated OpenCL code using the usual tools? I suppose a sensible workflow could be using Futhark to get as far as you can and then hand optimize the bottleneck kernels further.

> In practice, Futhark usually gets beat on single primitive operations (matrix multiplication, FFT, etc), but at an application level, it tends to do well.

Do you happen to know of any scientific/engineering applications (lattice Boltzmann methods, molecular dynamics, etc.) using Futhark in the wild? From a quick glance at the website it seemed like all of the applications are focused on image/video processing, any reason for that?

Athas · on June 19, 2018

> Am I correct in assuming that you can still (easily) obtain, inspect and profile the Futhark-generated OpenCL code using the usual tools? I suppose a sensible workflow could be using Futhark to get as far as you can and then hand optimize the bottleneck kernels further.

Sort of. Futhark generates pretty simple OpenCL host code, but there is not yet an obvious way to tie the generated kernels back to the original Futhark source code. As a result, it's easy enough to detect that some specific kernel is the bottleneck, and often also why, but it can be a bit of a puzzle to connect it to a part of the original program. While you can in principle edit the generated kernels yourself, it's not code that is at all nice to read or modify.

> Do you happen to know of any scientific/engineering applications (lattice Boltzmann methods, molecular dynamics, etc.) using Futhark in the wild?

No. Closest are simple things like nbody[0] simulations.

[0]: https://github.com/diku-dk/futhark-benchmarks/tree/master/ac...

> From a quick glance at the website it seemed like all of the applications are focused on image/video processing, any reason for that?

Not sure. Most of these are stencils, and while Futhark does alright with stencils, it doesn't do anything particularly clever (no hexagonal tiling, for example). Maybe it's just that they are easy and satisfying to write, because you trivially get something visual at the end.

Futhark's design was originally inspired by nasty financial algorithms (the ones from the FinPar suite[1]), which tend to be a combination of Monte Carlo methods and differential equations. I'd say that is Futhark's main strength.

[1]: https://dl.acm.org/citation.cfm?id=2898354

emteycz · on June 19, 2018

Is it true that more people know OpenCL than functional programming? To me it seems like functional programming is a lot more prevalent in domains where this would be used, but I don't have any numbers. Do you by any chance have a source?

keldaris · on June 19, 2018

I didn't assert that more people know OpenCL than functional programming, I said far more people know C than functional programming. Anyone who knows C can write (bad) OpenCL after a trivial tutorial, so the initial barrier to entry is low. Writing decently optimized OpenCL is, as I said, moderately difficult and requires some domain specific knowledge, and it scales up from there if you want to get close to optimal.

emteycz · on June 19, 2018

Sorry, I also meant C. Let me correct my sentence: Do more people in verticals that would make use of this know more C or functional programming?

keldaris · on June 19, 2018

Unequivocally, C. Futhark is clearly meant for reasonably high performance computing and I don't know of a single person in that domain that isn't a competent C programmer even if they spend a lot of time writing code in other languages (C++ mostly, also Fortran, Julia, Python with Cython/Numba, etc.). Functional programming, on the other hand, is something many (most, I'd contend) people in this domain consciously stay away from, myself included.

tehsauce · on June 19, 2018

These DSLs for high performance parallel computing based on purely functional algorithm descriptions are very exciting- halide and accelerate are similar projects. I can see as the demand for writing parallel, portable code increases technology like these becoming essential tools.

mkirklions · on June 19, 2018

What applications?

lou1306 · on June 19, 2018

Most algorithms related to computer vision and image processing come to mind. Of course there are efficient libraries written in general-purpose languages, but DSLs could lead to smaller, easily maintainable codebases.

Nzen · on June 19, 2018

tl;dr futhark is a DSL for transpiling non-graphics parallel submodules for other programs. Which is to say, if one were to write a N-queens futhark program, it could create C, C#, or Python modules for use in a larger program that would leverage OpenCL for parallelizing the chess moves.

This blog post declares some culture boundaries. Namely, the team intends to keep the dsl targeted at this domain, rather than expanding to general computation in some far future. This enables them to focus on common target language constructs and keep their documentation efforts toward similar experts. They acknowledge that this demands more of their users, for <example all> external imports must be in the same folder when compiled, but allows for simplicity and clarity all around.

Hence, the author uses the analogy of a desert creature that survives in an environment of constrained resources.

earenndil · on June 19, 2018

Honestly, if it's not intended for general use, I don't think it's worth it to complicate the build process, and complicate the process of setting up a dev environment by installing a whole new compiler toolchain just for a couple of small functions.

pjmlp · on June 19, 2018

While I agree with your point of view regarding production code, it is this continuous ongoing effort of such language designers that eventually (with a bit of luck as well) turn a programming language from "yet another one" to something relevant.

awb · on June 19, 2018

> Thus, even were Futhark to become the indisputably best language in its area - which is certainly the goal! - the user base would still be small, and so would the development resources.

Exactly. If I'm selecting a language for any critical components, it's going to be one with a strong community, not something obscure that might be a little less verbose or a little more performant.

Athas · on June 19, 2018

What if it's a lot more performant? It's of course always a risk to incorporate another language, but sometimes performance is a critical feature, yet you have no time or inclination to write (say) low-level GPU code by hand. Futhark is more of an alternative to CUDA or OpenCL than to high-level languages.

Also, there is the important property that Futhark programs tend to be fairly small, and are semantically close to ordinary functional programming. Thus, even in a worst case situation where the Futhark compiler is abandoned, it is a reasonable task to simply rewrite all the code in another high-level language.

earenndil · on June 19, 2018

> sometimes performance is a critical feature, yet you have no time or inclination to write (say) low-level GPU code by hand

I also have no time or inclination to integrate an entirely new tool into my build chain, complicate and lengthen the build process, and increase the cognitive load required to begin working on the project.

Athas · on June 19, 2018

Sure, that's a fine judgment for most applications. That's exactly why high-performance languages (of any kind) necessarily must remain somewhat niche.

Although I will point out that the point of this post is exactly to try to minimise the build system complication and cognitive load as much as possible, so Futhark will be useful to just slightly more projects.

RasmusWL · on June 19, 2018

That's totally fine.

Futhark is an offer for people that need high performance code, but could spend their time on better things than writing CUDA/OpenCL code by hand.

Clearly that's not everyone. Maybe you don't need high performance code. Maybe you need to squeeze every last bit of performance out of your GPUs, so you actually need to spend time writing your own CUDA/OpenCL code.

So while it might not be worth it for you, it might be worth it for people trying to solve other kinds of problems :)

crummy · on June 19, 2018

That's probably why they focussed on making the compiler as simple and straightforward to use as possible, so it does not significantly increase the complexity of your build process.

orbifold · on June 19, 2018

You would still need to wrap it in some form for it to be useable from most other build systems except make. For build systems such as bazel this is quite some work for little immediate benefit.

taneq · on June 19, 2018

> Fairly sure you don’t have to watch out for sandworms in a real desert.

Doesn't that map rather well to being acquired by $bigcorp?

erikpukinskis · on June 19, 2018

Transcendant read. Bang on.

dom96 · on June 19, 2018

So futhark is designed to be small and specific to a single class of problems. Has the author considered implementing it in another language as a DSL?

Athas · on June 19, 2018

Yes, we considered that. There were two reasons we wrote a standalone language instead:

* It gives more flexibility and simplicity, regarding both syntax and type system. The compiler can also more easily analyse the AST, because there are no artifacts of an embedding.

* An embedded DSL is practically available only to users of the host language. In contrast, Futhark is equally accessible to many languages, either via direct code generation (such as for Python), or using a fairly simple C FFI.

The best example of an embedded high performance DSL is, I believe, Accelerate[0] for Haskell. It is really good, but is ultimately only accessible to Haskell programmers, and the implementation has to solve non-trivial problems related to the embedding.

[0]: http://hackage.haskell.org/package/accelerate