If only it was that easy: change programming languages, change programming models, and poof! Magical parallelism.
But parallelism is harder than that. It's an algorithm problem, a design problem, not a language or code problem. While OpenCL might be harder to write than plain C, for anything except the most embarrassingly parallel problems, that difficulty pales in comparison to making the solution parallel to begin with.
For every problem where you can simply split into masses of shared-nothing tasks, there's a dozen others where you can't. Rate-constrained video compression. Minimax AI search. Emulation. All of these can be parallelized, but it requires parallelizing the algorithm and making sacrifices that are light-years beyond what a dumb compiler that isn't even allowed to change program output (let alone rewrite half the program) could do.
Modifying an algorithm -- possibly even changing its structure entirely, changing its output, or even accepting nondeterminism -- is inherent complexity, to use the terminology of Mythical Man Month. No programming language or tool can eliminate this complexity. Good tools can make it easier to implement an algorithm efficiently, but they really can't take a complicated application and figure out how to change its design, structure, and behavior. Until we have AI-complete compilers that take program specs as "code" to be compiled, that's a human job.
One significant issue is that most programming languages are inherently temporally inexpressive, having either full order (imperative/impure functional) or no order (purely functional). Ideally the language would make it easy to organize statement in such a way that a programmer could indicate a partial ordering relation without having to use function composition. Additionally, a big problem with function composition in purely functional programming is that multiple long function compositions can only be ordered as monolithic units, however in many cases being able to order at the individual function level is desirable. If a language had an explicit notion of relative temporal index for statements (which could be adjusted) it would be a nice win for writing concurrent code. That would also let compiler writers lift a lot of the stuff they do up to the program source level as macros (which would be a HUGE win).
A futures library solves the case where, in an imperative programming language, the programmer writes a block to compute A, a block to compute B, then combines A and B. Pure FP works fine in that case, since writing (+ (compute-A) (compute-B)) does the same ordering.
It seems to me pure FP is fine unless computations for A and B have interdependencies, which would result in duplicate computation in pure FP unless you can refactor to a different algorithm. I'm not sure I understand what you mean by wanting to control ordering at a "the individual functional level", unless you mean interdependencies between computations done in "monolithic units".
A->(loop B until done)->C
A->(loop D until done)->E
C-\
+-> F
E-/
Something like that. Am I one the right track?
No, because I can still express that in pure FP:
(let [(A (compute-A))]
(let [(C (compute-C A)
(E (compute-E A)]
(compute-F C E)))
Trying again ...
A->(loop B until (valid B D))->C
A->(loop D until (valid D B))->E
C-\
+-> F
E-/
Is that better? Care to help me understand what you're getting at?
I was thinking more along the lines of working around data-locality issues. For instance, imagine that you have code that requires very high latency fetches, or you are working with a data set that doesn't fit in memory. Typically, you have to develop modified algorithms that for these scenarios, but there is no reason that a minimal fiber scheduler couldn't adapt seamlessly. Even better, if your conditions change (like for instance, mobile devices moving from low throughput to high throughput links) a scheduler can adapt, but the hand rolled algorithm must be re-coded.
> No programming language or tool can eliminate this complexity.
But some languages can help get you there. Programming languages are tools and these tools come with idioms and paradigms of their own.
Certain paradigms help, often by constraining the developer to think and write in a way that makes the final result more parallelizable.
For example functional languages with immutable data structures help guide the implementation towards that. Actor models also do that.
Take a program in Erlang that someone wrote 5 years ago. They split it into 10 concurrent, cpu-bound, actors. Years ago it ran on a single CPU hardware. So the code was robust and it benefited from process isolation but didn't actually execute in parallel. It might have even been slower than the equivalent C++ or Java code.
Now all of the sudden there is a performance requirement. The site goes "viral" as they say. Now there are millions of requests per day not hundreds. If the program was written idiomatically correct all they have to do is get a big hunkin' box with 16 cpus and lots of ram. And all of the sudden those actors are working in parallel. No change to the code needed.
Next month the site goes "pandemic" so now they have to think about scaling beyond a single machine. No problem, they just move some of the processes (actors) to another node running on another machine. The way to send a message to an actor is still the same Pid ! Msg, just like it was in 2006 when running on a single core CPU. Still minimal code changes needed.
This is an example where languages and toolsets help guide developers in a certain way.
You brought the example of algorithms. The thing is, I believe, not that many programmers these days design algorithms, unless they are involved in research or are in graduate school. Most programmers engineer systems by stitching together components and APIs. I think with the right frame of mind and after some practice, it is not that hard to see how to split a lot of these components into concurrent pieces that can then be parallelized if the correct language and toolsets are used. It is often not the inherent problem but the way the mind of the programmer has been molded by years of applying another paradigm (ex. imperative, OO, mutable state).
You seem to be of the opinion that designing a parallel architecture or algorithm is really the difficult bit, and that what we need is the magical paralleliferizing compiler which as you say, doesn't exist.
Designing a parallel approach to a problem, where such a solution is possible, is always going to be easier than trying to implement it correctly using threads and locks and mutable state.
FP can make some of the basically-impossible possible, with e.g. STM to avoid thread deadlocks and starvation, or even better, by supporting deterministic parallelism so you don't have to mess with threads and concurrency at all.
Designing a parallel approach to a problem, where such a solution is possible, is always going to be easier than trying to implement it correctly using threads and locks and mutable state.
I haven't found this to be true in my (limited) experience. In writing sliced-threading support for x264, threading issues took up only a small percentage of my time. Admittedly, this is a rather simple application: the frame is split up into independent threads which never wait on each other, communicate asynchronously without any attempt at determinism, and all finish before the program can continue.
My experience is limited, but a generalization like "always going to be easier" seems rather at odds with reality. I have never found dealing with mutexes to be difficult in any situation. What I have found to be difficult is trying to design lock-free code, using memory barriers, and other trickiness to avoid slow locks in situations where the overhead of locking is just intolerable. This is definitely an application where things like pre-made lockless datastructures and the like come in handy.
I've been doing some real-time audio programming lately, which requires the same techniques. You can't take a lock or even malloc in this kind of code, so you're forced to implement things like lock-free ring buffers to communicate with the rest of the world. In my experience this is 10x harder than conventional lock-based parallelism and, unfortunately, I don't see any FP lang coming to the rescue in this domain any time soon.
> My experience is limited, but a generalization like "always going to be easier" seems rather at odds with reality.
You're probably better at reasoning about nondeterminism than I am. In my also limited experience, even trivial parallel algorithms are tricky to get right. So it seems to follow that an algorithm that took more thought (say, something using speculative parallelism) would likewise be even trickier to implement correctly.
What you say is true (http://en.wikipedia.org/wiki/Amdahls_law), but it's not really the point. Most programming is not that hard, it's just moving data from one place to another, operating on that data, and then moving it somewhere else. Using functional languages trivially allows you parallelize these types of operations, and forces you to think about problems in a way that will allow you to parallelize other things more easily.
So you're right, good design is a human's job, but when you're just doing things the grunt work that most of software development is, having tools that enforce The Right Thing is very helpful.
This isn't about Amdahl's law; this is about the fact that many problems are simply hard. Let's look at some examples.
One common example is the need to split a larger task into smaller subcomponents, but where all of the components combined need to obey some global constraints. A specific example: we're compressing a video frame, the result must fit within a certain size, and we can't parallelize over multiple frames for latency reasons. This means we need to split the frame into chunks, but somehow all the chunks have to communicate with each other in real-time, as they work, in order to ensure they obey the global constraint. Do you have a "master" thread that manages them all and makes decisions? Do you use some sort of algorithm where they act as separate agents, asking each other questions? Suddenly this is a lot more complicated than what you started with.
Another example is a search algorithm. Whether you're performing a minimax search of a game tree or simplex optimization, you're implementing algorithms that are normally not parallel. Parallelizing minimax is actually incredibly difficult and requires making a lot of hard decisions about where you branch threads, where a thread gives up, how to avoid duplication, and so forth. Fancy programming tools can give you features like thread-safe hash tables that help you, but they don't solve the actual problem. See any multithreaded chess engine for an example of this problem. Note particularly that the engines don't get perfect speedup from multithreading -- but it's NOT because of Amdahl's Law! It's because the searches between threads unavoidably duplicate at least some work.
"Moving data, operating on it" would be grossly oversimplifying real-world, complex programs like these, and there's nothing a functional language would do to "trivially parallelize" them. Dependencies in calculations are often so tangling that you cannot naively parallelize them without making dramatic, possibly sacrificial, changes.
Tools like FP can be useful, but they don't solve problems of inherent complexity. There is no silver bullet.
I have a little hunch that any parallelism "answer" is going to be derived primarily from ever-more-grand exploitation of its strengths, not patching of its weaknesses.
We know that in a formal reasoning sense, an asynchronous system is less powerful than a synchronous one, because you can implement any async design in a sync system, but not the other way around. Yet in practical use, asynchronous, "hard" decoupled systems are the ones that scale better and are easier to maintain. So we keep telling ourselves, "decouple everything" and have invented a gross array of patterns and techniques that add "decoupling pixie dust."
We know this, but usually don't extrapolate it to its natural conclusions: we're using brute-force engineering to attack our needs from the wrong direction - trying to break arbitrary sequential computations into parallel ones via sufficient smartness, instead of folding parallel ones back into sequential ordering where it produces wins, and breaking out the "sequential toolbox" only if the problem really taxes our abilities of algorithm design, as in the cases you have outlined.
Of course, adhering to parallel designs from the beginning is hardly a silver bullet either, but there are real wins to be had by exploring the possibility, and a good number of them can be experienced today simply by working that style into "ordinary" imperative or functional code with abstractions like actors and message-passing.
Correct me if I'm wrong, but in a formal sense you get the synchronous system inside the asynchronous one trivially. So the two are of equivalent power.
If you imagine that in the case of threading systems, you can implement a multi-threaded system with your own green threads library, which is async on top of sync. You can get sync on top of async with native threads by simply using one thread. Or, if you want to be complicated, by using multiple threads with locks that serialize to a single synchronous ordering.
> We know that in a formal reasoning sense, an asynchronous system is less powerful than a synchronous one, because you can implement any async design in a sync system, but not the other way around.
As someone else pointed out as well, I just don't see it. If components of an asynchronous system can wait on an event you can make it synchronous by driving it with deterministic synchronous events. Imagine a bunch of independent actors that always wait for a clock signal and then processing one piece of data then waiting for next and so on.
So I actually see a synchronous system a restricted case of an asynchronous system.
If you think about it, the world is inherently asynchronous. If you have 2 agents in the real world. They process and do things asynchronously, there is no global event or clock system that drives everything.
When you start talking about latency, you are inherently talking about Amdahl's Law because the question that gets begged is, "can we make amount of work X complete in less time with more processing power Y" ? And in your problem domain of video coding, esp if it is for real-time face to face communication or telematics, latency matters. So Amdahl's Law matters because it dictates the bound on latency.
Doing more work X (making the pie bigger) in about the same time as Y is http://en.wikipedia.org/wiki/Gustafson%27s_Law, latency, like the time of light it is one of those physical universal properties we won't be able to tech our way out of. Latency is inherently a serial problem. What changes when we could try every solution at the same time?
Nearly all hard problems we are solving right now don't need to have exact solution. They can be time bounded or quality bounded or both. What we lack is a clarity in being able to create algorithms that are probabilistic and progressive. Algorithms that refine and converge towards a solution over time. Better languages can help with that.
Here are some examples of probabilistic parallel minimax algorithms.
I don't think value of FP is that it solves the complexity of parallelism. Rather it is that once one has solved the problem (designed the algorithm), FP can help assure the solution is implemented correctly. This in and of itself is a large advantage.
I use functional programming because I lack the mental ability to hold so much state in my head. A friend of mine is a badass at programming in C++ using threads. I don't and will not have the ability he does. So I use FP and my programs are correct and by being functional the options to speed up my code are also easily proved correct.
Being able to handle large amounts of complexity and being ok with it is not necessarily a good thing. If I could reason like a compiler I would probably just use a flat memory model and address each byte individually.
Mutable state won't eat your children if you know how to handle it in a disciplined manner. Pure functions and immutable datatypes are definitely preferable in a variety of cases. However, just because you discover one day that there are better tools for driving screws than a hammer doesn't mean you have to start banging your nails in with a screwdriver and wearing a saffron robe and preaching to your friends that you have achieved liberation and that they won't be free until they also throw away their hammers.
The softwareverse (and he universe, for that matter) is still full of patterns and constructs that are inherently stateful, and need to be constructed such that they behave as if they were stateful. Heck, some constructs simply can't be made to work efficiently in a purely functional manner. Hash tables come to mind. Fortunately we already have a tool that's proven to be excellent for efficiently and effectively modeling state. No, it's not monads. It's state. And it does the job so well that even Haskell was forced to resort to it for its hash table implementation.
(Though admittedly not until after much sound and fury to the effect of, "Nobody needs hash tables; trees are Pure and just as good!" I'll say this for silver bullets - people sure can cling to them.)
Haskell is more about inverting the responsibility. Just code up whatever in Haskell and it might be slow, but there are a whole bunch of errors that code just can't have.
When you need the big unsafePerformIO hammer, it's there for you.
Anybody can write fantastic threaded code in any language if they think real hard about it. Haskell just makes you say, "Yes indeed i did think really hard about it. I'm not just checking this in on friday afternoon so the boss doesn't bug me."
I'm not real familiar with erlang, but IIRC it has very similar appeal to a higher power semantics.
This may or may not be useful, depending on the kind of work you do. I ,for one, am often undisciplined but occasionally very clever. Given that, the Haskell model is very enjoyable ; I've never used it to make money though. So there's that.
I don't think anyone says that mutable state should be completely eliminated; the argument is that it should (1) not be the default, and (2) be made explicit through the type system.
While you're right, I think that's kind of obscured by the language that a lot of functional evangelists use. I myself prefer FP (Lisp first, now Haskell,) but most people who are coming from an imperative programming languages tend to see Functional languages as uselessly pure. As a matter of fact, in LYAH, the author states that Haskell is totally 100% pure. Which of course, is true in a sense, but false in another, at least in that we still model and use mutable state. (It's just marked by the type system, as you mentioned, and it's done in a functional way and usually heavily sugared.) In Haskell, I'm disregarding the IO monad.
The argument is that while a good programmer can manage and perhaps reason about mutable state in a single threaded program, this becomes impossible when concurrency is added to the mix.
I'd argue the real problem here is that too many people are taught to rely on shared mutable state in concurrency. Mutable state and concurrency mix very easily as long as you stick with something like message passing for your cross-thread communication. Erlang stands out as a stellar example of how successful that approach can be, but it can just as easily be done in C. Event-driven and client/server programming, as close cousins of the Erlang model, are concurrent even when they aren't parallel, and have given us a couple decades' worth of proof that good programmers don't really have a problem with it, and even mediocre programmers can manage it. The point where everything really gets hairy when the word 'lock' gets introduced into the mix.
If we take, for the sake of argument, that the primary motivation for writing parallel code is some version of scalability or performance, then I don't think we should be so hasty to jettison mutability. After all, concurrency is only a means to an end. The real goal is performance, and it's generally much easier, if not more possible, to write performant code if you allow yourself some mutable state, however judiciously it is used. The hash table example from my earlier post is an elephant in the room here: Hash tables can give near constant-time performance. The closest anyone has ever gotten with a purely functional equivalent is O(log n) plus a healthy dose of cache-unfriendliness to boot. If that must be the cost of pervasive parallelism, perhaps we should be reconsidering how pervasive parallelism really needs to be.
The problem with message-passing approaches is that you have to keep multiple copies of your data structures in sync and this can be just as tricky as handling mutable data. The kind of functional data structures you find in languages like Clojure or Haskell can simplify these issues a lot
While I agree with some things the author says, his conclusions are bullshit.
Very few processes want to be parallel. People who primarily do parallel programming think everything needs to be parallel, so they assume everyone must be going through the pain they are going through. This is false. Most people write sequential software.
Another issue is that purely functional programming is fundamentally flawed, because it attempts to eschew state. Many functional programmers will call this a virtue, but the world is stateful, and being able to reason about necessary state is how you do anything useful. Purely functional programming is great when you want a convenient test-bed for ideas, but Haskell as a practical programming language is a horrible idea. Ultimately, what we want is to eliminate unnecessary state, while reasoning about necessary state. Modern programming language researchers are doing this under the guise of "effects systems", however most of the research I've seen far has left me underwhelmed.
Eventually functional programming researchers are going to end up somewhere that is quite far from where they started, but just a few small steps from logic programming.
We actually don't know that the world is stateful. Quantum physics would tend to suggest that the world is a waveform function that collapses on a discrete value when evaluated. If you think about it the current 'state' of the world should entirely depend on how the previous functions collapsed to produce the current 'state'.
Perhaps like in physics there is a sort of Church-Turing duality to it all that is hard to tease apart.
The reality is stateful on a macro-scale. I find the idea that the universe is actually stateful, and we do not completely understand behavior at quantum level far more parsimonious than the converse.
The double slit experiment confirms the wavelike nature of the universe at macro scale. A completely stateful universe where a photon discretely moves through spacetime could not exhibit self interference, only if the photon takes all paths (suggesting 'functions' that describe its movement) could it then collapse into a discrete position after being observed (aka. evaluated)
What I'm getting at is that in between emission and observation the photon is not in any one state, at best it's state could be described as a superposition.
To take the analogy one step too far, perhaps worst of all this suggests a lazily evaluated haskellian universe with a giant state monad whose NUMA architecture updates at the speed of light.
I hesitate to admit photons into the macro realm, considering they are elementary particles and essentially massless.
Given the de Broglie hypothesis, I must of course admit macroscopic objects do have wavelengths, and thus macroscopic uncertainty (your statelessness) is a nonzero vlue. For all intents and purposes though, the denominator of the equation causes the wavelength to go very close to 0, as the mass value dwarfs the Planck constant and velocity is always non-zero due to brownian motion.
Wave function collapse is interesting and I must admit when it is not induced by momentum, I don't understand it very well.
Maybe, but IMO it's often that the abstractions used to build software suck at it. Consider for example UI programming, the UI thread and event loop are hacks done because of lack of tools, performance with the current ones, and legacy environment/design. There is no reason why every event cannot be executed parallel, manipulated as push collections and have event handlers that perform application state transitions as transactions.
>and being able to reason about necessary state is how you do anything useful
I agree, and two parts need to be stressed in this, being able to reason and necessary state. Immutability/values make reasoning part transparent and necessary state is isolated in separate abstractions that again simplify reasoning about them.
I don't know how this applies to Haskell, but from playing with Clojure I can definitely see the advantages of working with values, even with single threaded programming reasoning is much simpler, and state is isolated in to Var/Ref/Atom/Agent primitives that have clear usage patterns and clean semantics (no gotchas with compiler reordering some instruction for lockless programming, no worry about lock acquisition order, etc.)
In my opinion, there isn't really a palpable benefit to be gained from totally re-engineering GUI event handling. Processes that are so CPU intensive they cause the GUI to lag are uncommon (and easily dealt with when they do occur). Even if a re-engineering of the GUI event architecture did occur, that would be a systems level issue, and most developers could remain oblivious.
I like Clojure, I think Rich did a great job with the data structures. My only gripe is that it would be a lot more elegant if the notion of transients went away completely. I think transactions with automatic commit and branching on reference creation would be much nicer (and faster). Still a very well engineered language by and large.
>there isn't really a palpable benefit to be gained from totally re-engineering GUI event handling
Sure but you do agree that the event loop/thread lock is a cumbersome hack due to the system constraints. If the system was based on clojure/functional concepts (with lw threads) it would be natural to implement it that way. And it would hardly be transparent, you deal with event loops and UI thread lock in explicitly.
My experience in user interface coding (though not really my favorite area) has been that event handling is pretty much code and ignore, until you want to do some big nasty thing, at which point you have to implement a callable/runnable interface (in java), queue things up and do some synchronization. This seems to be less of an issue in the browser, though I haven't looked at the implementation details.
I do agree that having some kind of multi-threaded greenlet/fiber dispatcher would be more efficient and handle corner cases better. I just don't think your average developer would notice a huge difference.
Do you have any proof of this? Because all you need to do to turn that sequential software into parallel software is run it in a web server. In fact, with emphasis on UI responsiveness on the frontend and scalability on the backend, I would be willing to bet the majority of programmers are writing parallel software. But I'm willing to let you prove me wrong with data.
If you're talking about single process cgi/wsgi/node-style-eventloops, the composition of sequential software is still effectively sequential if the data is disjoint. If you have non-disjoint data and a single process environment, then you do have to consider shared mutable information at some point in your architecture. This is mitigated by the fact that shared mutable information in the web domain is typically handled by a database or information store of some kind designed for concurrent use. I am going to make the somewhat bold claim that if you store shared, mutable data in the web application process you are doing it wrong (and for other reasons besides just this one).
I'm sorry, but I simply have no idea what you're talking about. I'm sure it's just me being an idiot, but I'm lost whenever we talk about "composition of sequential software with disjoint data". Could you explain this to me like I'm an 8-year-old?
Sorry, I'm a math person, I use math terms out of habit :)
Your initial statement was that if you take a single threaded program and run multiple instances of it in a single process (composition of sequential software), concurrency issues appear even for programmers working on sequential processes. I wanted to clarify that this is only the case when variables that are modified as part of execution are shared between them. If no variables modified as part of program execution are shared, this is equivalent to saying the data of the programs are disjoint.
To get down to web-level nitty gritty: If you modify global or module level variables, you will run into trouble. If you keep variable that will be modified in a scope specific to the request you are currently handling, and persist objects using an information store designed for concurrency, you will be fine. Avoiding mutable global variables and using specialized data stores have been best practices in coding for a long time.
The article starts by mentioning data ownership, but then diverges after saying that didn't take hold in D and discusses just FP. Yes, FP works well for concurrency. But if you're complaining about data races, letting a thread own a piece of data and only be able to share it via channels solves the concurrency problem too. Thus imperative languages can live on if they adopt memory models in which data is owned by a single thread and cannot be shared. I notice Rust has a mechanism for sharing this kind of data by transferring ownership. That looks pretty cool to me.
Also, there are still plenty of applications where high level scripting languages like Python and Ruby work great for application logic, and where single threaded event-based systems work for handling user events. It's the data processing part that needs to be put in a safe concurrency sandbox.
Can we form a club or something? I keep saying this, but it's still a relatively unpopular idea.
Kilim (lightweight actor-like concurrency for the JVM) implements what it describes as "linear ownership transfer", to where objects are owned by a single thread at a time, eliminating any potential data races.
I am a big believer in FP (reasoning about functional code is much sounder even in the sequential case) but not in parallel programming. Most algorithms just do not parallelize that well, regardless of the programming paradigm used.
Quote Donald Knuth:
'[...] So why should I be so happy about the future that hardware vendors promise? They think a magic bullet will come along to make multicores speed up my kind of work; I think it’s a pipe dream. (No—that’s the wrong metaphor! "Pipelines" actually work for me, but threads don’t. Maybe the word I want is "bubble.")'
'I won’t be surprised at all if the whole multithreading idea turns out to be a flop, worse than the "Itanium" approach that was supposed to be so terrific—until it turned out that the wished-for compilers were basically impossible to write.'
If you look at classes of problems, sure, only a few of them are easily parallelizable.
But if you look at what many programmers are actually doing on a day-to-day basis, it turns out that a huge chunk of it is really basic, straightforward SIMD. Hence the popularity of for-each loops in modern programming languages. That kind of stuff is often quite trivially parallelizable. Microsoft demonstrated that nicely with the introduction of, e.g., Parallel.ForEach() in .NET 4.
There's also something to be said for tasks which are naturally. . . let's say concurrent. ETL comes to mind - if you do ETL using purely sequential code, you're going to end up with software that's immensely slow compared to code that has enough basic time management skills to realize it could probably be doing something more productive than twiddling its thumbs while waiting on the disk controller.
That's true, but for those tasks, it's not clear you need to abandon imperative languages at all. In cases where you're just doing the same operation on all elements in an array, with no data dependency, you can do that in C without fiddling with pthreads by using OpenMP's "parallel for" construct.
FP promises to make it less error-prone and practical to do more complex parallelization than these trivial kinds of parallelization, but the skeptical take (which I partially share) wonders whether that is a big win. In other words, are the trivial parallelizations just the 10% tip of the iceberg, and there are 90% wins left from more complex parallel programming (in which case perhaps FP is what lets us unlock that future), or is trivially parallelizable stuff more like 90% of the parallelization win?
With the question posed that way, I think I fall on yet another skeptics' side. Regardless of whether it's the majority of the win or not. Suppose, for the sake of argument, that the non-trivial parallelization stuff is complicated enough that it's really not feasible for folks in general to be tackling it outside safety of what's effectively being billed as a linguistic walled garden. If that's the case, then the previous sentence ended about 13 words too late. Difficult, error-prone tasks like that should really be handled by a library or run-time.
This is a cute idea, but your computer is almost never CPU pegged. It spends most of its time waiting for input from you. For 95% of software, faster execution (through parallelism or whatever) would have no effect.
But don't take my word for it. Look at all the applications you have open right now and check if a magical 1e100Hz processor would make any meaningful difference. For me: iTunes: no. Vim: no. Chrome with hackernews: no. VLC: no. Terminal: no.
"We need to rethink software design to make our programs run faster" – Hardly. We're actually seeing the opposite trend. People are caring less about CPU performance now than they used to. One of the effects is that real software is being written in slow languages like Ruby and Coffeescript, performance be damned.
There are obviously exceptions (games and scientific simulations). But for the other 95% of code out there, we simply don't need the parallelism of functional languages to write good software.
I think it's stretching it to say that 95% of programs won't peg your CPU. Ok, maybe 95% of all programs written for an actual desktop/laptop that aren't games are. But we live in a more complex world where mobile devices are slow, but have multiple cores. And what about games? Game developers are always trying to squeeze that last little bit of performance out.
And we haven't even gotten into server-side software, nor have we discussed concurrency due to I/O-boundedness.
On mobile devices, you can already be a bit sloppy with your code and your UI will still be responsive. The fact that Corona and Phonegap are usable is proof of that. The 'comfort margin' of mobile CPUs will only increase over the next few years.
You certainly don't need functional languages to write efficient server-side code for I/O-bound applications. People have solved the C10k problem for every imperative language I know.
Ryah (the nodejs guy) makes a pretty strong case that functional programming's parallelism doesn't help you on the server side anyway, because you'll need to scale across multiple machines. Multiple machines mean multiple processes. If you're doing that anyway, you may as well avoid threads entirely and parallelize by spawning a few processes on each server.
Games are the big exception, though I find it curious that I've never heard of any published game (big or small) that was written in a purely functional language. If functional programming really does make concurrent code easier to write, I'd expect at least a couple indie computer game developers to be using them, somewhere!
Anyway, despite how visible game development is, it makes up a small percentage of the computer industry. That we will experience 'The Downfall of Imperative Programming' because some day, I hope game developers use functional languages to make their games run faster is a bit of a stretch.
The ability of functional programming languages to do parallelization is really cool, but you're not solving a problem that many programmers actually have. Its like selling a car by saying it has lots of leg room for tall people. Its a great pitch to the right customer, but don't expect a revolution.
This is only true because all of the heavy lifting in those apps is done in C/C++ code that is a real pain in the ass to get right. I'd love to be able to do my low-level DSP work in a higher level language than C++.
This is a rather salient point. Relatively few tasks, even at the enterprise level, are actually CPU bound. Memory & I/O are pushed to their limits regularly, however.
I think the consideration (not addressed by the article), is that high power highly-multi-core systems have the potential to change the kind of computing experience we have, and harnessing the power of these systems effectively with the imperative tools available right now is really hard compared to using pure fp.
Hunch: as hardware (multi-core) improves, we will not achieve a sudden breakthrough that enables us to run conventional software approaches on it (functional, imperative or otherwise). Clever people have been anticipating and investigating multi-core for decades, without result, apart from the "embarrassingly parallelizable". In later decades, actual multi-core hardware was adopted in leading-edge then mainstream applications: in supercomputers; in server farms (esp google); in GPUs. Mainstream desktop CPUs have been multi-core for a decade now; and, today, as the article indicates, even phones are multi-core. Phones.
In the big picture, multi-core instead will slowly become good enough to solve problems suited to it, that couldn't be solved easily or cheaply before, such as massive simulation in non-government/military applications, and including problems that we did not even see as "problems" before, they seemed so insoluble.
It will be a new golden age of computation, with utterly different concepts, approaches and values from today. Our current issues of programming languages will disappear, pushed down to a lower level, and compared to it, imperative and functional programming will be as twins.
While I do agree with the author that FP has in-arguable benefits when it comes to optimizing for multithreading, I'm not sure I buy the logic pronouncing the full-fledged demise of imperative languages.
The notion that dev shops will switch to fully-functional languages because a junior dev is bound to muck up the code base seems a little far fetched. That's a huge price to pay for what amounts to a warm fuzzy feeling of immutable certainty (pun intended).
I don't think it's just a matter of a junior programmer messing things up. To really maintain concurrent code, you have to fully grasp a lot of pieces in your head, like what's the correct locking order etc. This is huge mental debt that needs to be transferred between teammates. And if there's no testing for it, errors creep in. The largest source of heart burn in my career has been just this.
That said, I agree with you that imperative languages will be around for a long time to come.
Great article, but does it answer the question it raised: Why has software evolution lagged behind?
Also, is the focus on parallel computing the only litmus to judge this as the downfall of every imperative programming language?
To answer the first question, we have to abandon our technical computing hats in favour of the philosopher's hat.
What went wrong with the promise of reusable objects? Whatever happened to 4GL? Why are frameworks confined to isolated corners of language evolution, far-removed from the domain problems faced by end-users?
Why, oh is every nerd developing an Order-Entry system from scratch? Why are nerds doing the same code again and again for decades in every conceivable new language (now FP).
You see, if we start accepting structural restrictions like those imposed by FP, then philosophically we should follow that maxim to its logical conclusion. Restrict to the point where we do not harm ourselves (by repeating).
I think this article is a really bad introduction, both to parallel and functional programming. There is just so much stuff that is badly informed, or simply wrong, e.g:
> For a functional programmer there is no barrier to concurrency, parallelism, or GPU programming.
Amdahls law? And GPUs, while having branch support recently, still only perform great for problems that are highly data parallel and are not in high need for branching.
> I may surprise you that the state of the art in parallel programming is OpenMP and OpenCL.
This did in fact surprise me a lot! Erlang? Well, if you think about it, a webserver renders webpages in parallel. If that is not state of the art and parallel, then what is?
While I am very interested both in parallel and functional programming, I am actually disappointed this article has made it onto the frontpage of HN...
I guess I am kind of confused about the title because I was under the impression Occam counts as imperative. Also, if we are talking general parallelization strategies then Grand Central Station is something to look at on the Mac.
Thanks, I keep doing that with GCD and calling the "First Responder" First Listener. I am glad they ported it to iOS. It really helps to be able to use the same programming structure on both platforms.
I still think one of the agent oriented programming variants might not be a bad way to build parallel programs.
> In fact the most popular language for parallel and distributed programming is Erlang — a functional language.
Erlang achieves its parallelism not through immutability (though it has that) but through a shared-nothing architecture, like smalltalk. You can implement shared-nothing concurrency imperatively. It circumvents the problem of two threads accessing the same memory location by restricting access to only one thread. Of course, then there's the problem when you do need to share data between threads - Erlang does this with STM (software transactional memory - like a DB).
With the niceties of C++11, it's certainly become a lot less painful; lambdas, auto (both variable and return type), variadic templates and r-value references give the language just enough flexibility to allow functional style with far more natural syntax. To be sure, it's not a perfect setup; type traits (e.g., enable_if) in function signatures are ugly and hardly a replacement for concepts.
I've been dabbling in the creation of a functional library that provides a modest subset of Prelude-inspired features from Haskell. https://github.com/jdduke/fpcpp Many of the algorithms are simply wrappers around stl algorithms, hiding some of the syntactic awkwardness attendant with the heavy use of iterators. Some day I'll get around to playing with monadic devices, and integrating the list comprehension work that I've done recently.
That's where it gets tricky. It's hard enough dealing with your own discipline ironing, imagine a dev team composed of varied expertise and awareness levels.
imagine the dev team who takes over the project after you leave. Iron discipline doesn't exist over long periods of time, that's why it needs to be built into the language.
Should we all take a serious look at functional programming? Absolutely. Should we start learning a functional programming language? Maybe not. C++ is a multi-paradigm language in which one can write functional programs, and it lends itself nicely to switching back to stateful imperative programming when needed.
You can not "switch" to an imperative style in C++. You are always in imperative land by default in C++. You can "switch" to a functional island in the middle of C++, but it will forever be a foreign body in the greater C++ program, and a mere moment's inattention will quickly re-introduce imperative side-effects into your purely functional code. Or you'll use some library that re-introduces it, or somebody else on your team will use your putatively functional higher-order function to do something imperative.
C++ is a particularly awful language for trying to manage side-effects in or maintain a purely functional constraint on your code, because it offers so very, very many ways of performing side effects without meaning to, and only the thinnest possible compiler support for maintaining those constraints in the face of temptation (const). To be sure "a = b.c(d, e)" is a purely functional statement requires understanding arbitrary amounts of code to be sure that none of reams of code that can invoke accidentally violates your constraints.
I've only been using C++ for a few months, but it's my opinion that all function parameters should have been "const &" by default (except for maybe primitive types). Const correctness has helped me spot some nasty bugs, and it's too easy to accidentally forget the "&" and pass horrendously large structures by value, which must go through a copy constructor and destructor with every call. Plus, it gets really tedious typing "const Mat &src" a million times (guess which library I'm using, lol).
That's simply not the case.. In general, the "no penalty" holds true for primitive data types, and when passing an r-value instance of a class that has a move copy constructor defined. In the case where your object is an expensively constructed l-value, it is preferable to pass by const reference rather than value (assuming you plan to use the object later... otherwise you could indeed pass using std::move semantics).
In C++11 you are supposed to use move semantics in your own classes. I'm not 100% sure but I think the standard require that the STL use move semantics.
Actually I think C++11 is a simpler language than C++03 or C++98. The problem is that we need to un-learn what we know and learn the new way of doing things.
C++11 is simpler for the casual user of classes but I'm not so sure it's easier if you're implementing a lot of classes yourself. I disable assignment and copying in all my classes by default unless I have a really good reason to implement them and I still pass by const & for types I define myself.
But parallelism is harder than that. It's an algorithm problem, a design problem, not a language or code problem. While OpenCL might be harder to write than plain C, for anything except the most embarrassingly parallel problems, that difficulty pales in comparison to making the solution parallel to begin with.
For every problem where you can simply split into masses of shared-nothing tasks, there's a dozen others where you can't. Rate-constrained video compression. Minimax AI search. Emulation. All of these can be parallelized, but it requires parallelizing the algorithm and making sacrifices that are light-years beyond what a dumb compiler that isn't even allowed to change program output (let alone rewrite half the program) could do.
Modifying an algorithm -- possibly even changing its structure entirely, changing its output, or even accepting nondeterminism -- is inherent complexity, to use the terminology of Mythical Man Month. No programming language or tool can eliminate this complexity. Good tools can make it easier to implement an algorithm efficiently, but they really can't take a complicated application and figure out how to change its design, structure, and behavior. Until we have AI-complete compilers that take program specs as "code" to be compiled, that's a human job.