Given any long-running task, there are two natural things to do:
1. Do something while the long-running task is running
2. Do something after the long-running task has completed
Node.js' callback function convention simply makes it stupendously easy for you to write code for both cases, and leans towards making case (1) as natural as possible. Case (2) is naturally easy in Javascript (and even easier in Coffeescript) because of the way Javascript supports closures.
application = function() {
// do stuff
database_call( options, function(err, callback) { // javascript closure
// case (2) logic
} );
// case (1) logic
}
How does Tora fare on this regard? Node.js is about parallelism through forking closures, not any solution to long running CPU intensive tasks.
The weakpoints of Node.js, in my opinion (and please correct me if I'm wrong, that would make me happy), is having to write a lot of boilerplate code for error and exception handling. Exception handling is particularly onerous because depending on the function call you'll never know which stack the exception will travel up. In the example above, you (probably) can't catch an exception from case (2) by wrapping the database_call in a try-catch block. Does anyone have a good solution to this?
All-in-all, I like that Node.js tries to stay pure by enforcing the above convention. Noders who stay in "userland" can't shoot themselves in the foot by making an asynchronous procedure synchronous, which may lead to faster code overall.
To me, it seems the real weakness of node.js is having to manage asynchronous control flow yourself, at all. It grows exponentially more ugly as the system increases in size. I really like continuation-passing-style as a technique (especially as a compiler IR), but doing that stuff by hand is so 70s.
I got pretty far along writing a similar system in Lua (because, hey, event-driven systems are pretty nice), and unlike Javascript, Lua has well-implemented coroutines - they singlehandedly eliminate a LOT of the ugly control flow management. Still, most libraries are blocking* . I eventually decided that, for small systems, an event-loop framework wasn't necessary in Lua (it's easy enough to just do it from scratch!), and for larger systems, I was better off using Erlang, which has the ideal foundation for that sort of system. In particular, I'm really not clear how much error-handling node.js does; that was what convinced me that Erlang's approach is refreshingly sane. When I started writing process-supervision and hot update code in Lua, I realized I was just re-implementing what Erlang does best.
* Which is a funny objection, like being mad that most libraries default to using decimal numbers rather than octal. It must be a conspiracy!
> Still, most libraries are blocking*
> * Which is a funny objection, like being
> mad that most libraries default to using
> decimal numbers rather than octal. It must
> be a conspiracy!
Actually, there _is_ a distinction here: It is ridiculously easy to implement blocking semantics on top of non-blocking semantics - in some pseudo code:
Whereas implementing non-blocking semantics using an underlying blocking implementation is quite, but not entirely, like banging your head against the wall. repeatedly. Essentially, you cannot avoid threads, shared mutable state, and a lot of other problems.
I'm looking for a language that can do everything that Node.js does but also solves this control flow problem, but is also fast and has support for OpenCL and beefy math libraries.
I don't find Erlang to be performant in this regard, so I'm looking at Scala but I'm constantly getting pissed.at.java.packages.and.conventions. I think it also suffers from the same problems as Node.js and doesn't support the async/control-flow magic that Haskell supposedly has.
Perhaps I should consider Haskell after all. It's that or C++.
I have some early FFI bindings for OpenCL (and OpenGL, glfw) with some simple demos. Not updated recently, but it's done by using stock luajit (take latest), and not writing a binding code at all (just FFI definitions which LuaJIT understands).
You can't do callbacks (yet). That is - you can't rely on "C" function to call you back at lua land. That's why I chose glfw instead of glut or others. There is way to setup your main loop without callbacks.
Perhaps you could try python, numpy, twisted, inline callbacks and one of the python opencl packages.
Inline callbacks are syntactic sugar included recent versions of twisted that make your code look synchronous while still running on the event loop.
With python, you get to use a wealth of libraries, but you need to still be mindful of what is blocking.
Wnen something you need to call is blocking, you can try deferToThread.
Don't worry, you won't need to think too much about race conditions and other concurrency issues with threading since there is the GIL in python. Also, Twisted makes it a cinch.
Haskell (or rather GHC) offers a blocking programming model (e.g. spawn a thread per connection and make blocking reads/writes on the socket) but uses asynchronous I/O in its implementation (one thread uses epoll/kqueue/poll to do the I/O and the CPU bound threads are scheduled on a thread pool).
Maybe, in a couple of years, javascript (or any language like coffescript that compiles to javascript) will fit your needs, especially regarding the control flow problem.
>Having an officially guaranteed tail call mechanism
>makes it possible to compile control constructs like
>continuations, coroutines, threads, and actors. And,
>of course, it’s useful for compiling source languages
>with tail calls!
Of course, it is a long way to go until one is able to use these features. So, for now, you are better off to look somewhere else for your specific needs.
We developed a coroutine-based event-driven networking lib in Lua (not open sourced yet, but it looks like it might change someday). One of the neat thing is that we're able to run that on tiny ARM-based embedded hardware.
I think there are people who did similar stuff for classic PC hardware, and already open-sourced their work; you might want to look at nmap, among others.
The external API is a superset of Luasocket's, so the event loop can remain completely transparent for users if they choose so. Proper coroutine support really rocks!
Several commenters point out that long-running computations can be performed outside the main request thread without the server having to do anything special.
It's also possible to divide up long-running computations oneself. This can lead to very interesting designs.
"Several commenters point out that long-running computations can be performed outside the main request thread without the server having to do anything special."
Then why am I using Node.js? Any language has been able to do that for over a decade now, without the other hoops Node.js's style forces you to jump through. As evil and bad as shared-state multithreading truly is, this is a (or possibly "the") task it can manage without blowing up.
This is all but an admission that Node.js actually has no concurrency story at all. Hardly surprising, since it doesn't, unless "We force the programmer to do all the concurrency work" counts.
"This can lead to very interesting designs."
Yes, in the "may you live in interesting times" sense of interesting, absolutely. If you're going to reduce yourself to manually scheduling everything yourself why boot an OS at all? (Yes, that's a bit of an exaggeration, but seriously, think about it for a bit, there's truth there. Runtimes/VMs/languages ought to be adding to the OS, not fundamentally subtracting from it.)
No one's saying that IPC is unique to Node. The OP's criticism was: async i/o is fine, but what if you have some CPU-intensive work to do? Isn't it bad to let that block the whole server? Of course it is. OP's answer is a different server architecture; Nodians' answer is just forward that work to a different process and let Node keep doing the one thing it does well.
Why use Node? Isn't the reason that it lets you write server apps that don't block on i/o, in a high-level language?
From various comments over the last year or so, I gather that you're saying Erlang beats Node hands down at this. That may be true. Still, not everyone's going to use Erlang. What other alternatives are there? (Twisted, EventMachine, ...?)
I can see the attraction of Node's approach. First, it's conceptually simple. Second, yes, it shoves a bunch of things in your face and makes you deal with them - but they are precisely the things that make your program slow. Perhaps you want to deal with them. I can understand why someone would say: I want to manage my program's control flow explicitly so I know it won't block when it shouldn't; it makes some things annoying, but other things easier (at least I don't have to worry about other code interrupting mine); just don't make me write everything in C.
(At risk of being tedious, I'll add that I'm not being polemical. You, silentb and others know a lot more about it than I do. I'd like to get clearer about what the issues are. Also... I'm tempted to reply to the second part -- about writing programs that know how to divide up their computations, and whether this is greenspunning the OS -- but this is long. Maybe we should defer it.)
What other alternatives are there? (Twisted, EventMachine, ...?)
Lots, depending on what you're doing: scala actors, akka, STM in haskell and clojure, GHC (lightweight) thread manager, F# async's and MailboxProcessor; apple GCD; Microsoft Message Queuing (MSMQ),Completion Ports. ZeroMQ, rabbitMQ/AMQP
(but you should spend some time looking at erlang)
"What other alternatives are there? (Twisted, EventMachine, ...?)"
For every major high-level language, there is at least one Node-like library, and sometimes more than one (Perl has POE, Event::Lib, based on my experience the raw glib wrapper isn't half bad albeit perhaps not the fastest, but you get good access to anything else based on glib, in fact Perl has so many that there's an Any::Event wrapper to remove your dependence on the underlying event library!). My point isn't that Node.js is bad. I actually don't think it is.
My point is that the hype is bad. It's wrong to think it's bringing anything unique to the table, because it simply isn't doing anything that has not be done literally dozens of times, except it's doing it in Javascript. If you want it done in Javascript instead of Python, more power to you. I'm particularly incensed by the idea that Node.js' approach to asynchronous is the only way to do it and the number of people it has produced who it has anti-educated into thinking Haskell and Erlang and all kinds of other languages can't possibly be asynchronous because you can't see the manually-chopped-up event handlers in the code. I'm not guessing. I've met these people online. You may know better, but a lot of people don't; whether or not it was intended the hype is actually lying to people about the state of the programming world, comparing itself to the world of 1995.
I am also trying to speed up the education cycle that all of those other dozens of attempts have been through in which manually-compiled event-based programming inevitably explodes into unmaintainable complexity, and none of the dynamic languages, including Javascript, have the necessary constructs to truly contain it. Some of the dynamic languages are even more powerful than today's Javascript, such as Python with its generators (though ECMAScript is supposed to be getting those, I don't know if any browser has them yet) and it's still not enough. The structure of event-based programming demands such an explosion. Been here, done this.
You can see it already starting to poke out from under the hype, if you're watching carefully. This is going to get worse, not better (because there isn't a solution, just a variety of hacks long since tried and found to only slightly improve things at significant complexity cost themselves), and I'm actually trying to do the community a favor by deflating the balloon so it doesn't pop so hard.
(If you know Haskell, and you look at the implicit type signatures being put on things like callbacks, it becomes easy to see the problem. The clearest place to see the problem is a function that takes a callback for something, until one day you need to pass in a callback that itself has to go do something that requires a callback and suddenly you've got a big problem. The usual callback in Node.js is actually just a relatively-pure function, they are not in IO, which is done behind the scene for you. Then when you need to do something else, you've got some real problems. Solvable, yes, but at a fairly significant complexity cost, partially because any given issue can be addressed but you can't really address all of them simultaneously (simplicity, exception correctness, dealing with control flow across callbacks, etc.). Every time you write a callback or choose where to break the function up into a callback you're actually laying down far more restrictions on the code than you can easily see, but I don't go to this explanation very often because by the time you can understand it it is also borderline obvious.)
What other alternatives are there? One, use Node.js with awareness of the issues. There are places where it is fine. I just would incredibly strongly anti-recommend it if you know you're going to be continuously developing whatever you're building in it, especially your core product, rather than writing "a proxy socket server for web sockets to conventional sockets" and being done at some point. The other alternative is to actually work in a language/runtime where you don't have to manually perform all this tedious work. There's a number of them coming out and one of them is probably going to go mainstream at some point; of the current lot Go would be my best guess. Google isn't pushing it, but it's still got Google's name on it, and I don't know of anything else right now with the equivalent name power. History suggests name power is necessary for a language to crack the mainstream in anything less than 15 years. It probably requires the least adaptation to a new style of the bunch, the other advantage it has from the mainstream point of view.
I friendlily (!) request less anti-hype and more explanation of the technical issues, preferably with illustrations in code. For example, in the above post there is one point at which you come close to being specific and then back off, saying it's borderline obvious. It wasn't obvious to me.
There is one point at which I somewhat follow you. You say that the complexity introduced by callback management grows nonlinearly with program complexity. I understand this to mean that logic organized into async callbacks isn't composable (you have to write new logic to implement the composition, as opposed to just applying some operator to combine them) and isn't orthogonal (if you want to call some code that's written this way, your code also has to be written this way, and each new layer gets harder to add). It's easy to see how this could rapidly get out of control. But I'm not convinced that it must. There may be designs that nip this complexity in the bud. For example, the work done by callbacks themselves could be kept to a minimum (and preferably be standardized, i.e. when i/o is received, store the result in some standard place). In this way the callback chains always return as quickly as possible. Of course then you need some parallel strategy for managing the control flow of the program itself - some sort of state machine, perhaps.
Perhaps this is greenspunning Erlang but if so I'd like to know how.
From my experience with Node.js, your interpretation is correct. I also share your optimism for finding a general solution to this problem, but that's exactly what jerf said isn't worth the complexity once you implement it.
Even if you do manage the callback complexity issue, there is still the issue of exception handling, which jerf also explains here: http://news.ycombinator.com/item?id=2150800
That said, jerf hasn't proven that trying is not a rite of passage. Sorry for the double negative.
Not a general solution. An app-specific solution. That is, I don't want a framework; I just want a consistent simple design for an individual app written in this style. That's a big difference.
That's actually quite a painful way to write async code because the control flow is now inside the functions. That makes it very hard to perform refactoring.
Might I suggest you take a look at my preferred solution for this,
I've actually been put off Node a little because I've not seen any examples using this style of code. I had (perhaps stupidly) assumed that the scope of a closure was necessary for a lot of Node functionality; generally speaking, are all the parameters required for interacting with Node passed via the function parameters?
Can anyone suggest a better architecture than that of node.js for a server side Javascript application engine?
Perhaps still using Google V8 but maybe being more intelligent about threading/multicore, maybe using something like gearman (gearman.org) to distribute tasks, and addressing some of the criticisms of node.js but still maintaining good performance.
Also, CPS is not the only option for dumping callbacks in browsers / node.js. Another would be the Functional Reactive style. See Flapjax: http://www.flapjax-lang.org/
It's got Joose under the hood and I'm generalizing all the library functions for n-ary EventStreams and Behaviors (Reactive concepts). It's very much a work in progress and the test coverage is non-existant atm, but that's owing to the fact I'm working from an existing, working code base. As soon as I have all the core estream and behavior facilities in place, I'm planning to write some exhaustive tests that use JooseX.CPS together with the Joose3 author's Test.Run library: https://github.com/SamuraiJack/test.run
This reminds me of a very neat trick from Windows IO completion ports. When you mix blocking and nonblocking tasks, you want more threads than the number of cores, so that the CPU is fully used even if a thread blocks. But when no thread blocks, overbooking introduces context switch overhead. Windows' trick is to track threads that got work from a IO completion port and when one blocks, wake up another thread waiting on the port. This way you can have 6 threads working a IO completion port in your quad-core CPU and adapt both to mostly CPU intensive work and blocking work.
It looks like the compute situation in Node.js can be helped with a server side web workers implementation .. with the workers doing no IO and only computes. Any thoughts?
The weakpoints of Node.js, in my opinion (and please correct me if I'm wrong, that would make me happy), is having to write a lot of boilerplate code for error and exception handling. Exception handling is particularly onerous because depending on the function call you'll never know which stack the exception will travel up. In the example above, you (probably) can't catch an exception from case (2) by wrapping the database_call in a try-catch block. Does anyone have a good solution to this?
All-in-all, I like that Node.js tries to stay pure by enforcing the above convention. Noders who stay in "userland" can't shoot themselves in the foot by making an asynchronous procedure synchronous, which may lead to faster code overall.