Since most operations in the IMVU client are bound on disk or the network, we implemented an asynchronous programming model based on futures and coroutines (Python generators), which we've recently extended with queues, actors, and more explicit cancellation functionality. This system has grown intuitive enough that new engineers can grasp it fairly quickly, and there are very few gotchas or common mistakes.
In our network-connected, cpu-is-millions-of-times-faster-than-disk world, asynchronous programming is really important for building a responsive desktop application.
Just to elaborate on what you are saying, I'll write a pythonic version of the handleBlogPostRequest described in the article. This example is a line by line copy of the one in the post, no real library is used.
def handleBlogPostRequest(request, response, postSlug):
db = yield DBClient()
post = yield getBlogPost(postSlug)
comments = yield db.getComments(post.id)
html = template.render('blog/post.html',
{'post': post, 'comments': comments});
response.write(html)
response.close()
The 'yield' command turns the result of handleBlogPostRequest into a generator. When you "yield getBLogPost(...)", the generator returns a request to the database for a blogpost. When it is ready, the framework passes it back into the generator and assigns the result to post.
If we look at Django or Ruby on Rails, arguably the two most promising new web application frameworks to emerge in the past few years, both of them are written in such a way that synchronous programming is assumed.
Well, yes, but really no. Within a single process, everything is synchronous. Deployments in the real world are asynchronous via use of multiple processes, shared nothing architecture, and a few forms of interprocess communication. (The database is pretty common, though I guess that means shared-something. Let MySQL programmers deal with the semaphores. I can't even spell semaphore and I've got customers to satisfy.)
In real deployment scenarios, you can achieve I-can't-believe-its-not-asynchronous-processing by layering the same programming techniques and battle-tested, well-supported libraries you already use.
One hacky solution which works well in practice: give the browser as much as you can, have it AJAX poll for anything you don't have ready. In practice, since page rendering is Very Effing Slow relative to many other things you want to do, this increases user perception of speed considerably.
For example, hypothetically say you want to do a word count on 47 blogs and display that to the user (I don't know why you would want to do this, but it looks like his example). No problem. Check your cache for any of the blogs you've already calculated and spit that right into the response HTML. Have the browser poll you for the rest of them via AJAX. Meanwhile, you kick off the HTTP access and subsequent word count operation to any arbitrary number of worker processes, and after you're done with them, write them to the cache (or db, etc).
At the point where you "kick off the http access and subsequent word count operation", you are performing an asynchronous operation. The fact that you are using os processes to handle it instead of language primitives does not change the synchrony.
But you are using nice chunky grained blocks of asynchronous-ity (which makes it easier) rather than spreading it evenly throughout your program (often quite hard).
It Depends. I wasn't trying to lobby for or against different granularities of asynchrony or where you chose to branch the execution path, but was merely trying to point out that asynchrony is still involved.
One of the more confusing things that I've experienced recently is that some programmers get a little fuzzy when you ask if an HTTP request is synchronous. That is, is the body of an HTTP response considered synchronous communication with the client. In my view of the system, HTTP (sans keep-alive) supports a single synchronous message and response, whereas some people try to assert that the response happens after the completion of the request is sent, so the communication is actually asynchronous.
From that point of view, I can see how the terminology is vague. However, if we are talking about program control flow and say that blocking on a function call is synchronous and various other forms of deferred evaluation are asynchronous, then the HTTP lifecycle starts to fit the "synchronous" model more closely (note that I'm not talking about http client libraries here.)
Anyway, this does lead into one of my favorite interview questions: "Is JavaScript multithreaded?" Many people get hung up on AJAX and setTimeout... but these are the same people that don't really get the tech at a fundamental level.
On an initial page load, the user tends to be blocking on the page response - the user is forced to treat it as a synchronous operation, though they can task switch to other pages.
On the server side, the merit of asynchronous processing depends on the goal. I'm quite happy to spin up multiple processes and have each process synchronous process a single request on the server - my time is too valuable to debug the alternatives at the moment.
I'm quite happy to admit that my HTTP requests are processed synchronously by Apache, mod_python and django. Even the out-of-band requests from ajax also tend to be synchronously processed. On my power dashboard, I handle ongoing comms using Twisted and totally asynchronously - pushing data to the user through Orbited. This gives our system an extremely fast response rate.
Event loop driven code is the key to decent firmware performance. It is how we ensured that the responsiveness our our hardware solutions in the past (such as the Kiddie Kart) was very good.
As an interesting counterpoint, Paul Tyma has a really excellent presentation on how NIO in Java actually allows considerably less throughput than thread-per-connection and has a much more complicated programming model:
I've written a fair amount of asynchronous network code in my time, and it really is more difficult to write and debug - developers are generally used to being able to see a linear flow in their code. Event driven code in general IMO is much harder to follow and reason about than linear code. In my current project we have a fully asynchronous multiplexing network component - I'm about to replace it with a simpler, synchronous model because both the client code and the code of the component itself will be much simpler. Even if I take a performance hit the easier maintenance and fewer bugs will be worth it.
The annoying thing to me, as a guy who works on the compiler side of things, is that transformation of linear synchronous code to continuation-passing style (CPS) suitable for asynchronous code can be done mechanically by the compiler. There isn't a technical reason why it needs to be as hard as it is; lots of details to get right, especially when inter-operating with stack-oriented languages, etc., but in principle it's very possible.
The CLR (.NET Runtime) kinda does this. The host can substitute threading and I/O implementations. Msft's SQL Server actually does this; when you build C# stored procedures they're actually running on NT Fibers (green threads) with async I/O.
how would you manage to figure out the order of execution for an arbitrary set of events firing when the program is defined as a direct linear series of operations? surely the paradigm is so vastly different that getting a compiler to figure it out would be impossible?
There is data tying the whole program together: the continuation, which is a closure.
Assuming a bastardised version of Python with C#-style lambdas, so that () => {} is a function taking no arguments and that has no statements in its body, and result => {} is a function taking a single argument, etc:
Direct form:
def A(x):
x = B(x)
C(x)
def B(x):
factor = SyncCall()
return x * factor
CPS form:
def A(x,continuation):
B(x, result => {
x = result
C(x,continuation)
})
def B(x,continuation):
AsyncCall(result => {
factor = result
continuation(x * factor)
})
Things are actually more complicated than they look here, you may need more than one continuation for things like exceptions etc., but hopefully you get the gist.
Key things in CPS:
* Method calls aren't actually method calls; they are jumps with arguments. There is no call stack - the return location is itself passed to the method you're calling
* This means that the stack is implied in the variables that the continuation closures capture. So, the caller of A, where it has to return to when it's done, is stored in the "continuation" argument; and that location needs to be captured such that it is part of the state of the closure passed to B, so that it can ultimately be passed to C.
* Since the stack is stored implicitly in the closure, it is simple for the implementation of AsyncCall to initiate the I/O, stuff the return location (the last argument, "continuation" I've called it above) away such that it can be retrieved when the I/O completes, and then go off and do some other productive work - such as check for completed I/Os and call their continuations.
If you have green threads, the state for the continuation is implicitly part of the stack, only with green threads, you own the data that forms the stack, and can reliably do interesting things with it. The stack itself is a kind of argument that gets implicitly passed around in imperative programs, e.g. it is passed in the ESP register on x86. The top of the stack contains the continuation pointer; and 'RET' is the jump instruction which jumps to the continuation. But using the stack for an implied kind of CPS limits some of the other things you can do with continuations, as it enforces a contiguous linear flow, rather than what you can do with call/cc etc. when the closure data that's part of the continuation is stored on the heap instead.
Twisted's @inlineCallbacks decorator basically does this for you; whenever you yield a deferred (Twisted's representation of an ongoing operation), the rest of the generator is registered as the callback for when the operation is finished. Your example becomes:
Yes; some people have also been (ab)using C# iterators to try and make implementing this easier. But to bang on the same drum again, it shouldn't be necessary for the programmer to fiddle with all the methods on the call chain between initiating request processing and the point of any I/O (which may be deep in e.g. database-handling logic, for async DB I/O). The compiler or runtime ought to be able to help out.
Whether or not a function is atomic is part of its contract; forcing code changes up the call stack when a function's contract changes is perfectly reasonable.
In twisted, all code in normal functions is executed atomically, and code in @inlineCallbacks functions is only interrupted at yield statements. If any random function could release control to the scheduler without notice to its callers, certain classes of bug would require digging through all of the callee's code to figure out what's going on. I rely on that atomicity far more often than I change a function from nonblocking to blocking, so in my case, the tradeoff is a favorable one.
Also, if you don't care about the response, there's no fiddling necessary. Just call the function and drop the result; the operation will still happen at the right time, and the result will get dropped on the floor, just as you expected.
[EDIT: changed "blocking" to "atomic", since that's what I'm really talking about.]
You can define your coding standards such that blocking is part of the contract, but I disagree that there is a natural need to make blocking part of the contract.
For example, consider switching between a non-blocking in-memory DB (perhaps flushed in a background thread) and communicating with a database across the network. Why would a module communicating with such a DB need to advertise in its contract which DB it is communicating with?
The point being, a context switch can still occur at any moment, so it's not you get the kind of atomicity you seem to be aiming for in general, unless you are operating in a shared-nothing single-threaded environment.
It doesn't need to advertise which DB it's communicating with. It does need to advertise whether or not it operates atomically. There's no reason why the in-memory db can't advertise itself as non-atomic, in which case changing to the network communication shouldn't break anything. Similarly, the network DB can operate synchronously and hold control of the program counter during the entire operation.
Saying that all functions are non-atomic is certainly a reasonable policy, and one that is used in many situations. Personally, I find it more convenient to program in an environment that does have this distinction. Apparently, your tastes are different than mine.
In a multithreaded world, synchronous code storing its state on the stack, and asynchronous code derived from a CPS transform of direct style (thus storing its state in continuations), seem equivalent for the purposes of "atomicity"; since the OS scheduler can come in and reschedule the synchronous thread at any point (and will definitely context switch if the thread blocks), it seems no different to the CPS transformed case (OS may choose to context switch at any time).
Things like mutex locks do work differently in the CPS case, but it's rather that they are asynchronous too: blocking on a mutex means handing the continuation off to the mutex to continue when it's available to be acquired. And things like thread-local storage work differently - the logical thread of control may change physical threads in the CPS async case, but a runtime or compiler (which will be aware of the CPS transformation) can abstract this away.
I mean that no outside changes will be made to the environment during the execution of the atomic block of code, and that any intermediate state of the environment during the atomic block of code will not be seen by any code outside the block. In this case "the environment" is the process's memory space.
Because I can do so much without having to worry about other threads of execution seeing my internal state, I rarely have to bother with locks at all, even when dealing with globals. I'm not using threads at all; if I need to take advantage of multiple CPUs, I run multiple distinct processes.
What do you mean, "outside changes"? Another thread could come in and make changes inside the same process's memory space - unless you are specifically talking about a single-threaded design.
From the "inside", a thread of execution, and a CPS-transformed chain of continuation calls (originally written in direct style), "feel" exactly the same - precisely because there is a mechanical translation from one to the other with no loss in semantics.
Here's what I suspect you're trying to get at: you're coming from a single-threaded perspective, where Twisted's partially-automated CPS transform is in effect emulating green threads under a cooperative scheduling model, where the "yield" represents yielding control of the current "thread" to potentially other "threads". Your concern over fully automating the transform is because it may introduce further cooperative threading "yield"s that you cannot control.
My point is that under a preemptively threaded environment, any thread could at any time hijack the CPU and mutate your state - and this is no different whether the code is synchronous or CPS-transformed. A "yield" may occur randomly at any point in your code. Hence, the requirement to annotate methods to permit CPS transformation on preemptively multithreaded direct code is a chore; the extra information in the contract doesn't have semantic value, in the same way as it does for the cooperatively-scheduled Twisted approach.
you're coming from a single-threaded perspective, where Twisted's partially-automated CPS transform is in effect emulating green threads under a cooperative scheduling model, where the "yield" represents yielding control of the current "thread" to potentially other "threads". Your concern over fully automating the transform is because it may introduce further cooperative threading "yield"s that you cannot control.
Yes, that's exactly the case. To scale to multiple CPUs, I use distinct processes, not threads. Shared memory and preemptive execution require a lot of programming effort to play nicely together that I don't care to expend.
The GIL in python means that threads generally won't ever let you use more than a single CPU anyway.
Scheme is implemented via a CPS transform, so a server built around this approach ought to be quite possible on a good Scheme implementation, or a Lisp implementation that supports the moral equivalent of call/cc.
Functional languages are also frequently implemented using CPS as an intermediate form.
Smalltalk also supports continuations, as I understand it; the Seaside framework for Smalltalk is a server implementation oriented towards continuations, though I think they use it for picking up where you left off in between multiple request/responses, not necessarily for I/O optimization. I could be mistaken.
F# has a feature it calls async workflows; I haven't delved into it much, but it looks like it's trying to make writing async code as close to direct style code as possible. But the full benefit comes from the compiler being able to rewrite all functions on the call chain between you and the function that performs the I/O, so I suspect it may be limited. Ideally, the CLR itself would support CPS transformation into its Begin/End-style async support, like I advocate here:
Just as a minor addition to what barrkel says below, there's a good explanation of CPS transformation, and its use in compiling, in the first edition of EOPL.
I've heard the second and third editions had some of that material cut (too advanced?) - I don't know if that's true, but the first edition is typically much cheaper. FWIW.
I would submit that the reason we still don't do async very well is because callbacks are just not a very good technique for some reason.
If you look at Erlang (which the article mentions), I don't feel like I ever do the kind of Async programming he talks about here. I break the work flow into a pipeline of processes that do a specific job. I never have a process send a message to another process and waits for the answer. Every process receives some message, does it's job and then forwards the resulting message to another process.
I guess I just find this style of coding easier to reason about because every part of the pipeline is completely stateless and has a straightforward job.
Wow, thanks! I hope you find good resources to draw from so it doesn't turn out disappointing to you. I would look at things that Joe Armstrong writes. Not so much erlweb and stuff like that.
However, if your async call actually happens somewhere in a library function, you have to yield also calling that function - it's not transparent for the caller whether the library function is synchronous or not.
This is kind of the point of Google's Go language (and many languages before it...). Most things are asynchronous calls hidden behind synchronous ones. Get the advantages of async with the development ease of synchronous.
If you have green threads, and you wrap calls that may block such that they redirect to your own scheduler, you can simply save the green thread context, initiate an async call, and when the async call completes you can restore the thread and reschedule it.
Your scheduler may be as simple as a bunch of threads using work-stealing queues to grab jobs; a job is scheduled by adding it to a queue.
I think quite a bit of it is to do with lazy programmers.
With asynchronous programming, it's fairly constant. You have to think about things as you go, and once you get the hang of it, there are little surprises.
With synchronous programming you have the illusion that it is easier - fork() ! magic!. So you can get a lot done quickly. But then you have the realization that as things get more complex, you'll have to deal with locking,concurrency,scaling issues, etc. But by that time it may be too late to change architecture.
It's probably best though for programmers to work on a big synchronous system, realize just how much code you're writing to cope with the fact you asked the OS to execute your code in random order, and then hopefully realize that telling the OS to execute your code in an explicit order is probably a better idea.
>> If we look at Django or Ruby on Rails, arguably the two most promising new web application frameworks to emerge in the past few years
I'd also disagree with that. They're probably the most hyped up, but that's something entirely different. Once again, likely lazy programmers who want a framework for this, a framework for that. How about just make your own framework.
I'm still not sure I follow. Where does the "ton of code" come in? The boilerplate code around spawning new threads? The painfulness of that task varies widely amongst OS-level threading primitives, user-level thread libraries, and other programming models that let you do tasks with dependences like Cilk http://supertech.csail.mit.edu/cilk/ or TBB http://www.threadingbuildingblocks.org/ .
Personally, if I have a DAG of tasks to do, I prefer to encode that DAG in the structure of my code by breaking independent chains into separate tasks/threads/whatever and feeding that to some piece of software to do the scheduling for me, rather than roll my own scheduling in a potentially bug-prone and unscalable way. For the example shown, either is perfectly fine, but if you have hundreds of outstanding jobs with a more complex dependence structure, I'd rather not deal with scheduling myself.
without giving the game away of course, could you possibly explain the kind of features yours has that others are lacking? I'm working on 2 different web frameworks currently, 1 a long standing one and another "start from scratch" project. I'd love to include the meat of the real needed features the current popular frameworks are missing.
I use Twisted Python, and disagree that async is too hard. Lets sprinkle a bit of syntactic sugar over Twisted and consider this example:
@defer.inlineCallbacks # Python2.5 and up
def login_and_do_something(args):
server = yield factory.login(username, password)
rval = yield server.callRemote(args))
print rval
It looks fairly straightforwards and isn't the complete change to coding style that has mortal programmers scrambling for the hills. Doing the same without inline callbacks, I must admit, is quite a bit more to wrap your head around.
I'm not sure benchmarks of nginx v. Apache are enough to definitively settle that asynchronous programming is better than synchronous programming. This USENIX paper argues the opposite, though it admits there are some problems with current thread-based solutions: http://www.usenix.org/events/hotos03/tech/full_papers/vonbeh...
No one has created a web server that's anywhere near competing with nginx (or similar) on performance/efficiency. I think that counts as proof that in practice the argument is settled. Theoretically I'm sure it's not settled.
The post complains about the inadequacy of current language support for asynchronous programming, though, while the linked paper argues that the real problem is the inadequacy of current language support for synchronous programming.
Admittedly, they're talking about different kinds of "support": it's harder to do asynchronous programming in a lot of languages, but it's even harder than that to do synchronous programming efficiently, because the toolchains don't support the optimizations that would be necessary (but which you can do by hand in an asynchronous programming style).
Perhaps what is wanted is an easier/better threading model? If I read from a file, my code to blocks while the operating system reads the file. If I want to do something else, I can create a new thread which does the reading while the original thread does something else.
Unfortunately, our current model has threading a bit like a bull in a China shop as it creates a whole host of concurrency problems in anything more complex than a trivial example. The 'select' call is a reasonable half-measure, as your 'read' call is non-blocking when you get to it.
I question the threading model, as it definitely has problems, but in terms of having multiple tasks, and an event loop, and scheduling those tasks, the operating system already has that, should we really re-implement it?
The OS should definitely supply some kind of user-side task scheduling framework.
Ideally your task scheduling framework will have as many threads as the machine has processors, and will use e.g. work-stealing queues to dispatch work without needing a kernel transition that a thread or process context switch would require. But if every running process implements its own task scheduling framework, and creates as many threads as there are processors, there will be too many threads running around. The OS needs to help out with balancing the load.
I'm not sure if I'm misinterpreting this, but it looks like lazy evaluation could make his "synchronous event loop" into his "asynchronous event loop" without all the worry about callbacks. A few Lispy macros could have the same effect. I guess an "asynchronous web framework" (buzzword bingo!) written in a Lisp (or any language with macros, like Lua) would be interesting.
You're usually better off using coroutines for this sort of asynchronous programming in Lua. There's a larger overlap between coroutines and continuations than you'd think.
Also, Lua doesn't have macros, at least not in the sense that Lisp and Prolog do. It just has eval.
I wondered if you meant Metalua. That's a different dialect, like OCaml vs. SML different. Standard Lua doesn't have macros, and the packages I've seen that add them (such as Luma, http://luaforge.net/projects/luma/) do so via eval.
I haven't done that much with Metalua - I don't really feel that Lua's lacking macros is a major problem, really, though I like the author's views on metaprogramming ("Make it obvious when something interesting happens", etc.). I remember looking at it a year or so ago, but didn't get into it.
I just gave it a quick stab at porting, and I remember why - "Prerequisites - a 32 bits, little endian CPU" - I tend to switch back and forth between a couple OSs and hardware platforms, and Lua's portability is a big bonus for me. Adding a couple ML-ish features to Lua would be tempting, though.
I haven't used it, but someone has implemented the actor model of programming for Ruby. The project is called Revactor. The main website appears to be down, but I found some docs here: http://doc.revactor.org/files/README.html
Any good JavaScript resources? There isn't really a place that I can go to find a javascript-doc tarball in the same way that I can download a tarball of Perl docs or Java documentation.
Mozilla.org has some wikipages related to JavaScript, but there is no offline documentation (I was able to find a request for an offline version from a few years ago, but the response was "we don't have the resources right now, but we will at a later point in time unless you want to do it right now").
In case you don't know it, jQuery is a javascript framework for website use that's fantastic. It makes heavy use of event/async style, and so would be a great way to get used to it. I learned about async programming through enough use of jQuery in websites. Take a look at http://docs.jquery.com/Tutorials:How_jQuery_Works for a start at it.
Douglas Crockford, who's website really opened my eyes to the capabilities of Javascript about 4 years ago, recommends only one book: JavaScript: The Definitive Guide (5th Edition) by David Flanagan
Since he wrote that, he's also come out with his own book, JavaScript: The Good Parts.
Disclaimer, I have not read either book, but I highly recommend his website: http://www.crockford.com
I wouldn't call it "almost". It took a while to build and shake out the canonical supervision structures in erlang (OTP gen_fsm, gen_server etc.) as well as debugging /profiling libs like match specs.
I have seen a huge uptake of clojure, scala, haskell and F# recently (and possibly ocaml, I haven't been watching, and possibly common lisp and scheme, too)
In our network-connected, cpu-is-millions-of-times-faster-than-disk world, asynchronous programming is really important for building a responsive desktop application.
One of our engineers blogged about the task system at http://thespeedbump.livejournal.com/63798.html and http://thespeedbump.livejournal.com/64053.html
The source is available at http://imvu.svn.sourceforge.net/viewvc/imvu/imvu_open_source...