The annoying thing to me, as a guy who works on the compiler side of things, is ...

TimothyFitz · on Feb 9, 2010

The CLR (.NET Runtime) kinda does this. The host can substitute threading and I/O implementations. Msft's SQL Server actually does this; when you build C# stored procedures they're actually running on NT Fibers (green threads) with async I/O.

etherealG · on Feb 9, 2010

how would you manage to figure out the order of execution for an arbitrary set of events firing when the program is defined as a direct linear series of operations? surely the paradigm is so vastly different that getting a compiler to figure it out would be impossible?

barrkel · on Feb 9, 2010

CPS transformation is mechanical; don't take my word for it, there's plenty of documentation:

http://www.springerlink.com/content/dk7n7mbjx7blbelj/ http://moscova.inria.fr/~xleroy/publi/cps-dargaye-leroy.pdf http://muaddibspace.blogspot.com/2009/02/cps-transformation-...

There is data tying the whole program together: the continuation, which is a closure.

Assuming a bastardised version of Python with C#-style lambdas, so that () => {} is a function taking no arguments and that has no statements in its body, and result => {} is a function taking a single argument, etc:

Direct form:

    def A(x):
        x = B(x)
        C(x)
    
    def B(x):
        factor = SyncCall()
        return x * factor

CPS form:

    def A(x,continuation):
        B(x, result => {
            x = result
            C(x,continuation)
        })
    
    def B(x,continuation):
        AsyncCall(result => {
            factor = result
            continuation(x * factor)
        })

Things are actually more complicated than they look here, you may need more than one continuation for things like exceptions etc., but hopefully you get the gist.

Key things in CPS:

* Method calls aren't actually method calls; they are jumps with arguments. There is no call stack - the return location is itself passed to the method you're calling

* This means that the stack is implied in the variables that the continuation closures capture. So, the caller of A, where it has to return to when it's done, is stored in the "continuation" argument; and that location needs to be captured such that it is part of the state of the closure passed to B, so that it can ultimately be passed to C.

* Since the stack is stored implicitly in the closure, it is simple for the implementation of AsyncCall to initiate the I/O, stuff the return location (the last argument, "continuation" I've called it above) away such that it can be retrieved when the I/O completes, and then go off and do some other productive work - such as check for completed I/Os and call their continuations.

If you have green threads, the state for the continuation is implicitly part of the stack, only with green threads, you own the data that forms the stack, and can reliably do interesting things with it. The stack itself is a kind of argument that gets implicitly passed around in imperative programs, e.g. it is passed in the ESP register on x86. The top of the stack contains the continuation pointer; and 'RET' is the jump instruction which jumps to the continuation. But using the stack for an implied kind of CPS limits some of the other things you can do with continuations, as it enforces a contiguous linear flow, rather than what you can do with call/cc etc. when the closure data that's part of the continuation is stored on the heap instead.

kd5bjo · on Feb 9, 2010

Twisted's @inlineCallbacks decorator basically does this for you; whenever you yield a deferred (Twisted's representation of an ongoing operation), the rest of the generator is registered as the callback for when the operation is finished. Your example becomes:

  @defer.inlineCallbacks
  def A(x):
      x = yield B(x)
      C(x)

  @defer.inlineCallbacks
  def B(x):
      factor = yield AsyncCall()
      defer.returnValue(x * factor)

barrkel · on Feb 9, 2010

Yes; some people have also been (ab)using C# iterators to try and make implementing this easier. But to bang on the same drum again, it shouldn't be necessary for the programmer to fiddle with all the methods on the call chain between initiating request processing and the point of any I/O (which may be deep in e.g. database-handling logic, for async DB I/O). The compiler or runtime ought to be able to help out.

kd5bjo · on Feb 9, 2010

Whether or not a function is atomic is part of its contract; forcing code changes up the call stack when a function's contract changes is perfectly reasonable.

In twisted, all code in normal functions is executed atomically, and code in @inlineCallbacks functions is only interrupted at yield statements. If any random function could release control to the scheduler without notice to its callers, certain classes of bug would require digging through all of the callee's code to figure out what's going on. I rely on that atomicity far more often than I change a function from nonblocking to blocking, so in my case, the tradeoff is a favorable one.

Also, if you don't care about the response, there's no fiddling necessary. Just call the function and drop the result; the operation will still happen at the right time, and the result will get dropped on the floor, just as you expected.

[EDIT: changed "blocking" to "atomic", since that's what I'm really talking about.]

barrkel · on Feb 9, 2010

You can define your coding standards such that blocking is part of the contract, but I disagree that there is a natural need to make blocking part of the contract.

For example, consider switching between a non-blocking in-memory DB (perhaps flushed in a background thread) and communicating with a database across the network. Why would a module communicating with such a DB need to advertise in its contract which DB it is communicating with?

The point being, a context switch can still occur at any moment, so it's not you get the kind of atomicity you seem to be aiming for in general, unless you are operating in a shared-nothing single-threaded environment.

kd5bjo · on Feb 9, 2010

It doesn't need to advertise which DB it's communicating with. It does need to advertise whether or not it operates atomically. There's no reason why the in-memory db can't advertise itself as non-atomic, in which case changing to the network communication shouldn't break anything. Similarly, the network DB can operate synchronously and hold control of the program counter during the entire operation.

Saying that all functions are non-atomic is certainly a reasonable policy, and one that is used in many situations. Personally, I find it more convenient to program in an environment that does have this distinction. Apparently, your tastes are different than mine.

barrkel · on Feb 9, 2010

I'm not sure what you mean by "atomic".

In a multithreaded world, synchronous code storing its state on the stack, and asynchronous code derived from a CPS transform of direct style (thus storing its state in continuations), seem equivalent for the purposes of "atomicity"; since the OS scheduler can come in and reschedule the synchronous thread at any point (and will definitely context switch if the thread blocks), it seems no different to the CPS transformed case (OS may choose to context switch at any time).

Things like mutex locks do work differently in the CPS case, but it's rather that they are asynchronous too: blocking on a mutex means handing the continuation off to the mutex to continue when it's available to be acquired. And things like thread-local storage work differently - the logical thread of control may change physical threads in the CPS async case, but a runtime or compiler (which will be aware of the CPS transformation) can abstract this away.

So, what do you mean by "atomic"?

kd5bjo · on Feb 9, 2010

I mean that no outside changes will be made to the environment during the execution of the atomic block of code, and that any intermediate state of the environment during the atomic block of code will not be seen by any code outside the block. In this case "the environment" is the process's memory space.

Because I can do so much without having to worry about other threads of execution seeing my internal state, I rarely have to bother with locks at all, even when dealing with globals. I'm not using threads at all; if I need to take advantage of multiple CPUs, I run multiple distinct processes.

[Edit: clarifying single-threaded]

barrkel · on Feb 9, 2010

What do you mean, "outside changes"? Another thread could come in and make changes inside the same process's memory space - unless you are specifically talking about a single-threaded design.

From the "inside", a thread of execution, and a CPS-transformed chain of continuation calls (originally written in direct style), "feel" exactly the same - precisely because there is a mechanical translation from one to the other with no loss in semantics.

Here's what I suspect you're trying to get at: you're coming from a single-threaded perspective, where Twisted's partially-automated CPS transform is in effect emulating green threads under a cooperative scheduling model, where the "yield" represents yielding control of the current "thread" to potentially other "threads". Your concern over fully automating the transform is because it may introduce further cooperative threading "yield"s that you cannot control.

My point is that under a preemptively threaded environment, any thread could at any time hijack the CPU and mutate your state - and this is no different whether the code is synchronous or CPS-transformed. A "yield" may occur randomly at any point in your code. Hence, the requirement to annotate methods to permit CPS transformation on preemptively multithreaded direct code is a chore; the extra information in the contract doesn't have semantic value, in the same way as it does for the cooperatively-scheduled Twisted approach.

kd5bjo · on Feb 9, 2010

you're coming from a single-threaded perspective, where Twisted's partially-automated CPS transform is in effect emulating green threads under a cooperative scheduling model, where the "yield" represents yielding control of the current "thread" to potentially other "threads". Your concern over fully automating the transform is because it may introduce further cooperative threading "yield"s that you cannot control.

Yes, that's exactly the case. To scale to multiple CPUs, I use distinct processes, not threads. Shared memory and preemptive execution require a lot of programming effort to play nicely together that I don't care to expend.

The GIL in python means that threads generally won't ever let you use more than a single CPU anyway.

etherealG · on Feb 9, 2010

wow, that's really interesting. thanks. do you know of any production level languages that feature this kind of approach?

barrkel · on Feb 9, 2010

Scheme is implemented via a CPS transform, so a server built around this approach ought to be quite possible on a good Scheme implementation, or a Lisp implementation that supports the moral equivalent of call/cc.

Functional languages are also frequently implemented using CPS as an intermediate form.

Smalltalk also supports continuations, as I understand it; the Seaside framework for Smalltalk is a server implementation oriented towards continuations, though I think they use it for picking up where you left off in between multiple request/responses, not necessarily for I/O optimization. I could be mistaken.

F# has a feature it calls async workflows; I haven't delved into it much, but it looks like it's trying to make writing async code as close to direct style code as possible. But the full benefit comes from the compiler being able to rewrite all functions on the call chain between you and the function that performs the I/O, so I suspect it may be limited. Ideally, the CLR itself would support CPS transformation into its Begin/End-style async support, like I advocate here:

http://blog.barrkel.com/2006/07/fun-with-asynchronous-method...

(I've been banging on this drum quietly for some time now...)

silentbicycle · on Feb 9, 2010

Just as a minor addition to what barrkel says below, there's a good explanation of CPS transformation, and its use in compiling, in the first edition of EOPL.

I've heard the second and third editions had some of that material cut (too advanced?) - I don't know if that's true, but the first edition is typically much cheaper. FWIW.