Twisted's @inlineCallbacks decorator basically does this for you; whenever you y...

barrkel · on Feb 9, 2010

Yes; some people have also been (ab)using C# iterators to try and make implementing this easier. But to bang on the same drum again, it shouldn't be necessary for the programmer to fiddle with all the methods on the call chain between initiating request processing and the point of any I/O (which may be deep in e.g. database-handling logic, for async DB I/O). The compiler or runtime ought to be able to help out.

kd5bjo · on Feb 9, 2010

Whether or not a function is atomic is part of its contract; forcing code changes up the call stack when a function's contract changes is perfectly reasonable.

In twisted, all code in normal functions is executed atomically, and code in @inlineCallbacks functions is only interrupted at yield statements. If any random function could release control to the scheduler without notice to its callers, certain classes of bug would require digging through all of the callee's code to figure out what's going on. I rely on that atomicity far more often than I change a function from nonblocking to blocking, so in my case, the tradeoff is a favorable one.

Also, if you don't care about the response, there's no fiddling necessary. Just call the function and drop the result; the operation will still happen at the right time, and the result will get dropped on the floor, just as you expected.

[EDIT: changed "blocking" to "atomic", since that's what I'm really talking about.]

barrkel · on Feb 9, 2010

You can define your coding standards such that blocking is part of the contract, but I disagree that there is a natural need to make blocking part of the contract.

For example, consider switching between a non-blocking in-memory DB (perhaps flushed in a background thread) and communicating with a database across the network. Why would a module communicating with such a DB need to advertise in its contract which DB it is communicating with?

The point being, a context switch can still occur at any moment, so it's not you get the kind of atomicity you seem to be aiming for in general, unless you are operating in a shared-nothing single-threaded environment.

kd5bjo · on Feb 9, 2010

It doesn't need to advertise which DB it's communicating with. It does need to advertise whether or not it operates atomically. There's no reason why the in-memory db can't advertise itself as non-atomic, in which case changing to the network communication shouldn't break anything. Similarly, the network DB can operate synchronously and hold control of the program counter during the entire operation.

Saying that all functions are non-atomic is certainly a reasonable policy, and one that is used in many situations. Personally, I find it more convenient to program in an environment that does have this distinction. Apparently, your tastes are different than mine.

barrkel · on Feb 9, 2010

I'm not sure what you mean by "atomic".

In a multithreaded world, synchronous code storing its state on the stack, and asynchronous code derived from a CPS transform of direct style (thus storing its state in continuations), seem equivalent for the purposes of "atomicity"; since the OS scheduler can come in and reschedule the synchronous thread at any point (and will definitely context switch if the thread blocks), it seems no different to the CPS transformed case (OS may choose to context switch at any time).

Things like mutex locks do work differently in the CPS case, but it's rather that they are asynchronous too: blocking on a mutex means handing the continuation off to the mutex to continue when it's available to be acquired. And things like thread-local storage work differently - the logical thread of control may change physical threads in the CPS async case, but a runtime or compiler (which will be aware of the CPS transformation) can abstract this away.

So, what do you mean by "atomic"?

kd5bjo · on Feb 9, 2010

I mean that no outside changes will be made to the environment during the execution of the atomic block of code, and that any intermediate state of the environment during the atomic block of code will not be seen by any code outside the block. In this case "the environment" is the process's memory space.

Because I can do so much without having to worry about other threads of execution seeing my internal state, I rarely have to bother with locks at all, even when dealing with globals. I'm not using threads at all; if I need to take advantage of multiple CPUs, I run multiple distinct processes.

[Edit: clarifying single-threaded]

barrkel · on Feb 9, 2010

What do you mean, "outside changes"? Another thread could come in and make changes inside the same process's memory space - unless you are specifically talking about a single-threaded design.

From the "inside", a thread of execution, and a CPS-transformed chain of continuation calls (originally written in direct style), "feel" exactly the same - precisely because there is a mechanical translation from one to the other with no loss in semantics.

Here's what I suspect you're trying to get at: you're coming from a single-threaded perspective, where Twisted's partially-automated CPS transform is in effect emulating green threads under a cooperative scheduling model, where the "yield" represents yielding control of the current "thread" to potentially other "threads". Your concern over fully automating the transform is because it may introduce further cooperative threading "yield"s that you cannot control.

My point is that under a preemptively threaded environment, any thread could at any time hijack the CPU and mutate your state - and this is no different whether the code is synchronous or CPS-transformed. A "yield" may occur randomly at any point in your code. Hence, the requirement to annotate methods to permit CPS transformation on preemptively multithreaded direct code is a chore; the extra information in the contract doesn't have semantic value, in the same way as it does for the cooperatively-scheduled Twisted approach.

kd5bjo · on Feb 9, 2010

you're coming from a single-threaded perspective, where Twisted's partially-automated CPS transform is in effect emulating green threads under a cooperative scheduling model, where the "yield" represents yielding control of the current "thread" to potentially other "threads". Your concern over fully automating the transform is because it may introduce further cooperative threading "yield"s that you cannot control.

Yes, that's exactly the case. To scale to multiple CPUs, I use distinct processes, not threads. Shared memory and preemptive execution require a lot of programming effort to play nicely together that I don't care to expend.

The GIL in python means that threads generally won't ever let you use more than a single CPU anyway.