One thing I'd like to add to the discussion is the difference between edge-triggered and level-triggered notifications. Quoting from the Linux Programming Interface (Kerrisk, 2010),
> Level-triggered notification: A file descriptor is considered to be ready if it is possible to perform an I/O system call without blocking.
> Edge-triggered notification: Notification is provided if there is I/O activity (e.g., new input) on a file descriptor since it was last monitored.
The two different types can affect the way the program is structured. For example, the same book says that for level-triggered notification, generally when a file descriptor is ready, only one read/write operation is performed, and then go back to the main loop. Whereas for edge-triggered, usually I/O is performed as much as possible, so we don't miss any more opportunities.
In practice, you usually want your file descriptors to be non-blocking, for many reasons (like writing a large enough buffer can nevertheless block even when the file descriptor is initially considered ready for writing), so even for level-triggered notifications you can read/write in a loop.
Personally I believe edge-triggered notifications can make program design slightly simpler, though I'm not exactly uncertain how much simpler. I'd appreciate if my comment would invite a more detailed and nuanced discussion about those two.
> In practice, you usually want your file descriptors to be non-blocking, for many reasons
I think I know what you're saying, but in practice this is exactly backwards.
Usually you're doing I/O for some practical reason and want to do simple, well-defined, typically sequential processing on the results. Which is to say blocking is your friend and you shouldn't be mucking with parallel I/O paradigms if you can at all avoid it.
> Usually you're doing I/O for some practical reason and want to do simple, well-defined, typically sequential processing on the results. Which is to say blocking is your friend and you shouldn't be mucking with parallel I/O paradigms if you can at all avoid it.
In principle yes, though it becomes annoying when your use-case evolves or you have to miss out on otherwise obvious optimization opportunities because they'd seriously complicate your program.
E.g., Your program is mostly sequential but in one step, you'd be able to do a bunch of requests in parallel.
I think paradigms like async/await are a step ahead to give you the best of both worlds here: You can write your programs as if your requests block, but it still uses async IO behind the scenes - and you can drop the pretense of blocking when it makes sense at any time.
> async/await are a step ahead to give you the best of both worlds here: You can write your programs as if your requests block
Are you thinking of “green”/M:N threading (as found e.g. in Go)?
Async/await (as found e.g. in Python) is precisely what hinders the style you describe: If your brand new I/O routine is “async colored” to take advantage of non-blocking syscalls, you can’t easily call it from your regular “sync colored” code without “spilling the color” all around, i.e. considerable refactoring.
I'm a fan of the user space threading for IO programs that handle more than one stream. It gives you the same context you are used to in blocking programs without the mostly needless thread context switching.
My context is that most of my recent years work was in high performance systems that handle thousands of streams concurrently and only have a limited amount of cores and a rather limited amount of processing for each io request have so the context switching cost becomes a high percentage compared to actual work done.
Yes. The default of blocking I/O is optimized for simple programs where I/O is not the main thing. That default is perfect for those programs. Another default behavior, killing the program on SIGPIPE is also optimized for those programs.
But I'm specifically talking about those that need sophisticated strategies to deal with multiplexed I/O (which is a topic of this article you're commenting on).
> Another default behavior, killing the program on SIGPIPE is also optimized for those programs
I've never understood this choice myself. It has always seemed unhelpful given that the read/write returns an error anyhow. Why is it a helpful default?
Exactly. Also it generally relies on every process everywhere to properly handle an error return from write(). If you launch a big pipe from the shell and just one stage goofs this up and keeps writing after error, the whole thing will stall until you Ctrl-C or otherwise manually kill the process group, which will wreck whatever result you were trying to get from the (already completed successfully!) "head" or "cut" or whatever.
Basically it's a very sane robustness choice and one of the great ideas of classic unix. It's just surprising the first time you stumble over it.
I had written a program a while back that needed to load all of the customers outgoing emails from multiple large .pst files. I got a pretty big performance gain using a thread pool to do the io for all the files concurrently and block until they all finished with thread join() calls. (As opposed to loading them 1 after another)
The actual runtime difference for me was 40 mins -> 10 mins
You may have missed the "multi threaded vs single threaded" section where they talk about a very similar pattern:
"The way it works is simple, it uses blocking IO, but each blocking call is made in its own thread. Now depending on the implementation, it either takes a callback, or uses a polling model, like returning a Future."
It is very relevant. Both async io and multithreaded io are ways to extract parallelism. Which one is more appropriate depends on the characteristics of the problem.
You also have EONESHOT which means that the event monitor is automatically removed after it's triggered.
It depends on the design of your concurrency model.
For event based systems, you may prefer level-triggered notifications. That's because you want to trigger the callback any time data is available.
For fiber/green-thread based systems, you may prefer edge-triggered notifications, and often typically one-shot (e.g. EONESHOT). That's because someone called `wait_until_readable` and when that function returns, they are done. If they want to wait again, they will call the function again.
> Level-triggered notification: A file descriptor is considered to be ready if it is possible to perform an I/O system call without blocking.
> Edge-triggered notification: Notification is provided if there is I/O activity (e.g., new input) on a file descriptor since it was last monitored.
The two different types can affect the way the program is structured. For example, the same book says that for level-triggered notification, generally when a file descriptor is ready, only one read/write operation is performed, and then go back to the main loop. Whereas for edge-triggered, usually I/O is performed as much as possible, so we don't miss any more opportunities.
In practice, you usually want your file descriptors to be non-blocking, for many reasons (like writing a large enough buffer can nevertheless block even when the file descriptor is initially considered ready for writing), so even for level-triggered notifications you can read/write in a loop.
Personally I believe edge-triggered notifications can make program design slightly simpler, though I'm not exactly uncertain how much simpler. I'd appreciate if my comment would invite a more detailed and nuanced discussion about those two.