One thing I'd like to add to the discussion is the difference between edge-trigg...

ajross · on May 7, 2018

> In practice, you usually want your file descriptors to be non-blocking, for many reasons

I think I know what you're saying, but in practice this is exactly backwards.

Usually you're doing I/O for some practical reason and want to do simple, well-defined, typically sequential processing on the results. Which is to say blocking is your friend and you shouldn't be mucking with parallel I/O paradigms if you can at all avoid it.

xg15 · on May 7, 2018

> Usually you're doing I/O for some practical reason and want to do simple, well-defined, typically sequential processing on the results. Which is to say blocking is your friend and you shouldn't be mucking with parallel I/O paradigms if you can at all avoid it.

In principle yes, though it becomes annoying when your use-case evolves or you have to miss out on otherwise obvious optimization opportunities because they'd seriously complicate your program.

E.g., Your program is mostly sequential but in one step, you'd be able to do a bunch of requests in parallel.

I think paradigms like async/await are a step ahead to give you the best of both worlds here: You can write your programs as if your requests block, but it still uses async IO behind the scenes - and you can drop the pretense of blocking when it makes sense at any time.

vfaronov · on May 7, 2018

> async/await are a step ahead to give you the best of both worlds here: You can write your programs as if your requests block

Are you thinking of “green”/M:N threading (as found e.g. in Go)?

Async/await (as found e.g. in Python) is precisely what hinders the style you describe: If your brand new I/O routine is “async colored” to take advantage of non-blocking syscalls, you can’t easily call it from your regular “sync colored” code without “spilling the color” all around, i.e. considerable refactoring.

baruch · on May 7, 2018

I'm a fan of the user space threading for IO programs that handle more than one stream. It gives you the same context you are used to in blocking programs without the mostly needless thread context switching.

My context is that most of my recent years work was in high performance systems that handle thousands of streams concurrently and only have a limited amount of cores and a rather limited amount of processing for each io request have so the context switching cost becomes a high percentage compared to actual work done.

lenkite · on May 8, 2018

By this you mean use technologies like golang right ?

kccqzy · on May 7, 2018

Yes. The default of blocking I/O is optimized for simple programs where I/O is not the main thing. That default is perfect for those programs. Another default behavior, killing the program on SIGPIPE is also optimized for those programs.

But I'm specifically talking about those that need sophisticated strategies to deal with multiplexed I/O (which is a topic of this article you're commenting on).

steeleduncan · on May 7, 2018

> Another default behavior, killing the program on SIGPIPE is also optimized for those programs

I've never understood this choice myself. It has always seemed unhelpful given that the read/write returns an error anyhow. Why is it a helpful default?

mjw1007 · on May 7, 2018

You might be piping its output into `head` or `more`.

ajross · on May 7, 2018

Exactly. Also it generally relies on every process everywhere to properly handle an error return from write(). If you launch a big pipe from the shell and just one stage goofs this up and keeps writing after error, the whole thing will stall until you Ctrl-C or otherwise manually kill the process group, which will wreck whatever result you were trying to get from the (already completed successfully!) "head" or "cut" or whatever.

Basically it's a very sane robustness choice and one of the great ideas of classic unix. It's just surprising the first time you stumble over it.

skate22 · on May 7, 2018

I had written a program a while back that needed to load all of the customers outgoing emails from multiple large .pst files. I got a pretty big performance gain using a thread pool to do the io for all the files concurrently and block until they all finished with thread join() calls. (As opposed to loading them 1 after another)

The actual runtime difference for me was 40 mins -> 10 mins

kccqzy · on May 7, 2018

That's not really relevant. Both the article and the comment is talking about single-threaded I/O.

skate22 · on May 7, 2018

You may have missed the "multi threaded vs single threaded" section where they talk about a very similar pattern:

"The way it works is simple, it uses blocking IO, but each blocking call is made in its own thread. Now depending on the implementation, it either takes a callback, or uses a polling model, like returning a Future."

gpderetta · on May 7, 2018

It is very relevant. Both async io and multithreaded io are ways to extract parallelism. Which one is more appropriate depends on the characteristics of the problem.

ioquatix · on May 7, 2018

Stackful fibers allow you to have concurrency with the illusion of blocking IO.

ioquatix · on May 7, 2018

You also have EONESHOT which means that the event monitor is automatically removed after it's triggered.

It depends on the design of your concurrency model.

For event based systems, you may prefer level-triggered notifications. That's because you want to trigger the callback any time data is available.

For fiber/green-thread based systems, you may prefer edge-triggered notifications, and often typically one-shot (e.g. EONESHOT). That's because someone called `wait_until_readable` and when that function returns, they are done. If they want to wait again, they will call the function again.