It's expensive enough that the first thing I do when diagnosing slow software is to break out "strace -c", because shockingly often the cause is someone who thought syscalls wouldn't be that expensive.
You're right, they're not that expensive, but once who end up triggering multiple syscalls in a tight loop where a tiny fraction of them were needed, it often does totally kill performance.
Especially because while the call itself is not that expensive, you rarely call something that does no work. So the goal should be not just to remove the syscall, but to remove the reason why someone made lots of syscalls in the first instance.
E.g. pet peeve of mine: People who call read() with size 1 because they don't know the length of remaining data instead of doing async reads into a userspace buffer. Have cut CPU use drastically so many times because of applications that did that. The problem then is of course not just the context switch, but the massive number of read calls.
FYI shells do this when reading line-by-line from non-seekable streams (e.g. pipes). It's not really feasible to do buffer I/O when you have subprocesses around, and you can't just ungetc(3) into arbitrary file descriptors. A potential performance worth nothing IMO—that is, if you ever do text processing with a bunch of read commands and not sed/awk/whatever like a normal person. Of course this doesn't apply to seekable files, but it still has to rewind back to immediately after the last newline character so there's that.
You're right, they're not that expensive, but once who end up triggering multiple syscalls in a tight loop where a tiny fraction of them were needed, it often does totally kill performance.
Especially because while the call itself is not that expensive, you rarely call something that does no work. So the goal should be not just to remove the syscall, but to remove the reason why someone made lots of syscalls in the first instance.
E.g. pet peeve of mine: People who call read() with size 1 because they don't know the length of remaining data instead of doing async reads into a userspace buffer. Have cut CPU use drastically so many times because of applications that did that. The problem then is of course not just the context switch, but the massive number of read calls.