Well, they could be doing some simplifications and assumptions that can improve performance.
Also performance is not always the only factor to consider: you can optimize a program for speed, but also for RAM usage, or the size of the binary itself.
In program like Coreutils to me is more important that the programs are small than the rest. Typically you use a lot of commands in a script, to do some trivial operations (the input of the program is usually small), thus simpler programs (that have less startup time) are usually better.
After the first time the binary is executed, it should be comfortably in the memory disk cache. And frankly, on a modern SSD, the sector size is large enough that the only difference is reading 3 instead of 1 sectors to call "ls". Barely matters if it gets batched.
Also performance is not always the only factor to consider: you can optimize a program for speed, but also for RAM usage, or the size of the binary itself.
In program like Coreutils to me is more important that the programs are small than the rest. Typically you use a lot of commands in a script, to do some trivial operations (the input of the program is usually small), thus simpler programs (that have less startup time) are usually better.