Yeah, let me just multithread and hand-optimize the piss out of this Go program until it's marginally faster than it would have been in C.
Reminds me of the argument that venison tastes better than beef. The argument roughly being, "If you shoot the deer right, drag it home right, gut it right, skin it right, tenderize it right and cook it _just_ right, it'll be _almost_ as good as store-bought frozen beef"
Author here. I think you should go through the article again. I think it's quite readable, and there are no "hand-optimizations" as you say. Also, the single-core implementation was already faster than the C version - the multithreaded version was only done to explore different methods of concurrency in Go.
To be fair the author does say "it has since turned into a game of trying to take on the venerable wc with different languages". Of course the real message is that the original wc isn't particularly efficient and it can be beaten in many different languages - including C.
Reminds me of the argument that venison tastes better than beef. The argument roughly being, "If you shoot the deer right, drag it home right, gut it right, skin it right, tenderize it right and cook it _just_ right, it'll be _almost_ as good as store-bought frozen beef"