Reminds me of my first job processing radar data on a Masscomp mini-computer. The full processing was a bunch of simple c programs all unix piped together into a processing chain. Simple and elegant.
We appeared to have a persistent bug though, processing left running overnight always seemed to crash. So after a couple of nights I stayed in the lab to try and see what the cause was. At 7:30pm the lab door opened and in walked the cleaner who reached down and unplugged the computer so that he could plug in the vacuum cleaner. Problem solved.
That article totally misrepresents the normal use case for a Hadoop cluster though. Hadoop clusters are meant for when you have multiple petabytes of data and this network bandwidth becomes the bottleneck for doing batch processing jobs.
Let me know when your command line tools run multiple large scale processing jobs on petabyte datasets.
His article was him constructing a straw man about why people use Hadoop and attacking that.
I've worked with these "straw men" you say he constructed, they are absolutely out there. There was a time in 2014 when Hadoop/MapReduce was the hammer and every problem out there looked like a nail.
How many people have used Hadoop for a project?
How many Petabyte+ datasets do you think are out there?
Unless you truly believe the answer to those 2 questions are the same, I think you can see why that article had to be written.
I feel it is in a similar spirit as the OP here. In both articles the important take away for me is focusing on an approach and tools that match the problems. It sounds like you are making a similar point.
I also appreciate quality of the writing and step by step journey.
When I first read the article, I found it timely as I felt bombarded by IaaS & PaaS offerings with “web scale” solutions for problems I couldn’t relate to, basic examples, and few case studies.