The long term maintainability is an important point that most comments here ignore. If you need to run the command once or twice every now and then in an ad hoc way then sure hack together a command line script. But "email Jeff and ask him to run his script" isn't scalable if you need to run the command at a regular interval for years and years and have it work long after Jeff quits.
Some times the killer feature of that data analytics pipeline isn't scalability, but robustness, reproducibility and consistency.
> "email Jeff and ask him to run his script" isn't scalable
Sure, it's not.
But the only alternative to that is not building some monster cluster to process a few gigabytes.
You can write a good script (instead of hacking one together), put it in source control and pull it from there automatically to the production server and run it regularly from cron. Now you have your robustness, reproducibility and consistency as well as much higher performance, for about one-ten-thousandth of the cost.
Some times the killer feature of that data analytics pipeline isn't scalability, but robustness, reproducibility and consistency.