I wrote something similar (but necet really finished it), called 'gut', in Go a few years back. Funny thing is, that I literally never use it. I thought splitting on regexes and that stuff would be super useful, but it turns out that I just use Perl one-liners instead. And Perl is available on something like 99.99% of all *nix machines, which my own 'cut'-substitute isn't.
Still a good exercise for me to write it, and I assume for OP too.
It was indeed an great exercise! Part of the motivation for me was also performance oriented. I should add some Perl one liners to the benchmarks to see where they land as well. My experience is that they are usually a bit slower than awk.
What tool would you recommend to someone who is starting out and wants to learn to write nifty scripts this day in age? I’m currently studying bash but there are so many scripting languages that I hear about and it’s hard to know what to invest time into.
Invest time into what you need to get your job done. Easy when summarized like that, but lets dig in.
First consider what systems you want your skills to be applicable for.
Do you need tools that work on many random Linux machines that you have little control over? Then go with the lowest common denominator - bash, and various command line tools (sed,awk,grep) included with every system, and get good with the subset of command line options common on all of them - most likely limited by the oldest system you need to work with. (There are still Windows XP and Redhat 4 systems out in the wild, if you're unlucky enough to have to work with them.)
Do you need to work with OS X at all? I never learned to use Apple's outdated versions of programs, instead I heavily customized my laptop to have compatible versions of things but this only works because there's 1 os x machine I ever deal with.
Then it's about the right tool for the right job. Do you want to process text? Awk will take you a long way, but ultimately, Perl is your friend. Do you want to want more structured programming type things (aka objects/classes)? Then Python is your friend. There's a certain mindset that thinks that if everything is in one language things are better, but that's a trap. With enough work, you can do the same thing in any language, but each languages is better than others at some specific thing. (working legacy code is that something that a language can be better at than others.)
These days, it's more important to learn what tools are available and how to use them, but because you can just google 'awk print second to last column' and plug that into your script, and continue working, there's less of a need to truely grok awk's language (for example). (I mean, spend the time to learn it once so it will come back to you the next time you need to do something more custom with it)
For a lot of tasks, posix-compliant Bash scripts are more than adequate. Use Perl, Python, or Ruby (your choice) if it becomes more complex (especially with state). It’s worth considering ones that are installed by default on most linux distros.
There’s no reason to chase X script/lang of the month. Bash etc are extremely well documented and there’s a very good chance someone already asked how to do something similar to what you’re doing on stackoverflow, etc.
A book "Minimal Perl" used to be referred to often in these discussions but I never hear about it any more. It was teaching these kind of tricks for command line magic.
Here's `tac` in rust, with simd optimizations (in the simd branch, I haven't gotten around to releasing it although the only thing missing is dynamically detecting simd support instead of doing it at compile time):
I don't know whether anyone here has used Rexx. The 'parse' instruction in Rexx was incredibly powerful, breaking up text by field/position/delimiter and assigning to variables all in one line.
I've often wondered if there was a command-line equivalent. Awk is great but you have to 'program' the parsing spec, rather than declare it.
I saw about `hck` recently on twitter, was impressed to see support for compressed files. From the current todo list, I hope complement is implemented for sure.
I see Negative index is currently "unlikely". I'm writing a similar tool [0], but with bash+awk. I solved the negative index support with a `-n` option, which changes the range syntax to `:` instead of `-` character.
My biggest trouble came with literal field separator [1], because FS can only be specified as a string in awk and backslash is a metacharacter for both string and regexp.
<offtopic>
I have implemented a `_split` command to split a line by a separator and `_stat` command that does basically `sort | uniq -c | sort -nr` counting elements and sorting by frequency. Really useful operations for me.
When my one liners become 2-3 lines long I need to switch to a regular script, but I also log all my shell commands years back and have something a bit better than `history | grep word` to search it.</>
Nice one op. It’s mostly due to my lack of knowledge of rust but the code is not easy to read unlike golang. Does anyone feel the same ? (between nothing to do with how op wrote but rather the language itself)
I don't think even Rust fans would argue that. Rust has roughly 2x the amount of reserved words, more operators, and so on. There's a larger basic set of things to learn before you could skim some code and read it.
Still a good exercise for me to write it, and I assume for OP too.