Show HN: Hck – a fast and flexible cut-like tool

lillesvin · on July 10, 2021

I wrote something similar (but necet really finished it), called 'gut', in Go a few years back. Funny thing is, that I literally never use it. I thought splitting on regexes and that stuff would be super useful, but it turns out that I just use Perl one-liners instead. And Perl is available on something like 99.99% of all *nix machines, which my own 'cut'-substitute isn't.

Still a good exercise for me to write it, and I assume for OP too.

totalperspectiv · on July 10, 2021

It was indeed an great exercise! Part of the motivation for me was also performance oriented. I should add some Perl one liners to the benchmarks to see where they land as well. My experience is that they are usually a bit slower than awk.

c54 · on July 10, 2021

I’ve never used perl, but i love concise bash 1-liner wizard incantations. What are some examples of things it’s handy for?

atsaloli · on July 10, 2021

See https://catonmat.net/perl-one-liners-explained-part-one

And https://nostarch.com/perloneliners

asicsp · on July 11, 2021

I wrote a couple articles showing examples where Perl is more suitable compared to sed/awk:

* https://www.perl.com/article/perl-one-liners-part-1/

* https://www.perl.com/article/perl-one-liners-part-2/

FractalHQ · on July 10, 2021

What tool would you recommend to someone who is starting out and wants to learn to write nifty scripts this day in age? I’m currently studying bash but there are so many scripting languages that I hear about and it’s hard to know what to invest time into.

fragmede · on July 10, 2021

Invest time into what you need to get your job done. Easy when summarized like that, but lets dig in.

First consider what systems you want your skills to be applicable for.

Do you need tools that work on many random Linux machines that you have little control over? Then go with the lowest common denominator - bash, and various command line tools (sed,awk,grep) included with every system, and get good with the subset of command line options common on all of them - most likely limited by the oldest system you need to work with. (There are still Windows XP and Redhat 4 systems out in the wild, if you're unlucky enough to have to work with them.)

Do you need to work with OS X at all? I never learned to use Apple's outdated versions of programs, instead I heavily customized my laptop to have compatible versions of things but this only works because there's 1 os x machine I ever deal with.

Then it's about the right tool for the right job. Do you want to process text? Awk will take you a long way, but ultimately, Perl is your friend. Do you want to want more structured programming type things (aka objects/classes)? Then Python is your friend. There's a certain mindset that thinks that if everything is in one language things are better, but that's a trap. With enough work, you can do the same thing in any language, but each languages is better than others at some specific thing. (working legacy code is that something that a language can be better at than others.)

These days, it's more important to learn what tools are available and how to use them, but because you can just google 'awk print second to last column' and plug that into your script, and continue working, there's less of a need to truely grok awk's language (for example). (I mean, spend the time to learn it once so it will come back to you the next time you need to do something more custom with it)

JulianWasTaken · on July 10, 2021

> instead I heavily customized my laptop to have compatible versions of things but this only works because there's 1 os x machine I ever deal with.

This is all good advice, but to be fair, "heavily customized" these days is nearly:

    brew install awk coreutils findutils gnu-tar gnu-sed gnu-which gnu-time

banana_maker · on July 10, 2021

For a lot of tasks, posix-compliant Bash scripts are more than adequate. Use Perl, Python, or Ruby (your choice) if it becomes more complex (especially with state). It’s worth considering ones that are installed by default on most linux distros.

There’s no reason to chase X script/lang of the month. Bash etc are extremely well documented and there’s a very good chance someone already asked how to do something similar to what you’re doing on stackoverflow, etc.

mongol · on July 10, 2021

A book "Minimal Perl" used to be referred to often in these discussions but I never hear about it any more. It was teaching these kind of tricks for command line magic.

rashil2000 · on July 10, 2021

Love seeing these modern alternatives to coreutils! Ripgrep, fd, hyperfine, bat, exa, bottom, gdu, wc, sd, hexyl...

Yet to find a GNU 'tr' alternative though

ComputerGuru · on July 11, 2021

Here's `tac` in rust, with simd optimizations (in the simd branch, I haven't gotten around to releasing it although the only thing missing is dynamically detecting simd support instead of doing it at compile time):

https://github.com/neosmart/tac

and `rewrite`, which I've been told is akin to gnu sponge, "rewritten" in rust:

https://github.com/neosmart/rewrite

tyingq · on July 10, 2021

Here's tr in Perl: https://metacpan.org/dist/PerlPowerTools/source/bin/tr

marto1 · on July 12, 2021

Ripgrep is really the one that stood out for me. It feels substantially faster to use and does seem to do sane things more often than grep.

Would recommend anyone to try it.

sieste · on July 10, 2021

> Ripgrep, fd, hyperfine, bat, exa, bottom, gdu, wc, sd, hexyl...

Thanks for that list! Is there any place where more of these "modern alternatives to coreutils" are collected?

basetensucks · on July 10, 2021

https://github.com/ibraheemdev/modern-unix is a pretty decent list.

kristopolous · on July 10, 2021

What would you like it to do?

rashil2000 · on July 10, 2021

It's not like anyone absolutely needs it, I was just fascinated by the recent surge in faster and more cross-platform utilities.

kristopolous · on July 10, 2021

Get it to play nicer with iconv and we've got an improvement

kitd · on July 10, 2021

Nice work!

I don't know whether anyone here has used Rexx. The 'parse' instruction in Rexx was incredibly powerful, breaking up text by field/position/delimiter and assigning to variables all in one line.

I've often wondered if there was a command-line equivalent. Awk is great but you have to 'program' the parsing spec, rather than declare it.

twic · on July 10, 2021

> Awk is great but you have to 'program' the parsing spec, rather than declare it.

You could probably turn a declarative spec into an awk program with an awk program.

tyingq · on July 10, 2021

Not declarative, but Perl can do something like that.

Delimeters/Regex:

  $ perl -ne '($name,$pass,$uid,$gid,$therest)=split(/:/);print "$name $gid\n"' /etc/passwd
  root 0
  daemon 1
  bin 2
  ...

Fixed width:

  $ printf "1234XY\n5678AB" | perl -ne '($f1,$f2)=unpack("a4 a2");print "$f2 $f1\n"'
  XY 1234
  AB 5678

I believe Rexx's parse is fancier still, but this is reasonably close.

Ultimatt · on July 11, 2021

You might want to look into what the following cli options do for you:

perl -F':' -anE 'say $F[0]'

kitd · on July 11, 2021

That is indeed good. I used a bit of Perl a few years back but it has slipped out of my mind.

bilalhusain · on July 10, 2021

It is interesting to note how it compares to "choose" (also in Rust) in the benchmarks.

single character

    hck           1.494 ± 0.026s
    hck (no-mmap) 1.735 ± 0.004s
    choose        4.597 ± 0.016s

multi character

    hck           2.127 ± 0.004s
    hck (no-mmap) 2.467 ± 0.012s
    choose        3.266 ± 0.011s

The single pass optimization trick[1] seems to be helping a lot in single character case.

Of course, doing away with a pass is suppossed to give 2x, and I am wondering whether the regex constraint lead to this "side-effect".

[1] fast mode - https://github.com/sstadick/hck/blob/master/src/lib/core.rs#... https://github.com/sstadick/hck/blob/master/src/lib/core.rs#...

asicsp · on July 11, 2021

I saw about `hck` recently on twitter, was impressed to see support for compressed files. From the current todo list, I hope complement is implemented for sure.

I see Negative index is currently "unlikely". I'm writing a similar tool [0], but with bash+awk. I solved the negative index support with a `-n` option, which changes the range syntax to `:` instead of `-` character.

My biggest trouble came with literal field separator [1], because FS can only be specified as a string in awk and backslash is a metacharacter for both string and regexp.

[0] https://github.com/learnbyexample/regexp-cut

[1] https://learnbyexample.github.io/escaping-madness-awk-litera...

visarga · on July 10, 2021

<offtopic> I have implemented a `_split` command to split a line by a separator and `_stat` command that does basically `sort | uniq -c | sort -nr` counting elements and sorting by frequency. Really useful operations for me.

When my one liners become 2-3 lines long I need to switch to a regular script, but I also log all my shell commands years back and have something a bit better than `history | grep word` to search it.</>

nerdponx · on July 11, 2021

> I also log all my shell commands years back and have something a bit better than `history | grep word` to search it.

I'd be very interested to hear more about this.

rendall · on July 11, 2021

The README and description should not assume the reader knows what `cut` is or what it's used for. Maybe reference it and then ELI5

technological · on July 11, 2021

Nice one op. It’s mostly due to my lack of knowledge of rust but the code is not easy to read unlike golang. Does anyone feel the same ? (between nothing to do with how op wrote but rather the language itself)

tyingq · on July 11, 2021

I don't think even Rust fans would argue that. Rust has roughly 2x the amount of reserved words, more operators, and so on. There's a larger basic set of things to learn before you could skim some code and read it.

queuebert · on July 10, 2021

Yay, no more piping multiple cuts when you have multiple delimiters.

toastal · on July 10, 2021

valbaca · on July 10, 2021

> hck is a shortening of hack, a rougher form of cut.