Unexpected Interaction of Features

zimpenfish · on June 27, 2023

You can work around this by specifying the number field twice. Which makes no sense to me but hey, it works.

    > cat blah | sort -u -k 1n -k 1
    1 a
    5 which
    10 exotically
    15 aerodynamically
    15 differentiation
    20 electroencephalogram

DougBTX · on June 27, 2023

Nice find. The docs say...

    -n    compare according to string numerical value
    -u    output only the first of an equal run

...so the implementation seems clear enough. With -n the comparison is numerical, with -u lines which compare equal are deduplicated. But almost certainly not as intended!

ReleaseCandidat · on June 27, 2023

Because of this I always use `uniq`.

POSIX' documentation and Apple's man-page at least are bit better formulated than :

    -u Unique: suppress all but one in each set of lines having equal keys.
    [...]
    -n Restrict the sort key to an initial numeric string, [...]

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/s...

than the GNU version - looks like the person documenting `-u` hasn't known what it does (never attribute to malice ...):

    -u, --unique
              with -c, check for strict ordering; without -c, output only  the
              first of an equal run

srean · on June 27, 2023

If you want to stick with Bash then the commands split, join, cut, paste, join come in very handy.

For me there is a threshold beyond which I move to AWK. Some would prefer Perl over AWK, but I cannot hold all of Perl's syntax and keywords in my brain.

Sometimes the needed processing is either a lot for AWK, or I have to be at my best behavior -- the equivalent of meeting my girlfriend's parents for the first time. In such cases I use Python. I usually do not use Python for one offs that I do not intend to maintain because with AWK you can usually relieve yourself of the ceremony of setting up a read and tokenize loop. Then again, some prefer making a sqlite database out of the file and use SQL.

TLDR: split, join, cut, paste, join, AWK.

Still TLDR: Coreutils (in particular textutils and fileutils) https://www.gnu.org/software/coreutils/manual/coreutils.html

This is not to insinuate that OP does not already know this, commenting just in the hope that it is helpful to some.

ReleaseCandidat · on June 27, 2023

> the commands split, join, cut, paste, join come in very handy.

And uniq, you obviously forgot about uniq.

Sorry, could not not respond to this ;)

srean · on June 27, 2023

Indeed !

Leaving my comment uncorrected for the unintended insider joke on why uniq would have helped.

tzs · on June 28, 2023

> For me there is a threshold beyond which I move to AWK. Some would prefer Perl over AWK, but I cannot hold all of Perl's syntax and keywords in my brain

You don't have to hold all of a language's syntax and keywords in your brain to use that language. Or even be aware of all the syntax and keywords.

It's fine to just learn the subset of a language needed for the things you are going to use it for.

two_handfuls · on June 27, 2023

Awk is great, but I don’t use it often enough to memorize the awk syntax. That’s why I use “pawk”, an awk-like tool with a Python syntax. It sets up the read and tokenize loop for me :)

barrkel · on June 27, 2023

Maybe a case for -V, aka version or natural sort.

  ~$ printf '2 foo\n2 bar\n1 one\n10 ten\n3 three\n2 foo\n' | sort -un
  1 one
  2 foo
  3 three
  10 ten
  ~$ printf '2 foo\n2 bar\n1 one\n10 ten\n3 three\n2 foo\n' | sort -uV
  1 one
  2 bar
  2 foo
  3 three
  10 ten

ReleaseCandidat · on June 27, 2023

Version sort does have its own problems:

   % echo -e "1.15 bla\n1.5 hugo\n1.05 gsfg"| sort -V
   1.05 gsfg
   1.5 hugo
   1.15 bla

dwb · on June 27, 2023

nushell's sort and uniq do what you might expect, operating on the whole value (line). Copying the data to the clipboard (hence pbpaste; substitute if necessary):

  ~: pbpaste | split row "\n" | sort -n | uniq                                                                                                                                                     
  ╭───┬─────────────────────────╮
  │ 0 │ 1 a                     │
  │ 1 │ 5 which                 │
  │ 2 │ 10 exotically           │
  │ 3 │ 15 aerodynamically      │
  │ 4 │ 15 differentiation      │
  │ 5 │ 20 electroencephalogram │
  ╰───┴─────────────────────────╯

Izkata · on June 27, 2023

It's not nushell doing that, those are the same sort and uniq tools that they mention at the bottom of the post.

dwb · on June 27, 2023

No, nushell has its own built-in implementations that are aware of nushell's richer type system:

https://www.nushell.sh/commands/docs/sort.html https://www.nushell.sh/commands/docs/uniq.html