Hacker News new | past | comments | ask | show | jobs | submit login
Unexpected Interaction of Features (solipsys.co.uk)
60 points by ColinWright on June 27, 2023 | hide | past | favorite | 13 comments



You can work around this by specifying the number field twice. Which makes no sense to me but hey, it works.

    > cat blah | sort -u -k 1n -k 1
    1 a
    5 which
    10 exotically
    15 aerodynamically
    15 differentiation
    20 electroencephalogram


Nice find. The docs say...

    -n    compare according to string numerical value
    -u    output only the first of an equal run
...so the implementation seems clear enough. With -n the comparison is numerical, with -u lines which compare equal are deduplicated. But almost certainly not as intended!


Because of this I always use `uniq`.

POSIX' documentation and Apple's man-page at least are bit better formulated than :

    -u Unique: suppress all but one in each set of lines having equal keys.
    [...]
    -n Restrict the sort key to an initial numeric string, [...]
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/s...

than the GNU version - looks like the person documenting `-u` hasn't known what it does (never attribute to malice ...):

    -u, --unique
              with -c, check for strict ordering; without -c, output only  the
              first of an equal run


If you want to stick with Bash then the commands split, join, cut, paste, join come in very handy.

For me there is a threshold beyond which I move to AWK. Some would prefer Perl over AWK, but I cannot hold all of Perl's syntax and keywords in my brain.

Sometimes the needed processing is either a lot for AWK, or I have to be at my best behavior -- the equivalent of meeting my girlfriend's parents for the first time. In such cases I use Python. I usually do not use Python for one offs that I do not intend to maintain because with AWK you can usually relieve yourself of the ceremony of setting up a read and tokenize loop. Then again, some prefer making a sqlite database out of the file and use SQL.

TLDR: split, join, cut, paste, join, AWK.

Still TLDR: Coreutils (in particular textutils and fileutils) https://www.gnu.org/software/coreutils/manual/coreutils.html

This is not to insinuate that OP does not already know this, commenting just in the hope that it is helpful to some.


> the commands split, join, cut, paste, join come in very handy.

And uniq, you obviously forgot about uniq.

Sorry, could not not respond to this ;)


Indeed !

Leaving my comment uncorrected for the unintended insider joke on why uniq would have helped.


> For me there is a threshold beyond which I move to AWK. Some would prefer Perl over AWK, but I cannot hold all of Perl's syntax and keywords in my brain

You don't have to hold all of a language's syntax and keywords in your brain to use that language. Or even be aware of all the syntax and keywords.

It's fine to just learn the subset of a language needed for the things you are going to use it for.


Awk is great, but I don’t use it often enough to memorize the awk syntax. That’s why I use “pawk”, an awk-like tool with a Python syntax. It sets up the read and tokenize loop for me :)


Maybe a case for -V, aka version or natural sort.

  ~$ printf '2 foo\n2 bar\n1 one\n10 ten\n3 three\n2 foo\n' | sort -un
  1 one
  2 foo
  3 three
  10 ten
  ~$ printf '2 foo\n2 bar\n1 one\n10 ten\n3 three\n2 foo\n' | sort -uV
  1 one
  2 bar
  2 foo
  3 three
  10 ten


Version sort does have its own problems:

   % echo -e "1.15 bla\n1.5 hugo\n1.05 gsfg"| sort -V
   1.05 gsfg
   1.5 hugo
   1.15 bla


nushell's sort and uniq do what you might expect, operating on the whole value (line). Copying the data to the clipboard (hence pbpaste; substitute if necessary):

  ~: pbpaste | split row "\n" | sort -n | uniq                                                                                                                                                     
  ╭───┬─────────────────────────╮
  │ 0 │ 1 a                     │
  │ 1 │ 5 which                 │
  │ 2 │ 10 exotically           │
  │ 3 │ 15 aerodynamically      │
  │ 4 │ 15 differentiation      │
  │ 5 │ 20 electroencephalogram │
  ╰───┴─────────────────────────╯


It's not nushell doing that, those are the same sort and uniq tools that they mention at the bottom of the post.


No, nushell has its own built-in implementations that are aware of nushell's richer type system:

https://www.nushell.sh/commands/docs/sort.html https://www.nushell.sh/commands/docs/uniq.html




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: