Hacker News new | past | comments | ask | show | jobs | submit login
Unexpected Interaction of Features (2018) (solipsys.co.uk)
27 points by ColinWright on Feb 17, 2019 | hide | past | favorite | 4 comments



Technically speaking, this may be a case of undefined behavior. From my man page:

    -u, --unique
  Unique keys.  Suppress all lines that have a key that is 
  equal to an already processed one.  This option, similarly 
  to -s, implies a stable sort.  If used with -c or -C, sort 
  also checks that there are no lines with duplicate keys.
  ...
    -n, --numeric-sort, --sort=numeric
  Sort fields numerically by arithmetic value.  Fields are 
  supposed to have optional blanks in the beginning, an 
  optional minus sign, zero or more digits (including 
  decimal point and possible thousand separators).
When you use -n, without a key fields specification, the whole line does not meet the requirement for numeric sorting.

This sort does give me the intended output:

  $sort -k1,1n -k2 -u ~/tmp/sort.txt
  1 a
  5 which
  10 exotically
  15 aerodynamically
  15 differentiation
  20 electroencephalogram
Whether or not deduplication on keys is ideal behavior, it is what is specified here. What is not explicitly specified is what is considered to be the key when you try to sort a non-numeric line numerically.

This is the sort of problem that you get with duck typing: it does what you expect and intend, except in those corner cases where it doesn't.


Ah, yes. The sort command definitely has a few gotchas like this. It's too bad that we all seem to learn them the hard way. :)

Another one that used to bite me: locale-dependent sorting. These days, I rarely use the sort command without LC_ALL=C.


Excellent, but now I realise there are repeated lines, and I need to de-duplicate. So I use sort -u to do that

I would just pipe it to uniq, the ultimate solution proposed --- because that seems to make more sense to me. I have not ever used the '-u' option of sort before, nor would I have expected it to have such an option (sort is for sorting, not removing duplicate lines.) Maybe because I'm more used to the "UNIX philosophy" instead of the GNU one?


I agree. Often when I want uniquing, I want counts as well, so I have to go through uniq -c anyway.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: