Hacker News new | past | comments | ask | show | jobs | submit login
Use long flags when scripting (thechangelog.com)
423 points by adamstac on Feb 4, 2013 | hide | past | favorite | 124 comments



While it is generally a good idea to write readable code, it is also true that long flags are not universally supported, and the short flags don't always have the same meanings (or even exist!) across different systems.

This is an interesting conundrum. As has been pointed out, POSIX specifies standard options (which are all short):

http://news.ycombinator.com/item?id=5165058

But these are not universally supported:

http://unixhelp.ed.ac.uk/CGI/man-cgi?ps

What to do? My proposal: do what we do in the C/C++ world when presented with portability problems: abstract it away. POSIX sh supports functions, which can be named appropriately for readability, while calling the proper arguments and commands for different platforms. To avoid unnecessary duplication of platform/argument detection, setup could be done in a function called before anything else to create variables with the appropriate command names and arguments.

This may all seem too involved, but realize that we are talking about creating robust software that is to be maintained, so applying proper software engineering principles seems appropriate.


A much simpler solution is to comment your code:

# Invoke curl in silent mode (-s), pipe the output to grep

# and use an extended regex (-E) to only show the resulting

# digits (-o: only print matching text, not the whole line):

curl -s checkip.dyndns.org | grep -Eo '[0-9\.]+'


The comment is needless noise that the reader must check for correctness and then determine whether it or the code is the intended behaviour when they differ. The same is unfortunately true of many comments.

BTW, you've probably a bug in your regexp because you like to needlessly and wrongly escape a dot without knowing the regexp syntax you're using. Assuming you don't intend

    $ grep -Eo '[\.]' <<<\\
    \
    $
that is. :-)

(curl's -s should be accompanied by -S IMHO, and it's brain-damaged in not having that behaviour under the one option.)


I don't like commenting the how and what; that's what the code is for. Comments are for "why".


How and what are good to include when describing the function, but not for specific statements of code.


I agree with ralph that these comments are annoying because I have to waste time reading them (what if they contain something important?) but they offer me nothing (I already know these flags).

If you still prefer putting these hints in comments, I would rephrase your comments in a way that they are easily identifiable as useless, so I can stop reading them right away. For example:

# -s flag: puts curl in silent mode.

# -E flag: puts grep in extended regex mode.

# -o flag: prints only matching text.

curl -s thingy.thing | grep -Eo '[0-9.]+'

An additional win: comments are slightly more future proof. When the comments inevitably become outdated, you'll now see:

# -E flag: puts grep in extendex regex mode.

curl -s thingy.thing | egrep -o '[0-9.]+'

which is better because the comment is now merely irrelevant instead of actively wrong.


Or even better:

  # Get my IP address by checking dyndns.org
  curl -s checkip.dyndns.org | grep -Eo '[0-9\.]+'
When reading the script, you want to know what it does, not what each character in it does.

Using long flags actually hinders that by adding clutter; you have to parse a much longer line to know what it does. Adding a functional commend and then using the clean short flags is IMHO a much better way.


Or, even better:

  # Get my IP address by checking ifconfig.me
  curl -s ifconfig.me
The less work you have to do, the better :)

NOTE: If you visit ifconfig.me with your browser you get an html page full of text, but with the curl useragent it just returns the ip-address


That also introduces a dependency on ifconfig.me.

If your script contains four or five such 'less work you have to do', I bet it breaks within a year.


Obviously that's a consideration, but this script has removed a dependency on checkip.dyndns.org. You could make the case that that's more likely to stay up, which is a valid decision.

Alternatively, you could create a "whereami" that returns the ip, and use that in your scripts. If ifconfig.me goes down for good, you have to change the url in one place. I'm sure you'll cope.


All of the above scripts had external dependencies.


I think you fail on two software engineering principles:

* http://en.wikipedia.org/wiki/KISS_principle

* you should adapt the way you think about programming to the language you are using. This is shell, not Ada.


Unfortunately, taken as a whole (eg, not just blithely ignoring POSIX, or systems that don't support it, or systems that don't support long options), portable shell is a big mess. Keeping it simple would be nice, but may not be possible, depending on your goals ("make everything as simple as possible, but no simpler").

So you want to write some script that will only ever run on systems with GNU utilities installed, and you never ever plan on porting it? Fine. Just do the world (and yourself) a favor and be aware of and admit that fact upfront. And if you're going to break POSIX compatibility anyway, it would be nice if you use the long options for better readability.


Sure, but what's wrong with his approach? POSIX sh supports all the things you need to do 'if GNU grep run this, else if BSD grep run that' - functions, case statements, conditionals. He's not talking about building up major libraries in shell, just wrapping the invocations in functions based on the environment you're in.


It really depends on what you're making of course, but for a large set of situations, isn't supporting recent bash on recent Linux/BSD good enough? This way, you got all modern Linux installations, OSX, and Windows through msys(git) covered.

I'm thinking of shared dotfiles, build scripts, deploy scripts. The article you link to mentions SunOS and OSF/1. How bad are the incompatibilities on modern POSIXes only? (note: I really do not know and wouldn't know how to find out)

To compare, how many people would write a web page that supports IE5? Some would, but would you generally advice methods to make IE5-compatible pages in 2013?

Maybe we need a http://caniuse.com for Unixes.


I'll admit, I haven't had to touch OSF/1 in a long time, and I really would prefer to avoid it, but it's not always possible. Even modern systems aren't entirely consistent. As a real world example, I like to setup persistent SSH sessions on my laptop at home and personal smartphone. autossh is a nice solution to this, but it isn't available on my smartphone. To create one command across all environments that will create a persistent SSH connection, I do some OS and hostname detection to determine what I've got to work with, then create an appropriate function (either using autossh, or a more crude version using while). Then my dotfiles work across all platforms and I can keep them synchronized with git. Something like:

  if [ -n `command -v autossh` ]; then
      function ssh_persist {
          autossh -C -o "CompressionLevel 9" $@
      }
  else
      function ssh_persist {
          while true; do
              ssh -C -o "CompressionLevel 9" $@
              sleep 60
          done
      }
  fi
Edit: well, as long as I'm doing this much editing, I might as well make it better. To be sure, this isn't exactly what I actually run :) But it's a fair approximation and it gets my point across.


Make sure you have GNU versions installed and in your PATH. Write to those.


# define all the 'magic letters here' (perhaps some if statements for various platforms):

CURL_FLAG_SILENT=s

# Then call them here:

curl -$CURL_FLAG_SILENT ifconfig.me


This is very bad advice. I won't thank you for making me check GNU grep's --ignore-case is precisely the same as the very well-known, POSIX'd, -i. And ditto for all the other verbiage that clutter and obfuscate the script's intent.

Short options should be used where they're the more typically known and standardised. Long options are for the unusual.

    grep -iv
    sed -n
    tr -dc
    awk -f
    ls -tr
    comm -23
    tail -n


I'm a 4th year student in a well-respected CS program, I have used linux off and on for over three years now, and I would consider myself perfectly capable of maintaining a shell script using any of the utilities you mention. However, without checking the man pages I could not tell you the behavior of the -n flag for sed, the -dc flags for tr, the -f flag for awk, or the -tr flags for ls, and I only have a guess as to what comm -23 does.

This is not the bread and butter of programming, where everyone worth a damn should be able to decipher "int i = 10;" even if it's not their brand of syntax. There are people who are perfectly capable of understanding and maintaining a piece of software that uses "tr --delete --complement" but would be dumbfounded when confronted with "tr -dc" until they looked it up.


If you aren't sure what the "-<x>" flag does for a particular command, but you are a competent Unix user, you should be fine with the man page or 10 seconds of Google. If you aren't used to Unix, then I think it's dangerous to expect the verbose flags to explain things simply to you. For example, what would --mmap or --null-data mean to you in the context of the grep command? Are they immediately obvious?


> This is not the bread and butter of programming...

This is, however, the bread and butter of shell programming.

> ... "tr --delete --complement"...

And, likewise, I'm not sure of the meaning of --delete and --complement, but I do understand "tr -dc". I'd have to look up --delete and --complement to work out what they're the equivalent of.


These days I think it's clear that the dominant use case for shell scripts is not a full time responsibility but rather edge work for deploying/installing code written in other languages. Most people looking at shell code aren't going to be shell masters, they're going to be engineers of more modern things that have to look at the shell script for some reason.


> ...they're going to be engineers of more modern things...

Have you read TAOUP? The entire Unix philosophy hinges around the shell, since the shell is the tool that lets you glue different pieces together. The shell is not outdated.

If you live entirely within one language ecosystem, then you may be tempted to consider the shell an afterthought, and make all services available via API calls instead. But this is inefficient, slows down development, reduces flexibility and limits the capabilities of your programs. This fallacy is old and certainly is not "modern". See TAOUP for details; it explains this far better than I ever could.


But a lot of the defaults in the bourne shell and various posix utilities are wrong. One example: space as a field separator is clearly the wrong default, since it's nearly impossible to not find file-names with spaces in them these days.


"This is, however, the bread and butter of shell programming."

This is really old-fashioned thinking. The people who have been in this industry for 10+ years all grew up without computers - and thus had to memorize all of the flags, obscure shell commands, and weird regular expression. What was taught in school back then was to do things perfectly the first time, because CPU cycles were expensive, and bugs were time consuming.

The generation that graduated from college about 2 years ago is the first one that had universal access to a search engine since they knew how to use a computer. This generation (and all future ones for the rest of time) sees memorizing things like this as more of a waste of mental space. What is taught in Universities now is problem solving, research, and larger concepts. There is very little straight memorization, and almost zero programming by hand/whiteboard programming. No current college grad will be able to write a shell script without reference material, but every college grab should be able to write any simple script in less than 5 minutes with Google.


You're going to have a hard time making it through Hemingway if you have to reach for a dictionary every time you come across a word like "middleweight" or "khaki", even if your dictionary is online and answers your query in less than a second. You need to memorize the meanings of a vocabulary of words in order to read fluently or write well. That's not "old-fashioned thinking". It's just common sense. And it's just as true of shell scripts as it is of electrical engineering, mechanical engineering, poetry, or math.


While both the above words are known to me, I do like that when reading a book in Kindle on my iPad, I can tap a word, and have its definition show.


They probably think they invented sex too. ;-)

Memorisation isn't the issue. Knowing the tools available and their options is. That comes with study. Unix command line is a boon for problem solving and experimentation, just go and read Jon Bentley's _Programming Pearls_. One can't Google for what one doesn't realise can be done.


Is that a signed integer or unsigned? How many bits? What's the lifetime of i? One needs to know more than is inferred by a few characters. Seeing the long options just makes you think you understand what it's doing without knowing what it does. The latter comes from study; there's no short-cut. Fortunately, Unix is worthy of study unlike for example the Windows API.


I can't help but agree with this sentiment. Generally, if the operator doesn't understand what a command line option does and isn't willing to read the man page, they're open to a whole slew of unpleasant surprises (although I think this applies to many things, not just userland utilities). Long options simply provide a false sense of security and may not always do what someone thinks they're supposed to do. Worse, they may not always be available as others have pointed out insofar as the BSD flavors of specific tools provide.

I like what you said about "study." There truly is no shortcut for reading documentation available with the system, and assuming by inference what a command is supposed to do based on its options when neither the command nor options are understood just seems to be horribly, horribly bad practice in my mind.


Unless I am doing code review, I don't read code with the intent to find latent bugs or insecure idioms. I assume the programmer who wrote the code was competent, and wrote a correctly functioning program. (If I cannot assume this, then why am I using the code at all?)

If I am doing code review, then I have the man pages open anyways. If I am trying to write bug-free code myself, then I have the man pages open anyways. In these cases it doesn't much matter what form of flag I use. But the most common case of reading code is a brief scan trying to grok what the code does, which shouldn't require frequent reference to man-pages in much the same way that reading a novel shouldn't require frequent reference to a dictionary.

I understand your concerns about false sense of security and unexpected behavior in corner cases. I've browsed the IOCCC, I've browsed the CVE database, I know how easy it is to hide nasty behavior in unexpected corner case interactions, and that the only defense against them is vigilant attention to documented behavior.

To my mind the most common interactions with code, in decreasing order, are as follows:

1. Executing it. (Flag agnostic, portability issues aside.)

2. Reading it. (Descriptive flags >>> Cryptic flags.)

3. Maintaining it. (Standard flags >> Obscure flags.)

4. Writing it. (Short flags > Verbose flags.)

So I would say the most important thing is that flags are descriptive of their behavior. The second most important thing is that they are common and standardized, but this isn't as important as descriptiveness -- a less common flag that describes its operation better wins. The least important thing is brevity, it only matters once.


> then why am I using the code at all

You're using it because it comes with the job that's paying you and you want to see what it does and form an opinion on its quality. Much crap code is procured and produced by companies.

“…clarity is often achieved through brevity” — Kernighan and Pike, _The Practice of Programming_. It's an excellent book, I recommend every programmer should have read it. http://amazon.com/exec/obidos/ASIN/020161586X/mqq-20 http://cm.bell-labs.com/cm/cs/tpop/


The difference is that with long flags, even someone without 10 years of POSIX experience knows what's going on. The long example on the OP's article makes a lot of sense to me, even though I'm primarily a Windows user.

At the same time, for instance, I've no idea what the -tr flag to ls does, even though I use ls often enough.


You are not a Unix user. You don't know Unix's commands and their common options. If the article is aimed at you and people like you that read your scripts then it's fine but it's akin to commenting "dir /w" in a DOS batch file to explain its purpose. Unix users should not following the article's advice, they should be embracing Unix's style and ethos; it's part of what's made it such a success.

BTW, a month of using Unix should have a Unix user knowing what ls's -t and -r flags do, along with -a and -l they're some of its most commonly used.


You have a point about the -t and -r flags, but,

> Unix users should not following the article's advice, they should be embracing Unix's style and ethos; it's part of what's made it such a success.

I always wonder why this is taken so religiously? You're not the first person to write a comment like this, so please don't take it too personally. But is it not possible that the Unix style and ethos is mostly great, but there's a few things here and there that could've been better? You seem to imply that a culture that prefers short options is one of the reasons that Unix became a success. I'm not so sure.

In fact, if `dir /w` had a long version of the option, i would've indeed used it. I think it's a shame that most native Windows commands, unlike their Unix counterparts, don't have long versions of switches. But I doubt that that's contributed much to Windows' success.


Unix is in no way perfect, for that look to Plan 9. ;-) But just as one should try and understand the idioms and style of a programming language rather than write Pascal in it regardless, so one should try and embrace Unix then suffer and learn from its warts rather than write VMS DCL in it.

And yes, brevity is pretty fundamental to its early culture. Unix was created by mathematicians and scientists that were using it every work day to get stuff down and they recognised and extolled in their writings the power of notation. Imagine maths without superscripts, Σ, etc. So it is with Unix. More wordy notations existed in other OSes at the time, and more noisy commands, e.g. VMS would tell you that "dir/size/owner/prot foo." completed normally, even though that's hopefully the norm. It gets very annoying. :-)


Unix was developed on a system with a 110 baud teletype. On such a system, brevity was the path to sanity (seriously, imagine programming on a manual typewriter that only supported 10 characters per second).

VMS was written later, with a faster interface. Also, you didn't need to specify the entire long option, you only needed to type enough to disambiguate between "/all" and "/almostall" (for instance).


The constraints of the time shouldn't be confused with their recognition of the power of notation.

Also, based on the poor advice of this thread's subject, surely one should always write /almostall in one's DCL script for clarity and to lessen the chance of ambiguity as the command evolves? grep's -i is only ever going to mean -i.

VMS died, DCL was often ridiculed, its over-verbose style was a part of that. Its lack of power played a larger part.


I'm sorry, but this is terribly pedantic. The world isn't divided between deep experts of a technology and everyone else. Case in point: me, a moderately tech-savvy person. My main computer is a Mac and I'm a casual user of bash. I struggle to get things done every time I have to do something more complicated than "show me the files of this folder, even the hidden ones, even in the subfolders, then dump them in a .txt". The obscurity of flags don't help.


I completely agree. I use various flavors of Unix every day at work, and I can only identify half of the flags mentioned at the root comment.

This reminds me of the whole "replace" debacle [1] a couple weeks back, where the power nerds jumped all someone who wrote a bit of code to simplify common tasks. It's hard to understand how some people think flexing their arcane knowledge in everyone's face makes them look good. They trash efforts to make software more accessible and maintainable so they can pretend to be king of a tiny hill.

[1] http://news.ycombinator.com/item?id=5106767


I'm all for being nice, but I think you are missing the point. It's not about flexing knowledge for ego's sake, that's entirely beside the point.

No, the source of Unix nerd's retort is the realization that a small set of simple, standardized and very sharp tools performs better over time than an ever-increasing set of intuitive tools to suit a narrow use case. This is not to disparage anyone for writing something like that replace project. You certainly don't deserve scorn for that, and if that tool finds a healthy place in your utility belt then all the better.

However if you do a lot of work with text files on unix then I think inevitably you will reach a tipping point where it is in fact easier to learn and remember a limited set of commands and flags than it is to remember the ideal mountain of simpler commands that do all the things you regularly do in a more beginner-friendly fashion. If you want to argue against "arcane" unix tools being better you have to confront this argument head-on without assuming the proponent is an insecure, anti-social neckbeard out to prove his own superiority.


> a small set of simple, standardized and very sharp tools

This may have been true(r) twenty or thirty years ago, but in a contemporary Unix installation, it is not true at all.

The set of tools is often NOT small (there are thousands of them); they are NOT simple (two examples: 1. the ls man page and the huge number of options it takes; 2. shell quoting rules); they are NOT standardized (I regularly run into incompatibilities between BSD-derived and GNU tools, for example, the -E option for grep); and they are NOT sharp.

In summary, the whole "small tools that one thing only, but that do that one thing well" thing is a stupid, outdated mantra that may have made more sense many years ago, but not anymore. For some reason, people keep on blindly believing it even though it bears no relation to the reality of modern Unix programming.


I would respectfully disagree here. I can't remember, and won't bother to look up, how to recursively use cp / mv / scp / rsync / whatever else... but I'd bet it's -r, or -R. I'm not positive how to tell a shell program "I REALLY MEAN IT!", but I''d guess it's -f (unless it's kill, and I'm not going to sympathize with gripers over that one odd case :) ). There really is an (admittedly idiomatic, but still present) intuitive nature to these things. There are loads of counterexamples, for sure. But still - say I want to use extended Regexp in grep, or sed. I bet it's "-e" or "-E", right? Plus you can always search the man-page, without bothering to read it all.

Reading man-pages does do a lot for you. And in my experience, I don't spend a lot of time while on the same engagement/project/whatever writing shell code across BSD/solaris/GNU-linux/etc.

And honestly, if you can write a sophisticated sed substitution for BSD without reading the man page, is there really THAT big of a barrier lying between you and writing the same thing in GNU/Linux, or non-POSIX-compliant sed, or whatever else? You read the man page for 10 seconds and find the flag that means what you want to use. If it's different, you explore.

This is totally not a rant at you, and I sincerely hope you don't take it that way - I just haven't found myself utterly foiled by the differences between say, RHEL v SUSE v BSD recently. There are differences, but hell - there are differences between versions of languages, between terminal emulators (scripts that work in bash but not in zsh and vice versa), etc.


Sure unix has grown more complex over time, but what's the alternative? Start from scratch with simple commands? By the time you approach a system of equal power you'll have hundreds of thousands of commands instead of just thousands. For all its warts, it's hard to build a better shell than what's available in unix.


Sure; I'll elaborate. First, I am arguing against the unapologetic championing of arcane knowledge, not the tools. I agree that awk/sed/grep/etc are powerful, and knowing how and when to use them is highly useful.

I take issue with the attitude that the way these tools are used, with their billion cryptic parameters, is some flawless pinnacle of achievement.

Some people have put a lot of effort into memorizing Unix switches already--but the difficulty doesn't mean that work was meaningful. Unfortunately, to justify their sunk cost, a subset of those individuals tout their knowledge as something by which the rest of us should be impressed. In fact, this comment thread is littered with disdain for peers who dare to tread on their domain with "human-readable names" and "documentation".

In the root article, the author simply suggests that it is more maintainable to use full names for arguments because the code is then self-documenting. It's like using completeVariableNames instead of sht_y1s. Actual comments would suffice, but most code (unfortunately) isn't commented or the comments are done badly.

If a programmer is solo, whatever. Go nuts. Use shortcuts; don't comment. Hard-code paths in your scripts and write 600-line functions. A coder only has herself to blame for the problems she caused.

However, when programming with a group there will always be coworkers that can start being productive sooner (without interrupting you) if things are named and described in plain English and do exactly what it looks like they do.

Script switches are not a remotely interesting problem. Which style you use is a meaningless debate--until you factor in time. Time to debug. Time to look up documentation. Time to memorize. Time is a truly scarce resource, and there exists a better way than rote memorization: making code work in an obvious way. Actively choosing not to employ this idea steals time from others and those of us that lose minutes so a shortcutting programmer can save seconds of typing don't appreciate it.

That said, it's up to one's discretion about what "obvious" means. But nobody should try to argue that making things more readable is inherently bad.


Then you don't use Unix, you merely enter a few canned commands to perform a limited set of actions. You are not making it work for you.


Well, but maybe that's the point of the original post - it is talking about maintenance of shell scripts; so that implies the following:

1) It is not a command that someone "uses", it's a script written by a programmer to be run for a business task or an end user quite likely may not "use Unix" (by your definition) ever.

2) The script goal most likely is not to make Unix work for you - it was originally written to make Unix do task X for you; but the goal/question in the OP is how to make that program more maintainable. Maintainability is an important goal, and to achieve that it's definitely acceptable to mutilate the way you usually do one-off tasks manualy.

3) The shell script is written for non-shell-script users - it's quite likely some glue for a system where 99.9% is in other languages, and the maintenance guys will specialize in those languages, and may not "use Unix" in their daily tasks at all. It's very common and reasonable to write all your code on a Win or Mac computer and then have it deployed to some linux server; and most companies currently do have separate ops teams (which would "use Unix" every day) and developer teams, which might "use Unix" once a month or less, even if the end binaries run on Unix.


The OP says to use long-options at all times when not at the command prompt. All this other stuff is a scenario you're placing around it to try and justify it.


Is Randall Munroe a Unix user after 15 years of using it?

http://xkcd.com/1168/


That's a comic. He sells a shirt on his site's store that is filled with Unix tips, including "tar -xf # extract anything". I'm not sure this is a legitimate criticism.


Unix is an operation system, not a religion. I would argue that the pragmatism and adaptability contributed more to its success than its style.


There's something to be said for the automatic recognition that something like "tar zxvf foo.tar.gz" has, versus the verbose "tar --gzip --extract --verbose --file foo.tar.gz." It feels terribly ponderous to do the latter. That said, when I see a command I don't recognize immediately ('sed -n' comes to mind for some reason) I would benefit from seeing the long option. The other side of the coin is that once I go look up 'sed -n' I'll probably remember it for next time.



I always remember this as "eXtract Ze Vucking File".

(You have to say it in a bad german accent.)


Aside,

    sed -n /re/p
is grep. sed's default is to print every line at the end of the script, -n says not to. Larry Wall rightly had perl(1) inherit -n along with perl's -p.


We had a very similar discussion over the weekend at http://news.ycombinator.com/item?id=5157215 but with the domain expertise being with non-programmers.

My opinion remains the same as it was there. If you've already gone through the effort of learning the short options, there is no reason not to continue amortizing that effort by making use of them. On the other hand if you have not really learned the short options, it is only worthwhile doing so if you plan to use them enough in the future that it will pay off.

Of course if you are writing code that you expect to be maintained by someone else, aim it at the fluency that you expect to be able to demand at them. Aiming it lower than that will just make things more painful for you. Aiming it higher will make things more painful for them (and the vast majority of the cost of software development is in maintenance, so they matter more than you do in the long run).


And at least a few of these (grep -iv, sed -n, ls -tr) aren't even unusual. If you ask what -iv means in grep, I assume you have used grep < 10 times ever.


Agreed! It saves a trip to the man page.

Something else I figured out recently is that if you put the pipe at the end of a line, you don't need a backslash:

    echo thing |
    grep thing


I have seen this working, but didn't understand why... thanks!


I'd probably indent the grep in that case, though, to make things clearer.

My own PowerShell scripts often take the form of

    Get-Foo |
      Where-Object { $_.blah } |
      ForEach-Object { ... }
etc., especially with longer pipelines.


Also works with the && and || operators.


No. Use standard POSIX flags to invoke standard POSIX functionality, so your scripts don't break when run on a different system.


It's always a cost/benefits analysis. I think for a lot of us, the cost/benefit goes something like this:

"Will I ever have to read and modify this script again? Effectively-100% likely. Will this script ever run anywhere that I don't have the GNU toolchain? Effectively-0% likely."

It's a no-brainer at that point.

If you can't say that second part honestly, reconsider, but a lot of us can.


And now porting your script to non GNU such as BSD becomes a pain in the behind ... whereas before maybe only one or two commands would have had to change, now it will be most if not all.


When I mentioned "GNU", I think it sort of implied that I am aware of the existence of non-GNU toolchains.

I really am not worried about that in my world. The dependencies on Linux go a great deal deeper than the GNU toolchain. YMMV.


> Will this script ever run anywhere that I don't have the GNU toolchain? Effectively-0% likely.

Not really. A lot of stuff broke when /bin/sh moved to a POSIX-compatibile shell rather than Bash.


No a lot of stuff broke because people depended on implicit rather than being explicit.

    #!/bin/bash


  wilya@home $ /bin/bash
  zsh: no such file or directory: /bin/bash
  wilya@home $ which bash
  /usr/local/bin/bash
(Yes, I'm annoyed when I see #!/bin/bash, especially on scripts which are otherwise basic enough to be portable everywhere)


It is still better to write #!/bin/bash for bashism using scripts than #!/bin/sh. I recently had to run sed -i (well, another GNUism) 1s%/sh%/bash% on bunch of customer's scripts to make them work on debian. At least when script wants /bin/bash it is going to fail cleanly (and not in the middle after modifying random things) and with mostly meaningful error.


It's probably safer to use:

    #!/usr/bin/env bash
Assuming env is installed (I believe I had to install a package on OpenBSD to use it) this will find the first instance of bash in the PATH.


How do you know env is in /usr/bin rather than in /bin or /usr/local/bin? I'm just curious...


You don't, but I've run into more problems with bash being in unexpected places than env.


Right on, thanks.


Good question. I've used systems, e.g. AIX, where env(1) was in /bin.


>Will this script ever run anywhere that I don't have the GNU toolchain? Effectively-0% likely.

That's not the case for most people though. I can't even remember the last time I saw even a tiny project assume the world is all "whatever gnu/linux distro I happen to use". Tons of people use linux distros that aren't ubuntu, tons of people use OSX, tons of people use a BSD.


Okay. Care to point us all to the official standard POSIX flags documentation? Even then, you are probably going to find that there are just as many systems that your script needs to run on that are POSIX incompatible as there are systems that don't recognize the long flags.

My ultimate rule: write the code that is the most readable, because in 10, 15, 30 years, someone can easily update a script to working arguments from arguments with long words, whereas single letter arguments may have multiple different meanings on different systems. It's the same with naming conventions and code constructs: the how and the what should be obvious from the code.


For documentation, this will do: http://pubs.opengroup.org/onlinepubs/009695399/utilities/con...

What current systems have arguments that conflict with those?


For documentation, this will do: http://pubs.opengroup.org/onlinepubs/009695399/utilities/con....

Thank you for that. I was aware of the POSIX docs for C/system level programming, but I wasn't sure if command arguments were standardized.

What current systems have arguments that conflict with those?

Ah, but "current" is not always what we get. It's been a while, but I can vaguely remember conflicting single character arguments for basic commands (eg, ls, ps) on OSF/1, SunOS and Linux.


I think, like most things, this would depend on who your audience is. For internal use only where you're running under a consistent environment at all times and aren't worried about portability, it probably makes sense to go with the more explicit long options.

For something you're planning to deploy to customers, it probably makes sense to play it pretty strict with POSIX.

Likewise, I usually script straight sh for portability if it's going to a customer unless there's a compelling reason to use bash. I'll mix in bash in stuff I write for myself where it makes my life simpler.


As a part-time scripter (is there any other kind?) I'm ashamed to say that I often use shortcut flags because I looked up the functionality I needed and copy pasted without taking the time to even understand what all of the flags mean.

This advice is good not only for others but setting a standard for yourself where you take a moment to learn what exactly is going on.


Like everything else it depends. "rm -rf" is a lot easier to read than "--recursive --force" simply because every likely maintainer will (1) already know the short form and (2) will have no idea the long form options even exist. Likewise arguments to tar are probably best left in traditional form, etc...

Don't rock the boat just to adhere to silly style rules. The goal is readability. You're probably already the best judge of what is most readable, so trust your intuition. If you had to look it up, then spell it out. If not, relax and do it the sane way.


I think the correct answer is do both.


Many utilities have long-form names in GNU flavors but not in BSD flavors, so it would seem that using short form is best. However, there are some tools that are not the same (where the bsd flag differs) ...


Disagree. There isn't and shouldn't be any hard fast rule for this. Same as most things in programming. Do what makes sense.

Also, [0-9.] not [0-9\.]


IMHO, if your intent is to capture a literal dot then you should escape it even if, based on the context, you don't have to.


No, that's wrong, a reader that knows his onions, e.g. me, will be puzzled as to what the writer's intent was and spend time investigating if there is an error before deciding the writer needs to spend more time studying his onions.


Agreed. Regexes are already confusing. Adding unnecessary escaping just makes them worse. And if you have [\.] was the intent to escape the dot or match a slash? It's confusing.


I think the goal should be clarity, not conservation of characters. And I think the escaped dot is clearer (the author's intent is obviously to include a dot in the range).

I do think it's funny that you said "There isn't and shouldn't be any hard fast rule for this" and then one comment later said my approach is "wrong" :) (EDIT: oops, that wasn't you -- Sorry!)


> you said "There isn't and shouldn't be any hard fast rule for this"

No I did not. You're confused as that was someone else. :-)

Do not needlessly escape. Only escape what that flavour of syntax needs you to do. What you may think is a needless escape may actually give different behaviour.

    $ sed 's/[x\.]/y/g' <<<'x.\z'
    yyyz


It's only clear for people who don't really understand regexes. For people with knowledge of them, it's more ambiguous as to whether a slash is being matched or used to escape. It reminds me of commenting the end of your for loops, } //end for ... it's clear for people who come from Python and don't know Java but that's not who we should be helping.


Are you writing this for yourself or to share? If the latter, optimize readability for people who don't understand regexes.

As for commenting the end of loops - that too just improves readability, especially in long functions. If your editor doesn't show invisible characters, it can be easy to lose track if some indent is part of the 'i' loop or the 'j' loop, for example (yes, that can indicate a bigger problem, but that's not the point. I'm talking about real-world code, not idealistic academic nonsense)


Why? So my grandma can read my code? That's not the way to enlightenment if you program in a professional setting.


It seems that statement wasn't a hard and fast rule. ba dum tish I think everyone should do what they want, including telling others what to do, hypocritically.

I prefer not to escape dots in char classes, because escapes make it less readable for me. I know this involves a more complex rule, for where one need not escape, but somehow my mind easily treats character classes as a special case region.


It's not necessarily your choice. [\.] in sed is a two-character character class.


I agree with this. Regexps in general encourage a bit of "if unsure, escape".


Then you could be introducing errors into your regexps. \x doesn't mean a plain x instead of the metacharacter x in all cases. Don't ship code you don't at least think you understand.


"After 40 Terabytes, your fingers start to hurt."

--David S. Miller


This is a practice I use; in my opinion it is akin to descriptive variable names.


I write a lot of shell scripts as part of my daily Sys Admin work. In my opinion, the author gave us choice to write less codes & readable code by using short flags for most of the command functionalities.

Consider something like this: $ rsync -qaogtHr example.com:/opt/data/ /home/backup/

1. Elaborating all the short flags into long ones will increase readability by explaining this statement but at the cost of more lines of code.

2. All UNIX commands have options to combine short flags. For eg: ls -l -t -r ./ can be written as: ls -ltr ./ This also helps in reducing code. Writing less code helps in managing it more easily.

Long options may help beginners, but once they get comfortable, Short flags may seem more readable!!


For well known flags like grep's -E, this makes it less obvious what the script is doing. Not everyone speaks English as their first language and most of the long options are GNU specific so won't be portable.


A secondary benefit not mentioned in the article is immutability.

Long flags on a command line utility should never change. Short flags may be modified between major versions of a utility, which would break scripts.

Of course this is just a convention, and I'm sure it's not followed 100%. I'll take whatever protection I can get though.


Neither long flags or short flags should change meaning. I'm not sure where you got the idea that short flags do.


You now, I'm really not sure where I picked that up. It's always been my understanding that it was acceptable to change the short flags on a utility as part of a new major version. Not that you should change 'em capriciously, but that it wasn't the end of the world if you did.

Of course, I can't find any references to back me up on that, so you may very well be correct.


"GNU grep 2.12 changes behavior of recursion options, breaks existing scripts": http://news.ycombinator.com/item?id=4295681


Ironically, tab-completion works on the command-line for long options, but not in a script (I'm sure vim can do it though).


C-x C-e opens an editor for the command-line, so after you've tab completed it it's easy to save.


Which is entirely dependent on how the shell is configured (not all shells have tab-completion for option flags, let alone path or command completion). I'm sure Vim or Emacs could do tab-completion for script flags, but you'd be in the same boat in the beginning: Someone has to script it first.


I recently found myself going back and changing a script I had done to use the long form flags because I had forgotten what the short ones meant.


This is very good advice. You may not be maintaining your script later on, maybe someone who is new to unix will be asked to go change it. Having the intention much more obvious will save them some hassle. Hell, even someone who uses unix a lot may not know every flag of every command you use. With no other context, it's pretty hard to know if -r is recursive or reverse sort, or something else entirely.

And god forbid you think you know which one it is and end up being wrong... you might not realize until it hits production (scripts not always being the best tested things in the world).

Yes, they might not be supported on BSD.. they're also not supported on Windows, what's your point? It's a lot more likely you'll have to read and/or change the script 6 months down the line when you've forgotten everything about what's in it, then you'll all of a sudden need to run your script on BSD when you've never had to in the past (obviously if portability is a requirement from te beginning, then that changes how you write the script from the start).


Add auto-complete to the command line and everyone will start using long form without the need to beg. Imperfectly remembering the flag spelling is a rather large annoyance, short flags avoid this problem at the cost of crypticism, autocomplete actually solves it.


First and foremost, the article talks about flags in files, where your IDE of choice can probably autocomplete them (see http://emacswiki.org/emacs/DynamicAbbreviations for an example). Second, even on the commandline, bash_completion (http://www.gnu.org/software/bash/manual/html_node/Programmab...) has been available for a while now.


This is silly. I can type grep's "-iv" before your auto-complete has enough info entered to complete the first option alone; "--i<Tab>" would beep, needing a "g" to continue.


Or, there should be an automated tool that takes scripts with short flags and displays them with automatically replaced equivalent long flags for readability.

Not everything has to be static text, one format does not fit all situations.


At least for scripts that are on github, it seems like this could be potentially tackled by creating a robot that submits pull requests for such scripts, changing the options passed from short to long. For people fine with the change, they could just merge the PR, and for those that don't like it, people still have a long-option version of the script they can check out in the PR.

I've not written a github robot, though, this is just based on my vague understanding from the robots already out there (whitespace, .gitignore, etc), so please correct me if this isn't actually feasible. :)


I think the short flags are more useful when using directly the command (outside of a script). It is faster to type man grep in a window and to type grep -Eo than to type grep --extended-regexp --only-matching. At first, I even didn't realize that --extended-regexp was the same as the usual -E flag. I had to check with man.

Theoretically, I would like to agree that long flags are better than short ones in scripts. In practice, I prefer grep -E. And I can not imagine a tar cvfhz with long flags.


I too have been using this practice for years now, unfortunately most coworkers don't seem to appreciate the genius of it.


Time to update some of my scripts.


My age probably shows in this statement, but it seems odd to me to suggest that someone else change their behaviour so the poster does not have to rtfm. We're coders. When we don't understand something, we read the manual, then the source.


I accept --silent, but almost everybody knows what -E to grep means. Normally I would just use egrep rather than grep -E however.


heh.. I have never thought of that. I basically script exactly how I construct commands in the cmd line, but the author is correct looking at my past bash scripts it would be better to use long options, good stuff.


Amen!


if you can't read the short flags, you shouldn't be reading my code.


there's always a "man" page (and Google) to look it up, stupid.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: