Filenames and Pathnames in Shell: How to Do It Correctly (2010)

Adaptive · on Feb 9, 2015

One of the reasons I like zsh is its copious expansion flags. If you aren't wedded to POSIX or bash syntax, consider zsh for this reason alone.

In the case of the first example he shows, the cat -n problem, you can do the following in zsh to expand the results of the wildcard globbing automatically:

    cat *(:A)

This will give cat full paths. Not significantly different from the PWD referenced with ./* in his example, but more universal in applicability, particularly when you start to use things like the file type filters:

    cat *(.:A)

giving you only files, not directories, and also expanding to absolute paths, while

    cat **/*(.:A)

does the same for plain files in the working directory and all subdirectories as well.

Remember to test your patterns with a print statement first:

    print -l **/*(.:A)

before passing them to a command.

sjolsen · on Feb 10, 2015

>Remember to test your patterns with a print statement first

ZSH has an option to expand globs onto the command line the first time you hit return, and issue the command the second. I don't remember what it is.

Adaptive · on Feb 10, 2015

By default zsh will expand globs if you hit tab on the command line.

However if you have more than a handful of matches it's going to be a suboptimal method of eyeballing the glob results.

I prefer to use `print -l` as it will break the glob results on separate lines.

An additional benefit of ZSH is that it will normally return each glob result as an independent item (this avoids the problem bash has with just returning strings that in turn are broken on IFS characters). Using that flag with print will give you a clear sense of the actual results.

Apofis · on Feb 10, 2015

What about fish?

white-flame · on Feb 9, 2015

The original strength of Unix also ends up being such a commonly frustrating feature: Everything is marshalled through strings.

With human-manageable strings comes ambiguities, especially in concatenative situations like commandline expansion and SQL injection susceptible code.

There's really no good universal solution. Judiciously adding explicit boilerplate as the article describes, or using less open-ended syntax which ends up adding common syntactic overhead as well, are both more painful to the user in common cases.

sjolsen · on Feb 10, 2015

The solution is actually pretty straightforward: don't do everything with text. The actual content you want to manipulate, sure, do that with text. But there's no good reason to structure with syntax except that that solution is compatible with even the most arcane computer interfaces.

How you replace syntax-based structuring is the hard part, but it's not impossible.

sukilot · on Feb 10, 2015

What can you structure with besides syntax? Even data structures in memory have a (binary) syntax

sjolsen · on Feb 10, 2015

>What can you structure with besides syntax?

At the interface level, nested navigable fields.

>Even data structures in memory have a (binary) syntax

No, they don't. Syntax is the expression of structure through the arrangement of the contents of a single sequence. Data structures as they are typically realized express structure through the relationships between several sequences.

rikkus · on Feb 9, 2015

Or use something like 'pash', a powershell work-alike for UNIX-likes.

Scaevolus · on Feb 9, 2015

Shellcheck will find a broad variety of unsafe shell operations, including most (all?) of the issues on this page: http://www.shellcheck.net/

deathanatos · on Feb 10, 2015

Today I learned globbing happens after word-splitting.

Can someone explain the following:

   for file in ./* ; do        # Prefix with "./*", NEVER begin with bare "*"
     if [ -e "$file" ] ; then  # Make sure it isn't an empty match

1. Why prefix with a "./" ? Is that just to help avoid the `cat $filename` scenario? (i.e., that $filename will be "./-n" instead of "-n", and that

  cat -- *

is perfectly valid?)

2. What's the -e check for? It says "an empty match" — -e means that the file exists, but * would only return files that exist, so -e must (with some caveats) be true. (The caveat being that there's a race condition between the globbing and the test, but with the added test, there's _still_ a race condition between the glob, the test, and the command execution. Are we just attempting to minimize the amount of race-condition by testing?)

rjgray · on Feb 10, 2015

1. That's my understanding from the article.

2. I think this is for the situation where the glob doesn't match, and the nullglob shell option is not set. Without that option, a non-matching glob is processed as a regular word. e.g. In an empty directory:

  $ for file in ./*; do echo $file; done
  ./*

Note the glob pattern is printed by the echo statement. The -e test catches this condition.

bch · on Feb 10, 2015

Note also that some[1] commands will have a "--" (dash dash) flag indicating "end of flags", so (eg): "cat -- -n" really would cat a file called "-n"[2].

[1] on my BSD system, many internal Tcl commands honor this convention. Damned if I can find a section 1 shell command that uses the convention, but I'm sure I've seen them.

[2] my version of cat doesn't have a -- flag, so my example is contrived; not sure if GNU cat differs.

EDIT: typo, perl(1) supports "--". See perlrun(1) for details.

swatow · on Feb 10, 2015

First step for doing things correctly:

  import shutil

artmageddon · on Feb 9, 2015

I'd love to know if there's a Windows equivalent to this.

useerup · on Feb 9, 2015

PowerShell is strongly typed, parameter parsing is done by the shell - not the command itself. Powershell wildcard ("glob") expansion is performed by the command.

This alleviates the problems described in the article.

    cat * > ../collection # works is PS

    cat $file # works even if $file contains unusual characters. If $file is an *array* it "cats" multiple paths. If wildcards are not expected cat -lit $file should be used (literal path).

est · on Feb 10, 2015

Windows's parameter switch is clever. It use slash, like

dir /s

Since / can not be in filename, so it avoids the problem compeletely.

vacri · on Feb 10, 2015

I couldn't find a link, but a few years ago there was a problem with a virus checker and a game that (errantly) triggered it (I read this in the game's support pages). It turns out that the given virus checker would quarantine executables to a holding file called "c:\program". This game's launcher was quarantined by the virus checker to that location.

So, it turns out that when Windows wanted to launch things, it would find the first exe it could, then apply the rest of the command as args. "c:\program" comes before "c:\program files\", so every time a user went to launch a program, windows would find the "c:\program" exe first, and apply the rest of the string as args (" files/and/rest/of/string"). So the launcher would fire up, and it ignored the args. For some reason I can't recall, Windows kept looking for the right program and eventually it would launch as well.

So the end-user, on trying to run any application, would get that application plus the game's launcher, all because of the crazy way Windows searches it's path... well, when combined with a crazy virus checker behaviour.

Unfortunately I can't recall the checker or the game, sorry.

gear54rus · on Feb 10, 2015

That's a scenario more common than it should be actually:

http://www.commonexploits.com/unquoted-service-paths/

There's even a hint of privilege escalation there (but not always: writing to C:\ still requires root in most cases).

vacri · on Feb 10, 2015

A much better explanation, thank you.