Hacker News new | past | comments | ask | show | jobs | submit login
Filenames and pathnames in shell: How to do it correctly (2020) (dwheeler.com)
219 points by ingve on June 19, 2023 | hide | past | favorite | 129 comments



This is an excellent resource. I'd like to re-iterate the author's recommendation to use shellcheck:

https://www.shellcheck.net/

I also like using shfmt on top of that.

(I run them via pre-commit[1] hooks[2,3].)

FWIW, I find writing POSIX-compliant shell rarely necessary. In my 25 years of writing shell scripts, they've nearly always been for automation on a specific OS where portability is not a concern. Arrays are really nice to have. One place I like to use them is for making long commands easier to read. e.g.

    cmd=(
        some-command
        --some-long-option
        --another-long-option
        --an-option-with-a-value="$value"
    )
    "${cmd[@]}"
This avoids having to use backslashes so you can comment in-between lines or at the end of any line.

I also almost always prefer the self-documenting long-option to single-letter switches when writing shell scripts. Those single-letter options are to save you typing on the command-line. There's no reason to use them in a script.

When I do write a script for more than one OS (typically macOS and Linux), I have bash on both systems, so I use "#!/usr/bin/env bash" and take care to use only the features that the older version of bash that macOS ships with supports.

Where I have to be more careful is in utilities like find, sed, etc which can differ quite a bit between BSD and Linux.

1. https://pre-commit.com/

2. https://github.com/shellcheck-py/shellcheck-py

3. https://github.com/maxwinterstein/shfmt-py


> I also almost always prefer the self-documenting long-option to single-letter switches when writing shell scripts. Those single-letter options are to save you typing on the command-line. There's no reason to use them in a script.

Same applies to Stack Overflow answers and blog posts. If you’re teaching someone how to do something, always use the long flags so they can better understand it as they read. If you don’t know what the long option is, take the two seconds to check the manual.


> In my 25 years of writing shell scripts, they've nearly always been for automation on a specific OS where portability is not a concern.

I've written hundreds of scripts for Busybox for Windows (https://github.com/rmyorston/busybox-w32), Linux bash and macOS zsh. POSIX compatibility means a lot.

WRT the specific concerns of arguments: if you are doing something complicated, use Python. ChatGPT writes it better anyway.

In my experience, more complex scripting is reinventing Gradle, Ansible, Terraform, YAML & similar.


I write a lot of Python too. I've written whole systems in it. Been using it since 1.5.2 days. And before that Perl. I know my way pretty well around sed and awk too and pretty much the entire Unix environment.

These are all just tools. They each have their strengths and weaknesses.

There are times when shell is the best tool for the job and when it is, I prefer to use bash.

If I have to write POSIX shell because it's a constrained environment, sure, I can do it. It's just been something that's rarely been a requirement.


> WRT the specific concerns of arguments: if you are doing something complicated, use Python.

Introducing Python syntax for this reason alone is way over-engineering your script.

Yeah, if you're doing something that requires better data structures[1] then don't use the shell, but switching to Python just to call existing programs makes things more complicated, not less.

[1] Arrays, Trees, Dictionaries, etc


>if you are doing something complicated, use Python. ChatGPT writes it better anyway.

Amusing that this is even a factor


I write lots of scripts that need to work on both Linux and MacOS. Although both are POSIX, filenames in MacOS are case-insensitive and often contain spaces. This stuff plays havoc with bash scripts written by naive Linux programmers. Following TFA's recommendations ensures that your scripts will work on both.*

* well, except for the fact that MacOS defaults to zsh now.


To be clear: I agree with every recommendation in this article. I just find that writing POSIX-compliant shell is rarely needed. I've clarified my comment.


> well, except for the fact that MacOS defaults to zsh now.

But Zsh is also POSIX sh compatible, is it not?


The Z, Korn, Debian Almquist, Watanabe, and Bourne Again shells are all capable of running POSIX-conformant scripts. So if one has a POSIX-conformant shell script, one can use the Z shell to run it, yes.

The major factors when it comes to portability are, for all of them, that (a) the POSIX-conformant mode has to be specially invoked, and (b) even if invoked in conformance mode the non-POSIX parts are not always turned off. It's not that the shells won't handle POSIX-conformant scripts. They even mostly will in their (default) non-conformant modes. It's rather that it's difficult to discipline onesself to keep within the bounds of POSIX conformance when writing, even if one tests with conformance mode switched on.

The Debian Almquist and Watanabe shells are the best at preventing non-POSIX stuff creeping in by accident, because the former doesn't have much of it in the first place and the latter actually does things like turn off non-POSIX option arguments and builtins so that they generate errors in conformance mode and one can detect accidental "Yashisms" in testing. With the others, it's easier to accidentally use some of their various extensions and not spot it without explicitly testing with something in addition to the shell in its POSIX-conformant mode.


Zsh is not POSIX compatible. You need to enable many flags to make it POSIX compatible.

Zsh:

  $ A="foo bar baz"; for X in $A; do echo $X; done 
  foo bar baz
Dash:

  $ A="foo bar baz"; for X in $A; do echo $X; done 
  foo
  bar
  baz


This is true for the Bourne Again, Watanabe, and Korn shells too. The default mode is not the POSIX-conformant mode, but is the shell's native mode. This does not, however, make these shells "not POSIX compatible" any more than it makes the Z shell "not POSIX compatible". In all of them, one has to do non-default stuff, from setting particular environment variables through invoking the shell with a particular name to adding extra command-line options, to pick the POSIX-conformant mode.


> You need to enable many flags to make it POSIX compatible.

or use 'emulate sh' to enable many flags at once.

(and I personally would prefer if other shells would behave like Zsh and not word split unquoted variables by default)


No.

Afaik it can be thou


Sorry to be so naive.. who is TFA?


The fine[1] article.

1. Originally from RTFM -> RTFA -> TFA, so feel free to use your own substitution for F, but in this case it's a fine article.


I've always thought of it as "the full article", having not made the connection :p

I think it's more broadly applicable


The foregoing article


The author suggests using shellcheck in addition to his recommendations.


You're right! I totally missed that. I've updated my comment.


> Arrays are really nice to have. One place I like to use them is for making long commands easier to read.

…at the cost of making it harder to grok for other people what that code is doing, and having them wonder if there’s some other important reason it has to be written that way.


I'm sorry, but anything that requires a programmer to memorize this many rules in order to avoid shooting themselves in the foot is deeply and fundamentally flawed and needs to be completely redesigned from the ground up. There is a reason that modern programming languages include actual data strutures and don't just treat everything as a string of bytes. That might have been a good idea in 1973. In 2023, not so much.

So... use a shell for interactive work if you must, but never ever write a script for one. Use Python or Ruby or just about anything besides a shell.


>"but never ever write a script for one. Use Python or Ruby or just about anything besides a shell."

And create a massive overhead because these python scripts just keep on breaking? No thanks, shell has its issues but saying not to use them anymore in simple scripts any to much of a hot take for me. (but please note this is a very biased take on my park)

Unless you're of course only want to use perl scripts... those things will keep on running for decades without ever breaking (or being readable)


If you keep to basic functionality, there are very few breaking changes in both Python and Ruby. Apart from py2/3 and ruby string encoding change, I have not run into anything I remember now. I've experienced more issues with tiny differences between zsh/bash/ash/dash. (And Mac vs modern bash)


Except the python3 str change was pretty big (and centered again, on those things that aren't just strings in POSIX).

Getting the correct `bytes` that went in your program uses completly different interfaces compared normal python use , for example, you now need to use `os.environb` instead of `os.environ` if you want to use arbitrary POSIX-compliant `bytes` environment variables and `argvb = [os.fsencode(a) for a in sys.argv]` instead of just `argv` for POSIX-compliant `bytes` command line arguments.

As people suggest in https://peps.python.org/pep-0383/ , handling really arbitrary inputs is tough work.


Python 3 took about a decade to play out and for a few years you could opt to still explicitly use v2. Parent was saying those scripts "keep on breaking". Those two things are not the same.


A lot of languages/systems require to memorize rules. And a basic understanding on how the shell works help understanding those rules. They aren't just magic incantation.

As an older programmer, I've noticed that the unix shell is one of these few tools that I learned when I was a teenager that is still relevant and useful today (and unavoidable, even if you don't like it). So understanding it was (and still is) a good investment.


The shell rules can be called many things but "basic" is not one of them.


The flaw in your reasoning is that a minimal POSIX shell, in the form of dash, compiles to less than 100k bytes on 32-bit x86.

None of the other technologies that you did or might mention exhibit this property.

Korn features were explicitly excluded to meet this goal in the standard.


There's also the set & setting that is good to consider. For example, I bet lots of the scripts in my current employer's DevOps infrastructure have filename handling bugs that could've been avoided by meticulously following these rules. But these scripts are not meant to be used by customers - the people using these scripts are other developers working on the product wanting to set up the development environment. The script might fail if you do some stupid / unexpected stuff, even as simple as having a space in your Unix user name (i.e. home directory path) or VCS clone directory.

I would feel weak from the knees if someone exposed these scripts on a web server and let them process input from untrusted external parties. Yet, I don't care if the scripts aren't 100% perfect if they work as long as the user doesn't do something silly. Plus, many of them are portable meaning you can run them out of the box on Linux, MacOS or WSL (Windows) without installing any extra software.


I have literally never worked on a computer where that would make a noticeable difference.

I am however struggling with those footguns regularly.


I actually used Microsoft Xenix on 80286 "minis" early in my career.

The 64k limit for the text segment of a compiled C program is likely a large factor in the POSIX shell standard; it seems designed to fit architectures similar to this.

The Korn shell was included in later versions of Xenix, but I understand that the source code was an unmaintainable mess, hence the removal of features in the final standard for the shell language.


Most home routers have < 16MB of space.


Micropython will compile to much less than 100k bytes on x86, too.


You should try to get that into Busybox!


But is micropython installed on my router (POSIX shell is)?


Unironically Powershell. I agree with you, I don't get how people can justify having a string of bytes and then parsing that to communicate between programs. Powershell functions off of data types. You can so ls (an alias for Get-ChildItem). You can also do (ls).CreationTime, just like in most languages since it returns an object. Or (ls)[0].Name, since just like you would expect, it returns an array of objects. Or do ls | select-object Name, CreationTime. And grab those variables out of the object. And when you pass things, it passes an object. Much safer, much easier, whether Powershell or something else, this idea is the future of terminal.


Just port PowerShell to Linux, and Bash and friends will become obsolete.

EDIT: seems like that has already happened: https://learn.microsoft.com/en-us/powershell/scripting/insta...


It's not on my desktop (not installable via my distro), not on my laptop (same), not on my router (same, and it's a completely different one!), not on my phone, not on any of the servers I interact with (and again, not installable!)...

Given it's harder to get than (as an example) Python (which is likely already on the system I'm interacting with), and powershell just straight breaks things (https://github.com/PowerShell/PowerShell/pull/1901), why would I ever use powershell?


The porting of PowerShell happened, Bash obsolescence is still uncertain.


I would agree. It would be nice to see a new shell (that isn't PowerShell) that worked well with spaces in file names.


Zsh works fine with spaces and all other manner of weird whitespace, non-printable binary data, etc. The only unfortunate edge case where are you still need to quote things defensively is if a parameter is completely empty, then the entire word gets removed from the command if the expansion is not quoted.

Other shells are fun, but Z Shell already solved all the worst problems of parameter expansion years ago. If anything, you should be migrating your Bash scripts to Zsh even if you never switch your interactive shell.

That said, most of the problems here don't have much to do with parameter expansion and have more to do with the fact that text as a universal interface kind of sucks. Nushell and Powershell have better models here, but the latter lacks a convenient way to parse fixed-width tables so it's unfortunately frustrating to use with a lot of traditional Unix tools (there's no "ConvertFrom-Table" cmdlet, only "ConvertFrom-CSV").


> Zsh works fine with spaces and all other manner of weird whitespace, non-printable binary data, etc.

> most of the problems here don't have much to do with parameter expansion

Parameter expansion is 99% of the problem here as you said youself.

Parameter expansion ends up being most of the time what other script languages would call eval, and everyone knows to avoid eval unless you really need to use it.

Excluding some really annoying bugs, almost all shells works fine for any sort of binary data. It's the user shell script that is composed of unsafe word expansions that doesn't.

It just takes someone messing with parameter expansion in zsh (SH_WORD_SPLIT) and then zsh becomes just another shell in a world of hurt again.


"You can use any characters in filenames, except NUL and slash."

"Including LF?"

"Yes."

"So what does find do when it encounters a file with an LF in its name?"

"It outputs the name including the LF."

"Without quoting it?"

"Right."

"But how can something reading the output from find know what's an LF in a filename or an LF separating lines?"

"You can use -print0 so it separates files with a NUL character, which filenames can't use."

"Wouldn't it be simpler to prohibit control characters from filenames?"

"Never!"


You would think that this would be common-sense. In 2010, David Wheeler (author of the featured article) proposed to the Austin Group that control characters – in particular newline (and TAB and ESC) – be forbidden in filenames: https://www.austingroupbugs.net/view.php?id=251 Unfortunately, it has yet to be implemented. :(

I also note that in 2016 the authors of GNU coreutils took a lot of flak when they changed the default behaviour of ls to quote filenames with special characters¹². Personally, I’d be having stern words with anyone that produced path names with anything funkier than a space character.

¹ https://www.gnu.org/software/coreutils/quotes.html

² https://unix.stackexchange.com/a/258691


This would be a significantly better resource if it explained the underlying "why's" inline. In it's current form, it comes off as cargo culting / magic spell recital.

For example, why "./*" rather than "*". Not helpful to understand the core principal and reality.

Personally I don't tend to fuck about with "*" and instead use `find . -print0 | xargs -0 -n1'. Hasn't bitten me yet, across a few thousand shell scripts (!).

Why should I prefer this magic incantation compared to my own battle-tested proven one?

And this is why, IMHO, the explanations should be inline.


Sadly you can't please everyone :-).

The explanation is further down on the page. If I wrote in that recommended order, many other developers would complain "I don't want all the details, just tell me what to do".

I wrote it this way because some developers just want the tl;dr version, that is, they "just want the answer"


What do you think the chances are of removing the bag of bytes filename design bug in another decade?


Low. It wouldn't be hard to create an LSM for Linux to reject file creation of certain filenames, though. Then at least sone security-minded people could enable it.

The POSIX folks are finally adding formal support for null terminated filenames in filename lists (which is what a lot of people are doing). So that's something.


I found it to be quite thorough and lucid in its explanation of the rationales behind each suggestion, including the one you mentioned.

Perhaps you are referring to the first section which is introduced as being a brief tl;dr:

  > Here’s a quick summary about how to do it correctly, for the impatient who “just want the answer”.
The full document below that tldr includes for example this explanation of the reason for using ./* over *. Namely, flag injection:

  > This is important because almost all commands will interpret a string beginning with dash as an option, not as a filename, until they see something that does not begin with dash. Globs are expanded by the shell into a list of filenames, and dash is earlier in the sort order compared to before alphanumerics, so it is easy for attackers to make this happen.


Another approach is adding filenames after a "--" terminating option. Most, though not all, commands support it, like:

  ls -l -- -file-name-with-dash

  cat -- -file-name-with-dash


This is discussed in the featured article: https://dwheeler.com/essays/filenames-in-shell.html#dashdash

As you point out, not all commands support this convention and the author feels that it would be hard to enforce this consistently among maintainers. I agree that it would be easier (for me) to remember to prefix globs with `./`.


I'd argue it's not quite as terrible as the author suggests. Getopt(3), for example, uses it, so many programs pick it up from there. As do many popular getopt() variations for other languages...Golang's "flag" built-in, for example. So it's not universal, but it's also more common than you might imagine.


Can you elaborate?

Do you prefix the find line there eith a backtick?

It looks robust, and I am interested in using this technique


Here are a few examples of how I safely traverse filesystem contents in bash:

  find . -print0 | xargs -0 -n1 ./sub_script.sh

  find . -mindepth 1 -maxdepth 1 -name '*some*pattern' -print0 \
    | xargs -0 -n1 -I{} /usr/bin/env bash -c \
      'printf "%s" "$*"' _ {}


Related:

Filenames and Pathnames in Shell: How to Do It Correctly (2010) - https://news.ycombinator.com/item?id=9021786 - Feb 2015 (20 comments)

Filenames and Pathnames in Shell: How to Do It Correctly - https://news.ycombinator.com/item?id=8337296 - Sept 2014 (2 comments)


This is the reason its better to use a "Real Programming Language" leveraging libraries to do this job instead of a shell script. A ludicrous number of things to keep in mind for simple filenames and pathnames.


Or just use a sane shell (eg. the oil shell https://www.oilshell.org/blog/2020/10/osh-features.html#elim..., fish) where handling pathnames with spaces, etc. is not a problem and doesn't require arcane quoting knowledge or linters.

The only reason why this is an issue is that people insists in using bash in the name of backwards compatibility. It's about time people start considering moving away from bash - its design is broken and the efforts and time it takes to write safe bash shells might not be worth it.


I don't disagree, but only some of the rules are really about the shell itself. A significant part of the problem is the stringly-typed interface of exec. Avoiding passing filenames starting with "-" has little to do with shells.


The only real reason to use Bash in the first place is that people will probably have it installed already (assuming you don't care about the most popular desktop OS in the world).

If you're going to add a dependency you may as well just use a "real" language at that point.

I've been using Deno for "shell scripts" for a while and it's great. My only complaint is that `deno run` prints annoying "a new version of Demo is available" messages which it very clearly shouldn't. Hopefully they'll fix that.


> If you're going to add a dependency you may as well just use a "real" language at that point.

Not necessarily. It's still easier to manipulate shell programs in fish[0] than it is in Python since fish was made explicitly for that. Calling external programs, using pipes, process substitution, etc. are easier in a "proper" shell language.

[0] Or any bash alternative of your choice: nushell, oils, xonsh, elvish, etc.


I don't know if they're planning on changing that, but `-q|--quiet` will suppress this. It just sets the log level to error, and the new version message is info level.


Brokenness is really a tradeof.

Imho using oil/fish/etc trades unfavorable because of network effects but it is perfectly acceptable choice in many circumstances


IMO someone made the wrong trade-offs there. Shell scrips mainly deal with text output of different programs, so escaping and quoting should be as simple as possible.


Messing with IFS can also solve a lot of problems


This is interesting, he also provided commentary on Fixing Unix/Linux/POSIX Filenames[1]

[1] https://dwheeler.com/essays/fixing-unix-linux-filenames.html


Your link was an interesting read, he does make good points along with clear examples.

I'm one of those who believes that linux paths should have more limitations. I also believe they should be case insensitive. Both opinions are highly controversial, unfortunately

I guess it comes down to whether you believe paths are there to serve humans or machines. If you believe it should serve humans, then you understand why allowing a - prefix or case sensitivity causes unnecessary problems.

On the other hand if you believe that paths are there to serve the machine, then the fewer "arbitrary" limitations the better. The devs know best, after all.


Every time I see someone argue for case insensitivity I remember this excellent Hacker News comment on the issue: https://news.ycombinator.com/item?id=29755865

It's definitely more complex than you'd naively think it is.


People are making things more complex than they are. No one said you need to convert between scripts or muck about with fullwidth variants or stuff like that; "semantically identical" is not the same as "case insensitive". That post is going of on a "if you want case-insensitivity then you must also treat color and colour as identical!" tangent, which is just silly and not what anyone has ever argued for.

The scripts with case translations that are more complex than a simple 1-to-1 mapping are the exception, not the rule (German, Greek, Turkish, Lithuanian). These can be dealt with.

It's certainly not the case that it "will only work for English and a few European languages"; it will work for much of the world.

The fact of the matter is, two out of the three most used systems today are case-insensitive, and Linux/POSIX being the exception is rather painful.


> People are making things more complex than they are.

You are the one who wants to introduce extra complexity: A few narrow corner cases where the filesystem treats certain code-points as equivalent. Imagine how confusing that would be to someone who doesn't regularly use those code-points.


> It's certainly not the case that it "will only work for English and a few European languages"; it will work for much of the world.

I respectfully disagree.

Consistently broken in the same way for everyone is better than unreliably working for some people, reliably working for other people, unreliably broken for a third group and reliably broken for a fourth group.

Filenames should be created as the user typed it - don't change the input before storing it.

Filenames should be displayed as the user typed it - don't render output different to what was input.

Tools should match filenames as the user expects it - in this case (hehe) it should perform a case-insensitive match. Ambiguity resolution has to be performed in the case where there is more than one match.

    cat > MyFile.txt    # Create 'MyFile.txt'
    ls                  # Display 'MyFile.txt'
    echo myfile*        # Display 'MyFile.txt'
    vim myfile.txt      # Opens 'MyFile.txt'
The first problem with this is in automated shell scripts: there is no interactivity so the script can't prompt the user "Which of the following files did you mean to open: [MyFile.txt, myfile.txt]?". This means that a script which relies on 'foo.txt' will fail if someone creates a 'Foo.txt' in the same directory.

Another issue with this is that userspace calls to `open()`, `stat()`, etc aren't able to fail and return a list of case-insensitive matches in the case of ambiguity. This makes handling the ambiguity the applications (very complex) problem - before any `open(fname)` call, the application will first have to `scandir()` to get all the filenames, then perform a case-insensitive match against the list to get all the CI matches, then prompt the user to select the correct one.

> The fact of the matter is, two out of the three most used systems today are case-insensitive

I think it's more that they are case-aware than case-insensitive; after all the filesystem certainly stores the case, and they both retrieve the case correctly.

It's the tooling that faces the user which "fix" cases to prevent two files with the same name in different cases being created.

You can certainly create file 'FOO' and file 'foo' on Windows in the same directory, and then the tools tend to randomly open only one of them no matter which on the user clicked on.[1]

Which is why I say that the actual filesystem is case-sensitive. The user programs normalise the case for the user.

[1] EDIT: I accidentally did this once, and had to write another small tool to remove files with the filename as typed because the normal tools (del, explorer.exe) just randomly choose one file to remove.


> Filenames should be created as the user typed it - don't change the input before storing it.

I wouldn’t go that far. Certainly, I would use some normalization algorithm on whatever the user typed, so that there’s no difference between typing a precomposed character (https://en.wikipedia.org/wiki/Precomposed_character) and typing a character and a combining character (https://en.wikipedia.org/wiki/Combining_character)

If you don’t do that, the user’s keyboard layout may affect whether they can type a given file name.

Also (nitpicking), if a user types a backspace or control-H, I wouldn’t include a 0x08 in the file name.


> It's definitely more complex than you'd naively think it is.

It has been working just fine on macOS for more than 20 years now.


…for English, and a handful of other European languages where the problem is trivial. Not for the 70% or so of humanity who isn't European or European-descendant.


It uses a specific normalisation. It’s not perfect (even for supposedly easy European languages), but that’s still many less foot guns than the status quo. Again, macOS works just fine with these languages.


Oh, wait, THAT'S what keeps breaking umlauts every time Mac users try to share files with any other operating system? Bloody hell, that garbage just keeps causing trouble constantly.


[1] https://github.com/tesseract-ocr/tesseract/issues/3447

Less that 20 years ago. Not a small project.


That's interesting but he does dive head first into a stupidly common fallacy - "if we can't do it perfectly we shouldn't do it at all".

I still think filesystems should be case sensitive, but not because there are some weird languages out there that it wouldn't work for.


> That's interesting but he does dive head first into a stupidly common fallacy - "if we can't do it perfectly we shouldn't do it at all".

I rather think you need to solve at least a somewhat significant subset of a problem in order to justify the extra complexity of the solution (and any confusion caused by it not solving the whole problem). Believe me, I'm a great believer in "perfect is the enemy of good", but not to the extent that I think that "any shitty solution at any cost is better than nothing". I'm not convinced that case insensitivity doesn't fall in the later category.


Over 2bn people speak English, Spanish or French. That's somewhat significant. You're going to tell those 2bn people they can't have nice things because there are other people that wouldn't get them?

As I said before, I still don't think it is a good idea, but that's because of other reasons (basically it introduces more confusing behaviour than it removes); NOT because it can only work for a subset of people.


> Over 2bn people speak English, Spanish or French. That's somewhat significant. You're going to tell those 2bn people they can't have nice things because there are other people that wouldn't get them?

If the cost of the nice thing (which I, as one of the 2bn don't even consider nice) has to be paid by the other 6bn people as well, sure.

> As I said before, I still don't think it is a good idea, but that's because of other reasons (basically it introduces more confusing behaviour than it removes); NOT because it can only work for a subset of people.

This is my main reason for being hesitant to the whole thing as well. The post linked above just happened to broaden my view on the problem a bit.


100% agree that the argument should be around file system behaviour rather than some imagined social justice in file system names.

There are some weird takes on here. Most users aren’t even going to see the file system in the future.

If there were a time to make everything case insensitive it was 35 years ago.

We might even rarely type soon.


"weird languages"? sigh.


>If you believe it should serve humans, then you understand why allowing a - prefix or case sensitivity causes unnecessary problems.

Serving humans feels like it should take whatever garbage I shove down its throat and be happy to receive. Similarly, I think many humans would like to differentiate between, "resumefinal.docx" and "resumeFINAL.docx"


>Similarly, I think many humans would like to differentiate between, "resumefinal.docx" and "resumeFINAL.docx"

Would they? Both are the "final resume" Word document...

Humans don't care about case in distinguishing names. Considering case differences as important distinctions between names (of files in this case) is something instilled from using computers, not a human trait. That's a concern for coders, proof-readers, and other OCD types.

Not to mention that the most popular Desktop OSes (Windows, MacOS) don't care for file case either. They preserve it for visual display is you set it, but two filenames with differences in case only are the same file - and can't coexist.


Humans don't speak binary garbage, so it's a disservice to them to happily swallow/spit it out

Cap sensitivity is a much lesser mistake


> Similarly, I think many humans would like to differentiate between, "resumefinal.docx" and "resumeFINAL.docx"

I strongly disagree. Those sound exactly the same. If you had a physical filing cabinet and asked someone to get a document from the final folder (which is supposed to be the human-friendly metaphor we're using here), they would happily open a folder labeled FINAL, and be quite confused if you had distinct final and FINAL folders.


If they shout the name, you know to look for the upper case version :-)


> The devs know best, after all.

Which may be why they gave us these options to put in our ~/.inputrc

``` set completion-ignore-case on set completion-prefix-display-length 2 set completion-map-case on ```

Technically filenames and paths are still case-sensitive, but with these settings they practically aren't.


ARGH! How does one do code on this site?

I meant: set completion-ignore-case on

set completion-prefix-display-length 2

set completion-map-case on



Indent by 4 spaces.

    like this


You actually indent by 2 spaces, not 4:

  Like this
4 spaces is a Markdown thing.


My bad, I knew that at some point. The rules are actually linked to from the well-hidden "help" at the bottom right of the comment box:

https://news.ycombinator.com/formatdoc


Why is the fact that filenames can contain newlines and executable shell commands always framed as a problem? You can have files that write to themselves and change their names. This is a massive boon for expressive power of a language. You can have a lot of fun on the command line with a couple files and eval *.

"We have persistent "meta"objects, they're called files."


A nice collection of issues, although I'd disagree a bit with some of the solutions, particularly around prefixing. You first need to check if you have an absolute path and it doesn't work with input that isn't a path. So for me it seems best to focus on -- as the primary measure and prefixing as an additional measure if possible when needing to be extra careful. Most software uses some getopt variant so it should be quite rare that -- doesn't work.

One nice thing about bash is the ${var@Q} syntax that is great for printing stuff and deals with the potential for stuff that affects the terminal in a much nicer way than piping through tr. IMO, it should always be used for pathnames so that you can always just copy and paste the output, which I do quite a bit. I wish it was in POSIX.


> One nice thing about bash is the ${var@Q} syntax that is great for printing stuff and deals with the potential for stuff that affects the terminal in a much nicer way than piping through tr.

An alternative which works in both Bash and Zsh is

  printf %q "$var"
using /usr/bin/printf %q "$var" has even a good chance to work in other shells.

And as fun fact: the Zsh equivalent for Bash's ${var@@} is ${(q)var} and it was not enough to have one quoting style but instead you can also provide qq, qqq, qqqq, q+ or q- all with a different result.

https://zsh.sourceforge.io/Doc/Release/Expansion.html#Parame...


Unrelated but if you want this page to look modern and responsive, you can use WaterCSS's Bookmarklet: https://watercss.kognise.dev/#bookmarklet

Example (NOT NSFW , I don't know why imgur shows this warning): https://imgur.com/a/QxJGARN


Thanks for the tip. I appreciate the minimalism of the original article (which is already responsive) but Water.css makes it visually prettier. It even has dark and light themes!


Am I the only one that thinks setting -e is a huge footgun given that it only works in certain contexts? I'd much rather see explicit error handling.


The biggest issue with "set -e" is that it's inconsistent. It doesn't always cause the shell to exit where you'd expect it to, and in some cases, it causes the shell to exit where you would NOT expect it to. This is the most surprising:

    set -e
    i=0
    (( i++ ))  # shell will exit here
Still, I tend to use it because even though it doesn't handle every case, I'd still usually rather my scripts exit when a command fails.


Another variant which I overlooked to often is if you have a construct which returns an error but does not exit the script later move it in a function and now it does exit the script.

  set -e
  [[ 0 == 1 ]] && true
  echo Status: $?
will print "Status: 1" but

  set -e
  func() {
    [[ 0 == 1 ]] && true
  }
  func
  echo Status: $?
will print nothing but exit after func returns.


There's quite a few situations where it does NOT exit when you'd expect it to. This is one of those. What you're seeing there is a variant of the "Short-Circuiting Considerations" discussed here:

http://redsymbol.net/articles/unofficial-bash-strict-mode/

Basically, when you short circuit, it will not cause the script to exit. However, the short circuit still affects the return value of the function, or the script itself if it's the last line of the script.

So your short circuit is causing the function to return non-zero, which then causes the script to exit.


This doesn't cause an exit for me on GNU bash (3.2.57)


The change was made with bash-4.1:

  This is a terse description of the new features added
  to bash-4.1 since the release of bash-4.0. [...]

  j. The [[ and (( commands are now subject to the setting
     of `set -e' and the ERR trap."
https://git.savannah.gnu.org/cgit/bash.git/tree/NEWS#n1078

Test program:

  $ cat foo.sh
  #!/bin/bash
  set -eux
  
  echo "$BASH_VERSION"
  
  i=0
  
  (( i++ ))
  echo "?=$? i=$i"
  
  (( i++ ))
  echo "?=$? i=$i"
Running with 3.2.57:

  $ /bin/bash foo.sh
  + echo '3.2.57(1)-release'
  3.2.57(1)-release
  + i=0
  + ((  i++  ))
  + echo '?=1 i=1'
  ?=1 i=1
  + ((  i++  ))
  + echo '?=0 i=2'
  ?=0 i=2
  
Running with 5.1.16:

  $ /opt/homebrew/bin/bash foo.sh
  + echo '5.1.16(1)-release'
  5.1.16(1)-release
  + i=0
  + ((  i++  ))

So the arithmetic operation returns non-zero in both cases where it evaluates to 0, but only under bash >= 4.1 does it cause bash to terminate.

The moral of the story is to be careful with arithmetic operations that might evaluate to zero, or use this:

(( operation )) || true


I copy/pasted the example from the Google bash scripting guide because I was on mobile earlier, but I just confirmed that it exits immediately after the arithmetic operation with bash 5.1.12.


Given that it makes a script terminate for any unhandled errors, I don't think your goal contradicts with setting -e. Rather, it makes the script author explicitly handle errors or the script dies.


Handle errors in bash? More power to you, but if my script cannot just fail in place, then it will be written in Python. Even with shellcheck, I do not trust myself to write more than a handful of lines in bash.


> Handle errors in bash?

The || operator does that pretty well.


The "bag of bytes" POSIX(?) standard(?) can be overridden at the filesystem level:

ZFS

   zpool create -O utf8only=on -O normalization=formD -O casesensitivity=insensitive mypool <pool geometry> <devices>
My primary use case for ZFS is a home lab file server to my Mac clients; you might want to keep Unicode normalization and case sensitivity at the default settings.

My suggestion here is just one [futile] step at reducing the madness; Postel's Law dictates that we should expect pathological inputs to shell scripts anyway.

("emit clean outputs, expect crazy inputs")

https://en.m.wikipedia.org/wiki/Robustness_principle


See also the HN comments on

"Defensive Bash Programming (2015)"

Archived: https://web.archive.org/web/20170728030103/http://www.kfirla...

HN: https://news.ycombinator.com/item?id=10736584

"The Hell that is Filename Encoding (2016), HN 2018"

https://news.ycombinator.com/item?id=16991263


This is a nice comprehensive resource. I admire author's perseverance in expounding these edge cases and pursuing to amend them somehow. For many years!

Yet filenames is not the only place that can unexpectedly include control chars. For instance here is creation of an user with partially colored username. Still work on GNU/Linux:

  $ UNAME=$(printf '%b' "iam\033[1;31mred\033[0m\n")
  $ sudo useradd --create-home "$UNAME"
Lot's of tools break in various ways for such user.


I would add the "-print0" option in find that causes it to output null separated file names. And the companion "-0" flag in things like xargs, perl, tar, etc...to parse them.


Ooh, I can put slashes in filenames!

    touch 2023∕06∕19
Or, maybe not.

    ls 2023/06/19
/usr/bin/ls: cannot access '2023/06/19': No such file or directory

Or, maybe I can.

    ls 2023∕06∕19
2023∕06∕19


> touch 2023∕06∕19

Won't work if the directory 2023/06 doesn't exist. The filename is 19, the relative path is 2023/06/19.


2023∕06∕19 is a filename. 2023/06/19 is a directory.

UTF-8 &#8725; division slash


2023∕06∕19 may be a filename, but it’s not a date, it’s an arithmetic expression. ;)


Oh geez I missed the utf8.


Lol

Unescaped non-asci was a mistake


Or you quote them?

ls "2023/06/19" etc?


The colon is always an interesting character, since it works on windows, but not linux


> Set IFS to just newline and tab, if you can, to reduce the risk of mishandling filenames with spaces.

should be: reduce the risk when mishandling filenames.

If your handling of filenames depends on IFS, you are already in big trouble!


Does this work as desired?

    "$(dirname "$file")"
Is $file not actually quoted here? Since the quoted bits are

    "$(dirname "
    ")"


This will correctly nest the quotes, yes. It’s somewhat counter-intuitive, but it works.


>It’s somewhat counter-intuitive, but it works.

Sums up the POSIX shell experience


One might disagree about “it works”.


$() is a quote, like ``, and the "$file" is entirely within it, so it should work.


Thanks. I had thought the " quote was parsed / used much earlier in the tokenisation for sh.


So many sharp corners to cut yourself with.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: