Hacker News new | past | comments | ask | show | jobs | submit login
“Exit traps” can make your Bash scripts more robust and reliable (2013) (redsymbol.net)
591 points by ekiauhce on June 20, 2023 | hide | past | favorite | 148 comments



I used an exit trap to kill an SSH agent that I am running, and I noticed that dash did not kill if the script was interrupted, but only if it ran to successful completion.

I asked on the mailing list if this was expected behavior, and it turns out that POSIX only requires EXIT to run on a clean shutdown; to catch interruptions, add more signals.

  trap 'eval $(ssh-agent -k)' EXIT INT ABRT KILL TERM


I found my original submission to the email list.

https://www.spinics.net/lists/dash/msg02208.html

"Signal terminations are not caught by EXIT. It only catches normal exits. Unfortunately, the EXIT condition is not well-defined by POSIX, so it's left to interpretation."

...

POSIX is making it unspecified what happens here:

https://austingroupbugs.net/view.php?id=621

The EXIT condition shall occur when the shell terminates normally (exits), and may occur when the shell terminates abnormally as a result of delivery of a signal (other than SIGKILL) whose trap action is the default.


Why do people write such standards? May occur. I don't need a standard to do something random.


In this type of scenario, it's sometimes because there are two or more implementors that are trying to develop the standard in a way that makes their existing implementation compliant. Likely there were existing scripts that relied on the behavior being specific to that implementation, so it was explicitly made optional, which is viewed as better than not adding the behavior to the standard at all. "May" is just a polite standards way of saying "Warning: check yourself before you wreck yourself."


I guess to get a well-defined behavior the interrupt handler needs to clear the EXIT trap.


The signals EXIT HUP INT TERM cover everything I've run into (I'm actually using EXIT SIGHUP SIGINT SIGTERM but presumably it's equivalent).

In basic terms for my purposes these respectively account for a clean exit, the terminal emulator being closed, ctrl-c, the kill command (edit: the default SIGTERM kill -15, not the SIGKILL kill -9)


What's the signal for SIGKILL?


SIGKILL can't be handled. It's the signal you send when you don't want to give the process a chance to handle it.


That's the difference between kill -15 (SIGTERM) and kill -9 (SIGKILL), where SIGTERM shuts down a process gracefully.


Relevant: Monzy performs at Stanford Univ. "Kill Dash Nine" https://www.youtube.com/watch?v=Fow7iUaKrq4


This is mandatory viewing


You can't trap kill can you? doesn't that just go kill your process without any possibility of intervention or other actions? Also you probably want to handle HUP there too I would think (depending on what the script does)


SIGABRT would have to come from the process itself; IDK when if ever the shell would do that.

And SIGKILL can't be handled, so that is indeed pointless.


In sufficiently old Unix (I remember doing this on 32V) there was a fun little hack you could do with SIGKILL. If you sent SIGKILL to a process and that process was being debugged it would not kill the process. It would just pause it and notify the debugger that the process had received a signal, and tell the debugger which signal.

The debugger could then allow the signal to go through to the process as is, or to replace it with another signal, or have it be ignored.

I made a program that looked like sh in ps, but actually was a simple debugger that just ran a specific other processes of mine. When that process hit a signal the debugger poked the signal number into a fixed location in the process' memory, then changed the signal to something innocuous like SIGALRM, and let that be delivered.

The process' SIGALRM handler would get the original signal number that the debugger had poked in, and print some obnoxious message like "Stupid sysadmin...your wimpy SIGxxx cannot hurt me!" where xxx was whatever signal someone had tried to send it.

I then told my fellow admins I had a stuck process that I could not kill, and asked them to kill it.

It took quite a while before someone got suspicious enough to suspect that the sh that was the parent of the "stuck" process wasn't actually a normal shell and try killing it.


It's like the software equivalent of an Annoyatron! I love it!


That is quite a story!


POSIX says that "setting a trap for SIGKILL or SIGSTOP produces undefined results", but for signals it describes SIGKILL as "Kill (cannot be caught or ignored)".

I'm guessing this is some relic from 80s Unix systems where SIGKILL behaved different, or perhaps just an inconsistency/oversight.


I read that as undefined in terms of how the shell itself handles it, because the OS doesn't care.


I think it means that it is valid for OSes to return an error instead of doing nothing, if you attempt it.


You can't, though probably common for a parent to spin off a child and handle the SIGKILLed child via waitpid().


That's what they put on the ticket, so that's what I'm using, but you're probably right.


SIGABRT is also not a normal termination signal. Seems out of place here.


I think you want:

  trap 'ssh-agent -k' EXIT INT TERM
I don't see any reason for the eval as "ssh-agent -k" doesn't return anything useful you want the shell to evaluate.


That's not what the eval is for.

The "ssh-agent -k" command will emit shell commands that the shell must then execute which will kill the agent daemon and unset the socket environment variable.


If all you care about is killing it, you don't need to eval the output. The output just unsets two environment variables which only matters in the current shell context.

  $ ssh-agent
  SSH_AUTH_SOCK=/var/folders/8p/_pwq997168s7vdwwdg_qr1j40000gn/T//ssh-DE0IoJfU5rrM/agent.15015; export SSH_AUTH_SOCK;
  SSH_AGENT_PID=15016; export SSH_AGENT_PID;
  echo Agent pid 15016;

  $ SSH_AGENT_PID=15016; export SSH_AGENT_PID;
  $ ssh-agent -k
  unset SSH_AUTH_SOCK;
  unset SSH_AGENT_PID;
  echo Agent pid 15016 killed;
That said, it doesn't hurt to eval it, so I overstated my case in my original comment.


> The "ssh-agent -k" command will emit shell commands

Does it really? I've executed it here and it just runs kill, doesn't emit any bash. Running just ssh-agent (without any args) does that though, which is what's probably causing the confusion.


I am on OpenBSD 7.2, and I see:

  $ eval $(ssh-agent)
  Agent pid 56785

  $ ssh-agent -k
  unset SSH_AUTH_SOCK;
  unset SSH_AGENT_PID;
  echo Agent pid 56785 killed;
The correct processing of that output requires an eval.

Did you have any other questions?


Why do you need to eval it?

$(ssh-agent)

won’t substitute that with the stdout and run that?


Because the intended use for "ssh-agent -k" is for eval.

While redirecting to /dev/null will certainly work, the agent is holding sensitive credentials (by design), and confirmation of shutdown has a tangible security benefit.


And now your script doesn't exit when you ^C it.

It's a very common mistake with trap.


Harsh lesson, those five signal names are identical if one squints real good. Would have never known.


`man 7 signal` on linux or just `man signal` on macOS will give you more information about the different signals, and shows what the different meanings of those are.


"kill -l" gives you a terse (but complete) list.


An annoying thing about bash is that EXIT will also run on SIGINT (^C), which most other shells won't (in my reading it's also not POSIX compliant, although the document is a bit vague). Some might argue this is a feature, but IMHO it's a bug – sometimes you really don't want cleanup to happen so people can inspect the contents of temporary files for debugging. Because trap doesn't pass the signal information to the handler it's hard to not do cleanup on SIGINT, so it's certainly less flexible, and it's an annoying incompatibility between bash and any other shell.

Also, zsh has a much nicer mechanism for the common case:

  {
      echo lol
  } always {
      # Ensure *all* temporary files are cleaned up.
      nohup rm -rf / &
  }


> Because trap doesn't pass the signal information to the handler

You can examine $? on entry to the trap function. On signals, it will be 128 + signal. i.e. on TERM (15) it will be 143. On INT (2) it will be 130.

  #!/bin/bash

  skip_exit=
  on_exit() {
    code=$?
    if test $code == 130; then
      skip_exit=1
    fi
    if test -n "$skip_exit"; then
      return
    fi
    echo "Exiting with: $code"
    return $code
  }

  trap on_exit INT EXIT

  sleep 2
  false
With ctrl-c:

  $ ./foo.sh
  ^C
After 2 seconds:

  $ ./foo.sh
  Exiting with: 1
You can also setup separate handlers for each signal and use a sentinel:

  $ cat foo.sh
  #!/bin/bash
  
  skip_exit=
  
  on_int() {
    echo int
    skip_exit=1
  }
  
  on_exit() {
    test -n "$skip_exit" && return
    echo exit
  }
  
  
  trap on_int INT
  trap on_exit EXIT
  
  sleep 2
With ctrl-c:

  $ ./foo.sh
  ^Cint
After 2 seconds: $ ./foo.sh exit


For robust code you should also use set -e.

This changes things again. Often a process started by the shell will get the signal, too (depends of course on how it is sent) and exit with a non-zero return value (depends on the process of course). I believe (not at the computer right now) the handler for EXIT is called in that case.

Was it so that bash can also trap ERR, but dash cannot?

It's not perfectly easy to handle all possible cases, and certainly impossible in a fully portable way.


> For robust code you should also use set -e.

Highly debatable.


What are the arguments to leave it out?

Of course in special cases where you do explicit error handling you can disable it. But for the big mass of commands where nobody checks whether it worked.


http://mywiki.wooledge.org/BashFAQ/105#So-called_strict_mode

> The behavior of set -e is quite unpredictable. If you choose to use it, you will have to be hyper-aware of all the false positives that can cause it to trigger, and work around them by "marking" every line that's allowed to fail with something like ||true.

Start there, then go back to the beginning for the extensive exposition against using set -e.

FWIW, I (a random person on the internet) use set -e for most scripts I write, and despite the caveats, find set -e is generally more beneficial than troublesome. I don't think Mr. Wooledge is wrong, it's just the groove I've settled in. I do sometimes consciously avoid using set -e, when I explicitly handle errors for every single element of the script.


The link is interesting, but more a rant than convincing not to use set -e. What would be the alternative? Handling errors yourself in every single command will for sure introduce more surprises and bugs. Of course bash is not the language to control a nuclear power plant. Even C has tons of undefined behavior. But writing all the shell scripts we use in this company without major problems in Rust would be prohibitively expensive.

I do not doubt that may bash scripters shoot themselves in the foot. You need at least one reviewer that understands well how the language works.

That said I prefer dash for scripting except when I really need arrays, which is rare. I have no scientific evidence, but KISS is good and bash just seems to have too many bells and whistles. And as the article mentions, bash seems to change in surprising ways between versions. I have also been bitten by that. Unfortunately dash has no set -o pipefail.


In my enterprise environment of legacy systems with 97% reliability, our scripts often run without -e and the effect is we deliver broken stuff constantly. But people get results. They get an email that says "an unknown error occurred". This can be superior to running some tiny shell script with -e and breaking that communication. It's kind of subjective to the experience.


Of course using set -e without an exit handler is contraproductive.

We use an exit handler that reports

   "$0 has exited prematurely"
in cases the script did not reach expected exit points.


I mean, it's possible, but it's not exactly pretty, and it won't necessarily work in all POSIX compliant shells either (although I believe it will in most, but I didn't test – things the trap execution order and exact status codes are not exactly defined IIRC).


Some temporary file remover. Lol indeed


// Thinks about that type I typed rm -rf /<space>something by mistake.

It took a few seconds before I thought... "Why does it take that long for only a handful of files?"

I never did that again.

Had my DOS filesystem mounted under Linux too (yes that long ago), and I spent a few days guessing the first letter of each deleted file with norton disk doctor or undeleter or something. That was fun (FAT16 filesystems overwrote the first letter of each filename to delete it)

At least it wasn't a mistake I made at work on some production thing. Though there is a reason I make all the desktops on windows production servers bright red. One time I was tired and shut down "my laptop" forgetting I was still logged into a remote server 200km away..... :/ Of course the iLO wasn't hooked up but I was extremely happy to find that HP servers listen to wake on LAN even when they're off. Another one for the never again books :P


"Keep non-temporary files intact" was not part of the design document.


The nice thing about temp files is that the OS will eventually remove them, even if you don't


Only if Cron is set up to do it. I have hosts without /tmp and /var/tmp clearance and they have data states which persist across reboot.

with UNIX/POSIX systems it pays to say "it depends" often.


That's fair, but a storage space specially treated as at least theoretically "ephemeral" should be cleared out on some regular basis before things start depending on it NOT being cleared out.

That said, it makes sense that at least some distros would leave a job in place to do so but initially disabled so that the user can decide based on the use-case


macOS does it on reboot, not sure if it inherited this from BSD proper


FreeBSD comes with a "periodic" script to do it daily, but it's not enabled by default.

There's this idea that macOS is "based on BSD", but that's not really the case in any meaningful way; it took some components, but the system overall isn't really "based on it" as such.


All files are temporary if your timespan is long enough.


my oldest file (sadly) is only from 2003.

how about you?


Difficult to say. The oldest file I could find says it's from 1989, but it got onto my machine a lot later. There's also a lot of "Ship of Theseus" going on, since it's a source code file that has seen lots of revisions over the decades.


Thanks for pointing this quirk/difference out! I'll keep it in mind.

Regarding the expected behaviour: I believe this is largely a matter of preference/philosophy. I have previously used trap handlers in bash scripts explicitly so that the cleanup is run in every case and /tmp is not polluted - even of failures. Reasoning: For automated tasks or people not familiar with Linux/my script, the disk should not be filled up with old garbage data. And for debugging, I usually add a '-d/--debug' option to have more verbose output and disable the cleanup on Ctrl+C (but still cleanup on normal exit).

But as I said, I don't believe there is the one true way. So except for pointing out that I had to ensure external cleanup in case the script was run by some auomated tasks in a permanent environment (read: native), I wouldn't seriously complain if I ever used one of your scripts :)


> I have previously used trap handlers in bash scripts explicitly so that the cleanup is run in every case and /tmp is not polluted - even of failures

That's perfectly fine – even desirable – behaviour for a whole bunch of use cases. The thing is you can opt-in to that trivially with "trap EXIT INT TERM ABRT", whereas the reverse is harder and much less obvious, so it seems like a better design to me (although a "trap ALWAYS" shortcut would be even better so you don't need to list signals).

At this point it's difficult to change due to compatibility as people's bash scripts would break in subtle ways, but it's one of my many annoyances with bash (my view is that people really ought to ditch bash in favour of a better shell – zsh being the most obvious incremental improvement, though far from the only option – but this always proves to be a controversial opinion).


if you're debugging hit ctrl+z and it will suspend the script in place


> Because trap doesn't pass the signal information to the handler it's not hard not to do cleanup on SIGINT

Did you mean "it's hard" instead of "it's not hard"?


Oops, yes, thanks; seems a "not" got duplicated in editing – still within edit window.


I like combining this with a bash implementation of an event API (https://github.com/bashup/events). This makes it easy/idiomatic, for example, to conditionally add cleanup as you go.

Glossing over some complexity, but roughly:

    add_cleanup(){
        event on cleanup "$@"
    }
    
    trap "event emit 'cleanup'" HUP EXIT
    
    start_postgres(){
        add_cleanup stop_postgres
        # actually start pg
    }
    
    start_apache(){
        add_cleanup stop_apache
        # actually start apache
    }
I wrote a little about some other places where I've used it in https://www.t-ravis.com/post/shell/neighborly_shell_with_bas... and https://t-ravis.com/post/nix/avoid_trap_clobbering_in_nix-sh... (though I make the best use of it in my private bootstrap and backup scripts...)


Thank you for sharing - if i understand the code, the queue is serialized into bash variable(s) (arrays)?

I must admit I find the code somewhat painfully terse and hard to read.

Still, interesting idea. I wonder if using a temporary SQLite/Berkeley DB/etc for queue might generalize the idea to a "Unix" event system - allowing other programs and scripts to use it for coordinating? (Like logger(1) does for logging)?


Yep. Definitely something you can do manually, but the API makes it easier to reason about and coordinate across otherwise disconnected/unrelated code.


Was thinking more as a "global" queue (like how psql/libpq will go look for a socket to local postgres in "the right place") - and a binary/program "event" could "magically" store ("on") and process ("emit") events in a db file /tmp/event.<namespace><random>.sqlite3 - creating/initializing or re-using db file as needed...

So keep the api, but support cross process queues, more or less.


I have given up the unequal struggle to learn Bash.

For me it is "read only".

It is too arcane, even for me.

I use Perl now. I tried to reform last year as I was building a system from lots of executable pieces, the perfect job for Bash

After much pain and suffering I re-wrote it in Perl. What a (relative) breeze.

Just. Don't. Do. Bash.

Works for me!


+1. Bash is riddled with absurd footguns. You want to set hack together three git commands and pipe to fzf? Fine. Anything more complex than that? Python, Perl, or any other proper programming language is there for you.


> Bash is riddled with absurd footguns

I agree, shell programming is ugly and very unsafe.

> Anything more complex than that? Python, Perl, or any other proper programming language is there for you.

I disagree.

First, there are certain things, specifically when you want to process very large datasets that are easiest - by a very large margin - to build using shell scripts: a combination a ripgrep, sed, awk, grep, cut, tr, paste, jq, head, tail, etc ... is way easier and faster to put together in bash than anything else.

Second, to get maximum flexibility you'd like to be able to switch from one (python, perl) to the other (shell script) transparently and both ways

Do certain things that can be expressed cleanly in python, and then transparently switch to bash calling a horde of specialized shell commands when the task at hand is easier to express that way.

Shell can transparently go down to Python or Perl very, very easily.

Unfortunately, the converse is absolutely not true: while Perl can - to a certain extent - be used to construct complex pipelines of data processing external commands, it is nowhere near Shell in ease of use.

And it is a giant PITA in Python which gives you fuck-all above exec/fork level subprocess manipulation: writing large external pipelines in Python is about as easy as it is in C.


  > > Bash is riddled with absurd footguns
  > I agree, shell programming is ugly and very unsafe.
Sharp knives are "unsafe". Kids shouldn't use them. Professionals prefer them.

I find most 'footguns' to be normal expected behavior, and the user shot themselves in the foot because they're careless or ignorant.


There's a difference between using a sharp knife, and a hammer that has been sharpened on one side so that it's usable as a knife, and pointy on another side so that it can be used as a weapon in case of a home intrusion, while keeping one side as an actual hammer and the handle is covered in honey for more friction in case your hands get sweaty.

Both can be unsafe, but one is a professional tool while the other is an abomination for our current standards.


i'd argue that every language is somewhat imperfect, but there is something to be said in favor of pitfalls/footguns/inperfections that are somewhat well defined and understood for more than a decade.


A footgun doesn't stop being a footgun with time. New generations of programmers and shell scripters are constantly rolling through and shooting their toes off. Just because there was a "well, that sucks, but it's the least worst compromise" 4 decades ago doesn't make it a good reason today, doesn't make it intuitive today, and doesn't make it not still a horrible footgun today. A familiar footgun is still a footgun.


Don't use traps unless you have to. They are subtly complex and require a great deal more code to deal with edge cases. There is almost always a simpler way to accomplish what you want.

If Bash has taught me anything, it's that many advanced features should seldom be used. Always resist the temptation to be fancy.


In my mind, as soon as you approach anything remotely resembling "fancy," you should move on to a different language or framework that isn't a bash or shell script at all.

I start questioning whether I should be writing bash as soon as I hit about ten lines. I won't even consider writing functions or loops.

If I'm manipulating the system, I'm probably using configuration management, and for most other tasks, I'm using a full-blown programming language with a nice set of standard libraries.

E.g., Python and Ansible.


How have exit traps come back to bite you?


The first is that the trap can come at any time. You can't assume at what point in the script it was running, so you have to test for different cases to find out what you now can/should do. Forget an edge case and now you've got an extra bug. Not using traps, it's clearer what happens at specific points in the execution of the rest of the code, so simpler to reason about how to deal with those cases as/where they happen.

The second is different events can trigger an exit trap, and those may have different implications on what's going on.

The third is there's parts of standards left out about what happens during/after a trap or when they get called, what data you have available, and different implementations can behave differently.

Fourth is that sometimes people will use an exit trap to, say, report on a failure, but they may have lost context of what block they were in when it exited, and now the error reporting doesn't tell you everything you wanted to know.

I can't remember more specifics atm because I stopped using them like a decade ago. I'll still use them to clean up temp files, but I also have to add the cleanup logic to the start of the script in case it didn't run.


There's a bit more nuance in my opinion.

If your cleanup logic is no more complicated than "perform some cleanup whenever the script exits for any reason, without concern for what state the things to clean up are in", I think trapping everything and calling the cleanup function is fine.

If you have to do anything more complicated, it's probably a better idea to stick all that logic into a non-bash program. You can do it in Bash if you know what you're doing, but it's going to be ugly, hacky, error-prone, and tedious.

    Not using traps, it's clearer what happens at specific points in the execution of the rest of the code, so simpler to reason about how to deal with those cases as/where they happen.
You can stick all kinds of logic into the cleanup function, but again, it's ugly.

    The second is different events can trigger an exit trap, and those may have different implications on what's going on.
    The third is there's parts of standards left out about what happens during/after a trap or when they get called, what data you have available, and different implementations can behave differently.
Which is why I think it's preferable not to do this in bash if you have any concern for why the program is exiting

    Fourth is that sometimes people will use an exit trap to, say, report on a failure, but they may have lost context of what block they were in when it exited, and now the error reporting doesn't tell you everything you wanted to know.
If you need any tracing, the only way this makes sense in bash is when running with -evx (errexit, verbose, trace) so you know exactly where you exited. This isn't always a bad idea, though -vx probably is most of the time.

If you think you can do any complex logic in the trap function then you have to consider whether that logic might fail at any point, and depending on the signal there's a good chance you're on a clock as well.


As with most things in programming, it sounds like if you use the wrong tool for the wrong job, then you're prone to writing bugs. Using traps is a great idea when used properly and dismissing it outright isn't doing anyone any favors.


I agree with you but also the person you're responding to; I think the article sells traps too hard as a handy multi-purpose tool like a swiss army knife, when they're really more like a letter-opener; there are situations where it makes sense to use them, but you'd usually be better off with something else, and not knowing the difference can result in trying to use it as a precision blade, which ends up mangling things.


    # usage : utility NEW_DIRECTORY -- user : I think not
    #
    trap 'cleanup' INT HUP TERM
    cleanup() { rm -rf "${mydir}" ;}
    
    # stuff
    mydir="${HOME}/$1" # but $1 is empty
    if test -z "$1"; then
        # handle; but while stuff is happening,
        # user presses Ctrl-C
    fi


If Bash has taught me anything it's that Bash should seldom be used. Always resist the temptation to be lazy.


Should go without saying, but don't rely on this for anything critical. It's not guaranteed this will run, even on successful completion of the script. Simple example: power is cut between the last line of the script and before the trap runs. Just a heads up


Not even a power cut- a SIGKILL (such as an OOMKill) is enough to cause the trap not to be run.


If you were building a critical system, what would do if power is cut after the last line of a script runs?


Idempotency is usually the best approach - have each step of a process examine the state of the disk and act only if it's an appropriate input then output something that isn't an appropriate input. Assuming you can ignore midstream corruption (which can safely be done by adding an atomically safe linking layer on top) then if power suddenly cuts out reboot and run the script until a clause finds an appropriate input and executes it continuing from there.

The concepts are simple, the implementation is a pain and (honestly) if you need something truly resilient you're probably better off leaning on a system that can provide that guarantee for you (like using a database for state storage).


>The secret sauce is a pseudo-signal provided by bash, called EXIT, that you can trap; commands or functions trapped on it will execute when the script exits for any reason.

"Secret Sauce", why is this secret at all.

Nothing against the author who's helping the ecosystem here, but is there an authoritative guide on Bash that anyone can recommend?

Hopefully something that's portable between Mac & Linux.

The web is full of contradictory guides and shellcheck seems to be the last line of defense.

- https://github.com/koalaman/shellcheck


It's secret enough to be well documented in the man page. The real question is, why do people look to random web pages prior to having digested everything in the manual? People used to say "rtfm" all the time, this would be regarded as shockingly rude in today's tech culture but it was a valuable public service to have it repeated, like being reminded to eat your vegetables.


Because ain't nobody got time for that. ;)

More seriously, I think that we have been trained to rely on just in time searches (or ChatGPT sessions) when we encounter the next thing we need to learn. RTFM is just so time consuming and I personally don't recall everything I have read, leading me to rely on search/AI to re-learn the next thing just in time anyways.

In some ways this is a vast improvement, which is why it's the default behavior now. Why cream your brain with information you might never use?

But it DEFINITELY has a weakness in that you don't know what you don't know. I never knew about this 'trap' trick, for example... and I didn't know I didn't know it, despite it being something I see as quite useful.

Side note: I think RTFM has historically meant "try to find the answer first before asking for it", leading to me designating LMGTFY (Let Me Google That For You) as the modern equivalent in this just in time searches reality we live in. I wonder how long it will be before we start saying LMAAIFY (Let Me Ask AI For You)...


The thing is that reading the manual isn't easy. If it was, everyone would do it, because the benefit is that, eventually, you know every topic or section that appears in the bash man page, and then eventually you know most of what that page says about most of those features. These efforts compound over time. If you reach that point, you know that if you don't know of a feature in that software, it doesn't exist. You eventually get a shape of the feature set, often as it was intended to be used by the author of the tool.

You can spend twice as much time over twice as many years reading random blog posts and googling, and you'll have no cohesive, comprehensive picture of the full tool and all its features. In this case, if you look at `man bash` and find the trap builtin function you'll learn about the DEBUG and ERR traps, for example, which I didn't see mentioned in the discussion here. These things might be useful to just file away; in case you ever need it someday, then you'll know it's there and exactly where to find it, not some half-remembered blog post you can't find again.

Over ten or twenty years, the difference between these two habits is night and day. The people who read the documentation first, and only then ask for help, and the people who ask for help first and get it and so never read the docs, end up in a totally different place with respect to overall confidence and comfort with the tools. Reading the man page means you don't get your answer right away, it's slower, less enjoyable, and less fun than googling. It's competing with content that was literally filtered by an engagement selection process. Of course it's less immediate gratification. The only reason people will do it is if they internalize the habit long enough to appreciate the benefits.

"RTFM" was a bit of social shaming, to tell people "don't be lazy, the answer you seek is literally in the documentation, please just read it". Shaming strangers on the internet turns out to not work well at scale, so generally we don't do this now, and the message just doesn't get passed on at all.

I've been shocked at the attitude even in some companies that reading docs is some kind of unnecessary or obsolete practice. As you say, it's become the default option to seek answers online. A culture of reading documentation still exists, but now it has to be maintained inside organizations that care about it, because it's no longer understood as the basic professional attitude.


I agree with the sentiment, but a there is a caveat: not all documentation is created equal.

There's incomplete documentation. There's API documentation without examples (ie specification but no example/tutorial). There's outdated documentation!

I can also say that I started studying SQL by reading the docs for mysql, and after an hour I was still stuck inside INSERT or SELECT. Reading about all use cases in detail was not useful to learn a first approach to the queries!

So I'd say that this "truism" isn't always true. Sometimes the docs suck or don't provide the info you need at that time.


Excellent comment, inimino.

> Over ten or twenty years, the difference between these two habits is night and day. The people who read the documentation first, and only then ask for help, and the people who ask for help first and get it and so never read the docs, end up in a totally different place with respect to overall confidence and comfort with the tools.

Ooof! This one's hit home.


Possibly? At this point there are a lot of manuals out there and it's unreasonable to try and read them all. Developers work with a plethora of tools and I think modern tools are getting better at not being surprising but some of these old tools have design choices that differ from modern habits.

I think this might be a bit easier to appreciate if you've ever worked at a young company and made choices because you have to (oh, we'll use MySQL, that sounds better than Postgres) and then give yourself an extra three months work in two years when you finally come up against a shortcoming in the tech. We have to make an awful lot of decisions and generally don't have the budget in time or money to fully grok the options we're deciding between.


> The real question is, why do people look to random web pages prior to having digested everything in the manual?

Easy! It is a reference manual. It documents every feature, even those that you should not use, does not discuss pitfalls, and does not discuss the best practice. Even worse, there is sometimes a disagreement whether using a particular feature is a good practice. Often there is a factor of adjusting your code style to something that is not too advanced for other people to review.


The funny thing about this comment is that man pages do all of the things you mention.

There seems to be a common sentiment that cargo culting random opinions from the internet is a "best practice" and that no code reviewer should ever have to learn anything new during the process. Both of these opinions are in my experience a fast track to a culture of mediocrity.


"rtfm" is definitely rude, and always has been. The f does not mean "fun".

Besides, it is slightly ignorant, and if I may say so, it can be a bit neurodivergent.

You see, manual pages are more often than not the wrong type of document to point people to.

`man` pages are information-oriented. You can (or should, if they are well written, which is not always the case, but that's another matter) find all the information about a piece of software there - they are references. As you say, it's something that needs to be digested before you can use it.

There's a certain kind of person, very commonly seen around computers, which can't help but digest a manual before they use a tool. Often they enjoy doing that. And that's fine. But that is not how everyone else does things.

People often have a particular problem they want to solve. They want to know how to zip a whole folder, or generate a ssh key with a particular algorithm. Whatever. Something concrete. They want to solve that, they don't want to "digest a document in order to solve that". That is not something those people enjoy, or are good at.

What those people need is a goal-oriented doc. Something that has a list of possible problems, and then gives solutions. Something that they can search for "whole directory" and find what they are looking for quickly. Something like a FAQ. This does not exist for all command lines (although the `tldr` app fits that well often enough for me).

Blogposts are often (yet another) type of documentation, they are learning-oriented. Like a tutorial.

The thing about blogposts is that they are indexed by Google, Bing and others. So in effect the combination of Google+Blogposts works like a big FAQ document.

Please understand that I am not trying to be offensive here.


Wow. What condescending hogwash!

You don't have to be born loving knowing how things work. But if you want to be a programmer (and most people don't) you should accept that the people who understand how things work are eventually going to run rings about people who don't. So you can either cultivate curiosity in how things work, or you can cultivate the habits and fake it till you make it. You put in the work if you want to develop a skill. There's no magic in it.

If you're not interested in being a programmer, then you don't need to waste your time reading man pages, obviously.


I was trying to answer the question "why people check online instead of reading the manuals". Again, I was not trying to insult anyone. I don't agree that the answer is "people are lazy and should not be programmers" is correct.

Programming is a vast ocean and very few people are going to know every single detail of ever single nook and cranny. If your day to day involves bash scripts, sure, learn and digest all of them. If you are a python programmer and you just want to compress a file you don't need to know all the flags that `tar` supports.


As an aside, I love fat manuals (and change/release notes) that have a bit of interesting arcana tucked away in one or two calm sentences here and there.

Not that I regularly read them all myself, but it's nice to be rewarded with a new dark art (like alias-based ~metaprogramming) every once in a while.


I don't know if it checks the right box as authoritative, but my goto guide has been tldp:

https://tldp.org/LDP/abs/html/


> is there an authoritative guide on Bash that anyone can recommend?

This guide is a great introduction and I refer back to it from time to time even after using Bash for ~15 years:

https://mywiki.wooledge.org/BashGuide/


Clearly it's a rhetorical device that may be a bit more heavy handed than one you'd used.


I just learned about these through “pair” programming with ChatGPT. It is the quintessential ML-enhanced programming trick: Using some old, robust language feature I’m skilled enough to grok but never had the time to learn about through endless documentation spelunking.

My opinion is that LLM pair programming is most or maybe only beneficial to already skilled programmers. ChatGPT can open the door for you, but it can’t show you where the door is. I needed the experience to ask it for a Bash script that handles exit codes gracefully, which is not a question all junior programmers would be able to ask.


Related, but I use exit traps (or actually ERR traps) to make debugging bash scripts at runtime a little easier. This will print the number of the line of the script that failed along with any error messages from the line that failed. This is useful if for whatever reason your logging system or whatever doesn't capture stderr

```

failure() {

  local lineno="$1"

  local msg="$2"

  echo "Failed at ${lineno}: ${msg}"
}

trap 'failure "$LINENO" "BASH_COMMAND"' ERR

```


This 100%.

I'll complete with patterns I'm using for exit traps:

- for temporary files I have a global array that lists files to remove (and for my use case umount them beforehand)

- in the EC2 example, I add a line with just "bash", so I have an env with the container still running to debug what happened and I just need to close that shell to clear the allocated resources


Just as an alternative suggestion, consider using a "down file" instead if you can and letting the code gracefully end. Plus you get to write "touch down" and do an end zone dance.


I think you need to be a little more specific about what the "down file" is for, especially since you can't for google it.


Apologies. It's for ending execution prematurely. Here's an example:

while true; do

  echo hello

  test -f down && exit
done

Now to stop early you execute "touch down".


I couldn't find a way to have more than one callback per signal, and created a system to have an array of callbacks:

https://github.com/kidd/scripting-field-guide/blob/master/bo...

A nice bonus is that it also keeps the return value of the last non-callback function, so your script behaves better when called from other scripts.


@redsymbol your site has a TLS certificate error. On Chrome I get NET::ERR_CERT_COMMON_NAME_INVALID because your certificate is from mobilewebup.com

Otherwise a good article. I use the following code to enable passing the signal name to the trap handler, so that I can kill the Bash process with the correct signal name, which is best practice for Unix signal handling (EXIT would have to be handled specially in `sig_rekill`):

    # Set trap for several signals and pass signal name to trap function.
    # https://stackoverflow.com/a/2183063/207384
    trap_with_arg() {
        func="$1" ; shift
        for sig ; do
            trap "$func $sig" "$sig"
        done
    }
    sig_rekill() {
        # Kill whole process group.
        trap "$1"; kill -"$1" -$$
    }
    # Catch signal and kill whole process group.
    trap_with_arg sig_rekill HUP INT QUIT PIPE TERM


Most people use Bash trap incorrectly, and it should be documented.

    # All normal and error exits

    trap 'e=$?; trap - EXIT; your cleanup here; exit $e' EXIT

    # Error only trap

    trap 'e=$?; trap - ERR; your error only cleanup here; exit $e' ERR
Save the previous exit condition to preserve it, otherwise it will be destroyed.

Untrapping is necessary to prevent multiple calls, especially if it can call exit or fail within the trap handler.

You don't need an exhaustive list of signals, which is almost never correct in oft touted cargo culted examples.


I like to use these in combination with set -e and report the error that happened to whatever is capturing stdout for logging.

You can report the error code with $? at the start of your trap, IIRC.


I wish there was a nicer shell scripting language that simply transpiled to Bash and would generate all this boilerplate code for me. There is https://batsh.org/ which has a nice syntax but it doesn't even support pipes or redirection, making it pretty worthless for shell scripting. I haven't found any other such scripting languages.


What's the difference between that and Go?


Go doesnt seem related at all?


Fyi, related article...: "Minimal safe Bash script template" - https://news.ycombinator.com/item?id=25428621


I like this but the lazy part of me just treats anything i write into $(mktemp -d) as something that will be eventually GC'd by the operating system. I have no idea when it actually happens, or if it does at all, but that's how i roll.


More so now with containers


> and may have security implications too

While it's certainly true that leaving around files with sensitive data is a security problem, you probably don't want to put sensitive data in /tmp to begin with.


Why not?


It's possible to gather some information from a directory to which an attacker has write access, though I'd have to look up details.

In general, this can usually be mitigated to some extent by creating a directory to which only the owner has access.

There are a number of ... interesting ... other circumstances which you might want to consider:

- /tmp is mounted as a ramdisk / memory-only filesystem. This is guaranteed not to persist over reboots, though there may be residual artefacts in memory even after a power-off. That last isn't a significant concern for many people, though it may turn up for others.

- /tmp is a network share. This is uncommon, but NFS + sudo across shared systems means that a user on a remote system may be able to assume your credentials and access or modify your data. rootsquash means that root isn't available, but sudo means that any UID can be defined.

- Various filesystem permissions or limitations may or may not apply to /tmp. I tend to prefer mounting /tmp as its own filesystem, with nodev and nosuid set. There might also be noexec, which can foul up a lot of temporary installation scripts.

An alternative is for users to define their own preferred temporary directory. I usually include ~/tmp under $HOME.


The program could get paused mid-execution. Moreover, I’m pretty sure a malicious process can put file watchers in /tmp and read all written contents.


If your script calls

  umask 077
...before creating temp files then they won't be world-readable. Still lots of pitfalls. (What user are you running as, and who else is running as that user? What's the mount point file system, and does it have POSIX permissions? Why are you persisting secrets to disk in the first place? Etc.)


What if instead of using a bunch of features bolted onto a shitty scripting language, we just used a real language like Python?

I've read enough hacked-together bash BS to just despise the language.


This is great and I have used it but the better advice is stop using bash scripts for anything other than your own personal scripts and use real languages for anything important.


Yes. I use them for cleanup in every non-trivial script I write.


Yep, I use these all the time, they’re very useful indeed.


Off topic but I really enjoy the lofi website design!


https://github.com/DaveJarvis/keenwrite/blob/main/scripts/bu...

My template script provides a way to make user-friendly shell scripts. In a script that uses the template, you define the dependencies and their sources as comma-separated values:

    DEPENDENCIES=(
      "gradle,https://gradle.org"
      "warp-packer,https://github.com/Reisz/warp/releases"
      "tar,https://www.gnu.org/software/tar"
      "wine,https://www.winehq.org"
      "unzip,http://infozip.sourceforge.net"
    )
You define the command-line arguments:

    ARGUMENTS+=(
      "a,arch,Target operating system architecture (amd64)"
      "o,os,Target operating system (linux, windows, mac)"
      "u,update,Java update version number (${ARG_JAVA_UPDATE})"
      "v,version,Full Java version (${ARG_JAVA_VERSION})"
    )
You define the "execute()" method that is called after the arguments are parsed:

    execute() {
      // Make the computer do the work.

      return 1
    }
If the script takes arguments, handle each one individually:

    argument() {
      local consume=2

      case "$1" in
        -a|--arch)
        ARG_JAVA_ARCH="$2"
        ;;
        -o|--os)
        ARG_JAVA_OS="$2"
        ;;
      esac

      return ${consume}
    }
Then call the template's main to start the script rolling:

    main "$@"
For 99% of the scripts I write, this provides:

* Built-in software dependencies verification.

* Instructions to the user when requirements are missing.

* Simple command-line argument parsing.

* Help and logging using ANSI colour.

Here's a complete script that builds the Windows, Linux, and Mac installers for my Markdown editor:

https://github.com/DaveJarvis/KeenWrite/blob/main/installer....

There's a write-up about creating the script that has a lot more details about how the template works:

https://dave.autonoma.ca/blog/2019/05/22/typesetting-markdow...

Note that it is technically possible to improve the scripts such that handling individual arguments can be done in the template itself. This would require a slightly different argument definition semantics:

    ARGUMENTS+=(
      "ARG_JAVA_ARCH,a,arch,Target operating system architecture (amd64)"
      "ARG_JAVA_OS,o,os,Target operating system (linux, windows, mac)"
      "usage=utile_usage,h,help,Show this help message then exit"    
    )
By detecting an `=` symbol for the first item in the lists, it's possible to know whether a command-line argument is assigning a variable value, or whether it means to perform additional functionality. (PR welcome!)


I wish Bash had 'defer' like Go.


https://cedwards.xyz/defer-for-shell/

Enjoy. (blog post is mine)


I'd like a way to do this for bash functions which I use quite extensively.


Emacs C-c C-t will insert that for ya in shell-script-mode


It is sad that trap interface is not reversed:

  trap EXIT WHATEVER -- cmd args
But as it is common with bash interfaces crystalized before good practices became apparent.


Stylistic bikeshedding.

3 keywords for normal, abnormal, and all exits would be semantically clear.

The best practice for a problematic language is to use something else.


Just use Python or any other proper high-level language that has proper control structures.


Sometimes you can “just use python”, like openwrt script or similar.


According to the openwrt website[1], packages exist for Erlang, Lua, node.js, Perl, PHP8, Python Ruby and Tcl.

[1]: https://openwrt.org/packages/index/start


OpenWrt has both CPython and MicroPython packaged.


This is very useful to know - thanks for sharing!


Very cool! didn't know about these


Poor man's defer


[flagged]


bash scripts have their use cases, many things are shorter and simpler than in Python. But coders should bother to learn how bash works and use shellcheck. Just guessing from how things work in another language typically leads to buggy code. Keeping a daemon always running is not a task for bash. systemd is typically much better at that (although something like exponential backoff in case of failure seem to be tricky)


A good read before dismissing http://n-gate.com/software/2017/


I disagree with his disagreement. I'm not able to overthrow my government to make it illegal for my only ISP to stop intercepting my traffic. HTTPS simply makes it impossible for my ISP to add stuff to the page in transit.


What's the gripe with Let's Encrypt? Certificate transparency?


Some of his arguments are wrong, some are not even wrong, some are absurd. Plus he seems to be an asshole. Being an asshole and wrong at the same time is not a great combination.


I think you would be very interested to read their takes on HN articles and Fossdem :p


Can you expand on your first sentence with some reasons or justifications for stating this?


For most use cases a guy like him writes a shell script, there's already some well written software. F.e. in his case, he'd used something like Dagu. Don't write shell scripts. If you want to do some programming, pick a proper programming language.


I thought exit traps were just SPACs




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: