This won’t work if you retry another function and are relying on set -e to detect errors:
set -e
retry() {
until "$@"
do
sleep 1
done
}
f() {
false
echo oh no
}
retry f
# oh no
set -e is ignored when a function executes in the context of a predicate.
Shell scripts are great for when you just need to run stuff. If something bad happens you notice it and run another command to ameliorate the error. This is how most interactive shell sessions pan out.
As soon as you need to handle errors programmatically I heartily suggest that you switch to a language with fewer quirks than sh.
You should never use set -e. Because shell scripts use exit status both to indicate failure and as a kind of a boolean, this option has to try to guess your intent by using a bunch of rules that are convoluted and can differ between shells, or even between versions. It's a matter of time before they do something you don't expect. Putting || exit or || return everywhere really is the better option if you can't switch to a better language like Python.
No no no. Shell scripts without -e is one of the biggest footguns since computers were invented. Second being shell scripts with -e enabled ;) As others mentioned -u and pipefail are also highly recommended.
Even mundane shit like ‘cd’ can fail where you least expect it, requiring || exit on practically every line of the script. Forget one place and boom. Defeating the premise of “it’s just a short “simple” script so let’s not write it in a sane language”.
If you want to write serious scripts and spend weeks scrutinizing every line, NASA style, it is perhaps better to explicitly handle every potential error. But good luck doing that in a large team of junior web developers just glueing stuff together in the build pipeline.
The pitfalls with -e linked above are mostly related to functions, if your script reached the size of requiring functions you have kind of lost already.
> Even mundane shit like ‘cd’ can fail where you least expect it, requiring || exit on practically every line of the script. Forget one place and boom.
This is true, but a much better solution is for everyone to use shellcheck. You can require that all shell scripts pass it without warnings and that every shellcheck disable comment has a good explanation attached; the first part can be automated with a pre-commit hook. In return, you get a script that is explicit and predictable rather than relying on a set of rules which you don't understand, and which may come back to bite you later when you change shells or even bash versions.
> spend weeks scrutinizing every line, NASA style
This is a colossal exaggeration. I've switched from set -e to putting || return or || exit everywhere and it barely takes more time after I got in the habit. On the rare occasions I get distracted and forget, shellcheck reminds me. I have my editor configured to run it on the fly and underline lines with warnings.
While shellcheck is a fantastic tool, it doesn't test for lack of error handling generally. In this area it only has a predefined set of known pitfalls, like "cd". Try running this through shellcheck and it will happily let it through.
curl icanhazip.com > myip.txt
cat myip.txt
Taking you back to the every line must be scrutinized problem. Maybe you can do it. Maybe i can do it. But does the rest of your large organization of diverse experiences have the discipline to do it? In my experience, from multiple organizations of different sizes - no. Even with shellcheck and mandatory reviews it just doesn't work. It's tiring to for the 100th time remind someone that curl can fail and arguments must be quoted, you worry being labeled as the know-it-all-besserwisser that shoots down even simple scripts, that's assuming the reviewer spots the errors in the first place. What's worse is that the code doesn't immediately tell you if errors has been considered or not, making it very difficult to read existing scripts.
Thanks for sharing where you're coming from. My opinion comes from maintaining shell scripts either alone or in small teams of people who learn quickly. Perhaps the optimal solution is different when the team is large. I still think that manual error handling is best if you're disciplined enough to do it, but I'll make sure to qualify this view next time I'm talking about it.
This is terrible advice. -e doesn’t prevent you from adding more careful error handling but it does catch a significant majority of the problems which will, with absolute certainty, happen because nobody ever adds “|| exit” everywhere it’s necessary.
-e and shellcheck should be mandatory for any script which can’t be written in Python. Once you’re over a screen or two of code, it’s almost always shorter to write in Python anyway and it’ll be easier to understand.
If you already use shellcheck and know to add || exit, there's no need to rely on a half-solution that may fail in unintuitive and hard to predict ways. Counting on convoluted "automagical" rules to guess your intent correctly every time is how you write unreliable programs. set -e just gives you a false sense of security.
> Once you’re over a screen or two of code, it’s almost always shorter to write in Python anyway
I would argue that it’s better to start with -e and deal with the handful of edge cases where something like grep is expected to return a non-zero value rather than have to remember to check every command. We now have half a century of evidence suggesting that won’t work reliably for all but the most conscientious shell scripters.
I think your script chugging along in the face of a failure and running whatever commands that probably relied on what happened before it blindly is probably more evil, though.
set -e creates two problems. The first is that what is considered "failure" by the shell is not consistent or intuitive. The second is that what each of your tools consider failure is also not necessarily intuitive. You're getting a false sense of security at best.
E.g. grep considers no matches worthy of a non-zero (a.k.a. error) exit code while df considers an invalid block size an error worthy of a exit code of zero. Ostensibly printf(1) exits with non-zero on an error, but a format specifier with insufficient arguments is not an error.
set -euo pipefail should be the default so that the script exits on at least some unexpected state/command results. If you want to do more then that with various fallbacks etc... then use another language.
pipefail is worse because tools like grep return non-zero for things that aren't inherently errors. Error handling in sh-like shells is archaic enough that if you're reaching for it you should strongly consider reaching for a different language.
I'm saying that a script which would have the potential to fail unintuitively from `set -eup pipefail` probably shouldn't be written in shell and most definitely shouldn't continue to execute once an unexpected state has occured.
You're right that there are a lot of cases in which non-zero exitcodes are the expected behaviour. But if you're accepting these commands with non-zero exitcodes then you're already doing error handling by verifying the output of the commands (at least hopefully), which was the original criteria from the OP for writing the script in another language.
I posted this because it worked perfectly for what I needed it for: a watchdog script on a cron job that issuis a PING to a process that occasionally times out. I'm sure it has a million flaws. Everything in Bash does, depending on who you ask. I'm certain there are a thousand reasons it won't work for somebody's application. It worked for mine.
Changed a bit in case of wrong or weird input. (The quotes on the right side of assignments aren't necessary but I find it to be a good practice to use quotes everywhere.)
#!/usr/bin/env bash
function retry {
local retries="$1"
shift
case "$retries" in
""|*[!0-9]*)
echo "retry: First argument must be a number, got '$retries'" 1>&2
return 1
;;
esac
local count=0
until "$@"; do
local exit="$?"
local wait="$((2 ** count))"
count="$((count + 1))"
if [ "$count" -lt "$retries" ]; then
echo "Retry $count/$retries exited $exit, retrying in $wait seconds..." 1>&2
sleep "$wait"
else
echo "Retry $count/$retries exited $exit, no more retries left." 1>&2
return "$exit"
fi
done
return 0
}
retry "$@"
Pure exponential back off can result in very long delays. It is often useful to truncate the sleep time, and for most contentious jobs also some jitter.
This function is called with an argument that limits the number of retries, so unlimited exponential backoff isn't an issue. Jitter is an interesting idea.
I would add some random jitter to the timing of the backoff. If the fail is due to two such processes getting themselves into some sort of deadlock and they both fail at the same time, they might retry at the same time if at the same part of the retry cycle and fail again for the same reason.
bash’s ‘if’ command will executes its arguments and enters the clause if the exit code is non-zero. Usually we use “[“ which is the test built-in to accomplish this (like “if [ $n -gt 5 ]”), which works the same way. But here the script uses “$@“, which is the rest of args “splat” in bash. The script passes those args to the “if” command which will thus execute “$@“ and enter the inner clause if that command returns non-zero. This is exactly what we want, since that inner clause is the retry logic. Just fun and weird bashism in the wild.
(Sorry if there’s minor errors here, on a phone and going off of memory)
Because *any bash arithmetic expression that evaluates to zero returns the exit code of 1*.
So this works:
#!/usr/bin/env bash
set -o errexit
i=0
((++i))
echo $i # 1 will be printed to stdout
But this doesn't:
#!/usr/bin/env bash
set -o errexit
i=0
((i++)) # terminates here with $? of 1
echo $i
The relevant doc is for `let`:
let arg [arg ...]
Each arg is an arithmetic expression to be evaluated (see
ARITHMETIC EVALUATION above). If the last arg evaluates
to 0, let returns 1; 0 is returned otherwise.
Yeah it's best to just use POSIX shell arithmetic, and not be clever, and not the other 2 ways bash has to do it! THat is:
i=$((i + 1))
The (( )) construct, without the $, is different, and it's also not POSIX. It has the gotchas around the exit code, which a plain assignment doesn't have.
`(( ))` is an arithmetic evaluation block. Its content has to be an arithmetic expression. Arithmetic expressions don't require `$` before simple variable names and some more complex expressions like array indexing.
`(( i + 1 ))` will evaluate the result of adding one to `$i` and then throw away the result. It doesn't do anything useful (other than having a different exit code depending on whether the expression evaluated to 0 or not).
`$(( ... ))` evaluates the expression and then returns its value. ie `i=$(( i + 1 ))` will increment `$i`, just like `(( i++ ))` did.
Shell scripts are great for when you just need to run stuff. If something bad happens you notice it and run another command to ameliorate the error. This is how most interactive shell sessions pan out.
As soon as you need to handle errors programmatically I heartily suggest that you switch to a language with fewer quirks than sh.