If you’re writing shell scripts you should have https://www.shellcheck.net/ in your editor and pre-commit hooks to catch common footguns.
Even then, my threshold for “this should be Python” has shrunk over the years: I used to say “greater than one screen of code” but now it’s more like “>1 branch point or any non-trivial scalar variable”.
I keep posting this, but my favorite rule of thumb came from a Google dev infra engineer who said that "every Python script over 100 lines should be rewritten in bash, because at least then you won't fool yourself into thinking it's production quality"
"If you are writing a script that is more than 100 lines long, you should probably be writing it in Python instead. Bear in mind that scripts grow. Rewrite your script in another language early to avoid a time-consuming rewrite at a later date."
Stuff like this does a huge amount of unintentional damage. Every crappy company in the world thinks they're Google and tries to copy them. Google-caliber people might be able to be reasonable about this, but ordinary employees come across something like this and interpret it to mean "bash is evil" and ostracize everyone who tries to write a bash script for anything, no matter how sensible.
At work lately, there's been a spate of contorted "just why?" Python scripts that could've been accomplished elegantly in a handful of lines of shell. While no one would select shell languages as the ideal for a lot of complex logic, data parsing, etc., there's no competition for a shell when you need to do what shells are meant to do: chaining invocations, gluing and piping output, and so on.
> Google-caliber people might be able to be reasonable about this...
I'm not sure about that. Google certainly has some very talented people among their developers, but most of them are so bad at programming that they had to invent their own heavily watered down programming language.
Shell is brilliant until you need to keep a filename in a variable. Or, worse, read filenames from the output of a command. Since basically everything except '/' and '\0' is a valid filename character, the potential for mishaps or exploits is huge.
That's where we came in with the need to use "$@" vs the unsafe-by-default bare $@.
I mostly agree, but it's sometimes a necessary evil. There's a reason I praised shell languages in general rather than bash specifically; fish shell, for instance, is much more comfortable.
In most languages, you don't need to know/understand this much minutiae about string handling; as an example, I've been doing open source dev for years now in Bash and I never knew the distinction OP posted.
Related, the longer your shell script gets, the greater potential for accumulated foot-guns. When you finally do the rewrite, will you be bug-compatible? And once discovered, will you be confident on which side the bug lies?
That way, it will never be rewritten again; because nobody will be able to read it.
Edit: Getting some downvotes, so to clarify, I am indeed saying that concerting something from Python to Perl has a high likelihood of making it for less maintainable. I get that people can write good perl. As someone who has had to maintain perl in the past; the fact is that it's far more common for the end result to be horrible perl. I have some issues with Python, but it is FAR more maintainable than perl.
Let me drag this to something concrete: if you are interviewing, it is almost impossible to pass an amateur "Perl screen".
Every single person who doesn't use Perl every day, professionally knows a different subset of Perl. This is one of the few times where being more knowledgeable than the interviewer is also a problem. You will write something, and the interviewer will question you on it because he has never seen it.
I finally solved this problem by bringing one of my personal programs written in Perl and also rewritten in Python so we could talk about it. Now, the interviewer is in MY subset of Perl AND can't argue because I have working Perl code in front of him. This was back in the 1990's when everybody in VLSI design expected you to know Perl.
Because of this "different subset" issue, if you want to maintain a Perl script, you have to basically know the entire language. This is what makes maintaining Perl scripts so difficult.
This "different subset" problem is the whole reason I left the Perl ecosystem back in 1996(!) at the height of Perl's popularity and never looked back.
> This is one of the few times where being more knowledgeable than the interviewer is also a problem
I have actually found that is pretty much always a problem. If you pull out something the interviewer is unfamiliar with they will often assume you are full of it. I have had such people refuse perfectly good explanations for things they haven't heard of because they assume incompetence before that possibility.
Indeed. I've "failed" interview screens because the reviewer's Python installation was broken, and rather than reading the traceback and realizing this, they just assumed the candidate's code was broken.
Last I knew, that position was still open almost a year after the fact.
Dunno. I finally left VLSI design because it was effectively a career dead end. Sense a trend? :)
While I still regard myself as a vastly better VLSI designer than programmer, my ability to wrangle software the whole way from assembly language on a chip to just shy of the top of a full web stack pays far better than my ability to wrangle transistors. And, in my opinion, attacks far more interesting problems.
That's a cute heuristic, but I think the better practice is to distrust scripts without tests as they can quickly diverge from the rest of the codebase.
It's still better to script in Python or Ruby than Bash. Nobody understands Bash. It's even more mysterious than Perl.
I’d rather write bash for orchestration than nearly anything else: bash is designed to make coordinating processes easy, something very few programming languages have managed to do.
The thing that gets me about all the new shells and shell scripting languages popping up these days is they loosely seem to fall into 2 categories:
1. more emphasis traditional programming paradigms (be it JS, LISP, Python, whatever) which leaves a platform that is arguably a better designed language but however is a poorer REPL environment for it. Bash works because it's terse and terseness is actually preferable for "write many, read once" style environments like an interactive command prompt.
2. or they spend so much effort supporting POSIX/Bash -- including their warts -- that they end up poorer scripting languages.
I think what we really need isn't to rewrite all our shell scripts in Python but rather better shells. Ones which work with existing muscle memory but isn't afraid to break compatibility for the sake of eliminating a few footguns. Shells that can straddle both the aforementioned objectives without sacrificing the other. But there doesn't seem to be many people trying this (I can only think of a couple off hand).
I've been writing a lot of Powershell lately, and my only real gripe with it is that it seems suspiciously like not being Posix compliant in any way was a design goal.
I agree with the idea of breaking backwards compatibility, but Powershell honestly has enough core design issues that it itself is starting to feel like it needs a major backwards-incompatible update.
It's also subject to many of the same footguns as Bash so I'd put that into the 2nd camp (re my previous post).
Not that I'm taking anything away from zsh. It is a nice shell. But I think we can do even better considering how dependant we still are on shells for day to day stuff.
> Zsh had arrays from the start, and its author opted for a saner language design at the expense of backward compatibility. In zsh (under the default expansion rules) $var does not perfom word splitting; if you want to store a list of words in a variable, you are meant to use an array; and if you really want word splitting, you can write $=var.
I realize that it has similar footguns, however reading through their info pages, I was surprised by how many they just decided to fix, unless you explicitly turn on compatibility mode.
Go is actually superb at this IMO. Channels and goroutines provide the actual primitives you need to handle streaming data, and the integration of the context library makes starting up and shutting down everything a breeze.
Best of all though, it's absolutely compatible wherever you need to run it.
The issue with I’ve had with go is that it’s batch compiled: I can repl something together in an interactive shell and then generalize it, which is my preferred workflow
Here's one thing that I find difficult to remember how to do in shell: run a command, have stdout go to a file (or /dev/null or whatever), but pipe stderr to more or grep or some other program to process.
I mean, I would expect the syntax to be something like:
make >/dev/null 2| more
but no ... it's some incomprehensible mess of redirection arcana to get that to work.
Note that the order of redirections
is significant. For example, the
command
ls > dirlist 2>&1
directs both standard output and
standard error to the file dirlist,
while the command
ls 2>&1 > dirlist
directs only the standard output to
file dirlist, because the standard
error was duplicated from the
standard output before the standard
output was redirected to dirlist.
Does it look a little arcane, yes, esp. until one memorizes it.
Can it be memorized? Yes, because it is just a single 'incantation': "2>&1". Just put that redirection operator before the redirection of the standard output to the file, and the result is stdout goes to the file, stderr goes to the pipe.
Just in case someone's confused by this, it becomes clear if you know how pipes work. `2>&1` means "redirect stderr to whatever stdout is currently pointed to" (redirection by copy, not by reference). More recent version of bash manual[0] has a less confusing wording in the same paragraph IMO:
Note that the order of redirections is significant. For example, the command
ls > dirlist 2>&1
directs both standard output (file descriptor 1) and standard error (file descriptor 2) to the file dirlist, while the command
ls 2>&1 > dirlist
directs only the standard output to file dirlist, because the standard error was made a copy of the standard output before the standard output was redirected to dirlist.
It's actually a pretty simple model. Every process has an array of open file objects. File descriptors are indexes into that array. "Redirection" copies the underlying entries, and are processed left-to-right.
"Redirecting streams" winds up being confusing - it's all just `dup2`.
Okay, but can you pipe stdout to one string of commands, and stderr to a different string of commands? That's something I feel should be possible but how Unix shells handle redirection is just ... alien to my way of thinking.
Yes, I know that underneath it's all calls to `pipe()` and `dup2()`, which I can do (and have done) in a language other than shell. It's the shell redirection syntax (for anything more complex than simple redirection or a pipe) that just doesn't make sense to me.
Note, the extra 'cats' are just to show a "string of commands".
How to read this:
When Bash sets up the file descriptors for 'command' it initially makes descriptor 1 refer to the pipe, and descriptor 2 refer to the terminal.
So, the dup operator (2>&1) copies the fd in descriptor 1 (which is the pipe) into descriptor 2 (so after the dup operator is processed both stderr and stdout for command reference the pipe).
Next descriptor 1 is replaced (the > operator) by a reference to a fifo created by the process substitution operator (the >(...) operator).
So now command's descriptor 1 refers to the fifo created by >() (which itself contains a "string of commands").
Then, because stderr for command was made to be a copy of what was previously stdout (before modifying stdout) it continues to refer to the pipe Bash setup (the | operator), and stderr now flows out over the pipes to another "string of commands".
I want to disagree with this so badly because I have tried to do stuff that's just a bit too complex for bash and ended up scrapping the idea... but I can't. Hacky python is even worse.
Another rule of thumb would be: "How many years from now do you still want this to work?"
If I run a shell script I wrote ten years ago, it works.
If I run a Python script I wrote ten years ago, it's quite likely to fail with a SyntaxError.
This, I say as someone who loves Python and I use it as my primary language both privately and at work. But I have to admit, Python scripts do not really age well.
I haven't really found that to be a problem, with some code I still use daily which was started before Python 3.0 was released. The py3 transition was real but it was also something which took seconds to run through futurize/modernize.
Over the years I've spent way more time going into inconsistencies about the various platform tools (e.g. GNU vs. BSD implementations of sed, grep, find, etc. even before you ge to the space aliens-with-broken-translators realm of AIX) — which using Python avoided needing to care about — or library / file naming changes. Similarly, a few years back there was way more time used when various HTTPS improvements flushed out code using old versions of OpenSSL or, worse, gnutls.
(Can't remember the details, but it may have been something about some extra hoop you'd have to jump through to trust self signed certs that broke our testing infrastructure)
Totally agree - I just don’t see that as a distinguishing factor since we had a bunch of shell scripts break, too, as various CLI tools stopped behaving consistently.
I guess this is a YMMV thing. My experience is that Python rots more than shell scripts, but of course I understand that this may not be universally true, and depends a lot on what your scripts actually do (and, to a large extent, on how they're written, of course).
Before that, well it's been a while. But with and yield were added as a keywords (breaking any code using them as a variables). The xreadlines module disappeared at some point (I'm sure a few other modules have died along the way). At some point you couldn't raise anything you wanted as exceptions any more. Oh, and the source encoding declaration. That's what I remember off the top of my head.
Minor things that are handled in things that are maintained, sure. But many things are also built once and then expected to run for a long time - especially in the role shell scripts usually fill.
Note, that I don't argue that any of these changes were bad. They were all good changes that made the language better. But, they resulted in old code breaking. One extra headache to deal with for the poor schmuck that was responsible for upgrading this server or that to the latest OS release.
I had one like that, years ago, in a 2.3 to 2.7 environment update. Had to just update a few lines. But, we knew to watch for it in test, cause we knew of the update. We test our she'll scripts too, when the env changes. Anecdotally the Py is slightly more touchy than Bash
I don't know, I often miss the pipe operator in python, which makes certain things much easier.
I once created a small class to simulate this a bit, it allows you to use + as pipe operator. Wouldn't use it in production though, was just fun to write it. https://pastebin.com/eQacwLj7
Why not use | as the pipe operator? Just use __or__ instead of __add__
I have done something similar too once in a shell-like Python CLI I worked on. Can also use __ror__ to be able to pass primitives into your commands like `[1, 2, 3] | SumCmd() | PrintCmd()`
I do agree that typing something like that, especially when working in a shell, is much nicer than having to go back and forth all the time to type `print(sum([1,2,3]))`
Doesn’t python have channels? The pipe is essentially a channel combined with a close message (a null object I guess) Otherwise you can do something similar with iterators and function composition. I guess the syntax isn’t quite as easy to read if you’re not used to function composition?(which seems surprisingly difficult for some beginners)
This is exactly my rule of thumb for this as well, and it's funny that that's exactly the feature that tells me to start using Python. Yes, there are arrays in Bash, but every time I start to look up how to write them (again), I think, "Ennnhhhhh...maybe not" and rewrite the darn thing in Python.
Uh.. you just reminded me of a python syntax hack that might come in handy... I'll give it a look again today at work, maybe I'll write some PoC and link it back here :)
What about some sort of complexity checking more precise than lines of code? What about trying to write one thing you would normally write in shell with something else per unit of time.
I would simply say if you're planning on making changes to the script you probably shouldn't write it in bash. Or to be a little more rigorous, don't use bash if you couldn't rewrite the entire script, correctly, from scratch, in an hour.
Although I think all of these things are just complicated ways of saying "please seriously reconsider writing it in bash."
But I write bash scripts all the time, I just try to keep them as short and simple as possible.
I don't think this is like the 80 character line length that's about screen size. This 100 line limit is framed as a quick and dirty for heuristic for script complexity.
A common problem that happened to a coworker was he made a quick bash script for something simple, then kept adding "just one more" thing with sunk cost fallacy not wanting to take the time to rewrite it. Eventually the monstrosity created was too difficult to debug and it had to be rewritten in a different language.
I’ve seen the same thing happen with any language. Generally tends to happen when a dev hasn’t thought through the scope of what they are doing beforehand. I’ve written some ugly python in my earlier days due to this as well.
My point here is it is less to due with the language and more to due with the mindset when solving a problem.
The main issue I see with more inexperienced devs with bash is that they tend to think it’s okay to be lazy with the code because it’s just “bash”. If you would write safety checks and comments in your python you should be doing the same in bash really.
^-- SC2148: Tips depend on target shell and yours is unknown. Add a shebang.
Being new to shellcheck, not familar with options or what it does, so I hastily and erroneously typed:
shellcheck -shell=bash script
Note I learned UNIX via NetBSD. I prefer and use their version of ash for both interactive and scripting use.1 I never got used to "--" GNU-style long options. I sometimes type a single "-" out of habit. Anyway, here is the output I got from shellcheck:
Unknown shell: hell=bash
I agree with shellcheck.
Although there may be some irony in the fact it cannot sort out it own argument parsing.
1. I do not use other scripting languages such as Python, Perl, Ruby, etc. That means, e.g., for quick and dirty one-offs and prototyping, I can omit the shebang. Debian's "dash" scripting shell is derived from NetBSD's ash, the one I choose for interactive use.
SC2148: Tips depend on target shell and yours is unknown. Add a shebang.
If you google what a shebang is, the top link for me is a Wikipedia article on the subject [0]. A shebang is basically just a line (always the first line) in a file which tells the operating system what program to invoke to execute the script. There are different shells beyond just bash, so shellcheck wants to know which flavor the shell is written for and uses the shebang to figure it out.
I always have the top of my shell scripts with a shebang, even if the script isn't intended to be directly executed.
Pick the user's bash from PATH environment:
#!/usr/bin/env bash
Or specify a specific bash:
#!/bin/bash
Or use whatever plain-shell is installed:
#!/bin/sh
Or maybe it's a Python script:
#!/usr/bin/env python3
Or it's a text file:
#!/usr/bin/env vi
If you're not using shebangs then you're probably writing your scripts wrongly.
second this. We recently added it to a project at my company (https://github.com/homedepot/spingo) as part of a github action and it is awesome. A quick search for the specific code in the shellcheck wiki reveals the problem and a solution. I've had no real issues with it yet.
To be honest, for most of my scripting needs I usually start with Python directly and just go mass os.system() or commands.get_output() calls. Later on I refactor into subprocess.Popen as needed
I've seen way too many devs try to recreate gnu coreutils their way because of a silly aversion to bash. As a sysadmin (sorry, thats not popular these days cough Ops guy) you can pry the bash out of my cold dead hands, and most of the "wierd edge cases" are easily avoided just like those of any language.
I know everybody likes to think devops and cattle/pets and "you should never ssh into machines" are how things should be and there thats how they are, but in the real, non-sv software startup world sysadmins around the world who get that 3am call are fixing some devs shit with bash and sysv/systemd scripts.
I feel at this point it's just a bandwagon people jump onto because they want to feel superior. Just mention bash on HN and expect any number of "... don't use bash" comments.
Bash best practice is always double quote variables! Do that and the post becomes rambling about what happens when you dont follow standard bash practices.
Bash also incorporates the POSIX shell, but has extensions. This is fine if you are using it, but if you are writing a script which may need to run on another system, its better to keep it to POSIX.
Ick. I have written my fair share of bash, and stuff like this is very common. Most things in bash are just inherently non-compositional and/or is riddled with weird corner cases that you just have to know about in order to not shoot yourself in the foot. This document [0] made the rounds on HN a while back, and it has, together with the associated tool, been something that I have regularly consulted whenever I have had to do anything non-trivial with bash (anything that has to deal with arguments to commands is already non-trivial to get right).
oil$ var myarray = @('has spaces' foo)
oil$ var s = $'has\ttabs'
# function to print an array element on each line
oil$ lines() { for x in @ARGV; do echo $x; done }
# pass 3 args -- 2 from myarray and 1 from s
oil$ lines @myarray $s
has spaces
foo
has tabs
I don't really think Oil is relevant to this topic. The only good reason I can think of for someone to script in bash is for portability purposes. If someone wanted portability without bash's shitty syntax, something like Python would be a much better candidate than Oil. One could also argue that there's no meaningful difference between e.g. fish scripts and Oil scripts because neither will work on the standard shell. It's also possible to run bash scripts from fish by simply calling bash. Right now, Oilshell is a reasonable choice for an interactive shell with bash compatibility, whereas Oil syntax is just as, if not more, useless for public distribution as fish.
As I see it, the goal for a project like Oilshell (a shell with both a new syntax and support for standard bash syntax) would be to replace bash as the default shell in distros. Until then, Oil scripts lack the primary feature of bash scripts just like other alternative shells.
the goal for a project like Oilshell (a shell with both a new syntax and support for standard bash syntax) would be to replace bash as the default shell in distros.
Thing is, if the script is basically the glue between incantations of multiple other commands (which is basically the intended use case of shell scripting), then replacing that with Python[1] is just adding lots of boilerplate code for no real improvements in functionality. I still agree with a strict limit on the acceptable complexity, though.
Most if not all my shell scripts are just piping executions of external commands. I find all the code needed to properly run a process and process its output is much easier with the UNIX toolbox and a couple of pipe commands, than having to handle all those input/output buffers, command execution modes, etc in any other shell script language.
OTOH Plumbum [2] has been mentioned here, and it seems fantastic for that use case. But I think the issue is obvious, in that it took a conversation in HN to raise awareness of this tool: it is not officially promoted, or recommended even, as the solution for replacing shell scripting, so it is kind of obscure (unless you are actively into the language or somehow by chance end up getting to know about it, that is)
There is also the thing about choosing Python to replace Bash scripts would force having to install Python in all of the project's Docker images, while a short POSIX script works as-is.
[1]: Saying Python because that's the most common suggestion for replacing Bash.
I can count on ten fingers the number of times in twenty years I've worked with another bash coder who did the su and ssh cases correctly without triggering escaping bugs. It's not any insult on them, but it's almost always done incorrectly and happens to work due to the absence of whitespace and backslashes, leading to eventual bugs (that Shellcheck won't always catch). Given:
# ARGV=( "one two", "three four" )
It's probably safe to recommend "$@" for use with su only whenyou use -c correctly, as you're locally specifying the args without any further IFS interference. But $* isn't usable:
# CORRECT
su root -c 'rm "$@"' -- "$@"
rm "one two" "three four"
# incorrect
su root -c "rm \"$@\""
rm one two three four # wrong arguments
# incorrect
su root -c "rm" "$@"
rm # -c doesn't use arguments
# incorrect
su root -c 'rm "$@"' "$@"
rm "three four" # loses the first argument (?!)
# incorrect:
su root "rm" "$*"
rm one two three four # wrong arguments
# incorrect:
su root "rm $*"
"rm one two three four" # command not found
It's probably safe to recommend "$@" for use with ssh only when using printf %q to ensure that you escape your arguments for their transit through ssh to the remote host, as otherwise the arguments get corrupted by the extra layer of shell processing. $* isn't usable here either:
# CORRECT
ssh remote -- 'rm '"$(printf '%q ' "$@")"
rm "one two" "three four"
# incorrect
ssh remote 'rm '$(printf '%q ' "$*")
rm "one two three four" # wrong arguments
# incorrect
ssh remote rm "$@"
rm one two three four # wrong arguments
# incorrect
ssh remote "rm \"$@\""
rm "one two three four" # wrong arguments
# incorrect
ssh remote 'rm "$@"' "$@"
rm one two three four # wrong arguments
# incorrect:
ssh remote "rm" "$*"
rm one two three four # wrong arguments
# incorrect:
ssh remote "rm" "$*"
rm one two three four # wrong arguments
EDIT: Shellcheck misses 3 of the 4 broken su cases, but catches all of the broken ssh cases. (And produced a warning I disagreed with in one of the complete examples, but in the spirit of things, added double quotes to silence it.)
The first instance ('rm "$@"') is part of the argument to -c. It is the "command" that su will have the spawned shell execute. The single quotes around the entire command pass the whole command, unchanged, onward to the shell that will be spawned, so that what is executed by that spawned shell is rm "$@" .
The second instance is the argument list being given to su by the shell running the su, and it is just normal "$@" semantics there. One has to realize here that the second "$@" is being expanded by the current shell (the one running su) while the first "$@" is not expanded by the current shell, but is instead expanded by the spawned shell.
The "$@" is needed twice, because two expansions ultimately take place, the first expansion occurs in the current shell, the second one is delayed and occurs in the spawned shell.
The command is going to be evaluated by bash, with ARGV set to what you pass after the hyphens as arguments to the bash environment that su spawns. Passing the arguments to su using "$@" to encode your calling function’s ARGV ensures that they’re uncorrupted into the su-spawned bash environment’s ARGV, and then executing the literal command rm "$@" ensures that they’re uncorrupted into the spawned rm command’s ARGV.
You could fake this with printf %q if you tried hard enough but that’s a high-risk game to diagnose issues with, for example when you’re trying to figure out why newlines are reaching the command as the letter n.
I was wondering if you could elaborate a little bit on what you mean by:
> as you're locally specifying the args without any further IFS interference
I know that IFS = Input Field Separator. But how does it interfere when doing su root -c “command”? I might be missing something obvious here.
Also the ssh + printf is gold! But to be honest, I would personally never use anything from $@ inside an ssh remote “command”. Too risky even with proper quoting.
IFS controls the "$* joining character, if I remember correctly, and so if you use IFS=\0 then you can “safely” pass multiparty args in and then somehow take them apart on the other side. But this is the bad kind of crazy to me, and if I was reviewing code with this in it, I’d flag it for replacement. It’s the sort of thing that leads to feeling smart at working around knowing $@ but in the end it’s best just to truly understand $@ or use helper scripts to avoid quoting drama altogether. YMMV.
I was going to say the same thing (so I will). The rule is just always use "$@" and then you have only one case to reason about and it is what you want 99.99% of the time anyways.
I highly recommend this site. It’s very nitpicky, and that’s the only way to write somewhat robust shell scripts.
https://mywiki.wooledge.org/BashGuide
+1 This is where I’ve learned proper bash. People always compliment my bash skills, but the only thing I do is sticking to the wooledge wiki’s rules + using the unofficial bash strict mode (http://redsymbol.net/articles/unofficial-bash-strict-mode/).
There are lots of those footguns in shellscript. One should always try to avoid any shell and rather use python, tcl, perl or powershell. Any criticism one might have about insecure and broken by design languages apply doubly to shell.
A short list of possible problems (of course depending on the shell in question):
spaces in filenames
newlines in filenames
nonprintables in filenames
empty variables and their expansion ([ x$foo = "xsomething" ])
errors in pipes
environment madness
/bin/bash ?= /bin/sh
Arrays or the lack of it
Space separates lists as arrays
#!bash vs. #!/bin/bash vs. #!/usr/bin/env bash vs. #!/usr/sfw/bin/bash vs. ...
Unwritable and unreadable control structures (if [], case, &&,...)
Information leaks via ps
and many others...
Never use shell except to search for and invoke a sensible language. And anything is more sensible, including C, Perl, brainfuck and Basic.
There are quite a few pitfalls in shell scripting. You can considerably reduce them by limiting yourself to only being compatible with modern versions of bash and settings things like pipefail, nounset, etc etc.
I do agree that in general a good programming language will be a better option.
> anything is more sensible, including C, Perl, brainfuck and Basic
I do disagree with that however. A 5 line bash script may be 500 lines of C, will take a hundred times longer to write, and may contain memory safety issues (which the bash script at least wouldn't).
I know brainfuck is hyperbolic so I won't argue against that. Something with no filesystem or process forking abilities obviously can't be used for any real task.
I think perl and basic have just as bad syntax as bash though, if not worse. Basic's penchant for "GOTO" is awful, perl's syntax as a whole is just as peculiar as bash's in many places.
I guess my overall point is that bash is usually not a good option compared to modern languages, but it's a darn sight better than you give it credit for. I think it still has its place for 5 or 10 liners that are easy to express and read in bash and don't need any abstractions beyond what coreutils provide.
I agree that 5 to 10 lines might be a sensible upper limit where a shell can safely be used.
Basic does have Goto, but modern dialects do have all the usual control structures. Perl has weird syntax, but far less dangerous footguns: e.g. there are proper arrays, as opposed to many shells. One can distinguish between an empty and an undefined string. One can declare variables and there is the notion of data types. There are even things like taint mode. In shell, you can't even properly iterate over a directory without nasty surprises.
Same in C. Yes, there are memory safety problems, but those are outnumbered by far by shellscripts exploitable via some expansion or variable injection. Its just that thankfully nobody uses shellskripts as network services, so you don't see as many reports about that.
And yes, brainfuck was there as hyperbole. But I truly believe that there are very few things worse than shell for programming.
Many of these crazy corner cases are the reason why I switched to fish. Eg, in fish all variables are arrays, so arguments are passed in the $argv array, and number of elements in the array is "count $argv" (in bash the number of arguments is passed in $#, but the hieroglyphic required to count the number of elements in an array is ${#var[@]} )
$PATH is just an array where each path is an element, so removing or adding a path equals to adding/removing an element from the array which is trivial.
Iterating over files names with spaces works exactly as expected and no IFS tweaking is needed (fun thing, in bash, doing for i in * .foobar; do echo $i; done; in a directory with no files with that extension will return the string "*.foobar"). No "[" craziness, nicer syntax, etc.
Unfortunately fish still lacks other important features (eg. no set -e equivalent).
There is an space for sane Unix shells but everybody has settled on bash and changing the status quo is difficult.
> anything is more sensible, including C, Perl, brainfuck and Basic.
IKR? I perform all administrative operations via a solidity contract, hooked up to a permissioned ethereum chain which my computer scans for relevant logs.
I like powershell, but it has its own arsenal of footguns for you to use. Things like non-terminating errors and the sometimes confounding behaviour of automatic variables comes to mind.
The article doesn't mention it, but very similar syntax is also used for arrays.
For example:
arr=(a b c)
arr+=(d)
ls "${arr[@]}" # ls "a" "b" "c" "d"
ls "${arr[*]}" # ls "a b c d"
This has quite nice symmetry with the fact that the 1st argument is "$1", and you replace the number with these symbols, and for arrays you access elements with "${arr[1]}", and again replace the number with the same symbols for the same behaviour.
If you do a lot of bash scripting, arrays are invaluable.
Indeed, and actual Bash scripting (as opposed to plain POSIX sh) is much more pleasant. Used properly, arrays make it easy to build commands with arguments and finally run them, e.g.
It's not just bash, this is true for all POSIX shells (including dash, bash, ksh, and so on).
If you're doing a lot of complex calculations, shells are the wrong tool for the job. But if it's a relatively small program whose primary task is invoking other programs on a Unix-like system, shells are still a decent choice. The biggest problems with shells are handled by using shellcheck, so if you're writing shell scripts, use shellcheck.
Shell scripting is absolutely still relevant. The rule of thumb should not be about length but about complexity, specifically if you absolutely need something like a real array or a hash then move onto a different language. Use shellcheck and avoid bash for scripting. I use bash or pdksh interactively but stick to POSIX shell for scripting. I am finding myself writing POSIX shell all the time and having great success with it.
Yeah don't get me wrong the bashisms are useful, but I'm hopping between OpenBSD, FreeBSD and Linux (and perhaps sharing scripts with my macOS-using colleagues) and although bash is available on all those platforms and more, POSIX shell will work out of the box without any further configuration.
So you're switching among the Debian Almquist, Bourne Again, FreeBSD Almquist, PD Korn, and Z shells. Surely you could find some way of working the Watanabe, MirBSD Korn, BusyBox Almquist, and Mashey shells into the mix, too? (-:
I write my scripts in zsh, because its extensions are most useful in scripts, and it’s not too hard just to make it available everywhere you need to work (e.g. you can compile it to be installed in ~/zsh and just copy the installation to your home directory on any machine you need to use.
Things like array-linked variables ($path is an array version of $PATH and modifications to one propagate to the other), associative arrays (dictionaries in python) and a handful of other really nice tools (e.g. saner white
space handling) make going back to bash for scripts unpleasant.
Because when you find yourself needing bash-specific features, alarm bells should be going off. POSIX sh is good at reminding you to KISS and delegate to other tools.
That's why I hate bash and Makefiles. The syntax is just so cryptic that if you don't write/read bashfiles/Makefile for a while it's just impossible to get back into it.
I think the basic idea of make is brilliant and still very useful. Compare time stamps of files and if there is a file A that is newer than file B and that is needed to make file B, then make B again, with the provided command.
The problem is trying to put any more functionality into it than that. Everything more complex than this basic functionality should be instead implemented in a script that gets called by make to rebuild the given file.
I agree to an extent about Makefiles, though I still think they are useful as long as you keep to a simple subset of their functionality.
Shell scripting is a different matter. One of the great things about shell scripting is that you can never forget it because you are using it constantly to interact with your system. The fact that it is something you use constantly and can just take and stick in a file and re-use is one reason why shell scripts are so popular.
I've run into that. There was a problem with the OSF/1 /bin/sh that caused "$@" to expand to a single empty argument if there are no arguments, rather than to an empty list as it should.
I just now removed a workaround for that problem from one of my scripts, 17 years after I added it.
After being challenged on this a few years ago, I checked with the shells available on Debian, and at least one still needed this. (dash maybe?) Just do it.
This is priceless. Countless times I have learned -- then later forgotten -- to use "$@" ... especially in cygwin's Windows 'spaces in filenames' territory
At this point in time (i.e. 21st century), any *nix script or program that doesn’t handle spaces in files names is woefully buggy. Maybe 20 years ago this was excusable, but not now. Spaces can reasonably be expected to be in filenames in all systems.
Support for other “special” characters in filenames (e.g. newlines), however, could still be debatable.
It just seems that the shell semantics (and to an extend those of other Unix tools) are specifically designed to strip off a level of quoting/escaping and perform word splitting the moment you lose your focus for a bit if you're even aware of all the arcane rules in the first place.
I totally agree. Dealing with shell semantics to support spaces is definitely a special type of arcane wizardry. At least using newer constructs like arrays can help.
I might say that allowing spaces (newlines, quotes, asterisk , and various other "control" characters) in the filenames is the real bug. Sure its cool that you can put anything you want in a filename, but do you really need them? Particularly on a command line oriented OS? If the entire OS's experience was windows explorer style interactions or C binary strings like manipulation then fine.
This causes nothing but problems, all to avoid a simple character filter..
News flash: it's not 1975 any more. Real human beings use computers, and name files, all by themselves. Some real human beings even have apostrophes and spaces in their names, which they often like to use in the names of their own files. Welcome to the future.
A space is a completely normal character that users quite reasonably expect to be able to have in filenames. Treating it as a regular character is not a bug. There are lots of Linux programs that can deal with spaces in filenames already.
I really wish there were a "lexical space" (spacebar) and "semantic space" (shift-space or some other control). Use something less obtrusive for the semantic space than - or _, like the interpunct ·. It breaks visually/semantically but is lexically part of the same string.
this·is·variable·one
Vs
one two three
Now I just use an ergodox ez (qmk-driven) keyboard with _ in an easy spot and snake_case everything but people says it's "ugly" or some bs (camelCase, especially with initialisms, drives me nuts).
Space (%20) and non-breaking space (%A0). People do somehow type the second one into web forms regularly enough (resulting in strange error messages) that those two hex codes are embedded in my brain.
I like to put U+00AD aka ­ aka ­ invisible soft hyphens into my long file names and variable names, so they break correctly when displayed in formatted text.
At this point that ship has already sailed. And it’s really not that unreasonable for users to want to be able to use what is available to them on the keyboard.
> any *nix script or program that doesn’t handle spaces in files names is woefully buggy.
On the contrary! It is filesystems that support plain spaces in filenames that are broken. Filenames are variable names. Allowing separators in them is bonkers. I make a point of carefully crafting my scripts to wreak havoc whenever a user has spaces in their filenames.
File names are an interface used by normal people to name files. Teaching everyone to name their files differently seems unlikely to succeed.
There is no particular reason your shell can't distinguish between a space inside a filename and the space between tokens in output. The fact that you have to do anything at all yourself is a bug.
> File names are an interface used by normal people to name files. Teaching everyone to name their files differently seems unlikely to succeed.
Sure. But there's no reason why typing the spacebar on a GUI to input your filename should produce a file with a plain space on the filesystem. It could be a unicode non-breaking space, for example.
That would just make working with files programmatically harder. The issue isn't that file names contain spaces but that we use programming languages that use spaces to separate fields/entries instead of proper data structures.
Funny how you just say these things as if they were true, without them being true, or any evidence.
If you want to be pedantic, Unix has ALWAYS allowed ANY character in file names, except for "/" (unless you have a GatorBox).
You once said: "Raw spaces in filenames are an abomination that make shell scripting unnecessarily difficult."
No, bash is an abomination that makes shell scripting unnecessarily difficult.
Prohibiting spaces in file names seems to be an obsession with you. (How about tabs? ;) You've said you don't see why people should be able to name their files anything they want, and that your reason for prohibiting billions of people from doing what they want is to make your bash scripts simpler. Then simply don't use bash, instead of trying to change the world. Read what I replied to you when you said that 7 months ago, and tell me if the world has changed so much since then that you're right this time around:
Slash is the ONLY character you're not allowed to have under Unix. There are no good reasons to disallow spaces. Disallowing characters in file names that you're allowed to have solves absolutely no problems, it only causes them.
There used to be a bug in the Gatorbox Mac Localtalk-to-Ethernet NFS bridge that could somehow trick Unix into putting slashes into file names via NFS, and Unix would totally shit itself when that happened. That was because Macs at the time (1991 or so) allowed you to use slashes (and spaces of course, but not colons), and of course those silly Mac people, being touchy feely humans instead of hard core nerds, would dare to name files with dates like "My Spreadsheet 01/02/1991".
I just tried to create a file name on the Mac in Finder with a slash in it, and it actually let me! But Emacs dired says it actually ended up with a ":" in it. So then I tried to create a file name with a colon in it, and Finder said: "Try using a name with fewer characters or with no punctuation marks." Must be backwards compatibility for all those old Mac files with slashes in their name. Go figure!
If you think nobody would ever want to use a space or a slash in a file name, then you should get out more often and talk to real people in the real world. There are more of them than you seem to believe!
You give yourself far too much credit. Quoting your own words back to you isn't stalking. You're the one who claims to make a point of wrecking havoc with users on purpose. If you didn't want what you said coming back to you, you should have deleted that posting, or not written it in the first place.
You were wrong and weren't able to support your views seven months ago, and you're still wrong and still can't explain yourself now. In the intervening months have you made any progress in changing even one person's mind not to use spaces in their file names? It would have been a much better use of your time to learn Python or JavaScript in those seven months, since it's bash that's actually causing you problems, not the people who use spaces in file names who you're so compelled to punish.
Back to your argument -- go ahead and state your views: what is your evidence that "Filenames are the variable names of shell scripting"?
Have you submitted pull requests to the Linux kernel and bash and any other affected libraries and applications to eliminate spaces from file names, and have they been accepted yet?
Funny how he would criticize users who want to use spaces in their file names, instead of shells that make it practically impossible for programmers to support spaces in file names.
Obviously, but then the answer is, why? What the hell is our problem?
Even if you think shell scripting is a good idea (which I don’t), just fix all the obviously dumb toxic stuff like this and put out a shell that is minimally different with only semantic fixes. Emit warnings now for toxic semantics but still support them, and in a couple of years, turn off that support for good.
Part of the problem is people reading the title of Richard P. Gabriel's "Worse is Better" without actually reading the paper, and adopting Worse as a design goal.
I think the problem is the people who don't understand the difference between "obviously dumb toxic stuff" and "features".
This thread is all about the difference between $* and $@. The difference is explained in the man pages for most shells, but since most programmers don't read instructions, they often need blog posts to explain to them how a documented feature of a language works.
* Expands to the positional parameters, starting from one.
When the expansion occurs within a double-quoted string it
expands to a single field with the value of each parameter
separated by the first character of the IFS variable, or
by a ⟨space⟩ if IFS is unset.
@ Expands to the positional parameters, starting from one.
When the expansion occurs within double-quotes, each posi‐
tional parameter expands as a separate argument. If there
are no positional parameters, the expansion of @ generates
zero arguments, even when @ is double-quoted. What this
basically means, for example, is if $1 is “abc” and $2 is
“def ghi”, then "$@" expands to the two arguments:
"abc" "def ghi"
It turns out we didn't need a blog post to explain it, because it's in the manual that nobody reads. But we should definitely complain about how this crafty, unusual piece of obviously dumb toxic stuff works, because how were you supposed to know to RTFM?
To answer your question "why still in 2020?", it's because these are independent features people needed. Sometimes people wanted the $* semantics, and sometimes the $@ semantics. So both exist. It's up to you to learn how the system works and use it properly.
It's not like Python doesn't also have weird edge cases that you won't know until you learn the whole language. I've seen people spend hours futzing about with lambdas and list comprehensions to try to fix a bug, which I addressed by just rewriting the expressions as regular-old loops and data structures. Bash isn't uniquely bad, it has warts like everything else. Take out the warts you don't want and someone else will complain that they're missing.
Yes, it takes a lot of experience and mindfulness to develop the taste for what is genuinely healthy and what ultimately is bad, even if it tastes good.
If you need a portable replacement, there is Perl. Available since the last century. If you don't care about obscure Unixes, there is also Python and a whole bunch of other scripting languages on more modern systems.
In summary, ICCCM is a technological disaster: a toxic waste dump of broken protocols, backward compatibility nightmares, complex nonsolutions to obsolete nonproblems, a twisted mass of scabs and scar tissue intended to cover up the moral and intellectual depravity of the industry’s standard naked emperor.
Using these toolkits is like trying to make a bookshelf out of mashed potatoes. - Jamie Zawinski
X-Windows: …Even your dog won’t like it.
X-Windows: …The first fully modular software disaster.
When I first came into the world of SWE, I thought "why would anyone use Bash nowadays when we have Python?"
A colleague answered me: "If we left, there's around 500 people in this building who could support my Bash script, and around 50 Python people."
It kind of stuck with me. At this point, I feel like Bash is one of the common tongues between all tech roles (that deal with Linux, that is).
Python, while nice, is a lot more niche, since you really have to be into development to know Python, while every sysadmin worth their salt can debug a Bash script. And, of course, every SWE, TSE, DOE, .... worth their salt know Bash scripts as well.
If you want to get a job (in the Linux land), bash scripting is typically an "of course".
woot, this is definitely wrong in my experience. Most people can read/write python to some degree, and even if you don't know python you can quickly ramp up to understand a config file or a simple program.
On the other hand most people don't write bash scripts and usually have to deal with them when encountering legacy systems or languages.
That conversation was your cue to get a new job, because your colleague earnestly believed that 90% of your coworkers were incompetent, and told you so point blank.
Applescript takes the cake there. Instead of standard indexing/dereferencing rules in the object hierarchy you instead have this nested mess of "tell X .... end tell"
Man, one of the worst debugging experiences of my life was when we built an extensigble build tool, based on Bash. Most of it worked, but there was always an inconsistency between $* and $@ and there would be PRs that swapped those values around, back and forth.
They were both totally valid; we just hadn't agreed on a calling convention for the tool so people were trying to fix their individual problems based on their own habits.
If you're using $* , you're not supporting arguments that contain spaces, and you'll break if arguments contain wildcards. If you're using "$*" , you're treating multiple arguments as a single argument.
> there was always an inconsistency between $* and $@
There is never a legitimate reason to use $* . Even in the rare cases where you want those semantics (hint: you don't), you should use something like "$(join ' ' "$@")" instead.
> Use communicate() rather than .stdin.write, .stdout.read or .stderr.read to avoid deadlocks due to any of the other OS pipe buffers filling up and blocking the child process.
The trouble with `communicate()` is it only handles very simple cases, basically, your output has to fit in memory. Same problem exists in asyncio.[2]
Yes, you can usually work around this. That doesn't mean it's a good replacement; whereas complex pipelines are so trivial in bash that any user-defined function can be used in a pipeline, subprocess generally forces you to do the dumb thing and create a mess of temporary files.
And that's not even considering you're writing 10 times as much code than you would to accomplish the same task.
Yes, stdin.write, stdout.read and stderr.read can cause deadlocks if not used carefully. However, those operations are rarely needed and are complicated in Bash as well (usually requiring sub-shells, etc). 99% of Bash scripts can be written without using those operations (you simply pipe input/output from one program to the next rather than trying to manually forward it in your program).
This is actually one of the main advantages of using Python. You can solve all those deadlock issues through careful use of threads, which are much better than the Bash sub-shell equivalent.
If anyone has an example script that requires the use of those functions and is cumbersome in Python, please share it. We can always improve the Python standard library if necessary.
Why is this still such a problem? People have been talking about replacing bash with python for years, and yet trying to actually do just that, is a pain. Plumbum doesn't quite cut it.
I think in part it's because the POSIX-style shell is so tightly wound with the OS, it's extremely proficient at spawning and forking processes, working with files, and communicating, and that experience is seamless. Python feels like a different world, and doing any subproc, pipes, or file i/o always feels like crossing some boundary and back.
I feel like this could be a situation parallel to JS on the web: You can change the language that's available everywhere only slowly and in limited ways, so people build compilers targeting it as the output language from "nicer" languages.
I want to. I really do. I've been using python for a decade and I still have to always look up how to call subprocesses correctly and communicate with multiple processes.
Plumbum is cool, but now I have an extra [""] for every command, and I need python and a lib (which usually means an env as well) to bring an app up.
On the other hand, I try to migrate as much business logic into the app as possible, and increasingly kv as well, so bash is pretty much pointing at config files and commands.
Here's an easy way to see how this works. Run this script:
echo dollar-star:
for i in $; do echo $i; done
echo dollar-at:
for i in $@; do echo $i; done
echo quoted-dollar-star:
for i in "$"; do echo $i; done
echo quoted-dollar-at:
for i in "$@"; do echo $i; done
./x.sh a "b c" d
dollar-star:
a
b
c
d
dollar-at:
a
b
c
d
quoted-dollar-star:
a b c d
quoted-dollar-at:
a
b c
d
Even then, my threshold for “this should be Python” has shrunk over the years: I used to say “greater than one screen of code” but now it’s more like “>1 branch point or any non-trivial scalar variable”.