The timeless beauty of shell scripts

moe · on Aug 1, 2010

I'd agree with timeless. A pile of shell-scripts is the backbone of just about any deployment. And once you have them, they're not going away soon.

Beauty? Not so much.

Bash scripts are still the go-to solution because it's so quick to throw one together. Just paste what you have just tested on the CLI, add a few conditionals, done.

Unfortunately the "done"-part more often than not drags out much further than we'd like. Suddenly there's a need to inspect the output of a process rather than just the return code. Suddenly the script should run from cron, but only one instance at a time please. Welcome to the not-so-beautiful world of lockfiles and the never-quite-complete shell environment in cron.

This is the way our quick and beautiful 5-liners tend to turn into fragile 50-liners in short order. Little helpers like ftsh[1] can mitigate the mess somewhat. However, I for one try to use Python or Ruby for just about everything nowadays. It takes a bit longer to get off the ground that way - but it saves my slightly older self so much time and headache...

[1] http://www.cse.nd.edu/~ccl/software/ftsh/

Schmidt · on Aug 1, 2010

I couldn't agree more, I've seen horrible >300 lines POSIX shell scripts used for deployments and thought "This is a stinking pile of shit".

avar · on Aug 1, 2010

The App::Cronjob Perl module is great for adding effortless locking to existing programs:

    @daily cronjob -j some-uid -c '/usr/bin/some-command.sh'

spudlyo · on Aug 1, 2010

Most shell scripts are heaping mounds of undeclared external dependencies.

geophile · on Aug 1, 2010

I really like the idea of connecting commands using pipes to do "one-off" commands. But piping strings is dumb and limits what you can reasonably do. It's too hard extracting what you want from the strings.

At my last company, I needed a tool to interact with the nodes of a cluster, including the databases on each node.

Putting this all together, I wrote a tool named Object Shell (http://geophile.com/osh), which is available under the GPL. It takes the idea of piping, but exposes Python language constructs on the command line. Python objects, not strings, are piped between commands. For example, I can write a command line to: execute a database query on each node; bring back the results with each row as a python tuple; combine the stream of tuples from each node into one stream; and then operate on the stream. Or I can write a command line to: get a list of process objects once a second; extract and transform properties of each process; dump the stream of data into a database. The Python objects have Python types, so I can operate directly on files, processes, numbers, times, etc. instead of strings representing those types.

snprbob86 · on Aug 1, 2010

"piping strings is dumb"

I used to believe this. I was simultaneously learning Bash and PowerShell. I love PowerShell....(when I am stuck with Windows), but I'd never trade in my Bash shell. Piping objects is cool, but in reality, what is it really? It's just a programming language with a terse syntax for manipulating iterators.

Let's compare...

Powershell:

  Get-Process | where { $_.Handles -gt 200 }

C# 3:

  Process.GetProcesses().Where(p => p.Handles > 200)

Hmmm... what does piping objects even mean, really?

OK, so now why piping strings is interesting: forced serialization.

Every component of the transform can be debugged simply by deleting everything after that component. The new output goes to the screen or you can pipe it to a file. You get persistence for free. You can checkpoint your work and continue later. Imagine you are doing some very heavy duty processing that you only really need to do once. If you keep piping to a file along the way, you have backups and checkpoints.

What if you want to send the results to another server? You're going to have to serialize your objects. If your objects are arbitrarily complex graphs, how do you serialize them? It can get hairy, fast.

In summary: if you need to do arbitrary logic on arbitrary objects, use a programming language. If you need to transform some serialized data into some other serialized data: pipe strings.

geophile · on Aug 2, 2010

Piping objects means that I'm dealing with objects, not string representations of objects. I'd much rather operate on a Process object than have to parse output from the ps command. For example, to find the processes whose parent is pid 987 (addressing snprbob86), I would write this on the command line in object shell:

    osh ps ^ select 'p: p.parent.pid == 987' ^ f 'p: (p.pid, p.commandline)' $

This invokes the Object Shell executable -- osh, a Python program, passing eight arguments (ps, ^, select, ...). ^ denotes piping. The select command selects processes, p, for which that parent pid is 987. discarding others. f applies a function to selected processes, p, returning tuples containing p's pid and commandline. $ prints the selected tuples to stdout.

Note that the piping is internal to the osh process, avoiding serialization to/from string between commands. That's much faster. (The initial version of Object Shell used OS piping, and spent a lot of time serializing and deserializing.)

Passing objects between servers does, of course, require serialization. So if I want to get a listing of pids and commandlines on every node of my cluster named foo:

    osh @foo [ ps ^ f 'p: (p.pid, p,commandline)' ] $

@foo looks up the definition of the cluster named foo (in ~/.oshrc or in /etc/oshrc) and runs the bracketed command on each node. The resulting (pid, commandline) tuples are serialized, sent back, deserialized and printed.

By the way, Object Shell has a Python API also. So the same remote command would be:

    #!/usr/bin/python

    from osh.api import *

    osh(remote('foo',
               [ps, f('p: (p.pid, p.commandline)')],
               out()))

Objects rule, in programs and on the command line.

jerf · on Aug 1, 2010

There's probably a happy middle. Basically what you're saying is that you want a weakly-typed format, which unpacking that statement yet further means that you want a format that lets you do pretty much anything to it. Types boil down to a set of restrictions on what the data can be and what you are permitted to do with it, and a hunk o' text (TM) is just about as untyped as you can get.

But it might be nice to pass around JSON or something instead. It isn't quite as untyped, but on the grand scale of things it's not very typed.

snprbob86 · on Aug 1, 2010

I'm not saying that Unix's approach is the the final word on the subject. In fact, I quite agree with you: structured formats are a good idea.

I, for one, would like to see --format options added to many of the standard Unix tools. I'd probably define --format= as a shortcut for specifying both --informat= and --outformat=. It would be great to have some sort of XPath or dotted object notation on the command line rather than hacking around with awk, sed, cut, etc.

Rather than:

  ps -ef | awk -v p=$PPID 'p==$3{print $2}'

Do:

  ps --format=xml | xquery "/process[ppid=$PPID]/pid"

Or something like that. A bit more verbose, but far more understandable.

If I were to design the future, every tool would have metadata which describes which input and output formats it supports and prefers, so that most --format arguments would be optional:

  ps | xquery "/process[ppid=$PPID]/pid"

I'd prefer that 100 times over object piping.

gaius · on Aug 1, 2010

This is why PowerShell rocks - superficially they're text pipes like any shell, but under the hood it's passing COM objects around. Nice.

CrLf · on Aug 1, 2010

For certain values of nice.

While I find the PowerShell to be more powerful when I'm scripting stuff, I still prefer the unix shell mostly because it passes around and operates over text.

Operating over text means you can compose a pipe progressively using what you see, without having to think about how it is structured internally. Operating over text makes the normal usage of the unix shell faster and more efficient than the PowerShell. That and the terser syntax.

With the PowerShell I find myself always having to check the type of objects are being passed around and which attributes I should care about, while on unix I just "grep" and "awk" and I'm done.

The PowerShell is more of a scripting language than a shell, while the unix shell is more of a shell than a scripting language. Although both of them are both those things.

DrJokepu · on Aug 1, 2010

Actually, those are CLR (.NET) objects, not COM objects. Sorry for nitpicking.

gaius · on Aug 1, 2010

Heh, yep, old habits :-)

doki_pen · on Aug 1, 2010

Ruby is often just as terse as bash, and often simpler to understand. That said I still end up using bash, awk, sed, tr etc.. and don't revert to ruby until something gets complicated. Probably because bash is more portable.

silentbicycle · on Aug 2, 2010

Shell scripts are portable. Bash is specific to Linux, at best, and ... a bloated, hot mess.

Programs that hardcode bash but only require basic bourne-shell features are a pet peeve of mine. "My favorite Linux distro has it out of the box" isn't a real standard.</bsd-porter>

avar · on Aug 2, 2010

You can easily install bash on your BSD. But I get your point about writing POSIX compatible code.

I'll declare a bash dependency in the shebang if I've only tested the script on bash. Requiring bash is better than having the script fail on some non-POSIX thing that I wasn't aware of.

Linux distributions are getting a lot better in this department these days though, with dash being the default Debian/Ubuntu shell.

hernan7 · on Aug 2, 2010

The REPL-like development style is what does it for me... I always end up having to re-write it in Haskell.

tkahnoski · on Aug 1, 2010

http://en.wikipedia.org/wiki/Domain-specific_language

Unix shell scripts is listed as the first example.

wglb · on Aug 1, 2010

While I am an inadvertent bash progammer--an astonishing number of lines of code end up in bash--I view the python challenge as more of an tutorial tool, and don't advocate replacing my shell scripts with python.

I see this as a good way to learn python in the context of small problems to develop basic proficiency.

houseabsolute · on Aug 2, 2010

$ tr a-z l-za-m

$ tr -cd a-z

Absent the comments, who the hell knows what either of these do? Certainly not the new guy on your team who has spent most of his life in the more popular environments.

madair · on Aug 1, 2010

We evolve, or we die. Truism that. Sometimes it might take awhile, but its not less true.