Hacker News new | past | comments | ask | show | jobs | submit login
Fixing the Python subprocess interface (amoffat.github.com)
291 points by daenz on Jan 30, 2012 | hide | past | favorite | 61 comments



FWIW, re-replaying the stack trace to figure out what was imported and re-implementing it is a horrible idea. There are much better ways to do this type of import voodoo, specifically the import hooks that Python ships with. Here's an example of their use inside a small side project of mine: https://github.com/tswicegood/maxixe/blob/master/maxixe/__in...

All that said, this is horribly un-Pythonic. A much better route to so is something like envoy[1] which simply wraps `subprocess` in a sane API.

[1]: https://github.com/kennethreitz/envoy


Just to clarify. I applaud this type of development for people trying to learn various parts of Python like playing with the stack, but the idea of this being used in the wild scares me a bit.


It's like operator overloading for novel syntax in C++. It worked all right for iostream, but if you've ever seen boost::spirit you can see that even very smart people can make very weird things happen by trying to kludge features into new syntax.


envoy looks a lot more verbose than this though . . . if you're really trying to replace shell scripting, having envoy.run(foo) on every line is going to get annoying.

Also, it's not clear on what basis you make this assertion:

> FWIW, re-replaying the stack trace to figure out what was imported and re-implementing it is a horrible idea.

Does it not work?


"Does it… work?" is not the standard by which good Python code is measured. Django went through this process years ago. It's convenient to be able to call `from myapp.models import *` and have an `Articles` model magically added to your module even though you never defined. They realized years ago that writing clever code for the sake of being clever was a bad idea. How are you going to get someone else to be able to maintain it? Not only do they have to know the logic of what you're doing, they have to know how you hacked things together.

Line 17 of PEP-20 sums up my thoughts on this code as more than an intellectual exercise.

> If the implementation is hard to explain, it's a bad idea.

Regarding using Python as a replacement for shell scripting: if you have to rewire the language to do what you want, why are you using a different tool for the job?


> If the implementation is hard to explain, it's a bad idea.

I don't think this principle is universal in software, and I don't think you need to embrace it as a prerequisite for writing or distributing Python code. My personal preference would be for a statement more like this, "If the implementation is more complicated than it needs to be to do what you want, it's probably a bad idea."

> Regarding using Python as a replacement for shell scripting: if you have to rewire the language to do what you want, why are you using a different tool for the job?

Well that's easy to answer. It's cleaner and more readable than bash, has some nice features that shell scripts lack, can be used to call into python libraries, allows a single unified codebase if you're already writing python code . . . I could go on.


It's odd that the import mechanism is abused here to make objects "out of thin air". The fact that "from pbs import ffmpeg" works only if ffmpeg is actually on the path is somewhat surprising.

I think the more comfortable (and Pythonic?) way to do this would be to explicitly create these command objects:

   >>> import pbs
   >>> ffmpeg = pbs.Command('ffmpeg') # or '/usr/bin/ffmpeg', perhaps
   >>> result = ffmpeg(...)
[Edit: I really like the concept of using Python's positional and keyword arguments to construct a shell command, though. Great insight there.]


Except that changes what it is trying to accomplish. The goal (as I read it) is to make shell scripting more palatable in python. Needing to continually differentiate between "this is python" and "this is system" gets old very fast.

In your example, what pbs is doing is no different than a bash shell script reporting a "command not found". It isn't what I would want if I were writing a complex piece of software interacting with ffmpeg, but for a simple script to process a bunch of files in a directory, I like it.

I've tended to shy away from using Python for shell scripting because subprocess is so ugly (even its out-of-favor, crippled relative os.system is nasty), and pbs looks like it does a really great job at addressing that.


Agreed. As a way to write one-off scripts that one would otherwise use bash for, this seems like a nice trick.


Hi! Author here, there are a few ways to use it, including your suggestion (did you make your suggestion up or were you pulling from the docs?):

    # magical, designed only for single shell scripts
    from pbs import *
    ffmpeg()

    # less magical
    from pbs import ffmpeg
    ffmpeg()

    # or
    import pbs
    pbs.ffmpeg()

    # no magic
    import pbs
    ffmpeg = pbs.Command(pbs.which("ffmpeg")) # command takes full path
    ffmpeg()
I tried to cover the main use cases adequately. My goal was to ease a pain point myself and others have experienced, that other popular packages don't address well unfortunately.

If it's helped anyone like it's helped me, I'm happy and glad to share!


Wow -- no, I entirely missed this (and just happened to end up with the same name). Thanks for clarifying.


This looks really cool. It was not clear to me on the first reading of the document that it could be invoked these ways. The pbs.Command() example is way down in the "weird filenames" part. I just did a quick change to the README and sent you a pull request.


Neat but way too magical for my taste. The code to figure out what to do in the case of 'from .. import *' is particularly ugly.

Perhaps the commands should accessible from an object you import. That's slightly more typing but more explicit and would not require ugly magic. E.g.

  from pbs import sh
  print sh.ifconfig('eth')
If it's not clear, the 'sh' object could override __getattr__ or __getattribute__ and wrap commands as necessary.


Anything that encourages a from ... import * usage is evil irrespective of its implementation.


Why?


makes references ambiguous to anyone but the interpreter

let's say you have

from a import foo from b import *

what does 'foo' refer to? 'b' could contain 'foo', which would overwrite your previous reference to 'a.foo'. its hard to figure out what 'foo' now refers to from inspecting the code. its better to be explicit:

from a import foo from b import bar, baz


Because it leads to namespace pollution and hard-to-track-down bugs.

It's especially bad in this case where adding an executable in the system can suddenly shadow any built-in name (e.g. imagine someone adds a "print" or "sys" executable).


Refactoring is also made much harder with import * if you have cascading modules. I have a pylint hook forbidding these before committing to our hg repo


Off the top of my head a) Nothing prevents the writer of the module you are importing from overriding symbols that you expect b) In many cases increased load times. c) Irrelevant symbols present in scope when debugging d) Makes the order of import statements important, because potentially different modules might be providing the same variable name


because this might mess up your script by either overwriting something that's already defined, or you might accidentally overwrite something that is 'hidden' behind the '* '.

somewhat naive example:

    >>> time = 'blah'
    >>> time
    'blah'
    >>> from datetime import *
    >>> time
    <type 'datetime.time'>
    >>>


pollution of the global name space


Irrelevant for shell-style scripts because they are entry points, not included from other modules.

--And its not a "global" name space, its just the namespace of your script.--


It's a global namespace in your script. That's what it's called, I believe and it's available via globals().


Aye, they are added to "current scope's global variables", thank you for correcting.


To avoid any magic you could use:

  ifconfig = pbs.Command("/path/to/ifconfig")


So many people saying this is "too much magic". Whatever, I'm into it. The idea of commands being functions that can just pass their output to other functions is intuitive, and passing arguments as, well, arguments is as well.

It might not be pythonic, it might be a crime against Guido and everything he represents, but it's pretty awesome.


I have the feeling you misunderstood the problem with magic. In fact, having a function calling shell command is ok. The problem is to know, from reading the code, where it comes from. In python there is this idea of transparency, the determinism, which is with it feels enlightening, like in the docs cartoon. Anything obscuring that is said to be unpythonic.


I use Python for shell scripting a lot. Ignoring all of the issues people have brought up here, I really like how function composition is piping:

  # sort this directory by biggest file
  print sort(du("*", "-sb"), "-rn")

  # print the number of folders and files in /etc
  print wc(ls("/etc", "-1"), "-l")
The reason that I like that method over, say, envoy's [1] method is that envoy.run('uptime | pbcopy') has what I consider code in strings. When I'm writing a script, the programs I'm calling, and how they interact, is part of the "code" to me. I would prefer that they're at the language level, and not represented as strings.

[1] I only just learned about envoy in this thread. Thanks! Sadly, one of the places where I run my Python scripts is a location where I can't install my own packages, and I don't want to deal with using my own install of Python, so I tend to just implement something like this:

  def checked_exec(seq):
    p = Popen(seq, stdout=PIPE, stderr=PIPE)
    stdout, stderr = p.communicate()
    if p.returncode != 0:
        print 'err: ' + stderr
        print "'" + seq + "' failed."
        sys.exit()
    return stdout


As long as the package you need doesn't include a C extension (which most don't) you can just ship it with your code (license permitting ofc.) - just add the path to the libary to sys.path. It's not a very clean solution but can be a real life saver when you have to work on "broken" systems.


Sorry, I wasn't clear. This is a system I log into every day, and I don't want to maintain my own install of Python and related packages on it. It's too much overhead. I'd rather just use the default Python, even though it's old.


For simple libs I just dump the py file/folder in the same folder as the script and import.


This module is a huge hack. It's a neat hack, though :)

I've written what I feel is a much better solution to this problem: Envoy.

https://github.com/kennethreitz/envoy

It's pythonic and makes far fewer assumptions about both your code and what you're running.


+1 for Envoy. It's a really nice replacement for subprocess (which is awful).

Clint and Requests are worth a look too.. Thanks Kenn!


I like Kenneth Reitz's envoy too. He describes it as "Python Subprocesses for Humans™", similar to how he describes Requests, which he also wrote.

https://github.com/kennethreitz/envoy


Can we please stop titling postings like this? How about "An Alternative to the Python Subprocess Interface". Fixing something implies that it's broken or inadequate, and from my limited experience, Subprocess is already a major improvement over os.system. I'm not saying this idea doesn't have value, I'm saying that the way it's framed lacks the requisite humility it ought to.


This is a cute little hack, and quite possibly a very useful one. (And if it isn't useful, then it is certainly interesting and instructive.)

However:

Please, please, please don't use or recommend things like "from pbs import * ". Namespace pollution is bad enough when importing a documented collection of functions. Importing functions that are named based on whatever happens to be in my path at the moment ... that's seriously scary.

But "import pbs" looks like fun. :-) And "pbs.ls" isn't that much to type.

As The Zen of Python says:

] Namespaces are one honking great idea -- let's do more of those!

P.S. Hmmm ... but does "import pbs" work? Haven't tried it.


Wow this is so frighteningly magical and will break in many entertaining ways.

For a sane alternative I'd recommend Kenneth Reitz' awesome Envoy (https://github.com/kennethreitz/envoy)


Reminds me of something I saw not too long ago...

https://github.com/JulienPalard/Pipe

It would be really cool (though admittedly less Pythonic) to combine the infix notation provided by the Pipe library to allow more shell-like function chaining.

Instead of this...

  print wc(ls("/etc", "-1"), "-l")
You would have this...

  print ls("/etc", "-1") | wc("-l")


There's a few ideas floating around here https://github.com/amoffat/pbs/issues/6 on how to implement it, but nothing looks really feasible. If you have any insights, I welcome them :)


I wrote some code for that a while ago: https://gist.github.com/1300342


The value/cost of a "hack" is offset by what it provides. If something can be implemented at the same cost in a less brittle and more future-proof way, then by all means label it horrible.

If you think "horrible" hacks are a slight against something that provides an amazing level of functionality to an end user, then you've lost sight of what we're coding for.


Magic!!!!!

I code in Ruby for a living, and this is even a bit much magic for me. I suppose you can use it in magic-less mode with `from pbs import Command`, and set up your command set manually.


If you don't like the import *, don't use it. Python supports it, so why shouldn't this library? (not that it's a great idea)

If you'd like to handle missing system executables, catch the exception. You should be writing in that style anyways.

Honestly, this cleans up a ton of system scripting code, making it way more readable/maintainable.

Maybe the code could be cleaned up, but this is the direction that Python should be heading. Abstract away the complications when possible, keep low-level stuff around for when it's absolutely needed.

Beautiful is better than ugly.


It states that the lines

  curl("http://duckduckgo.com/", "-o page.html", "--silent")
  curl("http://duckduckgo.com/", "-o", "page.html", "--silent")
are equivalent. This worries me, I would much rather always have it be one argument corresponding to exactly one shell argument. Here it looks like in some (maybe all?) cases arguments are split on spaces, which means always having to be extremely cautious about escaping, something a good abstraction shouldn't force you to deal with.


Exactly. If you look at the sources, you'll see that the arguments are all joined into a single string with spaces, then split back into separate words using shlex.split().

So cat("filename with spaces in it") will fail, but cat("'filename with spaces in it') ought to succeed.

It's a neat experiment, but using this module in production would not be a great idea.


Very impressive, but the use of globals and the dynamic lookup mechanism are a little scary. Looking at the source there seems to be some magic involved like hacking the interpreter.

I'd feel more comfortable if it only exported one variable.


That's pretty brilliant, well done.


This is basically a dynamic DSL. A confusing muddle of syntax and semantics. Will result in pain. Usually when least expected.


https://github.com/Harshavardhana/pbs - Refactored the code into more Python library like and still trying to fix the command line import problem.


This looks great, but I really don't like the fact that I can't fire up a Python shell and try it out interactively. Having to run pbs.py itself to get a different kind of shell is uncomfortable.


This is fixed on master as of version 0.4, just fyi. It has some limitations (no star import) but otherwise works as expected.


Why not use pipes for piping? e.g. print du("*", "-sb") | sort("-rn")


Would break syntax too much and would make even less sense to python user reading the code. But there is several packages which does this (piping using pipes), maybe you could use that.


Order of evaluation; du() will be instantiated, sort() will be instantiated, and then du().__or__() is evaluated, which means you won't have du's stdout to attach to stdin on sort when it's instantiated.

Figuring out a way around this (mocking up an fd to pass to sort at instantiation time that blocks until stdout on du is available) is left as an exercise to the reader. I've already said too much. ;)


If you interpret the return value as a file (output file handle) then the function application method makes sense. This really falters when trying to use tee, but then again using pipes doesnt help either.


My thought too. Also > and < for file redirection.


The import hack is the only thing I don't really like (I'd prefer the `from pbs import sh` syntax proposed elsewhere) -- and without reading the code... is piping really piping, or will the function composition example actually consume the first command result fully first?

Anyway, this is a great idea. :)


[deleted]


This is impossible. The parent bash is suspended while the python process runs.

`` (not '') in Perl runs it in a child process.

The equivalent is at the bottom of http://hyperpolyglot.org/scripting - or read the subprocess module documentation.


Indeed. perl does not much more than "sh -c '<whatever you typed>'".


Too much magic! Too much magic!!


With this, maybe having python as a shell will be possible :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: