Hacker News new | past | comments | ask | show | jobs | submit login
Python's Mutable Default Problem (objectmentor.com)
68 points by tswicegood on Feb 20, 2011 | hide | past | favorite | 62 comments



I'm not sure this constitutes any problem other than a lack of understanding of the python runtime. What the author describes as:

"the mutable default parameter quirk is an ugly corner worth avoiding"

could also be described as:

"a natural outcropping of python's late binding, "names are references" variable model, and closure mechanisms, which provide a consistency to the language that is often crufted up in others"

I do somewhat agree with the author that this particular functionality should be a "use only when needed" feature. I don't think it should be avoided at all costs tho, because there are times where the mutable default allows for a lot of saved code. In fact in a few cases the code to work around using mutable defaults can get into some serious voodoo because frequently the writer is actually trying to work around the bigger mutable/immutable objects and names are references "issues" in python.

This also reminds me of something I was reading on the front page today about the old 'use the whole language' vs 'simplicity is king' holy war.


"a natural outcropping of python's late binding, "names are references" variable model, and closure mechanisms, which provide a consistency to the language that is often crufted up in others"

hm... that's debatable. The implementation could have just as easily chosen to evaluate the default arguments each time the function is invoked and that decision wouldn't have broken any of the existing mental models of variable binding/closures.


Precisely because Python has late binding you would expect the parameters to be evaluated on each call to the function.

One thing Python lacks is the ability to use preceding arguments in defaults, e.g. you cannot do this:

    def f(a=3, b=a+1):
        return (a + b) / 2
    
    NameError: name 'a' is not defined
Oops.


There is no order for named arguments. You could call the function like this after all:

  f(b=3)


Of course; in that case the default value expression for `b' would not be evaluated.

Common Lisp does this right.


That hides default logic in the signature. Why would you favor that over

  def f(a=3, b=None):
    b = b or a+1 # or use a more explicit version
    return (a + b) / 2


That is code is logically convoluted — that's one reason not to love it. Why do you think is the more straightforward statement, just in general?

- "Let B be one greater than A unless otherwise specified."

- "We have no default value for B. If B has a value, then let B be equal to that value. If B does not have a value, then let B be one greater than A."


Because that can be said more succinctly. It is even easier to read as there is less to read, which I realize is mostly subjective.


I'm pretty sure this is the same argument used to make Perl a bad guy.


I disagree, but we've gotten a bit off topic.

I think Python's behaviour is confusing and basically never what anyone actually wants. Regardless of whether or not you can use other params in defaults, the defaults should be evaluated on each call.


"a natural outcropping of python's late binding, "names are references" variable model, and closure mechanisms, which provide a consistency to the language that is often crufted up in others"

My mileage varies.

I'd prefer default parameters to honor referential transparency, whatever hoops the runtime has to jump through to make this happen.


That would make that the only place in which Python has referential transparency, though. It may be a quirky side-effect of consistency, but it is consistent.


Strings are also non-mutable, and thus have referential transparency. And so do numbers in Python.


Referential transparency is a property of functions, not data (unless you're in a lambda mood and treat them as functions of zero arguments, but in that case you're not in Python so it's not relevant here). Even a function to "concatenate two strings" could be passed an object that overloads the addition operator to cause arbitrary modifications:

    >>> class Evil(object):
            def __init__(self):
                self.evil = 1
            def __add__(self, other):
                result = ("%s" % self.evil) + other
                self.evil += 1
                return result


    >>> def referentially_transparent_concat(a, b):
            return a + b

    >>> e = Evil()
    >>> print referentially_transparent_concat(e, "hi")
    1hi
    >>> print referentially_transparent_concat(e, "hi")
    2hi
You can program in a referentially-transparent style with Python, but you'll have to do it by adding your own restrictions to the code you write. Python will not help you with that.


In your code the devil is in the data-type.


As noted in the comments, DON'T use

  stuff = stuff or []
because if you pass an empty list, you'll get a new one rather than mutating the one you passed.

  stuff = stuff if stuff is not None else []
is wordy, but at least it's correct.


I prefer the usual alternative

  if stuff is None:
      stuff = []


It's sad that such a common and concise idiom as "x or y" is so perniciously, subtly broken in Python, and that there is no satisfyingly concise equivalent.

If I were being cavalier and had an extra wish to burn, I'd request

  x else y
to mean x unless x is None.


Make yourself a function.


Not so helpful in a strict language (since you want y to be evaluated only when x is None)


Yes, that's true. You would need to wrap it in a lambda, but then that's looks horrible and you might as well use an if.


So indeed the auth of the article is really too clever on this.


And as usual, being clever in Python equates to getting yourself in trouble.

Dammit, people - in Python, "That's clever" is an insult. Learn the philosophy of a language and you'll be much happier using it.


That's exactly why I prefer Python over Ruby: not the language, but the philosophy of the community. The Ruby community seems to adore 'clever', while the Python community explicitly shuns it. I view the latter as being more mature and borne of experience.


Python itself is pretty "clever" compared to many other languages (GC, lack of obvious 1:1 correspondance between code written and code executed, dynamic typing, significant indentation, decorators, generators, etc.). If Python programmers were really opposed to cleverness, they'd be writing in straightforward assembly or a very thin veneer over it.


A language that forces me to say "self" every other word doesn't strike me as particularly clever.

Especially when that keyword is not necessary and when it breaks standard programmer expectations (if I declare a method with 3 parameters, I should call it with 3, not 2).


Ah, but you are calling it with 3 parameters:

  foo.bar(baz, sputz)
  ^1      ^2   ^3
When you write a class method, it has more information available to it than a similar subroutine; thus, another parameter.

GvR brings up the deeper reasons for explicit self at http://neopythonic.blogspot.com/2008/10/why-explicit-self-ha... .


Yes, this is exactly what I mean by "breaking programmer's expectations". Except maybe Modula, no language works like that at all. The parameters are in the parentheses, period. If you need to pass this, you do so in the declaration and in the invocation. If this is passed implicitly, you don't declare it in the parameters and you don't pass it at the call site.

Python is doing this totally weird stuff that sits in the middle and that makes no logical sense at all.

The fact that Python forces you to declare the "self" parameter is simply due to the fact that it's old, old, old. Nothing wrong with that, but post rationalizing it by saying it's okay to declare a method with 3 parameters but calling with just 2 is just silly.


Simply because it does not fit your expectations does not mean that it makes "no logical sense at all". As faulty beings, we often have expectations that really are quite far from logical.


I like 'self'. I don't mind it at all.

The error messages related to `self` in method definitions caused me some confusion when starting out with python. For instance if you define a method with no arguments, calling it results in "TypeError: your_method() takes no arguments (1 given)".


This is why I love F#.

It lets you define the name of the this parameter in exactly the same way you pass it.

  type Bar(p) =
    member x.Foo(bar) = printf "%s %d" bar p 

  let x = new Bar(10)
  x.Foo("Foo bar")
Output:

  Foo bar 10
And yes, the variables bar and p are type inferred from the %s and %d respectively


You can rename self to whatever you want in Python by changing it in the function definition.

    class Flip(object):
        foo=4
        def flop(x,num):
            x.bar=x.foo+num


Right, but his point was that the self variable is declared in front of the method name, similarly to how it looks when you call said method.


Python is "clever" so that the Python programmer doesn't have to be.


Actually Python has pretty close correspondence between source code and compiled byte code.


You can also do:

    stuff = stuff is not None and stuff or []
That said, the `if stuff is not None:` is the most Pythonic way of doing it.


I didn't realize you could write perl in python.


FWIW, Perl 6 implicitly treats default values as closures, and calls them when no argument is passed that could bind to the optional argument.

That way you get a fresh array each time, and you can even use defaults that depend on previous arguments:

    sub integrate(&integrand, $from, $to, $step = ($to - $from) / 100) { ... }


Pylint (or maybe it was pep8) has told me not to make dicts default arguments when running it against my code, but didn't explain why. Thanks for the post.


You just need to fully understand when Python does evaluation.

    import types
    def function(item, stuff = lambda: []):
        if type(stuff) == types.FunctionType:
            stuff = stuff()
        stuff.append(item)
        print stuff

    function(1)
    # prints '[1]'

    function(2)
    # prints '[2]'

In Scala it has a better syntax because of typed function object:

    trait Map[A, B] {
        …
        def getOrElse (key: A, default: ⇒ B): B
        …
    }

the `default` parameter is a function, so when you do

    getOrElse(someKey, defaultValue)
The `defaultValue` becomes a function that generates the value you put there when called.


Yes, it's a wart. You learn the patterns he mentions pretty quickly.


I knew the pattern well from other languages, but would have assumed that it wasn't necessary in Python. So, this is good to note.


"stuff = stuff or []"

This idiom is baked into the perl community. It's kind of funny to see a Pythonista deciding it's a good idea. (You can tell it hurts him too).


It’s not a good idea in Python. There are legitimate reasons to pass falsy objects as function arguments.


Funnily enough, Perl (since version 5.10) actually has a solution to this. The expression

  $foo // "default"
evaluates to "default" if $foo is undefined, and to $foo otherwise (even if it is defined but false).


I've been programming python for on and off 12 years, and full time for the past 3, and this is the first time I've seen anyone recommending that idiom. So on the whole the python community does not think it's a good idea.


JavaScript, too:

    stuff = stuff||[]
since JavaScript doesn't even support default argument values.


I keep seeing this "problem" come up, but I don't understand how it's realistic. If you have a function that modifies a parameter as a side effect, why would you have a default value for the parameter?

And since the site's comments seem to be taken over by link spam, is this mention on Hacker News just a clever way to juice the Google rank of said spam?


I saw it bite some people where I work. It's now used for discussion with potential hires.


This 'problem' can actually come in handy when used with a regex callback function.

See if you can determine what this does:

    def cbk(match, nb = [0] ):
        if len(match.group())==len(nb):
            nb[-1] += 1
        elif  len(match.group())>len(nb):
            nb.append(1)
        else:
            nb[:] = nb[0:len(match.group())]
            nb[-1] += 1
        return match.group()+' '+('.'.join(map(str,nb)))
    
    str = re.compile('^(#+)',re.MULTILINE).sub(cbk,str)


The code converts:

  ##
  #
To:

  ## 0.1
  # 1
str is builtin, don't use it as a variable name especially if you use it in its original role.


Correct, & thanks for the tip! But don't worry, I changed the variable name from my copy & paste and just didn't give it a second thought.


The problem is not so much about mutable, it's that default parameters escape the scope of their method.

Which is very, very messed up (but not the first thing Python messed up).


It's on SO's Python FAQ too. This is the only thing in Python that's really bitten me. I remember it took me like a week to figure this out when I was tearing down my algorithm bit by bit to find out whether my prove was wrong or the code.

http://stackoverflow.com/questions/1132941/least-astonishmen...


You can use this trick for memoization. Example: http://paste.pocoo.org/show/341849/


The article is right, though, in stating that later viewers of your code may not be expecting this behavior, and it culd lead to problems.


Memorization is an implementation detail and shouldn't be part of the arguments list.


"stuff = stuff or []"

This would fail to have expected behaviour here:

fill_list = []

stuff = function(info, full_list)

print fill_list

use the if list is None: paradigm


Not sure if that's a "problem" if the only other way to implement function-static variables would be to add a variable visible for the whole module, or tricks with decorators...

    @statics(blah=[])
    def foo(normal_args, **kwargs):
    # or
    def foo(normal_args, blah):
It messes up the idea of looking at the definition to find the function signature.


it's a gotcha but it makes perfect sense once you understand why it works that way - i.e. the difference between evaluating a function definition and calling it.


The Pythonist doth protest too much, methinks.


i think its clearer and more pythonic in this case to do

def function(item, stuff): .... blah blah def function(item): function(item, [])




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: