Common Python Mistakes

agf · on May 8, 2014

This is a pretty good list of gotchas, but it's important when writing something targeted at beginners to be as precise and clear as possible. Nearly every section here either uses terminology poorly, is slightly incorrect, or has difficult examples.

  Python supports optional function arguments and allows default values to be 
  specified for any optional argument.

No, specifying a default is what causes an argument to be optional.

  it can lead to some confusion when specifying an expression as the default value
  for an optional function argument.

Anything you specify as a default value is an expression. The problem is when the default is mutable.

  the bar argument is initialized to its default (i.e., an empty list)
  only the first time that foo() is called

No, rather it's when the function is defined.

  class variables are internally handled as dictionaries

As dictionary keys, and that's still only roughly correct.

In "Common Mistake #5", he uses both a lambda and array index based looping, neither of which are particularly Pythonic. A better example of where this is a problem in otherwise Pythonic code would be good.

In "Common Mistake #6" he uses a lambda in a list comprehension -- for an article of mistakes mostly made by Python beginners, this is going to make it tough to follow the example.

In "Common Mistake #7", he describes "recursive imports" where he means "circular imports".

In "Common Mistake #8" he refers repeatedly to "stdlib" where he means the Python Standard Library. Someone is going to read that and try to "import stdlib".

hsinger · on May 8, 2014

Hey, thanks for the great feedback! We agreed with (almost :-) ) all of your comments and have made corresponding mods/corrections to the post. Thanks again!

[Toptal blog editor]

agf · on May 8, 2014

Good changes. One more issue (I believe recently introduced): LEGB ends with "Built-in", not with "Module". It's also good to note in the blog post that it's been updated.

hsinger · on May 8, 2014

Doh. You sure it doesn't stand for "Mbuilt-in"? ;-)

Again, thanks for the attention to detail. We've fixed this as well.

nine_k · on May 8, 2014

Just to notice: this text is not targeted at beginners, in my opinion. These are upper-intermediate to advanced level gotchas.

crazypyro · on May 8, 2014

Did they add in "(Note: This article is intended for a more advanced audience than Common Mistakes of Python Programmers, which is geared more toward those who are newer to the language.)" later?

Because it states it right near the top of the article.

hsinger · on May 8, 2014

That note has been in the post since it was first published.

agf · on May 9, 2014

1, 2, 4, 5, and 8 all seem to me like mistakes only / primarily made by beginners. I can't see an article aimed at primarily intermediate Python users spending time on them.

3 is a really easy mistake to make for anyone (which is why the syntax was changed).

6, 7, 9 and 10 are more obscure, and where I really appreciate this article -- and they definitely can be issues for more experienced Python devs.

Goosey · on May 8, 2014

Slightly off topic, but does anyone know of a resource that has 'most common mistakes' for different languages all in one place? It's certainly possible to google for blog posts and stack overflow questions to assemble such a list, but it would be handy to have them all in one place.

My use case is when interviewing candidates I often ask them to rate themselves on a scale of 1-5 in the languages they know, and then ask them increasingly 'tricky' questions in each language to get a feel for how their "personal" scale aligns to their real knowledge. This works fine if we have an overlap of several languages, but in the case where I know nothing or very little of one of the languages they know I lose that data point.

I find it valuable to know what a "I am a 1 at X" vs "I am a 3 X" vs "I am a 5 at X" means to them, since I've found little correlation between how harshly someone rates themselves and their true ability. Sometimes self-rated 5s are really 5s by my book, sometimes self-rated 3s are really 5s by my book, and sometimes self-rated 5s are really 2s by my book. So I want to know how "my scale" translates to "their scale". If it was more formalized I'd go as far as to get a "confidence quotient" for a person as self-critical and self-confident people can be fantastic engineers or horrible engineers.

Does anyone else do this process when interviewing?

mark-r · on May 8, 2014

While such a resource would make your job easier, it would make the interviewee's job easier still. They'd just have to memorize all the points in the reference.

timdierks · on May 8, 2014

At which point they will no longer make these mistakes, and thus are more expert.

Everyone wins!

mark-r · on May 8, 2014

Except that you're risking hiring people that know all the trivia but still can't program FizzBuzz properly. The test loses its predictive value.

neumann · on May 8, 2014

but at least employers will stop testing trivia and focus on problem solving skills.

Everyone wins!

kremlin · on May 10, 2014

http://en.wikipedia.org/wiki/Goodhart%27s_law

tripzilch · on May 11, 2014

What are you even trying to say?

If such a resource indeed already exists, that is a bad reason not to link us to it.

If such a resource does not already exist, what you say is a bad reason to dissuade someone (who might otherwise be inclined to) from building it.

It's a pretty bad reason to argue against its existence, all around.

toddkaufmann · on May 9, 2014

I have thought about this, but in the context of understanding the root causes of these problems: 1) is it a language design problem, 2) a misunderstanding or misconception on the part of the programmer, 3) due to or related to bad coding/smells (e.g. method body too long), 4) high complexity code (could be related to (3), could reflect the domain), 5) reduced programmer cognitive capacity (distraction, stress, sleep deficit, lack of motivation, etc.).

These would be interesting research areas for instrumenting IDEs / other eco-system tools to collect some of this data. (I'm sure there is already some work in some of these areas and would appreciate names or links to high-quality reviews.)

famousactress · on May 8, 2014

This list is an excellent summary. If tasked with a #11 I'd probably add the slightly more obscure, but still super painful (when you do run into it) implicit string concatenation:

    >>> l = ["a",
    ...      "b",
    ...      "c"
    ...      "d"]
    >>> l
    ['a', 'b', 'cd']

nine_k · on May 8, 2014

Still hugely useful when you want to write long string constants.

Imho, adding a comma after each list element is a good practice. You can easily swap them, add more, and never run into a an issue you describe:

    foo = [
      "a",
      "bc",
      "def",  # comma here, too
    ]

bluecalm · on May 9, 2014

You could easily use a + operator then. I find the behavior surprising. I would expect a syntax error. You get a syntax error if you write two integers next to each other (separated by a space) or two of any other thing but somehow "a" "b" got converted to "ab". If I've discovered it myself I would be tempted to fill a bug report. It goes against Python mantra:

"Explicit is better than implicit"

It seems someone thought along similar lines and took action: http://legacy.python.org/dev/peps/pep-3126/ but the issue got rejected.

famousactress · on May 9, 2014

Yeah, when I discovered this little "feature" I had a read through that. The folks that use this for blocks of multi-line text are very defensive about the practice. I do understand not wanting to break compatibility though, especially since finding instances of this is hard (which is another reason it shouldn't exist in the first place!). Oh well :)

tripzilch · on May 11, 2014

> The folks that use this for blocks of multi-line text are very defensive about the practice.

odd, because we have triple-quotes for that, don't we?

klibertp · on May 11, 2014

No, unfortunately we have not.

With triple quotes you get a string with newlines and indentation in it. While you can not indent the following lines it looks ugly, and you can't do anything about newlines.

Take a look at how F# handles the issue: http://stackoverflow.com/a/14599828

And CoffeeScript: http://coffeescript.org/#strings (apparently implemented relatively recently (https://github.com/jashkenas/coffeescript/issues/3229) and borrowed from LiveScript.

There are other language which support multiline strings (Here docs) with indents stripped via means of syntax, like YAML (with |), Racket (which doesn't do dedenting, but being language it is it's very easy to add) and many shells (with <<-). Python doesn't have this feature, and parse-time string literals concatenation serves this purpose.

Of course, you can do something like:

    foo = """bar
          indented at first
          and after newline
    """
    textwrap.dedent(foo)

(or use list literals with str.join, or use a regex, or many, many other thing), but you can do this in all languages. Languages with syntactic sugar for this make writing slightly-longer-but-not-too-long strings much easier and cheaper (only done once during parsing, no need for imports, etc.), and Python makes up for not having explicit way of doing this with implicit parse-time string literals concatenation.

famousactress · on May 8, 2014

I totally agree, but if you have javascript folks on your team you run into issues with this... (thanks IE!)

nulltype · on May 9, 2014

Coffeescript!

fragmede · on May 8, 2014

This is preferred! (It also makes diffs more readable)

yeukhon · on May 8, 2014

I never knew you could do that. You don't need the continuation though.

>>> l = ["A", "c""d"] >>> l ['A', 'cd']

mark-r · on May 8, 2014

You don't need the continuation, but that's where you're most likely to make the mistake. It'll catch you when you leave off the comma from the last element of the list, then go back later and add another element to the end.

yeukhon · on May 8, 2014

Sorry, I am not followed. Can you show it instead? Thanks!

ColinWright · on May 8, 2014

You need to re-format that with leading spaces to retain the indentation.

tomp · on May 8, 2014

Another one:

    class A():
      def __init__(self):
        self._x = 0
    
      @property
      def x(self):
        return self._x

      @x.setter
      def x(self, new_value):
        self._x = new_value

Using it:

    a = A()
    print a._x    # 0
    print a.x     # 0
    a.x = 4
    print a.x     # 4
    print a._x    # 0 wait WTF?!

The bug was not having `A` inherit from `object`. With old-style classes, properties do not work correctly.

Spittie · on May 8, 2014

For anyone wondering, this works just fine in Python 3 :)

shotwell · on May 8, 2014

Yes, but just because all classes in python 3 are new-style.

tomp · on May 8, 2014

That's correct, but the post was targeted to Python 2(.7, I assume), so I thought it would fit :)

Redoubts · on May 8, 2014

is

      def x(self, new_value):
        self._x = x

really what you mean? not _x = new_value?

tomp · on May 8, 2014

You're right, I fixed it.

herge · on May 8, 2014

My pet peeve with python is the classic:

    return x,

I have never wanted to declare a tuple without surrounding it with (). Too bad it's not a syntax error in python 3.

Also, as opposed to one of his examples, if you are using python 2.7, declare your exception blocks as:

    except (FooException, BarException) as e:

It's forward compatible with python 3, it's easier to read and the syntax errors are clearer.

icebraining · on May 8, 2014

I have never wanted to declare a tuple without surrounding it with ().

No? I do that occasionally, e.g.:

  x, y = 5, 6

akx · on May 8, 2014

Doing

    (x, y) = (5, 6)

makes it clearer to "visually grep" that it's not a normal assignment though.

rnnr · on May 8, 2014

It's the comma that determines an expression as a tuple, not the parentheseis.

herge · on May 8, 2014

Not all the time.

    x = ,

causes a syntax error, while

    x = (,)

creates an empty tuple.

dalke · on May 8, 2014

Under both Python 2.7 and Python 3.3 I get:

    >>> x=(,)
      File "<stdin>", line 1
        x=(,)
           ^
    SyntaxError: invalid syntax

Spittie · on May 8, 2014

From the Python docs (https://docs.python.org/3/tutorial/datastructures.html#tuple...)

> Empty tuples are constructed by an empty pair of parentheses;

And it works fine

    >>> x = ()
    >>> x
    ()

I guess Herge mean that? After all, his argument was that a tuple is not always defined by the comma.

hmsimha · on May 9, 2014

There's also the distinction between x = (3) and x = (3,)

dalke · on May 8, 2014

That's a reasonable deduction. Thanks!

lucian1900 · on May 8, 2014

Which is the source of the original problem, which I've also encountered myself.

yeukhon · on May 8, 2014

That syntax reminds me of trying to do the following

    print x,
    print y

But I generall find myself doing the , in the situation like (element1,) to create a new tuple.

icebraining · on May 8, 2014

Fair enough, that would be fine by me.

deckiedan · on May 8, 2014

You should be able to easily add an extra syntax rule to your editor or git pre-commit (/return.*,$/ type of thing if this bothers you too much...

gejjaxxita · on May 8, 2014

#6 is really confusing. Whenever I encounter something like this my first reaction is that whenever possible such obscure components of a language should be avoided and more verbose/clear code used instead.

Programming languages are meant to be read as well as written, and someone relatively new to Python (and many who have used the language for a long time) is certain to get confused about the difference between:

   return [lambda x, i=i : i * x for i in range(5)]

and

   return [lambda x : i * x for i in range(5)]

mguillech · on May 8, 2014

Agreed 100%, this type of constructions should be avoided in the first place in favor of more "readable" ones but this happens in a fair amount of code that I've seen (and I keep seeing).

maxerickson · on May 8, 2014

Some of it seems to come from people cargo-culting their knowledge of anonymous and first class functions, so they end up believing that the only way to pass a function around is to construct it anonymously.

nine_k · on May 8, 2014

In the particular case of constructing several similar functions that do essentially the same thing, lambda is a rather natural choice.

Being more explicit and less hacky can well be combined with staying true to functional style:

    from functools import partial
    
    mul = lambda x, y: x*y  # could use int.__mul__, too
    multipliers = [partial(mul, n) for n in range(5)]

It does the closure-capturing of n for you.

maxerickson · on May 8, 2014

A function defined with def would work just the same there.

It would have a big ugly 'return' in it and be a few characters longer, but it would work the same, so I don't see what lambda brings to it.

nine_k · on May 8, 2014

Both def and lambda would fail identically in this context.

outworlder · on May 8, 2014

> "Python is an interpreted, object-oriented, high-level programming language with dynamic semantics."

I have an issue with that statement. No languages are inherently "compiled" or "interpreted", that's a property of the implementation.

If we are talking about CPython here, Python code is compiled to bytecode which is then interpreted. Not unlike Java - with the difference that the main implementation has a JIT and afaik, Python's does not.

But that's CPython. What about PyPy? It has a JIT.

FigBug · on May 8, 2014

> No languages are inherently "compiled" or "interpreted", that's a property of the implementation.

A language and it's implementation are usually designed at the same time. Compiled or interpreted will affect design choices that go into the language. While additional implementations may follow, it can be hard/impossible to design a compiler (machine code, not byte code) for a language that was designed to be interpreted without dropping features (ie eval).

It may be more correct to say 'Python was designed to be interpreted' than 'Python is interpreted'

nostrademons · on May 8, 2014

Not really - Javascript was designed to be interpreted, and yet V8/SpiderMonkey/Nitro all JIT-compile it down to machine code, sometimes very effectively.

stefantalpalaru · on May 8, 2014

When people talk about "compiled" vs. "interpreted" they usually mean AOT compilation, not JIT.

nostrademons · on May 8, 2014

Then Java and .NET are considered "interpreted"? And Android under Dalvik is "interpreted", but under ART is "compiled" (using Java as the language, which I'd always thought of as compiled, yet apparently is interpreted under your definition)? What if you embed Clang & LLVM in your application to run C++?

I think this just illustrates the fuzziness of these definitions. A compiler is just a piece of code; you can embed it into another piece of code and run it whenever necessary. Maybe in the world of shrink-wrapped desktop software there was a sharp distinction between AOT compiled languages and interpreted ones, but we haven't lived in that world for a couple decades now.

Periodic · on May 9, 2014

I feel that what most people actually mean when they say "compiled" vs. "interpreted" is whether or not the language specification has additional static checking beyond what is required by parsing. Essentially it is whether the language defers all errors to run-time or attempts to detect classes of them at compile-time. A language like JavaScript accepts as a program any string that parses, while a language like Java rejects many strings based on additional checks such as type rules. You can add static type-checking to JavaScript or Python, but it isn't part of the language spec. You can run C++ or Java with run-time type-checking, but it doesn't conform the to spec. In this way you could say that the language is fundamentally compiled or interpreted.

Of course, this doesn't address issues of incremental evaluation which often requires additional semantics for compiled languages.

jkrems · on May 9, 2014

Let's just say that the lines got a lot blurrier over the last couple of decades. The difference is generally if there's an explicit compilation step that is not hidden from developers. If you "run the source file", then it's interpreted. If you "run something generated from the source file(s)", then it's compiled. Stronger type checking and compiled mode is likely since the cost of compilation is higher and more resource intensive, so it makes sense to push it into an offline step.

> You can run C++ or Java with run-time type-checking, but it doesn't conform the to spec.

I'm not sure what you mean by "conforming with the spec". You can opt out of type checking by using only `object` or `void*`. But golang has a mode where you run a go file directly and a mode where you generate a binary. There are no traditional interpreted languages anymore (or at least only very few). And what we call "compiled" languages today are not actually compiled ones. The way Java runs is closer to how JavaScript runs than to how C runs. It's only about the interface the offer to developers.

tshaddox · on May 8, 2014

That's technically true, but I still think it's reasonable to casually refer to Python as "interpreted," since it conveys useful ideas, although those ideas are more accurately conveyed with different phrasing.

That said, for more precise discussions, what you pointed out is valid and important. One of the early questions I ask in a programming interview is to explain some high-level differences between two languages they're familiar with, which is often Java and Python. One of the common responses I get is that Java compiles to bytecode which is executed by a VM, while Python is interpreted. Of course, I point out that CPython is also compiled to bytecode and executed by a VM.

metaphorm · on May 8, 2014

I think its reasonable to refer to the reference implementation of Python (CPython) as just "Python". The compilation to bytecode is an intermediate step because Python specifies a virtual machine. The language is definitely interpreted though. Its completely accurate to call Python an interpreted language. There's no version of Python that I know of (including PyPy) that ever produces compiled machine executable binaries prior to runtime.

Xophmeister · on May 8, 2014

The compiled vs. interpreted thing is kind of a historical relic that people still cling on to. It's not dissimilar to where one draws the line for "scripting" or "nth generation" languages; it's all somewhat nebulously defined.

JasonFruit · on May 8, 2014

Do people still talk about nth-generation languages outside explaining why Forth is called Forth? I thought it was mostly history by now.

Xophmeister · on May 8, 2014

From my experience, the only time I've heard it nowadays is from people who were taught how to code by people who haven't coded since at least the 80s and never went on to develop the skill either professionally or as a hobby. So, yes, very much a historical relic!

mguillech · on May 8, 2014

That's right, in most Python implementations there's a "compiled" part (generally to bytecode) and then an "interpreted" one to run that bytecode. PyPy is a good example of a Python interpreter built on top of RPython (compiled with RPython, that is) adding a JIT to it.

marcosdumay · on May 8, 2014

High-level is also a relative term (with multivariable semantic, leading to not comparable languages), and the term "object-oriented" is getting less expressive by the day.

Anyway, all of those terms do communicate something, even if tomorrow they may wrongly describe the language.

mark-r · on May 8, 2014

I've always thought that #1 is a sign of an incorrect operation altogether. If you want to always modify the passed parameter, it doesn't make sense to have a default. If you want to return a modified version of the input, you should make a copy immediately and then you don't get this problem. Doing both an in-place modification and returning a modified object at the same time is just wrong.

ajanuary · on May 8, 2014

A slightly more realistic example:

    class Bag(object):
        def __init__(self, items=[]):
            self.items = items

        def add_item(self, item):
            # check the item is valid
            self.items.append(item)

    bag1 = Bag()
    bag1.add('an item')

    bag2 = Bag()
    print(bag2.items)

mark-r · on May 8, 2014

Again, the problem is not what it appears. You're keeping a reference to an existing item rather than making a copy. The results would be just as bad if you passed in an initial list rather than taking the default.

    initial = ['first item']
    bag3 = Bag(initial)
    bag3.add_item('second item')
    print(initial)

I think the surprising thing to most people is that you don't automatically get a copy when you do the assignment. That's how it works in older languages like C and C++, and how it appears to behave when you use immutable objects.

Hovertruck · on May 8, 2014

"Thus, the bar argument is initialized to its default (i.e., an empty list) only the first time that foo() is called, but then subsequent calls to foo() (i.e., without a bar argument specified) will continue to use the same list to which bar was originally initialized."

This actually happens when the function is defined, not when it's called the first time.

codezero · on May 8, 2014

OT but this page hard crashes Safari on iPhone.

peterldowns · on May 9, 2014

Also crashes Safari on my OS 10.5 Mac, and is unusably laggy in Firefox on the same computer. All sorts of thrashy javascript nonsense seems to be going on.

rmrfrmrf · on May 9, 2014

I use JavaScript Blocker for Safari, which is sort of like a less paranoid (and more convenient) version of NoScript. Looks like this site attempts to load 45 JavaScript files over 12 iframes, 17 of which JS Blocker blocked.

Mildly excessive? /s

japaget · on May 9, 2014

It crashes with Chrome also, but I was able to read the page with Opera Mini. iPhone 5, iOS 7.1.

jordigh · on May 8, 2014

File a bug report with Apple? No input should crash a program.

codezero · on May 8, 2014

I'll do that. In general Safari doesn't crash so I was letting the OP know for their own benefit as they are in a better position to isolate the cause and work around it.

brown9-2 · on May 8, 2014

Very reproducibly too.

mctx · on May 9, 2014

Also crashes chrome on ios 7.1.1 on an iPhone 5S every time

cefstat · on May 8, 2014

I have been bitten by #6 in a similar situation in the past. My solution was the analogue of the rather convoluted

    def create_multipliers():
      def multiplier(i):
        return lambda x: i*x
      return [multiplier(i) for i in range(5)]

    for multiplier in create_multipliers():
      print multiplier(2)

I would still prefer that Python doesn't do this.

vram22 · on May 8, 2014

In "Common Mistake #2", I'd say that the mistake is fairly obvious to anyone who understands even a little bit about OOP and inheritance. Since class C doesn't define its own variable x, it has to be that it inherits the x in class A, so there's no reason to be surprised that C.x changes when A.x does.

radiowave · on May 8, 2014

While I agree that the "problem" case can be seen as obvious when considered in isolation, really it's the behaviour of the two cases taken together that can seem inconsistent. Nothing about understanding OOP or inheritance will prepare a person for that.

vram22 · on May 9, 2014

This modified version of #2 might help clear things up. I've only added print statements. In general, when issues like this come up, printing the id()s of identifiers can help:

Edit: Added some extra blank lines because lines were getting joined together.

# class_variables.py

class A(object):

    x = 1

class B(A):

    pass

class C(A):

    pass

print "Initially, A.x, B.x, C.x and their ids:"

print A.x, B.x, C.x

print id(A.x), id(B.x), id(C.x)

B.x = 2

print "After B.x = 2, A.x, B.x, C.x and their ids:"

print A.x, B.x, C.x

print id(A.x), id(B.x), id(C.x)

A.x = 3

print "After A.x = 3, A.x, B.x, C.x and their ids:"

print A.x, B.x, C.x

print id(A.x), id(B.x), id(C.x)

And the output:

>python class_variables.py

Initially: A.x, B.x, C.x and their ids:

1 1 1

30519808 30519808 30519808

After B.x = 2: A.x, B.x, C.x and their ids:

1 2 1

30519808 30519796 30519808

After A.x = 3: A.x, B.x, C.x and their ids:

3 2 3

30519784 30519796 30519784

vram22 · on May 8, 2014

>really it's the behaviour of the two cases taken together that can seem inconsistent.

Why do you think so? I think that both cases seem consistent, or rather, correct (and therefore this example should not be treated as a common Python mistake), because x is not assigned a value anywhere in class C, and C inherits from A, so it should be clear to anyone knowing OOP and inheritance, that C's x is the same as A's x. (And the same holds true for inherited methods.) Even the OP says that in the post:

>In other words, C doesn’t have its own x property, independent of A.

radiowave · on May 8, 2014

What's happening here is that a variable is inheriting its value from the superclass, except for when it doesn't. And when it doesn't, why is that? Well presumably it's because something's been overridden - OO tells us that's how we change the properties that are inherited from the superclass. No wait, that's not it; nothing's been overridden here. All that's happened is we've assigned a value to B.x, and doing so seems to have changed the inheritance of our class.

So this variable is neither completely shared across classes and their subclasses (per Smalltalk class variables), nor completely independent across classes and their subclasses (per Smalltalk class instance variable), but instead its [in]dependence alters based upon whether (and where) you assign values to it.

While I can understand that in terms of the dictionary mechanism used to implement it, from my point of view it's just weird behaviour.

ndeine · on May 9, 2014

This is actually the same thing as regular Python scoping rules; there's not even any fancy OOP logic behind it. Here's the same thing, but using global scope and functions instead of classes and inheritance.

     >>> x = 1
     >>> def a():
    ...:     print(x)
    ...:
     >>> def b():
    ...:     x = 2
    ...:     print(x)
    ...:
     >>> def c():
    ...:     print(x)
    ...:
     >>> a(), b(), c()
    1
    2
    1
     >>> x = 3
     >>> a(), b(), c()
    3
    2
    3

I think there is an argument to be made that classes are special and "reaching upwards" into the superclass scope should not occur - a unique copy should be made - but I also think that Python's way of doing it makes enough sense that it is not confusing. The Python devs are at least consistent about having their own way of doing things.

radiowave · on May 9, 2014

That's an interesting point, that the behaviour of an inherited class variable is consistent with a case you show where inheritance plays no part at all.

So from that point of view, it comes down to whether we expect that an inherited class variable really is just some variable in an outer scope that we can shadow with a local variable of the same name (per your example), or whether we expect that inheritance provides some stronger notion of ownership of the inherited variable.

I dislike the former case, largely because I dislike the idea that the location at which a variable is stored can appear to change merely by assigning to it. But then, I dislike Python's implicit declaration of local variables for exactly the same reason. So you're right, there IS some consistency there. ;-)

macbony · on May 9, 2014

If instead of x being an integer, it were a function x(), then it makes more sense. Really, for Python, there is no difference between the two in this example. When you assign a new value to B.x, you're overriding the value that B inherited from A. When you override the value in A, any subclass that doesn't have it's own overridden value will use the new A.x, but any subclass that is overridden will be unchanged.

radiowave · on May 9, 2014

I agree the behaviour makes sense for functions.

I suppose I just prefer the idea that the meaning of assigning a value to a variable should be "assign this value to the variable", rather than "alter the inheritance behaviour of my class such that mutable state is stored in it where it wasn't stored before, and then assign this value to the variable."

macbony · on May 9, 2014

x isn't a variable, it's a tag. Because Python is dynamically typed, x can be an int one minute and a function the next, so every attribute on a class is stored as a pointer to an object, including attributes that are integers (which is an object) and functions (which are also objects). Because you aren't declaring the type of x (Python doesn't allow that), Python has to treat x = 1 the same way it'd treat def x(self).

radiowave · on May 9, 2014

Where x is an identifier (presumably aka tag?) that refers to a variable, it is not beyond the bounds of possibility for "x = 1" to be interpreted as "store the value of 1 into the variable that is referred to by identifier x". Plenty of languages, including dynamically typed ones, manage to do this, as indeed Python does in many cases.

Not knowing the type of x is unrelated to question of where x's value is stored, or whether x's value will be stored somewhere else after we've assigned a new value to it.

macbony · on May 9, 2014

I'm not really sure I understand your point. The value of x is still going to be in memory, but you may not have any references to it and it will be garbage collected. The disconnect is in thinking x is a variable and not an identifier. x points to the object in the parent until it is overridden. This allows me to dynamically alter the functionality of a class and all it's subclasses that don't override that functionality during runtime.

radiowave · on May 9, 2014

I'm not disputing how or why it works, just saying that I think it is poor design, as it causes a statement that looks like variable assignment to actually produce overriding. I think this violates the principle of least astonishment, and I suspect there is no good reason (beyond implementation simplicity) why Python class variables do behave this way.

(edit: replaced "an expression" with "a statement")

macbony · on May 9, 2014

No, the reason it exists is so you can override functionality of subclasses at run time, otherwise things like monkey patching would be impossible. I guess it COULD do a pass to see if the attr is an immutable or some form of primative type when __new__ is called and force those to be instantiated as instance attributes at the cost of internal consistency, but it seems pretty straight forward to me the way it works now. You don't need to be an expert, you just need a basic understand of how Python evaluates code and the difference between tags and variables.

evincarofautumn · on May 8, 2014

Here’s a thought experiment for you—think of “common mistakes in language X” as “design flaws in language X” or “ways in which language X is surprising” and what could have been done to mitigate that.

pekk · on May 8, 2014

Whether you choose 0-based indexing or 1-based indexing, somebody is going to be confused about indexing sometime.

JadeNB · on May 8, 2014

That's why I always use 7-based indexing, to avoid any possibility of confusion.

udioron · on May 10, 2014

Regarding circular imports and #7: The main problem in arises when using the from mymodule import mysymbol notion.

The example solved this by properly using import mymodule, although this might cause some more problem if your design is wrong, as see in the example. Calling f() from the module ("library") code itself is a very bad idea. Instead one should do this:

a.py:

    import b

    def f():
        return b.x

b.py:

    import a

    x = 1

    def g():
        print a.f()

main.py:

    import a

    a.f()

michaelmior · on May 9, 2014

For the first gotcha, using None as a default argument solves the problem, but checking `if not bar` instead of `if bar is None` can produce different results if bar evaluates to None in a boolean context.

    >>> def foo(bar=None):
    ...    if not bar:
    ...        bar = []
    ...    bar.append("baz")
    ...    return bar
    ...
    >>> bar = []
    >>> foo(bar)
    ["baz"]
    >>> bar
    []

icebraining · on May 9, 2014

True, but foo shouldn't be modifying bar anyway; make a copy instead:

  bar = list(bar) if bar else []

dmritard96 · on May 8, 2014

There are more than a few interesting points in here but this is funny to me coming from someone who is seemingly well versed in python: Mistake #5

numbers = [n for n in range(10)]

this should be: range(10)

dragonwriter · on May 8, 2014

Unless its Python 3.x.

Though, really, even there, while the list comprehension works, its kind of an awkward construction to use that instead of:

  numbers = list(range(10))

abaschin · on May 8, 2014

not in Python 3, where range() replaces xrange()

dmritard96 · on May 8, 2014

interesting, haven't written in python 3 yet

ygra · on May 8, 2014

And I would have thought that incorrect usage of bytestrings for text and then asking on Stack Overflow about the UnicodeDecodeErrors would be quite common as well ...

gtaylor · on May 8, 2014

Yeah, bigtime. I'm not sure that would make for a quick, easy countdown list of a blog post, though.

cridenour · on May 8, 2014

For #7, now you have a performance problem of importing every time you run that function. Rather, you can place the import at the bottom of b.py and be okay.

mguillech · on May 8, 2014

Python caches modules imported, you can check it out in your local shell with: import sys ; sys.modules. That's why whenever you make changes to a module which has been already loaded you won't see the changes until you load the module again, either quitting the shell or using reload(module) on Python 2.x

jagger27 · on May 8, 2014

The author states that Python is smart enough not to reimport a file that has already been imported. Is that not the case here?

tom_jones · on May 10, 2014

Any reason why you're using a slice here? >>> numbers[:] = [n for n in numbers if not odd(n)] I'm thinking that doing >>> numbers = [n for n in numbers if not odd(n)] wouldn't be a problem since the assignment is executed after the computation of the list comprehension.

andreif · on May 8, 2014

I have just been asked by a colleague about yet another gotcha when you have for example:

    # mypackage/__init__.py
    from .settings import settings

and when trying to import settings from mypackage

    from mypackage import settings

you get module instead of settings object.

kyro · on May 8, 2014

from mypackage.settings import settings

leephillips · on May 8, 2014

Is the explanation for #1 correct?

"when the default value for a function argument is an expression, the expression is evaluated only once"

I would explain the behavior he shows as due to the default value being mutable. I don't see an expression there, just an empty list used as a default.

herge · on May 8, 2014

The default value is an expression which is executed when the module is instantiated, even if the expression results in an empty list. If I had:

    def f(now=datetime.datetime.now()):
        ...

now would be the time when the module was loaded, not when f is called the first time, or when f is called after that, despite datetime objects being immutable.

kghose · on May 8, 2014

  import datetime, time
  def foo2():
    def foo(a=datetime.datetime.now()):
      time.sleep(1)
      return a
    return foo()

  for n in range(10): print foo2()

  2014-05-08 11:23:01.642871
  2014-05-08 11:23:02.644276
  2014-05-08 11:23:03.644579
  2014-05-08 11:23:04.645146
  2014-05-08 11:23:05.646328
  2014-05-08 11:23:06.647572
  2014-05-08 11:23:07.647904
  2014-05-08 11:23:08.648213
  2014-05-08 11:23:09.648973
  2014-05-08 11:23:10.649742

So, not quite module load time, but evaluation time.

marcosdumay · on May 8, 2014

Kilink's comment apply again.

If you define several functions at different times, the default argument will be evaluated each time you define a new function. You'll have the same behaviour if you keep reassigning lambdas at the same function name, or if you keep edditing the globals.

You are misunderstanding how dynamic Python is. And, yes, the part about "module load time" was a simplification.

herge · on May 8, 2014

You are technically correct, the best type of correct. The top level function is evaluated at module load time.

arethuza · on May 8, 2014

Isn't that just now being bound to a different datetime object each time f is defined?

mguillech · on May 8, 2014

Right, it's (re) defined every time you call 'foo2' but that's a different scenario than the one written in the article (hadn't you had 'foo2' and only 'foo' then you'd have the same time in all your calls to the function without supplying arguments).

hcarvalhoalves · on May 8, 2014

Actually, the explanation doesn't explain it all.

In Python, variables are references (pointers). `[]` gets evaluated on import time, and returns a pointer to an object (an empty list). This object is then further mutated on the body on the next calls because if you don't pass the parameter, it still points to the same object.

baby · on May 9, 2014

Does someone knows if the first mistake is also present in Ruby? I ran into this code:

    def get_analytics_data(options = {})
        options = options.merge({'ids' => GAReadonly.configuration.id })

and I wonder if I should fix it.

ludamad · on May 9, 2014

It's fine, Ruby takes the more sensible semantics here (IMO)

yxhuvud · on May 9, 2014

No need. You can even refer to earlier arguments in the list like

    def foo a, b=a.succ
      b
    end

    foo(1)
    => 
    2

ajanuary · on May 8, 2014

The scoping rules in #4, combined with being able to reference variables before definite assignment, is what leads to the 'variable hoisting' in Javascript.

Is #6 really called 'late binding'? That seems like the wrong term.

mguillech · on May 8, 2014

Please check out http://en.wikipedia.org/wiki/Late_binding to know more.

ajanuary · on May 8, 2014

I've always heard it used to refer to method dispatch, which that wikipedia article also seems to. However, it seems like the Python spec does use it to refer to when variable values are resolved.

worklogin · on May 8, 2014

Even the first argument doesn't make sense to me. The optional argument is within the scope of the function; why is the temporary optional argument getting carried over?

udioron · on May 10, 2014

Regarding common mistake #5, when using enumerate(mylist), mylist can be modified::

    for i, x in enumerate(numbers):
        if odd(x):
           del numbers[i]

euske · on May 9, 2014

Some of the mistakes mentioned in OP (#1, #3, and #4) can be automatically caught by tools like PyLint (and to lesser extent, Pyflakes), as well as good unittests.

adidash · on May 9, 2014

Bookmarked it. Might not be the most complete list but appreciate the OP is taking feedback and updating it. Kudos to hsinger!

skywhopper · on May 9, 2014

Hmm, "Common Mistake #1" appears to be an error on the side of the Python interpreter, not the programmer.

ronaldx · on May 9, 2014

I now feel dumb, since Python most often burns me with "Assignment Creates References, Not Copies".

lukasm · on May 8, 2014

For #4 use nonlocal keyword (python 3.x)

yeukhon · on May 8, 2014

Circular import is definitely one of the things I absolutely hate to deal with again and again.

warriar · on May 9, 2014

Common Mistake: ceate a blog that brings mobile Safari to crash reliably on every visit!

heyjonboy · on May 9, 2014

Does this link crash anyone else's Safari on iPhone 5s?

andrewdon · on May 8, 2014

good stuff

picanteverde · on May 8, 2014

Great Article!

kghose · on May 8, 2014

Also, for the first example, it is a scope issue:

  def foo2():
    def foo(a=[]):
      a.append('ba')
      return a
    return foo()

  print foo2()
  print foo2()
  print foo2()
  print foo2()
  print foo2()

Gives

  ['ba']
  ['ba']
  ['ba']
  ['ba']
  ['ba']

kilink · on May 8, 2014

I think the way he described it is pretty accurate. Default keyword arguments are only evaluated once at function definition, so supplying a mutable default keyword argument can cause issues.

Your example is pretty contrived and doesn't illustrate what he was pointing out, as you're creating a new function foo every time foo2 is called, and only calling it once.