Hacker News new | past | comments | ask | show | jobs | submit login

> I find myself often having to wrap expressions in list(...) to force the lists.

Out of curiosity, why do you need to force the lists?

> Generators make things much more complicated. They are basically a way to make (interacting, by means of side-effects) coroutines

Huh? Generators are a way to not make expensive computations until you have to--as well as to not use memory that you don't need. Basically, if all you're doing with a collection of items is iterating over it (which covers a lot of use cases--but perhaps not yours), you should use a generator, not a list--your code will run faster and use less memory.

> In most use cases (scripting) lists are much easier to use (no interleaving of side effects) and there is plenty of memory available to force them.

Generators don't have to have side effects. And there are plenty of use cases for which you do not have "plenty of memory available" (again, perhaps not yours).

> IMHO generators by default is a bad choice.

I think lists by default was a bad choice, because it forces everyone to incur the memory and performance overhead of constructing a list whether they need to or not. The default should be the leaner of the two alternatives; people who need or prefer the extra overhead can then get it by using list() (or a list comprehension instead of a generator expression, which is just a matter of typing brackets instead of parentheses).




I think you're referring to:

  for i in range(large_number):
     ...
(similarly enumerate, zip etc)

i.e., where you don't bind the generator to a variable, but instead immediately consume and reading from it as an iterator has no other side-effects.

And that's the only usage of generators that in my usage is both common and practical. Other use has always quickly become a mess for the above named reasons.

I don't disagree generators can be occasionally useful (or often, for your special applications). But mostly it's a pain that they are the only thing many APIs return and that they are not visually distinctive (in usage) from lists and iterators. For example

  c = sqlite3.connect(path)
  rows = c.execute('select blablabla')
  do_some_calculation(rows)
  print_table(rows)
Gotcha! "rows" was probably already empty when print_table was supposed to print it. But how can you know? Hunt down all the code, see what the functions do and what they want to receive (lists, iterators, generators? probably they don't even know). And what if the functions change later? Even subtler bugs occur if the input is consumed only partly.

So by far the common (= no billions of rows) sane thing to do is

  rows = list(c.execute('select blablabla'))
Which is arguably annoying and requires a wrapper for non-trivial things.


None of what you say addresses my main point, which is that a list is always extra overhead, so making a list the default means everyone incurs the extra overhead whether they need it or not. You may not care about the extra overhead, but you're not the only one using the language; making it the default would force everyone who uses the language to pay the price.

Or, to put it another way, since there are two possibilities--realize the list or don't--one of the two is going to have to have a more verbose spelling. The obvious general rule in such cases is that the possibility with less overhead is the one that gets the shortest spelling, i.e., the default.

Also, if you've already realized a list, it's too late to go back and un-realize it, so there can't be any function like make_generator(list) that saves the overhead of a list when you don't need it. So there's no way to make the list alternative have the shorter spelling and still make the generator alternative possible at all.

As far as your sqlite3 example is concerned, why can't do_some_calculation(rows) be a generator itself? Then print_some_calculation would just take do_some_calculation(rows) as its argument. Does the whole list really have to be realized in order to print the table? Why can't you just print one row at a time as it's generated?

Basically, the only time you need to realize a list is if you need to do repeated operations that require multiple rows all at the same time. But such cases are, at least in my experience, rare. Most of the time you just need one row at a time, and for that common case, generators are better than lists--they run faster and use less memory.

If you need to do repeated operations on each row, you just chain the generators that do each one (similar to a shell pipeline in Unix), as in the example above. This also makes your code simpler, since each generator is just focused on the one operation it computes; you don't have to keep track of which row you're working on or how many there are or what stage of the operation you're at, the chaining of the generators automatically does all that bookkeeping for you.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: