Hacker News new | past | comments | ask | show | jobs | submit login
Using Python generators for real work [pdf] (dabeaz.com)
154 points by conesus on Jan 23, 2011 | hide | past | favorite | 20 comments



Using Python generators for real work... or why I switched to Lua.

They're great at first, but then let's say you need to subdivide pieces of the generator into helper functions for better organization. Or yield recursively. Turns out you can't, because yielding is A) syntactical and B) can only be done from the main function body.

It took us until Python 3 to replace "print x" with a more sensical, functional "print(x)". Yet for yield we are stuck with even worse arbitrary syntactical rules for "yield x", instead of being able to do "yield(x)" where it yields to the wrapping generator\coroutine from wherever it is in a call stack. The result is nonintuitive, less-refactorable code.

http://lua-users.org/wiki/LuaCoroutinesVersusPythonGenerator...


Actually, Python generators will be changing quite a bit soon: http://www.python.org/dev/peps/pep-0380/


Thanks for the link, I had not read 0380.

The proposed solution seems to be increasing syntactic complexity rather than reducing it by adding another keyword, "from" into the mix which is hardly optimal imo.

This to me indicates a systematic logical flaw of thinking of hitting a "yield" in the call stack as analagous to "return". When in fact it is the inverse and analagous to waiting for a function to return and thus more similar to a print() function call. The whole point of coroutine yielding is that it inverts the point of view of a call stack and allows the called routine to be the calling routine as well.

I still use Python for some things and have too many fond memories to ever hate it, but even if generators get cleaned up I'll probably stay w/ Lua for pipeline type projects as I've grown too used to runtime within an order of magnitude of C, better coroutines, and its more Scheme-like nature.


My interpretation of the "yield" keyword is not that of a coroutine yield, but that of "cough up the following value", which is more in line with "return". Under that interpretation, the fact that generators can also be used to implement coroutines is a coincidence with unfortunate terminology namespace conflicts.

I looked around a bit for the original semantic intention behind choosing "yield" as a keyword, but came up empty. Anyone?


It is my understanding "yield" refers to "execution control". If semantic meaning is of concern over lexical, let us ignore the names "coroutine" and "generator" and focus on the general logic of code continuation. That is, the difference between pausing the current execution context vs destroying it.

Function call: halt further processing of current call stack, execute procedures, resume call stack after function call when control returns.

Yield: halt further processing of current call stack, execute procedures, resume call stack after yield when control returns.

Return: destroy current call stack.

The variation present on the code continuation axis is far more significant than the variation on the coughing-up-values axis.


I think, in this case, 'yield' is being used less in the sense of a street sign, and more in the sense of crop yield.


agreed, that's a significant pain point. That's also why I think using yield as "syntactic sugar" for asynchronous programming is a good looking but bad idea. It makes refactoring async code using yield even harder than it already is.


You could use eventlet or gevent for coroutines in Python.


Fantastic!

It would be interesting to combine pipelines into something that branches off into various categories. For e.g., you could split the lines of access-log file into IP addresses and size of requests and fork off separate processing threads: one for obtaining unique set of IP addresses and another for summing up.

It seems like all that is needed for describing a pipeline is:

  (a) a queue for input
  (b) a processing program that's connected to the queue
  (c) (possibly multiple) queues for output
  (d) a topology connecting the processing programs
  (e) a job scheduler.
At the face of it, it looks similar to Apple's Automator and Matt Welsh's PhD thesis on SEDA:

  * http://www.eecs.harvard.edu/~mdw/proj/seda/
  * Paper: http://www.eecs.harvard.edu/~mdw/papers/seda-sosp01.pdf
EDIT: Formatting


Here is some older material on pipeline coding in Lua you might be interested in that describes this problem in terms of Filters, Sources, and Sinks: http://lua-users.org/wiki/FiltersSourcesAndSinks

LuaSocket implementation of the above: http://w3.impa.br/~diego/software/luasocket/ltn12.html


There is an updated version of this talk:

    http://www.dabeaz.com/generators-uk/index.html
Also, more interesting talks from the same author:

    http://www.dabeaz.com/talks.html



It seems that pipeline-style programming is now popular in many modern programming languages, for example C# (with Linq), F# (with the |> pipe operator and the Seq module), and Ruby and JS+jQuery (with method chaining).

I really wish Python had some syntactic feature that encourages this style, doing "gen_cat(gen_open(gen_find(...)))" is rather cumbersome, compared to "gen_find(...) | gen_open | gen_cat".

Of course you can override __or__ like some libraries do, but it is not the One Way To Do It...


Mathematica has a nice syntax for function application, mapping over lists, etc. It's also possible to define such operators in Haskell, which has the added advantage of type-safety.

  f @ {1,2,3} == f[{1,2,3}]
  f /@ list == map(f, list)
  f @@ {1,2,3} == f[1,2,3]

  (#1 * 2) /@ {1,2,3} == {2,4,6}
etc.


That's like F# operator |>

    f |> g == g(f)
    [1 .. 3] |> Seq.map (fun x -> x * x) |> Seq.sum |> printfn "%A" // prints 14


Someone took an interesting stab at it: http://stackoverflow.com/questions/2281693/is-it-a-good-idea...

Sadly this is one of the reasons I still write shell scripts: pipes are so explicit and compact. Python seems to gather more of a practical functional crowd these times around.


I like Haskell's function composition syntax: .

(f . g)(x) = f(g(x))

cat.open.find ...


I wish I could upvote this ten times. Fantastic presentation.


This is awesome. I think I finally understand generators, and I just thought of a couple use cases in the code I'm working on now. This is great. Fantastic submission.


The design pattern the author presents for expressing iteration and the log parsing example is original and beautiful. I hope the author reads this comment thread and sees how appreciative we all are for his contribution.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: