Hacker News new | past | comments | ask | show | jobs | submit login
Abusing Python exceptions to turn a recursive function iterative (gist.github.com)
160 points by mercapto on Feb 27, 2018 | hide | past | favorite | 116 comments



Abusing python can really save your life sometimes. I think my favorite recent hack has been accessing and MODIFYING callers variable, due to inability to get it passed through the Django module to the callback.

  def invite_accepted(sender, email, **kwargs):
      import sys
      caller = sys._getframe(2)
      request = caller.f_locals['request']
      request.session['invite_accepted'] = True


I love how everything is available at runtime. Python's "we are all consenting adults here" philosophy has been super useful in a pinch.

I was monkey patching an external library in JavaScript today, rather than having to fork the thing for one slight modification, and thought about this philosophy again.


Monkey patching is something that you should know how to do. And then learn not to do.

Monkey patching is how you get into situations where the order in which libraries get loaded causes software to unpredictably work or break. Or, worse yet, to work reliably on the developer's machine and break in the wild. (Because the external library on someone else's server loads slower for the developer than their local library, and faster for some users.)


While actually using it in live code is definitely a bad idea, I'll have to say, it makes debugging and prototyping a lot easier and doable interactively.


Never say never. I saved hours on proof of concept code that dies in weeks.

I think you learn when not to do it, which is most of the time.


> dies in weeks.

Code can end up living a lot longer than people think it will (see Y2K bug).


>see Y2K bug

One of my biggest pet-peeves is how since popular media hyped Y2K up as a potential world ending disaster when nothing happened people thought the whole thing was a myth. When it was not a myth (though it was overblown), it was just avoided because people engineered around it. I can't wait to see how people handle the Year 2038 Problem.


Also I've seen managers pushing people to use prototype code directly for the production app, because "why fix what's not broken" and "we need this last week". So I write my prototypes as if they are going to be shipped.


I'm embarrassed to think about how many of my quick hack proof of concepts are still running years later.


Nothing is more permanent than a temporary solution.


s/most of the time/(?:almost )?all of the time/

Things I've been reading about python recently makes it seem to me like python people are re-learning lessons from perl.


Meh, all of these caveats apply to module level vars vs instance vars. Monkey patching can be quite effective if you're careful.


What's the best practice to land a change on a library without monkey patching or self hosting the library?


Send in a pull request on their repo, or publish an issue and beg the maintainer to support your use case.


As a person who has to maintain such systems, I really hate this aspect of Python.


In other words, not everyone involved is consenting to the things they're doing at runtime.


it's also a pain for compilers


If Python is interpreted, what’s the pain? Do you mean static compilers like Cython/pybind11/Boost.Python, or JIT compilers?


What does this mean?


It's hard to make optimizations, because you cannot make a lot of assumptions ahead of time on how the code is going to run.

For instance, one can override most of the builtin functions like len().


I am curious how this effects the compiler in particular?Does the compiler measure this in some way and respond with different behavior? Or is it already designed less efficient to account for these types of operations? Or a mixture of both and/or more nuanced hurdles? Thanks for any info you can provide.


Python 3.6 added a version field to dictionnaries. As all variables are looked up in dictionnaries (aka namespaces), a compiler can replace a call to a builtin by a test of the version of the builtins dict (aka a guarde), which chooses between an optimized version and the "naive" code.

Further reading: https://www.python.org/dev/peps/pep-0509/


I empathize. But I'm not convinced sloppy python is any worse than any other language used sloppily.

Your beef is likely with the reality of doing business and shipping a real product.


The effort required to unwind these technical debts is actively inhibiting our ability to deliver real value. That's the reality we've got.

Sloppy Python is particularly bad, because Python gives you a massive footgun to play with. Like you could write functions that radically alter its behaviour depending on where it's getting called - that's similar to the hack that the parent post was talking about.

It's easy to build systems that are coupled in really weird and unexpected ways, and that makes reasoning about code much harder than it needs to be.


That's not really the languages fault though. I've seen some amazingly bad Java, Python, JavaScript, PHP and bash. Be cross that technical debt was introduced and not paid back rather than at the tools used to do it.


Most languages don't facilitate directly accessing the variables of other stack frames. Of the languages you listed, only Python does that (though bash has equivalent terrible bits), so that in particular _is_ Python's fault.


The reality is that Python provided you with a paying job, because it carried you to where you are. The effort and sheer dumb luck involved in keeping a business alive to the point that some dev can complain about technical debt is a very underappreciated luxury.

Not that you are wrong, just saying...


Oh yeah, the first time I monkey patched a method, it was an "aha" moment. I really understood the 'self' variable, and class as variable, and python just clicked all of a sudden for me.


And then you start hating everything about everything else when you are a teen debugging code written by kids in a consenting adults environment.


It is also the main reason python is so hard to speed up and optimize ; JavaScript is not quite as dynamic, so it can be speeded up more (though super dynamic constructs like “with” do kill all optimization’s and hinder JITting)


A caveat: this example shows that you can read the value of a local variable, not that you can modify it. If you try actually assigning to an element of f_locals, as opposed to just dereferencing it, you'll find that the modification doesn't take effect due to optimizations in CPython's bytecode interpreter.


there's a secret hack to get your `f_locals` changes to propogate back

        exec(
            code_fragment,
            frame.f_globals,
            frame.f_locals
        )
        # update locals  
        ctypes.pythonapi.PyFrame_LocalsToFast(ctypes.py_object(frame), ctypes.c_int(0))

This sets the values back into the right place on the frame you were operating on.

https://pydev.blogspot.jp/2014/02/changing-locals-of-frame-f...



That's a good point, I modified the object to which the variable was referring to (session), not the local variable itself.


Maybe they're running an alternative Python implementation? There are quite a few.


No, I'm saying that the OP's code isn't actually modifying a local variable, because it never assigns to an element of the f_locals dictionary.


You were referring to CPython implementation details, which do not apply to Jython, IronPython, PyPy, and many more.

Alternative Python implementations do not follow all CPython implementation details.

In other words, if this is not part of the Python specification, it might work on other Python implementations even if it does not work on CPython.


> ...Jython...

Which is Java with Python syntax...


No. Jython is a full implementation of Python in Java.

Last I collided with it, it was terribly slow, but that is not the point here.

Some of these alternative implementations are very interesting. PyPy is JIT'ed, and insanely fast compared to CPython. IronPython, IIRC, has no GIL.


To further clarify, it modifies the value of an object referenced by a local variable.


Or rather, it modifies an attribute on the object referenced by a local variable.


To further muddy things, it invokes the setter method on the object referenced by a local variable with a certain argument.


Oh god that's terrible.

But then again, I was playing around with the function bytecode to transfer functions between instances in some of my for-fun hacks (you can access the byte code of a function and manually create functions from byte code in Python through "code objects").


Could you expand on this, perhaps with references that I can read to learn more?


Some search terms that might help: types.FunctionType, __code__.co_code, pickle, cloudpickle/dill


pickle got me my first CTF-flag at an infosec conference - be very careful about using it in production!


Yeah, (un)pickle is essentially eval() in terms of the security hole it opens.


What security holes can pickle cause? We used it for a while while training and saving our ML models which otherwise would have taken a lot of time to retrain each time the system starts.


Shell-code attacks using pickled objects have been around for some time.

A pickle bytestring can execute completely arbitrary code. I have used them in my work.

An easy introduction: https://www2.cs.uic.edu/~s/musings/pickle/

An example:

  payload = b"ctypes\nFunctionType\n(cmarshal\nloads\n(cbase64\nb64decode\n(S'4wAAAAAAAAAAAQAAAAIAAABDAAAAcxYAAABkAWQAbAB9AHwAagFkAoMBAQBkAFMAKQNO6QAAAAD6EGVjaG8gXCMgcm0gLXJmIC8pAtoCb3PaBnN5c3RlbSkBcgMAAACpAHIFAAAA+gc8c3RkaW4+2gdwYXlsb2FkBAAAAHMEAAAAAAEIAQ=='\ntRtRc__builtin__\nglobals\n(tRS''\ntR(tR."
  from pickle import loads; loads(payload) # don't do it...!
(p.s., a matplotlib core dev told me they may move away from their use of pickle for this very reason.)


Python 2.7 didn't have much love to give :-)

  Traceback (most recent call last):
    File "<pyshell#1>", line 1, in <module>
      from pickle import loads; loads(payload) # don't do it...!
    File "...\lib\pickle.py", line 1388, in loads
      return Unpickler(file).load()
    File "...\lib\pickle.py", line 864, in load
      dispatch[key](self)
    File "...\lib\pickle.py", line 1139, in load_reduce
      value = func(*args)
  ValueError: bad marshal data (unknown type code)


Python 3 uses a new pickling format (protocol) by default. That doesn't mean Python 2 isn't vulnerable.


I just left a tongue in cheek comment and you just killed it with a dry reply :( yes, I'm well aware...


Sorry! :)


Well, like eval(), it's only a security issue if you're reading pickled files from untrusted sources (that is, anywhere an attacker could have modified them). If you just ship them along with your Python source files, then it's a moot problem, since the attacker could just edit the source files.


Some environments use code-signing as a security mechanism.

Arbitrary execution via pickle creates holes in this mechanism, requiring additional mitigations.

PEP-551 briefly discusses security motivations and potential mechanisms for environments where code execution is locked down:

https://www.python.org/dev/peps/pep-0551/


I don't know if it works (it's been a loooong time since I looked into it), but here's the gist of it: https://bpaste.net/show/d594943a379e

In that example, subclasses of Runnable can be transferred. However, only the corpus of the "execute" method and the "properties" object is transferred. If you need to access modules, you'd have to import them inside the execute method.


I'm not sure of the context of that, but it seems like you probably could have used a closure to do that in a much less hacky way.


Can you give an example? It's a callback passed to the decorator implemented by the library. The library calls my callback with just the arguments it wants ;(


I think the parent misunderstood "inability to get it passed through the Django module to the callback". At first it sounded to me like you were trying one of your own values.

But if you actually plan to use this code in production, I would strongly advise against it :). There is a better solution available that will survive django upgrades:

Add a middleware that stores the session object in a thread-local variable for the duration of that request. You can then access that variable from your event handler.


> Add a middleware that stores the session object in a thread-local variable for the duration of that request

This has come up many times, and I've never been able to figure out how to do this without something redis. Do you know any open source examples of this?


I am very confused how redis could get involved here. I hereby declare this open source :)

    import threading
    _local = threading.local()

    class SessionKeeperMiddleware:
        def __init__(self, get_response):
            self.get_response = get_response

        def __call__(self, request):
            _local.session = request.session
            response = self.process_request(request)
            _local.session = None
            return response


    def my_callback():
        do stuff with _local.session


Well, the idea is that I would have to store a variable and access it without races, so I would use an external storage library. This is pretty cool though, I've never used python threading for state, thanks a lot!


This doesn't use threading, just a thread-local variable (i.e. a variable that points to different memory locations for each thread) to make the whole thing threadsafe.


I don't have enough context to know if this will work in your situation, but I believe he meant something like the following:

  def create_callback(request):
      def invite_accepted(sender, email, **kwargs):
          request.session['invite_accepted'] = True
      
      return invite_accepted
You would then pass create_callback(request_object) as the callback, and would have access to 'request'.


This won't work since the request is not yet available when the callback is set up.


Yeah, that's why I said I wasn't sure of the context. :) I guess the middleware suggestion someone else posted is probably the idiomatic Django way, then.

In general this is why I don't like the direction Django is going. The Python idiom for this would be to create the callback as a closure after the request is available, and then pass it into the library function that calls the callback. I suspect you might still be able to do this, but you'd have to use a function-based view, and that's no longer idiomatic Django.

It seems like what Django folks want is to write a configuration markup language that is a subset of Python classes and not actually ever write any function bodies. The problem with this approach is that you're limited to the configuration points the framework gives you (in this case, the arguments passed to the callback). They can give you workarounds (i.e. middleware) but this is still fairly hacky. It would be better, IMHO, to stick with function-based views and provide library functions to do what the framework does, which can be called through function-based views in a way that's idiomatic with the host language.

But Django has gone too far down this path to change now, so I guess I just have to accept it. :)


correct


>>Abusing python can really save your life sometimes.

I thought, preventing Abuse in all ways was the biggest point behind Python.


No, the point of the language is to give you maximum power. The point of the community is to make sure this power is wielded responsibly :).


Gross.


This isn't abuse.

This is abuse: https://github.com/ajalt/fuckitpy


> The web devs tell me that fuckit's versioning scheme is confusing, and that I should use "Semitic Versioning" instead. So starting with fuckit version ה.ג.א, package versions will use Hebrew Numerals.

Heh.



It doesn't use SV yet (as evidenced by the fact it's stil using Arabic numerals).


There was a (proposed? I don't remember) library or language that deleted source lines that caused errors. Repeated invocations would eventually yield source that did not produce any errors. Of course, it might be an empty file...


You must be thinking of this: https://github.com/munificent/vigil


That's the one!


Is this essentially "On Error Resume Next" for Python?

[0] https://msdn.microsoft.com/en-us/library/aa266173(v=vs.60).a...

"On Error Resume Next specifies that when a run-time error occurs, control goes to the statement immediately following the statement where the error occurred where execution continues. Use this form rather than On Error GoTo when accessing objects."


The point of "On Error Resume Next" is that you can check the error code on the next line. It changes the error handling from exception-style to error-return-style.


This brings me maniacal joy :D. One time, I had to make the opposite functionality: a streaming module was eating all errors inside a while True loop. This was making debugging really aggravating. So I made a DieWithHonor exception that caused the program to call process.kill() on its own PID >:D


Absolute child's play compared to:

https://gist.github.com/llllllllll/b1ac68a6b77535a64c0b13bfd...

(Python's LOAD_FAST bytecode does `fastlocals[i]`: how can we abuse the lack of bounds checking on this array access?)

(We've also discussed potential extensions this to approach "lift" C-extension code in bytestrings into interpreter objects. This would be useful to escalate existing interpreter attacks in environments that try to lock things down.)


That's amazing, but doesn't work for me (python 3.6.4). Not sure I'm going to try debugging it...

  t = (0, 1)
  tuple_setitem(t, 0, 99)
  print(t)
  # (0, 1)
EDIT: Ah, it seems to work in 3.6.3


I couldn't read this on my phone – it seems GitHub doesn't format Jupyter notebooks on mobile Safari at least. Here's a mobile readable gist: https://gist.github.com/dfee/00c1d7e70abfdbc3ef03d8ec165a104...


Or even better, here it is on the online jupyter notebook viewer. http://nbviewer.jupyter.org/gist/razimantv/1b33d4a090a5bc9ed...


On somewhat similar note dfsch implements tail calls by essentially throwing C-level exception (the VM contains somewhat ugly C macrology that implements exception handling in C in a way that is somewhat similar to WinAPI's exceptions, this is then used to implement TCO, tagbody and condition system). The implementation is arguably ugly, but the end result is that anything that you can do in scheme code you can also do in C native functions.


I've got a sneaking suspicion this is one of those clever, elegant, kind-of-awesome-thanks-python, yet entirely evil things you can, but never actually should do in a production app.

Is there a use case for something like this?


Absolutely not.

If you want an iterative function, write one. It's not harder than writing a recursive one (although sometimes a bit less readable).

Writing an iterative function is also both faster than the recursive function, and much faster than the hack.

Still a neat hack, though.


Well, sometimes it is harder. Think, any problem related to traversing a tree, or similar problems related to trees and/or graphs.


You can just replace the call stack with an explicit stack in your function – that should never be hard, just a bit less readable :).


Yes I can see how this makes sense, but I wouldn't say it is easier than recursion! Maybe a good exercise, but when traversing a tree the recursive solution is very easy to understand. The iterate solution ...

Then you have other problems, where the recursive solution is actually not super easy to understand why it "works" immediately (think, balanced paren generation) and going on to create an iterative solution, while not insanely hard, is not something super obvious.... what's your condition to stop iteration, again?

Maybe I'm thinking about that the wrong way, I'd love to learn more about it!

That problem is closely related to tree traversal, anyway, I should fool around with those two ideas a bit more to properly understand.


I never said it was easier, I just said it was not much harder. :)

Like I said, it is sometimes a bit less readable (tree-walking is a good example of such case), but at the same time, what the code actually does is much clearer. You have no hidden costs, and the code is easier to optimize this way.

You mention a case where you do not fully understand why the recursive solution works, in which case you obviously can't easily write an iterative solution. However, in this case, you are poorly equipped to make any implementation, recursive or iterative.


Can you use some structure like a clojure zipper[1] for traversing trees iteratively?

[1] http://josf.info/blog/2014/03/21/getting-acquainted-with-clo...


We started out trying to write a general method to convert top-down dynamic programming to bottom-up and this was as far as we got. Working around finite stack sizes and generating the topological sort of the function states are the only practical uses that I can think of.

Was more of an interesting challenge personally.


Take a look at https://github.com/plasma-umass/stopify

We can do it for general JS programs. If you are willing to compile Python to JS, you can transparently use our compiler to get heap bounded stacks.


This is a cool hack. Our project (Stopify.org) can do this transformation on source JS programs and automatically give you heap bounded stacks transparently.

A cool result of doing this in JavaScript is that any language that compiles to JavaScript (python, ocaml, Scala, c++, clojure...) can transparently get heap bounded stacks by just using our compiler.

For some more examples, look at pyret (pyret.org) that also supports heap bounded stacks. (In fact, we were able to strip out the Pyret compiler and just use Stopify to get all these benefits)


Here's a neat bit of Python abuse I stumbled across recently... you can define a method to return itself:

    >>> def recurse():
    ...   return recurse
    ...
    >>> recurse
    <function recurse at 0x1083b5f28>
    >>> recurse()
    <function recurse at 0x1083b5f28>
    >>> recurse()()()
    <function recurse at 0x1083b5f28>


This is cool :)

Reminds me of the fact that Python doesn't have a char type and it's strings all the way down:

"a"[0][0][0][0] == "a"


Only with decoded text. If you used encoded text through the "bytes" type, you'll get a unique value as an integer.


Id don't know when this us useful.

Also, what does `recurse()()()` mean? What are we looking for exactly?


in general `f()()` means call the function returned by f() with no arguments. So f()()() means call the function returned by f()() with no arguments. In this case recurse() returns itself so

  recurse == recurse()()()()...
It's more of an interesting observation than anything useful

The syntax can occasionally be useful if you have a function the generates function, but then you'd be calling initial function with some argument like this:

  f_x = f(x)()
  f_y = f(y)()


Means "take the function recurse, execute it, then take the thing it returns and execute that, and then take the thing that returns and execute that."


Any scenario where it is useful to do so?


well "recurse" returns an object, "()" executes the returned object. As that execution returns an executeable object as well, you can just start to chain the "()"s ad infinitum.

its just abusing a core mechanic.


Another fun one:

a=[]

a.append(a)


I haven't done more than skim the code, but is this basically unrolled trampolining via exceptions?


It's a little more devious than that. The key is that the decorator maintains a stack of call parameters, and a map from parameters to results.

When a recursive call happens, it aborts the top-level call, runs the recursive call directly, memoizes the result, and then re-executes the original call. Hence the warning about this only being usable for pure functions.

EDIT: also I just noticed that the map keys aren't the parameters themselves, but their string representations. So heaven help you if you try this with a data type whose str() method isn't one-to-one.


Better to use the method demonstrated in functools' lru_cache: https://github.com/python/cpython/blob/3.6/Lib/functools.py#...


This reminded me of an abomination I made a little while back allowing you to simulate haskell List monad behavior by decorating a generator function. https://gist.github.com/spitz-dan-l/d305ee410e1393a4df19065b...


My internship (trying to compile graphs into cobol) involuntarily ended up abusing java exception because I didn't know how to write graph rewriting systems, induction and backtracking.. I think my employer never ever used that.



note to self: python ppl discovered unwind-protect in 2018.


Nope, we've had "finally:" (the equivalent of unwind-protect) for decades. And this code doesn't use it.


Good. Now discover call/cc.


No thanks, it's a terrible control structure. We already have coroutines for doing most of what you can do with call/cc without its problems.


OT, but as a non-Pythonista, I very much appreciate the way Python does function annotations. Neat and obvious!


Or you could just use generators




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: