I'm... confused. Being able to intercept and modify syscalls is a neat trick, but why is it applicable here?
In python you generally have two kinds of randomness: cryptographically-secure randomness, and pseudorandomness. The general recommendation is: if you need a CSRNG, use ``os.urandom`` -- or, more recently, the stdlib ``secrets`` module. But if it doesn't need to be cryptographically secure, you should use the stdlib ``random`` module.
The thing is, the ``random`` module gives you the ability to seed and re-seed the underlying PRNG state machine. You can even create your own instances of the PRNG state machine, if you want to isolate yourself from other libraries, and then you can seed or reseed that state machine at will without affecting anything else. So for pseudorandom "randomness", the stdlib already exposes a purpose-built function that does exactly what the OP needs. Also, within individual tests, it's perfectly possible to monkeypatch the root PRNG in the random module with your own temporary copy, modify the seed, etc, so you can even make this work on a per-test basis, using completely bog-standard python, no special sauce required. Well-written libraries even expose this as a primitive for dependency injection, so that you can have direct control over the PRNG.
Meanwhile, for applications that require CSRNG... you really shouldn't be writing code that is testing for a deterministic result. At least in my experience, assuming you aren't testing the implementation of cryptographic primitives, there are always better strategies -- things like round-trip tests, for example.
So... are the 3rd-party deps just "misbehaving" and calling ``os.urandom`` for no reason? Does the OP author not know about ``random.seed``? Does the author want to avoid monkeypatches in tests (which are completely standard practice in python)? Is there something else going on entirely? Intercepting syscalls to get deterministic randomness in python really feels like bringing an atom bomb to a game of fingerguns.
The article makes it fairly clear they this is mainly a kind of nerd-sniping - there are better solutions for practical purposes, but the author wanted to explore a different approach and learn a bit about syscall interception along the way.
If you're developing a game, there's a fairly big issue in that many things may be requesting values from, and thus incrementing, the PRNG, and many of them could be indirectly controlled by the user (where they are, where they're looking, etc. https://www.youtube.com/watch?v=1hs451PfFzQ is a fun video about reverse-engineering Zelda to predict the randomness in a minigame)
As far as the approach, I agree in that I don't understand why 'no code changes' is that important, especially in the context of Python which has a general attitude of consent towards monkeypatching code. Maybe one of the randomness sources was hashing all the source files? :P
Python has perhaps the least tolerant culture toward monkeypatching of languages that are capable of it. Outside a couple well-known common cases (gevent, I think?) it’s widely frowned upon.
Monkey-patching is usually the wrong solution because python is extremely extensible. These days I can’t think of reasons.
Back in the day, sometimes we had to monkey-patch interface layers like database drivers and other code that was open to modification but closed to extension. Usually to disable some legacy or proprietary feature that broke everything else. Like “you have to use a database from 1993” and it had a `assert check_winxp_version()` or something dumb in an `__init__.py` top-level.
These days, there are mature or python-native solutions to all of those that I recall.
-
However! This article is more like using the debugger and ptrace as a Game Genie or save editor than about the utility of `prng = random.Random(123)`. The actual point of the article wasn’t much about python ;)
To be fair to the OP, the implicitness in Python in general and the random seeding in particular is confusing, especially if 3rd party modules are involved.
In C++, if you use std::mt19937, everything from seeding to the explicit generator is crystal clear while being terse as well.
In python you generally have two kinds of randomness: cryptographically-secure randomness, and pseudorandomness. The general recommendation is: if you need a CSRNG, use ``os.urandom`` -- or, more recently, the stdlib ``secrets`` module. But if it doesn't need to be cryptographically secure, you should use the stdlib ``random`` module.
The thing is, the ``random`` module gives you the ability to seed and re-seed the underlying PRNG state machine. You can even create your own instances of the PRNG state machine, if you want to isolate yourself from other libraries, and then you can seed or reseed that state machine at will without affecting anything else. So for pseudorandom "randomness", the stdlib already exposes a purpose-built function that does exactly what the OP needs. Also, within individual tests, it's perfectly possible to monkeypatch the root PRNG in the random module with your own temporary copy, modify the seed, etc, so you can even make this work on a per-test basis, using completely bog-standard python, no special sauce required. Well-written libraries even expose this as a primitive for dependency injection, so that you can have direct control over the PRNG.
Meanwhile, for applications that require CSRNG... you really shouldn't be writing code that is testing for a deterministic result. At least in my experience, assuming you aren't testing the implementation of cryptographic primitives, there are always better strategies -- things like round-trip tests, for example.
So... are the 3rd-party deps just "misbehaving" and calling ``os.urandom`` for no reason? Does the OP author not know about ``random.seed``? Does the author want to avoid monkeypatches in tests (which are completely standard practice in python)? Is there something else going on entirely? Intercepting syscalls to get deterministic randomness in python really feels like bringing an atom bomb to a game of fingerguns.