Hacker News new | past | comments | ask | show | jobs | submit login

Multiprocessing is a terrible solution to the GIL. You gain parallelism, but you are then required to serialize/deserialize and duplicate every shared object.

In Scala/Java, I might build a single immutable object (taking up e.g. 1kb) and transmit it to 10 actors. They use it as needed and let the GC deal with it when finished. In Python, I need to serialize it, transmit it 10 times and use 10kb of memory to store the copies.

The GIL is a flaw in the language. We should accept that. There are workarounds and hacks, but the GIL is still a flaw.

(Incidentally, my background is in python. My usage of Scala/Java is far more recent.)




Technically, the GIL is a "flaw" in the implementation. The language does not specify a GIL, it's just an implementation detail in cpython.

I don't necessarily agree that it's a flaw, but that's another discussion entirely.


And in CPython it's less that the GIL is the flaw, but more the refcounted GC. Back in '96 there was a patch to remove the GIL, run of the mill single threaded python code ran 2-6x slower (largely dependent on threading implementation used) due to all the locking overhead around the refcount updates happening all the time. When you have a language that is perceived as slow already, making the vast majority of typical scripts of the time that much slower to allow MT python to be faster was going to be a very difficult thing to sell.


In this day an age, the inability to compute two things at once is a pretty major flaw, if you ask me.


Using Python for high performance computation is also a flaw, if you ask me.


> Multiprocessing is a terrible solution to the GIL. You gain parallelism, but you are then required to serialize/deserialize and duplicate every shared object.

"You" are not required to do the serialization. Multiprocessing does it automatically behind the scenes. The only time it becomes relevant is when something can't be serialised.

It is correct though that serialization consumes cpu and time while it is happening - something that doesn't happen when all actors are local to the process. However the moment you do serialisation you can also do it across machines, or nodes within a machine which gives far greater scope for parallelism assuming the ratio of processing work to size of serialised data is large.


However the moment you do serialisation you can also do it across machines, or nodes within a machine which gives far greater scope for parallelism assuming the ratio of processing work to size of serialised data is large.

You can also do that with Akka, for example.

It's true that you can't avoid serialization when you need to work across multiple boxes. That doesn't mean serialization and IPC should be forced upon you the minute you want to parallelize. There are a LOT of jobs that can be handled by 2-8 cores, provided your language/libraries give support for it.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: