Hacker News new | past | comments | ask | show | jobs | submit login

Unfortunately, no, this doesn't solve the problem of RAM usage by worker Python processes: they are still separate OS processes, although they look like "normal"[1] Erlang/Elixir processes from that side.

By "replacing" I mean pretty literally getting the same effect as with Celery. [EDIT: BTW, in the recent post about Celery it was praised because it lets you compose background/async tasks. Of course, Erlang has something like that, too: https://chrisb.host.cs.st-andrews.ac.uk/skel-test-master/tut...] There really is no other reliable way around the GIL in Python other than multiprocessing.

The main idea here is that we should keep IO-bound code on the Elixir side as it's simply more efficient and easier to write there, but to delegate CPU-bound code to a pool of external processes. But this is only one of many possible patterns of integration: I can imagine a Django project where most of the logic is on the Python side and Elixir only handles WebSockets/long polling connections. Thanks to ErlPort passing data and calling functions between Elixir and Python is effortless, you can call an Elixir function just as easily as you can make Elixir call a Python function in a different process.

Moreover, ErlPort supports Ruby, so you can have workers written in it, too. There's no problem with workers in other languages, too, as long as you write the glue code yourself (or generate using Elixir macros).

[1] It bears repeating: Erlang processes are not OS-level processes. Erlang Virtual Machine, BEAM, runs in a single OS-level process. Erlang processes are closer to green-threads or Tasklets as known in Stackless Python. They are extremely lightweight, implicitly scheduled user-space tasks, which share no memory. Erlang schedules its processes on a pool of OS-level threads for optimal utilization of CPU cores, but this is an implementation detail. What's important is that Erlang processes are providing isolation in terms of memory used and error handling, just like OS-level processes. Conceptually both kinds of processes are very similar, but their implementations are nothing alike.




If you're having RAM problems, you don't want to start up each processor independently as a "port". You'd want to write a Python server that offers a socket, start one instance of that that listens to the socket and then forks (since you control the system fully, probably let it fork once per connection rather than worrying about "preforking" or other complicated scenarios for when you don't control the concurrency), and then write the glue code to talk across that socket. [1]

That way, your Python processes load all their modules and do all their initialization, and the forking automatically takes care of Copy-On-Write semantics in the RAM and you end up with only more-or-less one copy of them in RAM.

[1]: I don't know if there's a "perfect" off-the-shelf solution for you, but there's enough existing code in the world with all the pieces that it's pretty easy to do that nowadays. For instance, one very easy protocol that leaps to mind is:

    4 bytes to say how much JSON there is
    a JSON object containing the metadata for your connection
    4 or 8 bytes to say how long the incoming webpage is
    the contents of the webpage you're telling Python to process
It's not perfectly efficient, but it has great bang-for-the-buck, and will carry you a long way before you start needing to do anything else fancier.

You might also be able to bash ErlPort into compliance and get the client and server side going, but, well, this isn't very difficult code to write and it doesn't take much "screwing around with opaque library code that doesn't really want to be used separately" before you could have just written this.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: