Is anyone using zeromq to coordinate multiple Python interpreters in the same process? - 【StackMirror】|python|multithreading|zeromq

I love Python's global interpreter lock because it makes the underlying C code simple. But it means that each Python interpreter main loop is restricted to one thread at a time. This is bad because recently the number of cores per processor chip has been doubling frequently.

One of the supposed advantages to zeromq is that it makes multi-threaded programming "easy" or easier.

Is it possible to launch multiple Python interpreters in the same process and have them communicate only using in-process zeromq with no other shared state? Has anyone tried it? Does it work well? Please comment and/or provide links.

2012-04-04 18:38
by Aaron Watters

Why do you want threads if you don't want shared state? This doesn't make any sense. Use multiple processes instead, e.g. multiprocessing or IPython's parallel computing features (which use zmq, btw) - Sven Marnach 2012-04-04 18:46

@Sven Marnach - I think "Why do you want threads if you don't want shared state? This doesn't make any sense" is equivalent to saying it makes no sense to have more than one Erlang 'Process' inside each Erlang VM. A thread is typically cheaper than an Operating system process, so there is some benefit. 'Sharing' state by communication is an Erlang and Go lang approach, and can be very efficient when the OS process boundary is shared. Programs might still be understandable, external connections might be handled by pools of relatively simple 'threads' with good throughput - gbulmer 2012-04-04 18:56

@SvenMarnach: I'm guessing that having multiple threads in the same process and using in-memory messaging would be faster than using multiple processes and inter-process messaging -- am I wrong - Aaron Watters 2012-04-04 19:00

@AaronWatters: This might depend on the platform. The overhead of creating a process is higher than the overhead of creating a thread. On some platforms, most notably Windows, the overhead of context switches between threads is cheaper than between processes. If you are using inter-process communication techniques, the cost of the communication should not depend on whether you are comunicating between processes or threads - Sven Marnach 2012-04-04 19:14

And finally, there are even common NUMA hardware architectures (notably AMD Opteron) where using multiple processes will be a lot faster than mutiple threads since each process can be placed in the memory that works fastest with the core the process is assigned to - Sven Marnach 2012-04-04 19:15

@Sven Marnach - "If you are using interprocess communication techniques, the cost of the communication should not depend on whether you are comunicating between processes or threads." I think the point is not to use inter-process communication, but use intra-process communication, which should be much cheaper if optmised. Further, like the Erlang mechanisms, it might offer the flexibility to choose local processes or remote, and abstract the developer away from most of the details of the mechanisms - gbulmer 2012-04-04 19:22

@Sven Marnach - "And finally, there are even common NUMA hardware architectures (notably AMD Opteron) where using multiple processes will be a lot faster than mutiple threads" AFAIK the same mechanism for managing processes affinity can be used for threads. There is no benefit to processes. Further, there may be less context change overhead for threads vs processes because virtual memory translation buffers do not need to be changed/flushed for threads within the same process - gbulmer 2012-04-04 19:26

@gbulmer: My comments are very much in the specific context of this question, and you are answering on a much broader scope. Erlang processes and Goroutines are in no way comparable to having multiples Python interpreters in a single process (which I still think isn't particularly useful). The interpreters will have their own versions of all loaded modules, so they are not lightweight - Sven Marnach 2012-04-04 21:09

@gbulmer: Ad "AFAIK the same mechanism for managing processes affinity can be used for threads. There is no benefit to processes." – this is wrong. All threads share the same memory (except for the stack), so it cannot be mapped in an optimal way for all cores. For some applications, you will get real speed-ups by using processes, even if this introduces additional communication. (And yes, I did this before and measured.) I never claimed there are no advantages to using threads. I only claim it is pointless in this very specific situation - Sven Marnach 2012-04-04 21:11

@Sven Marnach - I am assuming there will be re-engineering to enable multiple Python interpreters to work correctly in the same process. So I can not agree with "Erlang processes and Goroutines are in no way comparable to having multiples Python interpreters in a single process" - gbulmer 2012-04-06 21:54

@Sven Marnach - "'AFAIK .. processes affinity ... threads. There is no benefit to processes.' – this is wrong". Agreed, that was unclear. I am happy to relax to 'There is more cost to communicating across process boundaries, when compared with communicating between threads in a single process, and affinity can keep threads on the same core'. "All threads share the same memory (except for the stack), so it cannot be mapped in an optimal way for all cores." Yes, I agree there is a balance between threads and processes, i.e. more than one process/core - gbulmer 2012-04-06 22:08

@gbulmer: Ad "I am assuming there will be re-engineering to enable multiple Python interpreters to work correctly in the same process." Multiple Python interpreters in a single process are already supported. And they are in no way comparable to Erlang processes or Goroutines - Sven Marnach 2012-04-06 23:44

@Sven Marnach - I'm British, so tend to understate. I wrote "I am assuming there will be re-engineering ... to work correctly in the same process". Maybe I should have written: IMHO there will need to be re-engineering so multiple Python interpreters in one process are light enough to yield significant benefit to concurrent intra-process communication. I'm not assuming a Python interpreter instance would be comparable to an Erlang process (> million processes/GByte RAM), but instead assuming instances are closer to the scale of threads on a core. But 'threads' on it might be light - gbulmer 2012-04-07 00:40

@gbulmer: Hey, this is not fair! You are redefining words in the middle of a discussion! I can see what concept you are talking about, but you cannot call it "multiple Python interpreters in a single process", because the name is already taken for something else that semantically is not lightweight, and no re-engineering could turn into that new concept. Again, I'm looking at this very much in the context of current CPython, since this is what the OP was asking about - Sven Marnach 2012-04-07 11:02

@Sven Marnach - "You are redefining words in the middle of a discussion!" Oh, sorry. I read "Is it possible to launch multiple Python interpreters in the same process ..." as giving me freedom to do anything to the Python interpreter to make that work well. No wonder we disagree:-) I didn't assume the OP had a specific implementation in mind. If you constrain it to a solutions that exists and you know works, that's good. My model assumes significant effort, then simplify towards an existing implementation. Maybe I'll ask "why are Python interpreter instances so heavy"?-). Apologies - gbulmer 2012-04-07 12:30

@gbulmer: I can certainly see this as a valid way of reading the question. Well, it turns out we don't disagree, we are just talking about different things. Python interpreter instances are heavy because they are meant to be as completely separated against each other as possible. They have their own instances of all loaded modules, including private instances of all code objects - Sven Marnach 2012-04-07 13:23

I don't know of any way to create multiple instances of the Python interpreter within a single process, but I do have experience with splitting multiple instances across multiple processes and communicating with zmq.

I've been using multiprocessing to implement an island-model architecture for global optimization, with zmq for managing communication between the islands. Each island is its own process with its own Python interpreter, created and managed by the master archipelago process.

Using multiprocessing allows you to launch as many independent Python interpreters as you wish, but they all reside in their own processes with a separate memory space. I believe the OS scheduler takes care of assigning processes to cores and sharing CPU time. The separate memory space is the hardest part, because it means you have to explicitly communicate. To communicate between processes, the objects/data you wish to send must be serializable, because zmq sends byte-strings.

The nice thing about zmq is that it's a piece of cake to scale across systems distributed over a network, and it's pretty lightweight. You can create just about any communication pattern you wish, using REP/REQ, PUB/SUB, or whatever.

But no, it's not as easy as just spinning up a few threads from the threading module.

Edit: Also, here's a Stack Overflow question similar to yours. Inside are some more relevant links which indicate that it may be possible to run multiple Python interpreters within a single process, but it doesn't look simple. Multiple independent embedded Python Interpreters on multiple operating system threads invoked from C/C++ program

2012-04-04 18:55
by Brendan Wood

Even when using multiprocessing, you can share memory between the processes, for example using numpy-sharedmem. It's not strictly necessary to communicate explicitly - Sven Marnach 2012-04-07 10:51

@Sven Marnach - in comments on the question, you say that multiple Python interpreters can be launched in one process. This answer says they do not know how. What have I missed? Could you please point me (us?) at reference for doing what you describe - gbulmer 2012-04-07 12:36

@gbulmer: See the official documentation. This feature is very rarely used, since it is almost always better to use multiple processes – see also Rakis' comment in the thread linked above - Sven Marnach 2012-04-07 13:12

@Sven Marnach - Brilliant! Thanks for the links - gbulmer 2012-04-07 16:45