Python Multiprocessing: How to add or change number of processes in a pool

Go To StackoverFlow.com

5

I have created a pool from the python multiprocessing module and would like to change the number of processes that the pool has running or add to them. Is this possible? I have tried something like this (simplified version of my code)

class foo:
    def __init__():
        self.pool = Pool()
    def bar(self, x):
        self.pool.processes = x
        return self.pool.map(somefunction, list_of_args)

It seems to work and achieves the result I wanted in the end (which was to split the work between multiple processes) but I am not sure that is this the best way to do it, or why it works.

2012-04-04 17:22
by sdiemert


2

I don't think this actually works:

import multiprocessing, time

def fn(x):
    print "running for", x
    time.sleep(5)

if __name__ == "__main__":
    pool = multiprocessing.Pool()
    pool.processes = 2

    # runs with number of cores available (8 on my machine)
    pool.map(fn, range(10))

    # still runs with number of cores available, not 10
    pool.processes = 10
    pool.map(fn, range(10))

multiprocessing.Pool stores the number of processes in a private variable (ie Pool._processes) which is set at the point when the Pool is instantiated. See the source code.

The reason this appears to be working is because the number of processes is automatically set to the number of cores on your current machine unless you specify a different number.

I'm not sure why you'd want to change the number of processes available -- maybe you can explain this in more detail. It's pretty easy to create a new pool though whenever you want (presumably after other pools have finished running).

2012-04-04 18:06
by Noah
I am doing some Natural Language Generation, the particular application of it requires that I do a lot of filtering of the words I choose, which is extremely slow if run on only one process. I was hoping to run a process for each paragraph I wanted to generate per page (I generate pages one at a time) So 10 pages with 4-6 paragraphs each requires 4-6 process to run per page. I guess I was hoping that Pool would do 'garbage collection' for me on finished processes and I could create a new one for every paragraph I created. Though I think I may have missed the point of multiprocessing - sdiemert 2012-04-04 23:44
You can make a single pool and submit as many jobs as you want. If there are more jobs than number of processes, it will only run simultaneously as many processes as there are available cores on your machine. All the jobs will get finished, and you'll get an approximate speed-up of n-fold, where n is the number of cores on your machine. You're unlikely to get a speed-up of greater than n, although I suppose it's possible depending on what the rate-limiting part of your process is - Noah 2012-04-05 14:44


0

You can by using the private variable _processes and private method _repopulate_pool. But I wouldn't recommend using private variables etc.

pool = multiprocessing.Pool(processes=1, initializer=start_process)
>Starting ForkPoolWorker-35

pool._processes = 3
pool._repopulate_pool()
>Starting ForkPoolWorker-36
>Starting ForkPoolWorker-37
2018-06-07 20:15
by Christian Will
Ads