Closed1

M1 mac でmultiprocessに失敗する問題の対処法

bilzardbilzard

問題

multiprocess.Poolの実行時にAttributeError: Can't get attribute "xxx"というエラーメッセージが出力される。

def parallelize_dataframe(df, func, num_cores=2):
    """ Utility function that distributes the application 
    of function func on dataframe df by using Pool()
    """
    dfs = np.array_split(df, num_cores)
    with Pool(num_cores) as pl:
        df = pd.concat(tqdm(pl.imap(func, dfs), total=len(dfs)))
        pl.close()
        pl.join()
    return df


def transform(df):
    df["source"] = df["source"].apply(format_cell)
    return df


num_cores = cpu_count()
print(f"nb of cores used {num_cores}")
df_formatted = parallelize_dataframe(df, transform, num_cores=num_cores)
Process SpawnPoolWorker-43:
Traceback (most recent call last):
  File "/opt/anaconda3/envs/pytorch_39/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/opt/anaconda3/envs/pytorch_39/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/anaconda3/envs/pytorch_39/lib/python3.9/multiprocessing/pool.py", line 114, in worker
    task = get()
  File "/opt/anaconda3/envs/pytorch_39/lib/python3.9/multiprocessing/queues.py", line 368, in get
    return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'transform' on <module '__main__' (built-in)>

対処方法

source: https://stackoverflow.com/questions/67999589/multiprocessing-with-pool-throws-error-on-m1-macbook

OSXのstart methodはforkだが、get_start_method()で取得されるstart methodはspawnになっている。明示的にforkを呼び出すことでエラーは出力されなくなる。

import multiprocessing
multiprocessing.get_start_method()
>>> 'spawn'
from multiprocessing import Pool, get_context

def transform(df, func, num_cores=2):
    """ Utility function that distributes the application 
    of function func on dataframe df by using Pool()
    """
    dfs = np.array_split(df, num_cores)
    with get_context("fork").Pool(num_cores) as pl:
        df = pd.concat(pl.map(func, dfs))
        pl.close()
        pl.join()
    return df
このスクラップは2022/08/10にクローズされました