Closed1
M1 mac でmultiprocessに失敗する問題の対処法
問題
multiprocess.Poolの実行時にAttributeError: Can't get attribute "xxx"
というエラーメッセージが出力される。
def parallelize_dataframe(df, func, num_cores=2):
""" Utility function that distributes the application
of function func on dataframe df by using Pool()
"""
dfs = np.array_split(df, num_cores)
with Pool(num_cores) as pl:
df = pd.concat(tqdm(pl.imap(func, dfs), total=len(dfs)))
pl.close()
pl.join()
return df
def transform(df):
df["source"] = df["source"].apply(format_cell)
return df
num_cores = cpu_count()
print(f"nb of cores used {num_cores}")
df_formatted = parallelize_dataframe(df, transform, num_cores=num_cores)
Process SpawnPoolWorker-43:
Traceback (most recent call last):
File "/opt/anaconda3/envs/pytorch_39/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/opt/anaconda3/envs/pytorch_39/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/opt/anaconda3/envs/pytorch_39/lib/python3.9/multiprocessing/pool.py", line 114, in worker
task = get()
File "/opt/anaconda3/envs/pytorch_39/lib/python3.9/multiprocessing/queues.py", line 368, in get
return _ForkingPickler.loads(res)
AttributeError: Can't get attribute 'transform' on <module '__main__' (built-in)>
対処方法
source: https://stackoverflow.com/questions/67999589/multiprocessing-with-pool-throws-error-on-m1-macbook
OSXのstart methodはfork
だが、get_start_method()
で取得されるstart methodはspawn
になっている。明示的にfork
を呼び出すことでエラーは出力されなくなる。
import multiprocessing
multiprocessing.get_start_method()
>>> 'spawn'
from multiprocessing import Pool, get_context
def transform(df, func, num_cores=2):
""" Utility function that distributes the application
of function func on dataframe df by using Pool()
"""
dfs = np.array_split(df, num_cores)
with get_context("fork").Pool(num_cores) as pl:
df = pd.concat(pl.map(func, dfs))
pl.close()
pl.join()
return df
このスクラップは2022/08/10にクローズされました