I am trying to read many miniseed files using obspy, which is operated in parallel by multiprocessing.pool. Some processes will be deadlock when the cores of the pool are larger than 1. Codes are as follows:
import os
import time
import obspy
import multiprocessing
def ob_read(file, output_file):
try:
with open(output_file, 'a') as f:
f.write('open file' + '\n')
st = obspy.read(file)
with open(output_file, 'a') as f:
f.write(str(st) + '\n')
except Exception as err_msg:
print("sub_task():error message=%s" % str(err_msg))
return None
def ob_read_parallel(data_path, cores, output_path):
pool = multiprocessing.Pool(processes=cores)
tasks = []
for path, dir_list, file_list in os.walk(data_path):
for file in file_list:
full_file = os.path.join(path, file)
output_file = os.path.join(output_path, file)
if os.path.exists(output_file):
os.remove(output_file)
# ob_read(full_file, output_file)
tasks.append((full_file, output_file))
rs = pool.starmap_async(ob_read, tasks, chunksize=1)
pool.close()
while True:
remaining = rs._number_left
print("finished:{0}/{1}".format(len(tasks) - remaining, len(tasks)),
end='\r')
if rs.ready():
break
time.sleep(0.5)
print ('\n')
pool.join()
if __name__ == '__main__':
data_path = '../data/mseed'
output_path = '../result'
cores = 5
ob_read_parallel(data_path, cores, output_path)
pass
For example, if we read 245 files, the output will stop at:
finished: 190/245
The python environment is created by ‘conda create -n test_mseed python=3.7’. Obspy is installed via pip.
- obspy version: 1.2.2
- python: 3.6 or 3.7
- platform: CentOS Linux release 7.5.1804 (Core)
I find changing the start method of the child processes as follows
multiprocessing.set_start_method("forkserver") # or 'spawn'
can solve this problem. Is there any inappropriate use of ‘thread’ or ‘lock’ in obspy when reading miniseed causing this error?