No worries. Multiprocessing plays fairly nicely with ObsPy, I use multiprocessing for processing data in EQcorrscan (see the pre_processing
functions here), but the memory overhead can be a real annoyance. I use multiprocessing to process each Trace
in a Stream
concurrently - but that means that you end up with n_processes
copies of the Stream
in memory. The speed-up can be useful, but it is balanced against the time taken to copy objects.
Because you are processing many short chunks of data, I think that the costs associated with copying in multiprocessing
would mean that you wouldn’t see that much speed-up. If you want to just take the code from EQcorrscan and add in your steps to the process function you could quickly see whether it is worth expending any more effort on multiprocessing in that way.
I would probably go with MPI parallelism where each event (and stream) is handled (potentially, depending on the number of workers) by a separate worker. Going with multiprocessing
over the event loop would also be a possibility, but, because the processes themselves do not need to communicate then the costs of multiprocessing
don’t seem worth it.
Parallelism in Python isn’t that easy… the GIL often limits what can be done. I’m thinking of starting an ObsPy-Accelerated
repository trying to use things like CuPy and numba to speed up some functions.
Although, if you are perusing Python native parallelism I would recommend starting off with concurrent.futures
, which uses multiprocessing
, but provides a nicer, higher-level interface.