massdownloader error

Hi Obspy users,

I am currently trying to download a large amount of continuous waveform
data, hence the massdownloader is a good fit.

Unfortunately I am encountering the following error:

[2017-10-10 16:48:14,857] - obspy.clients.fdsn.mass_downloader - INFO:
Client 'IRIS' - Successfully downloaded 30 channels (of 50)
Traceback (most recent call last):
  File "Get_data_massdownload_v2.py", line 53, in <module>
    mdl.download(domain, restrictions,
mseed_storage=data,stationxml_storage=xml,threads_per_client=1)
  File
"/home/mgal/anaconda2/lib/python2.7/site-packages/obspy/clients/fdsn/mass_downloader/mass_downloader.py", line 200, in download
    threads_per_client=threads_per_client)
  File
"/home/mgal/anaconda2/lib/python2.7/site-packages/obspy/clients/fdsn/mass_downloader/download_helpers.py", line 851, in download_mseed
    [(self.client, self.client_name, chunk) for chunk in chunks])
  File "/home/mgal/anaconda2/lib/python2.7/multiprocessing/pool.py",
line 251, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/home/mgal/anaconda2/lib/python2.7/multiprocessing/pool.py",
line 567, in get
    raise self._value
KeyError: 95

It is somehow connected to the multiprocessing module. I tried to set
threads to 1, but it still runs into the same issue.
The error is reproducible on multiple machines and occurs for
network = IU
Station = COR
starttime = 1995,1,1
endtime = 1996,1,1

Any help would be greatly appreciated.

Cheers,
Martin

University of Tasmania Electronic Communications Policy (December, 2014).
This email is confidential, and is for the intended recipient only. Access, disclosure, copying, distribution, or reliance on any of it by anyone outside the intended recipient organisation is prohibited and may be a criminal offence. Please delete if obtained in error and email confirmation to the sender. The views expressed in this email are not necessarily the views of the University of Tasmania, unless clearly intended otherwise.

Hi Martin,

I've not seen this particular bug before and it is hard to tell what is
going wrong without being able to reproduce it. Can you send me the code
and your exact system configuration?

You could also just execute `obspy-runtests` on the shell and then
acknowledge that you want to report the test results at end of the run -
then we'll see information about your installation on `tests.obspy.org`.

Given that this error is raised within Python it might also goes away if
your update your Python. As you are using conda you can just do:

$ conda create -n obspy_py36 python==3.6

$ source activate obspy_py36

$ conda install -c conda-forge obspy

and then see if it works with that new Python. (`source activate
obspy_py36` will make the new Python your active environment - `source
deactivate` will then deactivate it again so your old installation does
not disappear).

Cheers!

Lion

Hi Lion,

thanks a lot for the advice. I have set up a py3.6 env, but the bug
remains with the same error msg (enclosed at the bottom of the email).

I am running Ubuntu 14.04, with an Intel Xeon processor.
Since I followed your advice in using 3.6, all packages are pretty much
up to date, Obspy is 1.0.3

The error msg originates from the following simplified script.

domain = RectangularDomain(minlatitude=-80, maxlatitude=80,
                           minlongitude=-180, maxlongitude=180)

for y in range(1995,1996):
    origin_time = obspy.UTCDateTime(y, 1, 1)
    data = '/media/mgal/Data/IU_2/COR/%i/Waveforms/'%(origin_time.year)
    xml = '/media/mgal/Data/IU_2/COR/%i/Stations/'%(origin_time.year)

    restrictions = Restrictions(
        starttime=obspy.UTCDateTime(y, 1, 1),
        endtime=obspy.UTCDateTime(y+1, 1, 1),
        chunklength_in_sec=86400,
        network="IU",
        station='COR',
        reject_channels_with_gaps=True,
        sanitize=False,
        channel_priorities=["LH[Z]"],
        location_priorities=["", "00", "10",'20'])

    mdl = MassDownloader(providers=["IRIS"])
    mdl.download(domain, restrictions,
mseed_storage=data,stationxml_storage=xml,threads_per_client=1)

With the resulting error msg:

Traceback (most recent call last):
  File "Get_data_massdownload_v2.py", line 53, in <module>
    mdl.download(domain, restrictions,
mseed_storage=data,stationxml_storage=xml,threads_per_client=1)
  File
"/home/mgal/anaconda2/envs/obspy_py36/lib/python3.6/site-packages/obspy/clients/fdsn/mass_downloader/mass_downloader.py", line 200, in download
    threads_per_client=threads_per_client)
  File
"/home/mgal/anaconda2/envs/obspy_py36/lib/python3.6/site-packages/obspy/clients/fdsn/mass_downloader/download_helpers.py", line 851, in download_mseed
    [(self.client, self.client_name, chunk) for chunk in chunks])
  File
"/home/mgal/anaconda2/envs/obspy_py36/lib/python3.6/multiprocessing/pool.py", line 260, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File
"/home/mgal/anaconda2/envs/obspy_py36/lib/python3.6/multiprocessing/pool.py", line 608, in get
    raise self._value
  File
"/home/mgal/anaconda2/envs/obspy_py36/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File
"/home/mgal/anaconda2/envs/obspy_py36/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File
"/home/mgal/anaconda2/envs/obspy_py36/lib/python3.6/site-packages/obspy/clients/fdsn/mass_downloader/download_helpers.py", line 836, in star_download_mseed
    *args, logger=self.logger)
  File
"/home/mgal/anaconda2/envs/obspy_py36/lib/python3.6/site-packages/obspy/clients/fdsn/mass_downloader/utils.py", line 234, in download_and_split_mseed_bulk
    info = get_record_information(fh)
  File
"/home/mgal/anaconda2/envs/obspy_py36/lib/python3.6/site-packages/obspy/io/mseed/util.py", line 323, in get_record_information
    endian=endian)
  File
"/home/mgal/anaconda2/envs/obspy_py36/lib/python3.6/site-packages/obspy/io/mseed/util.py", line 489, in _get_record_information
    if ENDIAN[word_order] != endian:
KeyError: 95

Hope that helps to narrow it down.
Cheers,
Martin

Hi Martin,

indeed - it did help to find the issue. The problem was that one (or
more I guess) of the mini-SEED records there were downloaded had an
invalid word order set in blockette 1000, which caused the exception.
The mass downloader uses the MiniSEED headers to split up the files
after they have been downloaded in bulk.

This PR here fixes this: https://github.com/obspy/obspy/pull/1926

It should flow into the next ObsPy version which we'll release in the
next couple of days so if you can wait that would be the easiest
solution. Alternatively you can also update your ObsPy installation to
the PR's branch. The latest master (and the next release) also adds
explicit `exclude_networks` and `exclude_stations` arguments for more
control.

Cheers!

Lion