FDSN mass downloader http issue

Hi all,

I’m downloading a rather large noise data set from IRIS using the obspy.clients.fdsn.mass_downloader download(). I’ve broken the request down by station and year to ease the request on the DMC.

My request per network/station:

domain = RectangularDomain(minlatitude=min_lat, maxlatitude=max_lat,
minlongitude=min_lon, maxlongitude=max_lon)

restrictions = Restrictions(
starttime=start,
endtime=end,
chunklength_in_sec=86400,
reject_channels_with_gaps=False,
network=net,
station=sta,
location="",
channel=chan,
minimum_length=0.0,
sanitize=False)

Use IRIS,SCEDC,NCEDC

mdl = MassDownloader(providers=[“IRIS”, “SCEDC”,“NCEDC”])
mdl.download(domain, restrictions, mseed_storage=MSEED,
stationxml_storage=XML,threads_per_client=3,
download_chunk_size_in_mb=50)

I’m getting an http client error:

Traceback (most recent call last):
File “/Users/thclements/anaconda3/lib/python3.5/http/client.py”, line 541, in _get_chunk_left
chunk_left = self._read_next_chunk_size()
File “/Users/thclements/anaconda3/lib/python3.5/http/client.py”, line 508, in _read_next_chunk_size
return int(line, 16)
ValueError: invalid literal for int() with base 16: b’’

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/Users/thclements/anaconda3/lib/python3.5/http/client.py”, line 558, in _readall_chunked
chunk_left = self._get_chunk_left()
File “/Users/thclements/anaconda3/lib/python3.5/http/client.py”, line 543, in _get_chunk_left
raise IncompleteRead(b’’)
http.client.IncompleteRead: IncompleteRead(0 bytes read)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “download_inter_station_dist.py”, line 249, in
min_lon, max_lon,MSEED,XML)
File “download_inter_station_dist.py”, line 144, in download
download_chunk_size_in_mb=50)
File “/Users/thclements/anaconda3/lib/python3.5/site-packages/obspy/clients/fdsn/mass_downloader/mass_downloader.py”, line 204, in download
threads_per_client=threads_per_client)
File “/Users/thclements/anaconda3/lib/python3.5/site-packages/obspy/clients/fdsn/mass_downloader/download_helpers.py”, line 836, in download_mseed
[(self.client, self.client_name, chunk) for chunk in chunks])
File “/Users/thclements/anaconda3/lib/python3.5/multiprocessing/pool.py”, line 260, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File “/Users/thclements/anaconda3/lib/python3.5/multiprocessing/pool.py”, line 608, in get
raise self._value
File “/Users/thclements/anaconda3/lib/python3.5/multiprocessing/pool.py”, line 119, in worker
result = (True, func(*args, **kwds))
File “/Users/thclements/anaconda3/lib/python3.5/multiprocessing/pool.py”, line 44, in mapstar
return list(map(*args))
File “/Users/thclements/anaconda3/lib/python3.5/site-packages/obspy/clients/fdsn/mass_downloader/download_helpers.py”, line 821, in star_download_mseed
*args, logger=self.logger)
File “/Users/thclements/anaconda3/lib/python3.5/site-packages/obspy/clients/fdsn/mass_downloader/utils.py”, line 222, in download_and_split_mseed_bulk
client.get_waveforms_bulk(bulk, filename=temp_filename)
File “/Users/thclements/anaconda3/lib/python3.5/site-packages/obspy/clients/fdsn/client.py”, line 910, in get_waveforms_bulk
data=bulk.encode(‘ascii’, ‘strict’))
File “/Users/thclements/anaconda3/lib/python3.5/site-packages/obspy/clients/fdsn/client.py”, line 1328, in _download
timeout=self.timeout, use_gzip=use_gzip)
File “/Users/thclements/anaconda3/lib/python3.5/site-packages/obspy/clients/fdsn/client.py”, line 1707, in download_url
data = io.BytesIO(f.read())
File “/Users/thclements/anaconda3/lib/python3.5/http/client.py”, line 455, in read
return self._readall_chunked()
File “/Users/thclements/anaconda3/lib/python3.5/http/client.py”, line 565, in _readall_chunked
raise IncompleteRead(b’’.join(value))
http.client.IncompleteRead: IncompleteRead(10571265 bytes read)

This is usually failing around 10500000 bytes read ~ 10.5 Mb. I’m only using 3 threads per client, which shouldn’t be a problem on my machine. Thanks!

Cheers,

Tim

Hi Tim,

I just tested on my machine with some random station and time interval and it works fine. Can you send me the exact times and station you used so I can try to reproduce it?

In any case - this is not an error in ObsPy but at the lower level http lib of Python. I think that ObsPy should catch it but this particular error is (I think) related to server side problems. Can you start the mass downloader with

mdl = MassDownloader(providers=[“IRIS”, “SCEDC”,“NCEDC”], debug=True)

This will print a ton of output but it should enable you to figure out which exact call fails. Once you know this: can you send it to us? If you launch it with only one thread it is easier to interpret. Should� look a bit like this:

Downloading with requesting gzip compression Sending along the following payload: ---------------------------------------------------------------------- IU ANMO 00 BHZ 2015-05-07T00:00:00.000000 2015-05-13T00:00:00.000000 ----------------------------------------------------------------------

Cheers!

Lion

Hi Lion,

I wasn’t able to reproduce the error with 1 thread but it did fail again with threads_per_client = 3. Here is CI VES from 2010-01-01 to 2011-01-01. After this error is thrown, the script keeps downloading data. I looked at nettop, it’s still creating new threads after the error. I’m running this through an IPython (4.2.0) terminal with Python 3.5.2. One other question I have: I’m getting ~1 Mb/s download speed per thread with the MassDownloader when requesting from IRIS. Is that normal? I would have thought it would be a bit faster (on IRIS’s side - thanks for all your work on Obspy!).

Cheers,

Tim

[2017-05-08 17:07:50,949] - obspy.clients.fdsn.mass_downloader - INFO: Successfully initialized 3 client(s): IRIS, SCEDC, NCEDC.
[2017-05-08 17:07:50,949] - obspy.clients.fdsn.mass_downloader - INFO: Total acquired or preexisting stations: 0
[2017-05-08 17:07:50,949] - obspy.clients.fdsn.mass_downloader - INFO: Client ‘IRIS’ - Requesting reliable availability.
Downloading http://service.iris.edu/fdsnws/station/1/query?minlatitude=32.0&maxlatitude=40.0&format=text&starttime=2010-01-01T00%3A00%3A00.000000&endtime=2011-01-01T00%3A00%3A00.000000&level=channel&minlongitude=-125.0&network=CI&station=VES&location=–&channel=BH%3F&maxlongitude=-114.0&matchtimeseries=true with requesting gzip compression
Downloaded http://service.iris.edu/fdsnws/station/1/query?minlatitude=32.0&maxlatitude=40.0&format=text&starttime=2010-01-01T00%3A00%3A00.000000&endtime=2011-01-01T00%3A00%3A00.000000&level=channel&minlongitude=-125.0&network=CI&station=VES&location=–&channel=BH%3F&maxlongitude=-114.0&matchtimeseries=true with HTTP code: 200
[2017-05-08 17:07:51,091] - obspy.clients.fdsn.mass_downloader - INFO: Client ‘IRIS’ - Successfully requested availability (0.14 seconds)
[2017-05-08 17:07:51,229] - obspy.clients.fdsn.mass_downloader - INFO: Client ‘IRIS’ - Found 1 stations (6 channels).
[2017-05-08 17:07:51,230] - obspy.clients.fdsn.mass_downloader - INFO: Client ‘IRIS’ - Will attempt to download data from 1 stations.
[2017-05-08 17:07:51,305] - obspy.clients.fdsn.mass_downloader - INFO: Client ‘IRIS’ - Status for 2190 time intervals/channels before downloading: NEEDS_DOWNLOADING
Downloading http://service.iris.edu/fdsnws/dataselect/1/query with requesting gzip compression
Sending along the following payload: