Hi everyone,
I noticed an ‘unexpected’ (at least with respect to what I understood from the documentation) behavior of the FDSN mass downloader feature in terms of selected starttime and enditme.
If I specify some UTC starttime and endtime values, I would expect the downloaded data to exactly match them. But if I download the data using the mass downloader instead of an individual Client, I find that the traces are cut using slightly different times.
Here is a simple reproducible example using station GE.MATE and a sample time window:
import obspy
from obspy.clients.fdsn.mass_downloader import CircularDomain,\
Restrictions, MassDownloader
cl_GFZ = obspy.clients.fdsn.Client("GFZ")
starttime=obspy.UTCDateTime("2024-04-23T00:05:00")
endtime=obspy.UTCDateTime("2024-04-23T00:10:00")
#use mass_downloader to get test data for GE.MATE
domain = CircularDomain(latitude=40.65, longitude=16.70,
minradius=0.0, maxradius=0.3)
restrictions = Restrictions(
starttime=starttime,
endtime=endtime,
network="GE",
station="MATE",
location="",
channel="HH*",
reject_channels_with_gaps=True)
mdl = MassDownloader(providers=[cl_GFZ])
mdl.download(domain, restrictions, mseed_storage='GE_MATE_mseed',
stationxml_storage='GE_MATE_metadata')
mseed_from_mdownl = obspy.read('GE_MATE_mseed/*')
#use Client to get test data for GE.MATE
mseed_from_client = cl_GFZ.get_waveforms("GE", "MATE", "*", "HH*",
starttime, endtime)
#compare results from Client and mass_downloader
print(mseed_from_client)
print(mseed_from_mdownl)
I am using Python 3.10.13. For the data requested through Client, I get:
3 Trace(s) in Stream:
GE.MATE…HHZ | 2024-04-23T00:05:00.000000Z - 2024-04-23T00:10:00.000000Z | 100.0 Hz, 30001 samples
GE.MATE…HHE | 2024-04-23T00:05:00.000000Z - 2024-04-23T00:10:00.000000Z | 100.0 Hz, 30001 samples
GE.MATE…HHN | 2024-04-23T00:05:00.000000Z - 2024-04-23T00:10:00.000000Z | 100.0 Hz, 30001 samples
For the data requested through mass_downloader, I get:
3 Trace(s) in Stream:
GE.MATE…HHE | 2024-04-23T00:04:57.880000Z - 2024-04-23T00:10:00.860000Z | 100.0 Hz, 30299 samples
GE.MATE…HHN | 2024-04-23T00:04:56.340000Z - 2024-04-23T00:10:00.980000Z | 100.0 Hz, 30465 samples
GE.MATE…HHZ | 2024-04-23T00:04:57.180000Z - 2024-04-23T00:10:03.420000Z | 100.0 Hz, 30625 samples
As you can see, data obtained through the Client has the correct start and endtimes, while data obtained through mass_downloader has ‘strange’ start and endtimes, which are even different from one channel to another.
Am I missing some restrictions parameter to enforce the usage of exact times? I also tried setting chunklength_in_sec=60*5
, but the result doesn’t change.
I also tried to check the code to better understand, but came to a dead end. The only thing that raises my suspicions is the fact that the estimated channel sampling rate used by the mass_downloader.download_helpers.ClientDownloadHelper.download_mseed function to define data chunks size does not correspond to the actual sampling rate in the data (the SR dictionary in the helper function uses 250 Hz for H channels, while the selected data has SR of 100 Hz).
Any idea what’s missing / what I did wrong?
Thank you,
Laura