FDSN mass_downloader time range behavior

Hi everyone,

I noticed an ‘unexpected’ (at least with respect to what I understood from the documentation) behavior of the FDSN mass downloader feature in terms of selected starttime and enditme.

If I specify some UTC starttime and endtime values, I would expect the downloaded data to exactly match them. But if I download the data using the mass downloader instead of an individual Client, I find that the traces are cut using slightly different times.

Here is a simple reproducible example using station GE.MATE and a sample time window:

import obspy
from obspy.clients.fdsn.mass_downloader import CircularDomain,\
    Restrictions, MassDownloader

cl_GFZ = obspy.clients.fdsn.Client("GFZ")
starttime=obspy.UTCDateTime("2024-04-23T00:05:00")
endtime=obspy.UTCDateTime("2024-04-23T00:10:00")

#use mass_downloader to get test data for GE.MATE
domain = CircularDomain(latitude=40.65, longitude=16.70,
                        minradius=0.0, maxradius=0.3)  
restrictions = Restrictions(
    starttime=starttime,
    endtime=endtime,
    network="GE",
    station="MATE",
    location="",
    channel="HH*",
    reject_channels_with_gaps=True)
mdl = MassDownloader(providers=[cl_GFZ])
mdl.download(domain, restrictions, mseed_storage='GE_MATE_mseed',
             stationxml_storage='GE_MATE_metadata')
mseed_from_mdownl = obspy.read('GE_MATE_mseed/*')

#use Client to get test data for GE.MATE
mseed_from_client = cl_GFZ.get_waveforms("GE", "MATE", "*", "HH*", 
                                         starttime, endtime)

#compare results from Client and mass_downloader
print(mseed_from_client)
print(mseed_from_mdownl)

I am using Python 3.10.13. For the data requested through Client, I get:

3 Trace(s) in Stream:
GE.MATE…HHZ | 2024-04-23T00:05:00.000000Z - 2024-04-23T00:10:00.000000Z | 100.0 Hz, 30001 samples
GE.MATE…HHE | 2024-04-23T00:05:00.000000Z - 2024-04-23T00:10:00.000000Z | 100.0 Hz, 30001 samples
GE.MATE…HHN | 2024-04-23T00:05:00.000000Z - 2024-04-23T00:10:00.000000Z | 100.0 Hz, 30001 samples

For the data requested through mass_downloader, I get:

3 Trace(s) in Stream:
GE.MATE…HHE | 2024-04-23T00:04:57.880000Z - 2024-04-23T00:10:00.860000Z | 100.0 Hz, 30299 samples
GE.MATE…HHN | 2024-04-23T00:04:56.340000Z - 2024-04-23T00:10:00.980000Z | 100.0 Hz, 30465 samples
GE.MATE…HHZ | 2024-04-23T00:04:57.180000Z - 2024-04-23T00:10:03.420000Z | 100.0 Hz, 30625 samples

As you can see, data obtained through the Client has the correct start and endtimes, while data obtained through mass_downloader has ‘strange’ start and endtimes, which are even different from one channel to another.

Am I missing some restrictions parameter to enforce the usage of exact times? I also tried setting chunklength_in_sec=60*5 , but the result doesn’t change.

I also tried to check the code to better understand, but came to a dead end. The only thing that raises my suspicions is the fact that the estimated channel sampling rate used by the mass_downloader.download_helpers.ClientDownloadHelper.download_mseed function to define data chunks size does not correspond to the actual sampling rate in the data (the SR dictionary in the helper function uses 250 Hz for H channels, while the selected data has SR of 100 Hz).

Any idea what’s missing / what I did wrong?

Thank you,
Laura

The data comes from the server usually as MiniSEED and MiniSEED comes in chunks of data (“records”) that won’t match any specific given time exactly. So the FDSN server usually sends a little bit more at start and end. The normal FDSN client simply has a Stream.trim(start, end) command in there to cut it down to the exact time.
In the MassDownloader, the data coming from the server as MiniSEED is saved to disk as is without any modification, keeping every piece of data down to the various record level metadata flags (like clock flags etc), so that is why the data will not be cut to the exact sample time you requested, which would mean that the data would have to be re-encoded as new MiniSEED, which would lose all record level metadata.

2 Likes

Thank you megies for the reply.

Could it be possible to add this information in the mass downloader documentation? I believe it could be helpful, as probably most users would expect the same behavior from the Client downloader and the mass downloader, unless stated.

Where exactly would you want to see that piece of info? Feel free to open a pull request, the docstrings etc can be edited even online in github.