How to retain information on file names when reading a stream object from multple files

ftilmann · August 31, 2022, 12:53am

Maybe I am overlooking something obvious, but is there a way to know which filename belongs to which trace object in a stream.
My use case is the following. I read some single channel files using a glob pattern:

    all_streams=read(f'{datadir}/{station}/{station}_??[Z12]_????.??.??-??????.sac')
    all_streams.sort(keys=['starttime','channel'])

then do a bunch of processing in waveform triplets (ZNE), and write out some results. I would like to retain the information which set of files contributed to which result, as for display I have to reload a small subset in a different script for visualisation together with information from the results file.
If I knew which trace in the stream came from which filename I could associate the filenames with the results and the problem would be easy.
My first thought was to use glob.glob to generate the filelist and then use index to understand the association but the order in the stream object is destroyed by the stream.sort(). I am now thinking that I use glob.glob to generate the filelist, read traces one by one, creating a hash with a trace ‘fingerprint’ (probably from starttime, and get_id) and then rediscover the filename from the hash. But this seems awkward. Is there an easier way?

megies · August 31, 2022, 10:21am

With your above use case of using glob, currently it’s not possible to tell what trace came from what file. Even when reading a single file, we don’t currently save the filename in stats part of the trace (which we could easily do, and that might be useful and worth opening an issue/pull request).

So long story short, currently you’d have to change your code to make use of e.g. pathlib.Path.glob() to first do the globbing, and then do something like..

all_paths = ...glob("...")
st = Stream()
for path in all_paths:
    st_tmp = read(path)
    for tr in st_tmp:
        tr.stats['path'] = path
    st += st_tmp

Note that if you do stuff like merging traces etc. you’re custom stored info might get invalidated kind of, if that makes sense.