How to process PPSD for yearly data

Korhan · November 4, 2022, 10:13am

Hi All,

This is not a question about an error or a problem but more about best practices. I have day-long miniseed files that consist of 14 stations sitting in separate folders for each year.

For instance;
2021001.mseed
2021002.mseed
…
…
2021365.mseed

At the moment, I can process this data with a script like the one below reading each day-long file one by one and adding the previously calculated PPSD values.

import os
from obspy import read, read_inventory
from obspy.io.xseed import Parser
from obspy.signal import PPSD

files = os.listdir()

wf = [i for i in files if i.endswith(".mseed")]
wf = sorted(wf)

stas = ["STA1", "STA2", "STA3"]
chas = ["BHZ", "BHN", "BHE"]

for f in wf:
    st = read(f)

    for sta in stas:
        for cha in chas:
            try:
                tr = st.select(id=".."+sta+".."+cha)[0]
                inv = read_inventory(".".join([sta,cha,"RESP"]))
                ppsd = PPSD(tr.stats, metadata=inv)
                ppsd.add(st)
                if os.path.exists(sta+"."+cha+".npz"):
                    print("Found old psd...adding to the current one...")
                    ppsd.add_npz(sta+"."+cha+".npz")
                ppsd.save_npz(sta+"."+cha+".npz")
            except:
                print("There was a problem processing the channel", sta, cha, f)

What is the best way to calculate the PPSD for the whole year of data for each station? Probably there are better ways to make it more efficient.

Thank you,
Korhan

trichter · November 5, 2022, 8:54pm

It is a while since I calculated PPSDs, but your approach looks fine to me. I used to use a command line script for this task.

megies · November 7, 2022, 12:37pm

Looks like your code should work in general, but usually it is very advisable to make try/except cases as less “catch all” as possible and as specific as possible and that select() seems a bit off. Also, saved npz data is lacking metadata so it’s up to the user to not load together npz data that isn’t matching.

I would probably reorganize the loops and rather loop over files after setting up a PPSD for each sta+cha. That seems a bit safer to not mix together data from different channels. But again, I think your code would’ve worked probably.

for sta in stas:
    for cha in chas:
        inv = read_inventory(".".join([sta,cha,"RESP"]))
        ppsd = None

        for f in wf: 
            st = read(f)
            st.select(station=sta, channel=cha)
            if ppsd is None:
                ppsd = PPSD(st[0].stats, metadata=inv)
            ppsd.add(st)
        ppsd.save_npz(sta+"."+cha+".npz")

Korhan · November 7, 2022, 1:56pm

Thank you both @trichter and @megies for your answers. Knowing that I am in the right direction with my code is good. I also agree that the try and except method is not the preferred way to catch errors but I just wanted the process to keep running when there is an error and let me know which file has a problem.