Creating a CSV picks catalog

gonzalo_antonio_fern · March 25, 2021, 6:44pm

Dear Forum,
I did a small script based on PhasePapy examples (it uses obspy).
The aim is to save data of picks of Z channels (a set of 10 waveforms Z channel) inside a CSV catalog and the compare with other seismic catalogs made manually.
After importing the necessary library I start processing:

file_list = glob.glob('*BO*BBO*Z*mseed')

print (file_list)

# data to trim

A = UTCDateTime("2020-11-10T13:09:00")
B = UTCDateTime("2020-11-10T13:09:00") + 148.

# Filter / picker parameters
picker = fbpicker.FBPicker(t_long=5, freqmin=1, mode='rms', t_ma=10,
    nsigma=5, t_up=0.8, nr_len=2, nr_coeff=2, pol_len=10, pol_coeff=10,
    uncert_coeff=3)


# Pick the waveforms

for wf_file in file_list:
    st = read(wf_file)
     #Here I pick all waveforms

    for tr in st:
        tr.detrend('linear')
        tr_cut = tr.trim(A, B)
        if tr_cut.stats.npts == 0: continue

        scnl, picks, polarity, snr, uncert = picker.picks(tr_cut) # here is the picker
        t_create = datetime.utcnow() 
        summary = fbpicker.FBSummary(picker, tr_cut) # here is the plot of picker
        summary.plot_summary()

Here I try to save each pick information on a DataFrame

        for i in range(len(picks)):
            picks = pd.DataFrame({
                'scnl'     : [scnl] * len(picks),
                'picks'    : list([i.datetime for i in picks]),
                'polarity' : polarity,
                'snr'      : snr,
                'uncert'   : uncert
            })
            picks.to_csv("out.csv")

Unfortunately I am not reaching the objective because only the last pick information is saved inside the out.csv file. An example of the output is:

,picks,polarity,scnl,snr,uncert
0,2020-11-10 13:10:07.620,C,BBOB.SHZ.BO.,8.9,0.04

How can I improve my for loop

for i in range(len(picks):

To allow saving all the picks inside of the DataFrame and then to CSV file.
Thanks a lot
Gonzalo

jschmidt · March 29, 2021, 4:48pm

What your for loop is doing is writing the line to a csv file and then the next loop overwrites that previous line with the next loop, that is why you’re only ending up with a single entry (likely the output from the last for loop). What you need to do is insert a new row to the pandas dataframe with each loop and then write out the whole dataframe to the csv at the end outside the for loop.
I only glanced at this site, but this might give you some direction on how to build the dataframe with each successive loop.

Now if you have a lot of data and there is the potential of it crashing during this looping process and you need each loop to be written to the csv file immediately, then I believe there is a way to “append” your output to the csv file, but I’m not 100% on how to do it. I know there is a way though.
your problem line is…
picks.to_csv(“out.csv”)
it will always overwrite what is in the file with each loop. Either find a way to “append” the output to the dataframe or to the output file, or write the whole dataframe once the for loop is complete.

jschmidt · March 29, 2021, 4:59pm

So it appears the the function to have it write with each loop is to set the mode=‘a’ ## append.
You do need to be mindful that if you want to have a header, that each time the loop executes, that header will be rewritten each time the loop occurs, so it might be best to initialize the file outside the loop with the header that you want, then do the picks.to_csv(“out.csv”, mode=‘a’, header=False)

GracielaRojo · March 31, 2021, 12:17pm

Hello Gonzalo,
I personally use the suggestion jschmidt provided, where the dataframe is declared outside the loop and ‘sncl’, ‘picks’,‘polarity’, ‘snr’ and ‘uncert’ are added on each trace iteration. I declare a dataFrame that can hold all my picks, with zeros on the inside.
To partially prevent losing all your data, I suggest you write your csv table every N number of streams (f.e. 10,20,50,100) using the modulo function. You can name each partial save with a different suffix, if you wish. Note that you would need a counter for your streams .

#before the wf_file loop
savePrefix = ‘picks_’

for i, wf_files in file_list

#inside the wf_file loop, and after all traces have been read
if i%20==0: #saved every 20 stream obejcts
saveSuffix = str(iteration_number)
picks.to_csv(savePrefix’+saveSuffix+‘.csv’)

gonzalo_antonio_fern · March 31, 2021, 1:17pm

Dear @jschmidt and @GracielaRojo,
Thanks for the suggestions, I am taking account of them.
Temporally I fix the code adding a variable all_ = [] before the main for loop and then append data to DataFrame as @jschmidt said.

        for i in range(len(picks)):
            print ('The picks were on....')
            print (scnl, picks[i], polarity, snr, uncert)
            print ('---------------------------------------------------')
            all_.append(
                {
                    'scnl': scnl,
                    'picks': picks[i],
                    'snr': snr,
                    'polarity': polarity,
                    'uncert': uncert
                }
            )
        df = pd.DataFrame.from_dict(all_, orient='columns')

However as @GracielaRojo said I am aware if I can lose data at each iteration, I am using her suggestion to improve the code.

Stay safe and best regards.