Input data format

Dear Dr. Eulenfeld,

I aim to discover the seismic attenuation property at the caldera by applying the Qopen to our dataset. Our event and P- and S-wave traveltime information is in the hyp file format from Nonlinloc; Waveforms data are response-removed and all station recordings for each event are stored as a mseed file, also, Waveform data are response-removed and each station with each component for each event is stored as a SAC file; An inventory is ready and probably not necessary since the response has already been removed.

From examples on Github, several varieties of conf.json, “events”, and “data” are present, and I do not have a clear idea of which one to follow. I am writing this email to ask you for some recommendations on how to adjust the conf.json file and the event, travel time, and waveform format to fit well into the requirements of Qopen. Are there any examples that can give a proper reference to start with? Thanks for your development of Qopen and I am looking forward to your reply.

Regards,

Steve

Hi Steve!

Welcome to the forum!

The station inventory is needed anyway for the coordinates and the channel descriptions (level channel). The problem with HYP files is that the channel in the arrival data is often not fully specified with a SEED id. I had a similar problem in this repository, which belongs to an BSSA publication. However, event metadata was in HYPODDPHA format. But the resolve_seedid parameter is also present for ObsPy’s HYP read support. See also this configuration.
Hopefully something like this will get you started:

# data part conf.json
"events": "globexpression/*.hyp",
"inventory": "globexpression/*.xml",
"read_events_kwargs": {"resolve_seedid": true},
#"filter_inventory": {"channel": "HH?", "location": ""},  # necessary if multiple sampling rates are defined in you XML
"data": "plugin",
"plugin": "data : get_waveforms",

If you still have problems, perhaps you could post a sample hyp file and a printout of picks/arrivals from obspy.read_events() for one of your files.

Using the plugin interface for data is best when dealing with large datasets like yours. See the repositories referenced above for examples of data.py files. Either read the SAC files and set the remove_response option to null, or read the raw MSEED waveforms and set remove_reponse to either "sensitivity" or "full".
The get_waveforms function is passed an event, station name, a period of time and so on and Qopen expects this function to return a three-component stream of specified time period and station. Here is the code snippet from Qopen that will call this function:

        evid, station = pair
        seedid = one_channel[station][:-1] + '?'
        net, sta, loc, cha = seedid.split('.')
        t1 = origins[evid].time + request_window[0]
        t2 = origins[evid].time + request_window[1]
        kwargs2 = {'network': net, 'station': sta, 'location': loc,
                   'channel': cha, 'starttime': t1, 'endtime': t2,
                   'event': event_dict[evid]}
        stream = get_waveforms(**kwargs2)

Good luck with Qopen and your research! And keep us updated.

Best,
Tom

Edit: I just fixed the documentation of read_nlloc_hyp (it is the documentation of master branch, docs.obspy.org still points to the unfixed documentation of the latest release). The parameters in the bottom can be used in the read_events_kwargs dictionary to work around the problem of not fully specified channels. Actually, the only thing the resolve_seedid option does is to pass the inventory to the read_events function for this purpose. Qopen queries the picks/arrivals with the full SEED id of the station channels. Therefore, it is important that waveform_id in picks and station channels have the same SEED ids.

This link includes one example of hyp file from Nonlinloc:
https://drive.google.com/drive/folders/1voqj3N3wsSa2Kl0zuqOEUlRALVTk23aD?usp=sharing

I use Obspy to read it and the code and its outputs are as follows:
from obspy import read
cat = read_events(“./qopen.hyp”)
event = cat[0]
print(event)
Event: 2007-07-15T02:50:34.786718Z | +65.018, -16.271

         resource_id: ResourceIdentifier(id="smi:local/c7df8948-454c-43c2-b17f-704d1ee8594c")
       creation_info: CreationInfo(author='FirstName LastName   obs:./obs/new_hpf_120320/20070715025039100000.nonlinloc   NLLoc:v7.00.00(27Oct2017)', creation_time=UTCDateTime(2024, 3, 12, 20, 45), version='')
 preferred_origin_id: ResourceIdentifier(id="smi:local/554c3038-9f33-4c45-9b2a-7ed3eae75e20")
                ---------
            comments: 1 Elements
               picks: 12 Elements
             origins: 1 Elements

print(event.origins[0])
print(event.origins[0].time)
print(event.origins[0].longitude)
print(event.origins[0].latitude)
print(event.origins[0].depth/1000)
Origin
resource_id: ResourceIdentifier(id=“smi:local/554c3038-9f33-4c45-9b2a-7ed3eae75e20”)
time: UTCDateTime(2007, 7, 15, 2, 50, 34, 786718)
longitude: -16.270705 [uncertainty=0.0017162581521154327]
latitude: 65.01764 [uncertainty=0.0015552960519945266]
depth: 15676.758 [confidence_level=68, uncertainty=359.37584782508685]
depth_type: ‘from location’
quality: OriginQuality(associated_phase_count=12, used_phase_count=12, associated_station_count=-1, used_station_count=6, depth_phase_count=-1, standard_error=0.0528838, azimuthal_gap=82.5845, secondary_azimuthal_gap=90.3778, ground_truth_level=‘-’, minimum_distance=0.027458716797034825, maximum_distance=0.3351861557419701, median_distance=0.16394992604540826)
origin_uncertainty: OriginUncertainty(min_horizontal_uncertainty=241.93200000000002, max_horizontal_uncertainty=306.633, azimuth_max_horizontal_uncertainty=57.4783, preferred_description=‘uncertainty ellipse’, confidence_level=68.0)
creation_info: CreationInfo(author=‘FirstName LastName obs:./obs/new_hpf_120320/20070715025039100000.nonlinloc NLLoc:v7.00.00(27Oct2017)’, creation_time=UTCDateTime(2024, 3, 12, 20, 45), version=‘’)
---------
comments: 1 Elements
arrivals: 12 Elements
2007-07-15T02:50:34.786718Z
-16.270705
65.01764
15.676758

print(event.picks)
print(event.picks[0])
print(event.picks[0].phase_hint)
print(event.picks[0].waveform_id)
print(event.picks[0].waveform_id.network_code)
print(event.picks[0].waveform_id.station_code)
print(event.picks[0].waveform_id.channel_code)
print(event.picks[0].time)
[Pick(resource_id=ResourceIdentifier(id=“smi:local/dc427c57-a4ec-49b3-b23c-95b1b065b16c”), time=UTCDateTime(2007, 7, 15, 2, 50, 37, 730000) [uncertainty=0.02], waveform_id=WaveformStreamID(network_code=‘’, station_code=‘Sta1’, channel_code=‘Z’), phase_hint=‘P’), Pick(resource_id=ResourceIdentifier(id=“smi:local/140369ee-ca4e-4aa7-80b9-e26579f62642”), time=UTCDateTime(2007, 7, 15, 2, 50, 40, 60000) [uncertainty=0.1], waveform_id=WaveformStreamID(network_code=‘’, station_code=‘Sta1’, channel_code=‘N’), phase_hint=‘S’), Pick(resource_id=ResourceIdentifier(id=“smi:local/41ae0954-a9d1-46c8-b123-8be07e372f58”), time=UTCDateTime(2007, 7, 15, 2, 50, 37, 870000) [uncertainty=0.02], waveform_id=WaveformStreamID(network_code=‘’, station_code=‘Sta2’, channel_code=‘Z’), phase_hint=‘P’), Pick(resource_id=ResourceIdentifier(id=“smi:local/fd49d597-b86a-45d6-8eec-b8cf5d4d5166”), time=UTCDateTime(2007, 7, 15, 2, 50, 40, 160000) [uncertainty=0.1], waveform_id=WaveformStreamID(network_code=‘’, station_code=‘Sta2’, channel_code=‘N’), phase_hint=‘S’), Pick(resource_id=ResourceIdentifier(id=“smi:local/980deb47-4eba-450b-8625-524a3d58b8eb”), time=UTCDateTime(2007, 7, 15, 2, 50, 37, 930000) [uncertainty=0.02], waveform_id=WaveformStreamID(network_code=‘’, station_code=‘Sta3’, channel_code=‘Z’), phase_hint=‘P’), Pick(resource_id=ResourceIdentifier(id=“smi:local/f2207e82-a262-4254-9c41-5523139fc5ac”), time=UTCDateTime(2007, 7, 15, 2, 50, 40, 240000) [uncertainty=0.1], waveform_id=WaveformStreamID(network_code=‘’, station_code=‘Sta3’, channel_code=‘N’), phase_hint=‘S’), Pick(resource_id=ResourceIdentifier(id=“smi:local/5c001e88-def4-467e-98bf-fe76a6253f55”), time=UTCDateTime(2007, 7, 15, 2, 50, 37, 990000) [uncertainty=0.02], waveform_id=WaveformStreamID(network_code=‘’, station_code=‘Sta4’, channel_code=‘Z’), phase_hint=‘P’), Pick(resource_id=ResourceIdentifier(id=“smi:local/e7673a36-9577-4bba-9e62-f82036a73b3d”), time=UTCDateTime(2007, 7, 15, 2, 50, 40, 360000) [uncertainty=0.1], waveform_id=WaveformStreamID(network_code=‘’, station_code=‘Sta4’, channel_code=‘N’), phase_hint=‘S’), Pick(resource_id=ResourceIdentifier(id=“smi:local/c5ee25db-4d4e-4070-8304-d72c1309dd37”), time=UTCDateTime(2007, 7, 15, 2, 50, 38, 360000) [uncertainty=0.02], waveform_id=WaveformStreamID(network_code=‘’, station_code=‘Sta5’, channel_code=‘Z’), phase_hint=‘P’), Pick(resource_id=ResourceIdentifier(id=“smi:local/1e323ffa-5105-4662-9683-688d6a919af2”), time=UTCDateTime(2007, 7, 15, 2, 50, 40, 980000) [uncertainty=0.1], waveform_id=WaveformStreamID(network_code=‘’, station_code=‘Sta5’, channel_code=‘N’), phase_hint=‘S’), Pick(resource_id=ResourceIdentifier(id=“smi:local/d5d7a980-ae20-4d98-a772-6040fdd66339”), time=UTCDateTime(2007, 7, 15, 2, 50, 38, 440000) [uncertainty=0.02], waveform_id=WaveformStreamID(network_code=‘’, station_code=‘Sta6’, channel_code=‘Z’), phase_hint=‘P’), Pick(resource_id=ResourceIdentifier(id=“smi:local/9766ef4f-e3cf-4949-8b74-94b1a3314269”), time=UTCDateTime(2007, 7, 15, 2, 50, 41, 110000) [uncertainty=0.1], waveform_id=WaveformStreamID(network_code=‘’, station_code=‘Sta6’, channel_code=‘N’), phase_hint=‘S’)]
Pick
resource_id: ResourceIdentifier(id=“smi:local/dc427c57-a4ec-49b3-b23c-95b1b065b16c”)
time: UTCDateTime(2007, 7, 15, 2, 50, 37, 730000) [uncertainty=0.02]
waveform_id: WaveformStreamID(network_code=‘’, station_code=‘Sta1’, channel_code=‘Z’)
phase_hint: ‘P’
P
WaveformStreamID(network_code=‘’, station_code=‘Sta1’, channel_code=‘Z’)

Sta1
Z
2007-07-15T02:50:37.730000Z

Do you have any recommendations on how to utilize the hyp file in Qopen given the above information? Danke!

Regards,

Steve

Hey! The waveform, station, and event metadata have to fit together. The read_events_kwargs option can be used to manipulate the event metadata on the fly. Have you solved the problem in the meantime?

It is not clear from the posted event metadata alone how to configure everything correctly. If you need more guidance, I would need more information on the waveform and station metadata (best would be some waveform data from the provided event and the STATIONXML).