NonLinLoc hyp metadata

jwellik · September 25, 2024, 10:29pm

I’m hoping to modify ObsPy’s NonLinLoc _read_single_hypocenter() function so that more metadata is added. I can do the coding, but I’m unsure how developers would prefer it to be done. The specific lines from the NonLinLoc hyp file look like this:

STAT_GEOG ExpectLat 45.251190 Long -121.813478 Depth 13.597396

TRANSFORM AZIMUTHAL_EQUIDIST RefEllipsoid WGS-84 LatOrig 45.374000 LongOrig -121.695000 RotCW 0.000000

STATISTICS ExpectX 11.149988 Y 11.358957 Z 24.587528 CovXX 697.118 XY 16.1396 XZ -27.9779 YY 730.49 YZ -32.1841 ZZ 236.867 EllAz1 42.7698 Dip1 85.1076 Len1 28.6925 Az2 293.445 Dip2 1.62251 Len2 49.3879 Len3 5.111915e+01

The STATISTICS line is already read in like this:

line = lines["STATISTICS"]
covariance_xx = float(line.split()[7])
covariance_yy = float(line.split()[13])
covariance_zz = float(line.split()[17])
stats_info_string = str(
    "Note: Depth/Latitude/Longitude errors are calculated from covariance "
    "matrix as 1D marginal (Lon/Lat errors as great circle degrees) "
    "while OriginUncertainty min/max horizontal errors are calculated "
    "from 2D error ellipsoid and are therefore seemingly higher compared "
    "to 1D errors. Error estimates can be reconstructed from the "
    "following original NonLinLoc error statistics line:\nSTATISTICS " +
    lines["STATISTICS"])
.
.
.
o.comments.append(Comment(text=stats_info_string, force_resource_id=False))

This makes it a little tricky to parse out afterwards.

Would it be acceptable to include this information as a dictionary some how? For example:

# Extract TRANSFORM
line = lines["TRANSFORM"]
o.nlloc_metadata = dict()
o.nlloc_metadata['TRANSFORM'] = {
        'Transform': str(line.split()[0]),
        'Ellipsoid': str(line.split()[2]),
        'LatOrig': float(line.split()[4]),
        'LongOrig': float(line.split()[6]),
        'RotCW': float(line.split()[8])
    }

or just as another comment?

# Extract TRANSFORM
line = lines["TRANSFORM"]
trans_dict = dict({
        'Transform': str(line.split()[0]),
        'Ellipsoid': str(line.split()[2]),
        'LatOrig': float(line.split()[4]),
        'LongOrig': float(line.split()[6]),
        'RotCW': float(line.split()[8])
    })
o.comments.append(trans_dict)

Thanks for any feedback.

megies · September 26, 2024, 9:39am

After only a few minutes thinking about this (so just some quick brainstorming) here is my first thoughts..

could add something like read_events(..., details=False) that can be switched on and then we could just include the full original nonlinloc info complete with all lines unmodified as is as a Comment on the Event object. That would mean it’s both available in Python and also after saving to QuakeML. So no info would get lost along the way. Would make some parsing on user end necessary though, which seems acceptable
personally, I’m a bit wary of adding more special magic attributes like nlloc_metadata, this is usually hard to get documented right, hard for people to even learn/know about etc etc, just seems a bit quick n dirty
we already have a mechanism to add custom tags to all of the event-type objects (event, origin, …) with the extra attribute which is (kind of) well documented and is a mechanism to handle any non-standard xml tags in QuakeML (see docs), so users can already use this to juggle their own self defined xml tags in their data. However, I’m not sure if we want to start “Obspy defined” custom QuakeML tags. I feel like we probably want to stay away from starting a standard within the standard.. not sure

First thought is to keep it simple and maybe just do the first idea, just include the full text info if opted in and user can then go from there.

Other thought is that some of the nonlinloc uncertainty info certainly could be parsed more canonically and in detail, compare Build confidence ellipsoid from NLLOC location files · Issue #1929 · obspy/obspy · GitHub

jwellik · September 26, 2024, 7:27pm

I think the first idea is acceptable. The whole text string as a comment could get preserved in QuakeML.

Another idea came to mind. What about allowing read_nlloc_hyp() to parse all of the info from the .hyp file and return a dictionary as a second output argument? This way, users could call read_nlloc_hyp() to get all the info they wanted without affecting Catalog’s read_event() or adding weird stuff to the Origin object. The idea is below:

# Usage
catalog, nlloc_metadata = read_nlloc_hyp("./path/to/file.hyp")  # returns Catalog, list of dictionaries

# Code modifications
def read_nlloc_hyp(filename, coordinate_converter=None, picks=None, **kwargs):
    .
    .
    .
    cat = Catalog()
    nll_md = []  # *** Add list of NonLinLoc metadata for each event ***
    lines_start = [i for i, line in enumerate(lines)
                   if line.startswith("NLLOC ")]
    lines_end = [i for i, line in enumerate(lines)
                 if line.startswith("END_NLLOC")]
    if len(lines_start) != len(lines_end):
        msg = ("NLLOC HYP file '{}' seems corrupt, number of 'NLLOC' lines "
               "does not match number of 'END_NLLOC' lines").format(filename)
        raise Exception(msg)
    start_end_indices = []
    for start, end in zip(lines_start, lines_end):
        start_end_indices.append(start)
        start_end_indices.append(end)
    if any(np.diff(start_end_indices) < 1):
        msg = ("NLLOC HYP file '{}' seems corrupt, inconsistent "
               "positioning of 'NLLOC' and 'END_NLLOC' lines "
               "detected.").format(filename)
        raise Exception(msg)
    for start, end in zip(lines_start, lines_end):
        event, md = _read_single_hypocenter(
            lines[start:end + 1], coordinate_converter=coordinate_converter,
            original_picks=original_picks, **kwargs)  # *** Now returns two variables ***
        cat.append(event)
        nll_md.append(md)  # *** Append output to list of metadata ***
    cat.creation_info.creation_time = UTCDateTime()
    cat.creation_info.version = "ObsPy %s" % __version__
    return cat, nll_md  # Returns both the Catalog and list of metadata

And then, _read_single_hypocenter() has all the code to parse the NonLinLoc info as a dictionary (for brevity, I’ve only included some of the necessary code here):

def _read_single_hypocenter(lines, coordinate_converter, original_picks,
                            **kwargs):
    .
    . pre-existing code
    .

    ### Parse NonLinLoc metadata:
    # Documentation: http://alomax.free.fr/nlloc/soft3.03/formats.html#_location_hypphs_
    nlloc_metadata = dict()

    # Extract PUBLIC_ID
    line = lines["PUBLIC_ID"]
    nlloc_metadata["PUBLIC_ID"] = {
        'Public_ID': str(line.split()[0]),
    }

    # Extract GRID information
    line = lines["GRID"]
    nlloc_metadata["GRID"] = {
        'xNum': int(line.split()[1]),
        'yNum': int(line.split()[2]),
        'zNum': int(line.split()[3]),
        'xOrigin': float(line.split()[4]),
        'yOrigin': float(line.split()[5]),
        'zOrigin': float(line.split()[6]),
        'dx': float(line.split()[7]),
        'dy': float(line.split()[8]),
        'dz': float(line.split()[9]),
        'gridType': str(line.split()[10]),
    }

    .
    . more code blocks to parse the rest of the file
    .

    return event, nlloc_metadata

Let me know your thoughts. It seems like a good way to leverage all of the work already being done by read_nlloc_hyp (checking the validity of the file, etc.). Anyone getting to read_nlloc_hyp would know how to use it.

I’ve started using these local modifications plus Claudio Satriano’s nllgrid to parse the transform and ellipsoid information so that the ellipse can be plotted on a map with other Catalog data.

I’m happy to do the coding for any approach others find appropriate.

megies · September 27, 2024, 7:51am

@jwellik adding that dictionary to the low level routine seems fine to me. The only issue is that read_nlloc_hyp() is a public function, and our promise is to keep those stable and changing the return type would break code that uses it. While I don’t really like having the return type dynamic, that might be the only way out, i.e. something like read_nlloc_hyp(..., details=False, **kwargs) and then change the return type only if user opts in with details=True. I think that should work for you and I think we could live with that even if it’s a bit hacky.