Libmseed warnings: a hack to to help identify problem data?

Something that has been nagging at me for a while is the common warning

InternalMSEEDWarning: readMSEEDBuffer(): Not a SEED record. Will skip bytes …

…when trying to read a malformed miniseed file (seems to happen if a seedlink transmission is interrupted in a particular way, but also pretty common for older temporary networks). The problem is that in many cases, I only see the error randomly when working with thousands++ of files and there is nothing in the warning that helps identify the offending bit of data.

This is likely more of an libmseed issue, but can anyone think of an easy way to add, at least some or all of the NSLC header info to this warning? Hack would be somewhere in obspy/io/mseed.headers.py I imagine but not obvious enough for me, or perhaps that info isn’t easily accessible in the below

class _LibmseedWrapper(object):
    """
    Wrapper object around libmseed that tries to guarantee that all warnings
    and errors within libmseed are properly converted to their Python
    counterparts.

    Might be a bit overengineered but it does the trick and is completely
    transparent to the user.
    """
    def __init__(self, lib):
        self.lib = lib
        self.verbose = True

    def __getattr__(self, item):
        func = getattr(self.lib, item)

        def _wrapper(*args):
            # Collect exceptions. They cannot be raised in the callback as
            # they could never be caught then. They are collected and raised
            # later on.
            _errs = []
            _warns = []

            def log_error_or_warning(msg):
                msg = msg.decode()
                if msg.startswith("ERROR: "):
                    msg = msg[7:].strip()
                    _errs.append(msg)
                if msg.startswith("INFO: "):
                    msg = msg[6:].strip()
                    _warns.append(msg)

            diag_print = \
                C.CFUNCTYPE(None, C.c_char_p)(log_error_or_warning)

            def log_message(msg):
                if self.verbose:
                    print(msg[6:].strip())
            log_print = C.CFUNCTYPE(None, C.c_char_p)(log_message)

            # Hookup libmseed's logging facilities to it's Python callbacks.
            self.lib.setupLogging(diag_print, log_print)

            try:
                return func(*args)
            finally:
                for _w in _warns:
                    warnings.warn(_w, InternalMSEEDWarning)
                if _errs:
                    msg = ("Encountered %i error(s) during a call to "
                           "%s():\n%s" % (
                               len(_errs), item, "\n".join(_errs)))
                    raise InternalMSEEDError(msg)
        return _wrapper

I think one of the main things to have in mind is that at the point where this warning message pops up, there’s an invalid block, so there is no NSLC to hand over. So one thing could be to

  • print info about the previous or next valid block.. or
  • print info on the object that contained the invalid data

The latter might be useful, if reading a file given a filename or path, or not so useful given a chunk of bytes or file like object.

That being said and low level considerations aside, you could probably just catch the warning on a high level, putting a wrapper around any obspy "read"s done in your code as needed and introspect yourself, simple example:

import warnings
from obspy import read


def readit(*args, **kwargs):
    with warnings.catch_warnings(record=True) as w:
        st = read(*args, **kwargs)
        for _w in w:
            if 'Not a SEED' in str(_w):
                print("doh! do something")
                print(st)
                print(args, kwargs)
                break
    return st


readit("/tmp/mseed_invalid.mseed")
doh! do something
1 Trace(s) in Stream:
NL.HGN.00.BHZ | 2003-05-29T02:13:22.043400Z - 2003-05-29T02:18:20.693400Z | 40.0 Hz, 11947 samples
('/tmp/mseed_invalid.mseed',) {}
1 Like