Trouble with older data

Charles_Ammon · July 8, 2014, 7:10pm

I am having trouble accessing some older waveforms from the IRIS archive. The script below throws the exceptions that seem to suggest it is a problem with the SEED encoding handling. Can anyone see what I am missing?

#!/usr/bin/python

megies · July 9, 2014, 9:16am

Hi Charles,

indeed these data are encoded with an old encoding type that is not
supported by obspy so far. I have opened a ticket to track development
on this: miniseed: support for (legacy) SRO encoding by megies · Pull Request #835 · obspy/obspy · GitHub

As this might take a while, and will be included in 0.10.0 at the
earliest, as a quick workaround for now you can simply fall back to
IRIS's "old" timeseries web service that also serves data in converted
formats (ascii, sac, ...):

client.timeseries('AS', 'CTAO', '--', 'LHZ', tstart, tstart+100,
output="ascii", filename="data.ascii")
st = read("data.ascii")

Of course you could use something like StringIO to avoid going via local
files.

best,
Tobias

Joachim_Saul · July 9, 2014, 10:34am

Tobias Megies [07/09/2014 11:16 AM]:

as a quick workaround for now you can simply fall back to
IRIS's "old" timeseries web service

A much easier hotfix is to just comment out two few lines in obspy/mseed/core.py, in my case lines 213 and 215:

212 info = util.getRecordInformation(mseed_object, endian=bo)
213 # info['encoding'] = ENCODINGS[info['encoding']][0]
214 # Only keep information relevant for the whole file.
215 info = { # 'encoding': info['encoding'],
216 'filesize': info['filesize'],

Charles, in your case the line numbers may differ; have a look at your backtrace.

The result...

>>> from obspy import read
>>> st = read("test.mseed")
>>> print(st[0].stats)
          network: AS
          station: CTAO
         location:
          channel: LHZ
        starttime: 1982-01-12T01:40:48.600000Z
          endtime: 1982-01-12T03:55:11.600000Z
    sampling_rate: 1.0
            delta: 1.0
             npts: 8064
            calib: 1.0
          _format: MSEED
            mseed: AttribDict({u'dataquality': u'M', 'record_length': 4096, 'filesize': 16384, 'number_of_records': 4, 'byteorder': u'>'})
>>> print(st[0].data)
[ 126 67 -11 ..., 496 1243 1870]
>>>

... looks like it worked. This is not particularly surprising as ObsPy uses libmseed as backend, which includes support for DE_SRO.

HTH

Cheers,
Joachim

LionKrischer · July 9, 2014, 10:50am

Hi all,

here is a proper fix for those that really need it right now:
https://github.com/obspy/obspy/commit/d9736fc7f119b85ca7c87a074fc1e3c5f5078566

It would be really interesting to get some MiniSEED files in non-standard encodings so we can test if they work with ObsPy.

Right now we do not support MSEED files with the following encodings but it would be trivial to add in most cases:

* 24 bit integers
* GEOSCOPE Multiplexed Format 24 bit integer
* GEOSCOPE Multiplexed Format 16 bit gain ranged, 3 bit exponent GEOSCOPE Multiplexed Format 16 bit gain ranged, 4 bit exponent US * * National Network compression
* CDSN 16 bit gain ranged
* Graefenberg 16 bit gain ranged
* IPG - Strasbourg 16 bit gain ranged
* STEIM (3) Compression
* HGLP Format
* DWWSSN Gain Ranged Format RSTN 16 bit gain ranged

So if someone has a file, please let us know!

Cheers!

Lion

Ringler_Adam · July 14, 2014, 2:06pm

Hello Lion,

We have some CDSN format data, some HGLP, and some DWWSSN data. Let me know how much of it you need or if you would like anything in specific. I can put it on an ftp for you.

Best,
Adam

LionKrischer · July 14, 2014, 2:56pm

Hi Adam,

we already managed to find some CDSN and DWWSSN encoded data. The HGLP encoded data on the other hand would be very interesting along with the same data in some other format so we can test if it works correctly. Do you also happen to have some documentation on the HGLP data encoding or some code that can read/write it?

You can follow the current state of MiniSEED encoding in ObsPy here:
https://github.com/obspy/obspy/pull/835

We are still missing test data in the following exotic encodings:

GEOSCOPE Multiplexed Format 24 bit integer
GEOSCOPE Multiplexed Format 16 bit gain ranged, 3 bit exponent
24 bit integers
US National Network compression
Graefenberg 16 bit gain ranged
IPG - Strasbourg 16 bit gain ranged encoding
STEIM 3
RSTN 16 bit gain format

If someone has access to data in any of these formats, please let us know.

Thanks a lot!

Lion

Ringler_Adam · July 14, 2014, 3:42pm

Hello Lion,

Bob Hutt was able to recall the data format as well as provide me with a copy of the format. I have put this along with two examples of data in the following location: ftp://aslftp.cr.usgs.gov/pub/users/aringler/HGLPdata.tar

I have also included our LISS programs that we use to read this (I don’t know if it will compile for you or not). The different format types are described in LISS-utils-2.0.0/src/libs/dcc_seed

HG*.seed are the seed files
BlockInfo* are the blockette summaries and what they contain
ALQ_1973_088 and CTA_1976_090 are two example days of data in bdf format (essentially ASCII).

Let me know if you have any questions or want any other data.

Best,
Adam

LionKrischer · July 14, 2014, 4:22pm

Hello Adam,

thanks a lot for everything! The data you uploaded unfortunately contains SRO encoded data records. The BlockInfo files also state that. According to the source, the LISS program also cannot read HGLP encoded data, but it can read “US National network encoding”, and “STEIM 3” both of which libmseed and ObsPy cannot deal with I believe. Do you by chance happen to have test data?

What is the license of the LISS package? The included steim123 library is GPL but steim3 is documented enough so we could just roll our own as we require an LGPL compatible license. For the rest of the library is appears to be unspecified.

All the best,

Lion

Chad_Trabant · July 14, 2014, 4:36pm

Hi Lion and others,

I have never actually seen any HGLP encoded samples, all of the HGLP project data I have ever seen was in SRO encoding (and that’s why libmseed does not support HGLP encoding).

I have similarly never seen Steim3 encoded data “in the wild” (thus no libmseed support).

I believe all of the US Nat. Net. encoded data has been replaced at the DMC with Steim compressed data (the USNN encoded data had errors), so that encoding is basically unused as far as we are concerned.

If you find any data using encodings not supported for reading in libmseed please let me know, I basically implemented everything that I could find as if I were a user accessing data at various data centers.

cheers,
Chad

LionKrischer · July 15, 2014, 7:15am

Hi Chad,

thanks for the clarification.

With the latest changes ObsPy now utilises all of libmseed in regards to encoding support so that should be fine.

To guard against regressions it would be nice if we had test files with the following two encodings:

GEOSCOPE Multiplexed Format 24 bit integer
GEOSCOPE Multiplexed Format 16 bit gain ranged, 3 bit exponent

Otherwise we just trust that it works similar to the other encodings.

Cheers and thanks everyone!

Lion

Ringler_Adam · July 15, 2014, 6:29pm

Hello Lion,

Bob Hutt double checked the HGLP data and all of it has been encoded in SRO (there is likely no HGLP miniseed data that is encoded as anything other than SRO).

The dumpseed program is public domain (it was written under the USGS):

Public Domain Software by Scott Halbert - Allied Signal Technical
Services Corporation under contract to the Albuquerque Seismological
Laboratory - United States Geological Survey - Department of Interior,
United States, North American Continent, Earth/Sol 3, Sagitarius Arm of
Milky Way Galaxy.

We don’t have any examples of Steim 3 or US Nat. Net. (IRIS would likely have it if we did).

Best,
Adam