Reading corrupted(?) mseed

filefolder · November 16, 2020, 3:13am

Hi there, long shot but I was wondering how I might be able to use obspy to salvage some old miniseed data which appears to have some borked blockette header somewhere or another.

Reading it in the classic way in obspy works, but the start time and the end time (+1 hour theoretically) are the same and it says there are 0 samples. I can scan it in rdseed etc and it shows data is there but seemingly broken into many small slices, and for whatever reason I am struggling to grab hold of it or otherwise get anywhere with the IRIS programs.

I thought maybe if I set the station/location to “normal” values (and not the hex hybrid whatever is shown in bold) w/ msmod that might fix it, but sadly it does not.

In [42]: ri = obspy.io.mseed.util.get_record_information(‘stc3070320230000.BHZ’)

In [43]: ri
Out[43]:
{‘filesize’: 184320,
’station’: ‘\x00stc3’,
** ‘location’: ‘\x00’,**
‘channel’: ‘BHZ’,
‘network’: ‘7S’,
‘npts’: 2016,
‘activity_flags’: 0,
‘io_and_clock_flags’: 0,
‘data_quality_flags’: 0,
‘time_correction’: 0,
‘record_length’: 4096,
‘samp_rate’: 25.0,
‘starttime’: 2007-03-20T23:00:00.218000Z,
‘endtime’: 2007-03-20T23:01:20.818000Z,
‘byteorder’: ‘>’,
‘number_of_records’: 45,
‘excess_bytes’: 0}

Here’s the file (oops new users can’t upload, so https://filebin.net/albkvy9g4h6xgfz4 , 184 kb), if anyone has any luck or tips on how to fix this data, highly appreciated

admin · November 16, 2020, 4:01am

Guest users without an account can’t upload attachments. Anyway, here the mseed file in case the link above expires: stc3070320230000.BHZ (180 KB)

megies · November 16, 2020, 9:18am

You could read your file record-by-record. This will be very slow, but you will be able to recover all data that is not in one of the broken records since each 4k (in your case) miniseed record is completely self contained and not depending on anything else but it’s own header.

Something like…

st_all = Stream()
with open(..., 'rb') as fh:
    while True:
        data = fh.read(4096)
        if not data:
            break
        st = read(data, 'MSEED')
        st_all += st

Might have to put the read data in a BytesIO before passing it to read, not sure, or go with a lower level miniseed reading routing that eats bytes.

megies · November 16, 2020, 9:20am

If the file is broken in other ways, such that records don’t start exactly at multiples of your record length (4096 bytes) throughout the whole file, you’d have to come up with some more sophisticated code looking for positions of record headers.

filefolder · November 17, 2020, 2:44am

thanks for the reply!.. tried on both the broken file given and an equivalent working file from the same recorder, same size etc but both give “embedded null character in path” errors which I think is me just not understanding something about python and binary strings.

sliiiightly editing,

st_all = obspy.Stream()
with open(‘stc3070320230000.BHZ’, ‘rb’) as fh:
while True:
data = fh.read(4096)
if not data: break
f = open(“/dev/shm/crap”,‘wb’); f.write(data); f.close()
st = obspy.read(“/dev/shm/crap”, ‘MSEED’)
st_all += st

which works! (something like this would be a handy “emergency record-by-record” obspy function?) giving me a complete stream with 45 traces in it. but all of those traces still have 0 samples in it so I’m at a loss. I suspect I’m just going to have to hunt for the original pre-miniseed data and try to build this from scratch again.

megies · November 17, 2020, 9:54am

Like I mentioned, I didn’t try that code, just giving an idea, you probably have to wrap it in a BytesIO object…

...
data = fh.read(4096)
bytes_io = io.BytesIO(data)
st = read(bytes_io, format='MSEED')
...

filefolder · November 18, 2020, 7:54am

I didn’t think to use obspy-mseed-recordanalyzer script on these but FWIW here’s what that looks like. The broken one seems to be missing the 1000 blockette.

Here’s a “good” file / example of what I would expect

$ obspy-mseed-recordanalyzer EVA4060203230000.BHZ
FILE: EVA4060203230000.BHZ
Record Number: 0
Record Offset: 0 byte
Header Endianness: Big Endian

FIXED SECTION OF DATA HEADER
Sequence number: 1
Data header/quality indicator: D
Station identifier code: EVA4
Location identifier:
Channel identifier: SHZ
Network code: 7R
Record start time: 2006-02-03T23:00:00.103200Z
Number of samples: 2016
Sample rate factor: 25
Sample rate multiplier: 1
Activity flags: 0
I/O and clock flags: 0
Data quality flags: 0
Number of blockettes that follow: 1
Time correction: 0
Beginning of data: 64
First blockette: 48

BLOCKETTES
1000: Encoding Format: 1
Word Order: 1
Data Record Length: 12

CALCULATED VALUES
Corrected Starttime: 2006-02-03T23:00:00.103200Z

and here’s the problem data

$ obspy-mseed-recordanalyzer stc3070320230000.BHZ
FILE: stc3070320230000.BHZ
Record Number: 0
Record Offset: 0 byte
Header Endianness: Big Endian

FIXED SECTION OF DATA HEADER
Sequence number: 1
Data header/quality indicator: D
Station identifier code: stc3
Location identifier:
Channel identifier: BHZ
Network code: 7S
Record start time: 2007-03-20T23:00:00.218000Z
Number of samples: 2016
Sample rate factor: 25
Sample rate multiplier: 1
Activity flags: 0
I/O and clock flags: 0
Data quality flags: 0
Number of blockettes that follow: 0
Time correction: 0
Beginning of data: 0
First blockette: 0

BLOCKETTES

CALCULATED VALUES
Corrected Starttime: 2007-03-20T23:00:00.218000Z

with some googling I found lion krischer’s potentially lifesaving add_blockette_1000.py script Adding Blockette 1000 to MiniSEED files without one. · GitHub but I can’t quite get it to go

$ ./add_blockette_1000.py --reclen 4096 --encoding INT16 stc3070320230000.BHZ test.mseed
Traceback (most recent call last):
File “./add_blockette_1000.py”, line 109, in
raise ValueError("Requires at least 8 bytes between fixed header "
ValueError: Requires at least 8 bytes between fixed header and beginning of data

I obviously know very little about working in binary and probably less about hacking miniseed 2.4’s (if it even is 2.4? 2007?) data structure so this potentially the end of the line for me unless I’m doing something wrong or it’s possible to cram in or overwrite an extra 8 bytes where they need to be. I would also be perfectly happy just being able to access the waveform data with no headers at all.

As always huge thanks in advance

Ringler_Adam · November 18, 2020, 11:48pm

I spent just a couple of minutes playing with your data and I am didn’t see a quick solution. You could try and crack open the records, but it looks like it doesn’t tell you what byte the data starts on in the record. When I force it, I get that each record has 0 samples, but the headers have samples.

Time 2007,079,23:55:06.4582 Samples 2016 Factor 25 Mult 1 (25sps)
Blockettes 0 Correction 0 Data Start 0 First Block 0 Host Swap LE

Record 000043 (42) Type D Network 7S Station  Channel BHZ
Time 2007,079,23:56:27.0982 Samples 2016 Factor 25 Mult 1 (25sps)
Blockettes 0 Correction 0 Data Start 0 First Block 0 Host Swap LE

Record 000044 (43) Type D Network 7S Station  Channel BHZ
Time 2007,079,23:57:47.7382 Samples 2016 Factor 25 Mult 1 (25sps)
Blockettes 0 Correction 0 Data Start 0 First Block 0 Host Swap LE

Record 000045 (44) Type D Network 7S Station  Channel BHZ
Time 2007,079,23:59:08.3782 Samples 1296 Factor 25 Mult 1 (25sps)
Blockettes 0 Correction 0 Data Start 0 First Block 0 Host Swap LE

You might be better off trying to find the original data. You could force the data to have an offset (the error likely being raised in Krischer’s code) but I would be a bit worried that you would be decoding the data incorrectly.

Good luck with the data,
Adam

filefolder · November 19, 2020, 4:31am

thanks adam and tobias! comforting to know its likely well and cooked if i’m going to have to reprocess the whole network again.

filefolder · November 26, 2020, 6:58am

FYI managed to finally fix this by manually forcing the “beginning_of_data” var to 64 after being read in Krischer’s add_blockette_1000.py script. Due to corruption this bit was always 0 / always being read in as 0.

Not sure if there’s a sane way to catch that instance for general use but it probably should never be zero / 64 appears to be a good guess if you ever find yourself in this weird mess.