Large miniSeed files, read in chunks?

I am very puzzled about reading large miniSeed files in “chunks”. The code example below works fine, but changing the chunksize has unexpected consequences.

When reading mseed files that are larger than 2^31 bytes, buffered read is required. I have a question about the use of chunksize and I guess this is a question about how mseed files are created. Here is the suggested code for reading:

import obspy
  import io
  reclen = 512
  chunksize = 100000 * reclen # Around 50 MB
  with io.open("./TMDB.TM.mseed", "rb") as fh:
    while True:
        with io.BytesIO() as buf:
            c = fh.read(chunksize)
            if not c:
                break
            buf.write(c)
            buf.seek(0, 0)
            st = obspy.read(buf)
        # Do something useful!
        print(st)

It works just fine. Typically the stream (buffer variable ‘st’) contains one trace, st[0], but sometimes I get a stream that contains multiple traces. So I wondered if a trace might be truncated because it was split across the end of one chunk and the beginning of the next.

I experimented with reducing the chunksize by powers of 10. What a surprise I received: the number of traces within each stream increased as the chunksize became smaller - I expected the opposite!

With chunksize of 100000 I was seeing traces that were around 5 days long (at 100 sample/sec). As the chunksize decreased, so did the length of the traces (of course), but there were more traces within each chunk! And there are gaps between the traces.

Can someone help me understand this? How exactly is a trace defined in miniseed files? How could traces be different lengths depending on how they are read?

I thought I had answered in unable to read large mseed file by krischer · Pull Request #1419 · obspy/obspy · GitHub but apparently I didn’t…

In the miniSEED reading, neighboring records with subsample gaps will still be concatenated together on a low level, before the data even gets cast into a Trace. If you read it in chunks, and your chunk ends at such a point, it can’t get merged together on low level and in high level on Trace the gap might already be too big to get merged. This would also explain why you get more Traces when using smaller chunk size.

Thanks for answering again. Once I discovered where this forum lives, I thought it would be good to have this question addressed here rather than in a Github issue, because it’s not really a bug.

1 Like

Another thought… if you a curious about the start times of individual MiniSEED records to check what is going on, you could use the obspy-mseed-recordanalyzer tool. Of course with such ginormous files, you’ll have an insane amount of individual records in there, but maybe you want to look at a handful of records around one of these positions that change in gappiness depending on chunk size…