Reading large mseed file at once

NAMAN_HARISH_NASRA · February 10, 2025, 12:26pm

I am trying to read a 25gb mseed file using obspy.read(). But it shows

“ObsPy can currently not directly read mini-SEED files that are larger than 2^31 bytes (2048 MiB). To still read it, please read the file in chunks as documented here: unable to read large mseed file by krischer · Pull Request #1419 · obspy/obspy · GitHub” .

I tried reading in chunks but the order and length of traces is not same as it was originally. Is there any way I can load the mseed file as a whole at once with the order and length of traces preserved. The memory of my system is 64gb so I think there won’t be any issue with memory usage.

megies · February 11, 2025, 5:10pm

Ordering of the blocks in the mseed file just depends on how the acquisition software received and wrote it. It doesn’t have any meaning. To be honest, I would just clean up the data, split it into multiple files, one per channel and per day. Makes more sense to me than working insanely hard to work around all kinds of issues with a gigantic single file.

The following is absolutely not the fastest or most efficient way to do it, but takes 2 mins to implement.

import io
from obspy import read

record_size = 512  # or 1024, 2048 or whatever it is

with open("my_insanely_big_mseed_file", "rb") as fh:
    while True:
        data = fh.read(record_size)
        if not data:
            break
        bio = io.BytesIO(data)
        tr = read(bio, format='MSEED', headonly=True)[0]
        # generate SDS filename, one file per channel per day
        t = tr.stats.starttime
        out_filename = f'{tr.id}.D.{t.year}.{t.julday}'
        with open(out_filename, "ab") as out:  # "a" for append to file if it exists
            out.write(data)

EDIT: was missing the while loop..