I am very puzzled about reading large miniSeed files in “chunks”. The code example below works fine, but changing the chunksize has unexpected consequences.
When reading mseed files that are larger than 2^31 bytes, buffered read is required. I have a question about the use of chunksize and I guess this is a question about how mseed files are created. Here is the suggested code for reading:
import obspy
import io
reclen = 512
chunksize = 100000 * reclen # Around 50 MB
with io.open("./TMDB.TM.mseed", "rb") as fh:
while True:
with io.BytesIO() as buf:
c = fh.read(chunksize)
if not c:
break
buf.write(c)
buf.seek(0, 0)
st = obspy.read(buf)
# Do something useful!
print(st)
It works just fine. Typically the stream (buffer variable ‘st’) contains one trace, st[0], but sometimes I get a stream that contains multiple traces. So I wondered if a trace might be truncated because it was split across the end of one chunk and the beginning of the next.
I experimented with reducing the chunksize by powers of 10. What a surprise I received: the number of traces within each stream increased as the chunksize became smaller - I expected the opposite!
With chunksize of 100000 I was seeing traces that were around 5 days long (at 100 sample/sec). As the chunksize decreased, so did the length of the traces (of course), but there were more traces within each chunk! And there are gaps between the traces.
Can someone help me understand this? How exactly is a trace defined in miniseed files? How could traces be different lengths depending on how they are read?