Mseed write size

Hi all!

I am downloading waveform data with RoutingClient. I want to save the individual files in mseed format. I simply use tr.write(filename, fomrat=‘MSEED’).

in this way, a single channel, one day data after downloading is around 100mb (200hz). This is not normal size, should it not be around 15mb per day?

what should I change, if there is something to change?

thanks!

Hi Blaz,

Can you show print(tr) ? You can try tr._cleanup() to remove classic issues (e.g. overlaps)… That’s just a guess.

Happy new year,

Fred

code:

st = rsClient.get_waveforms(network=net, station=sta, channel=“HH*”, starttime=startday, endtime=startday+86400)
st.merge(fill_value=0)
st.detrend(‘simple’)
st.detrend(‘demean’)

output:

SL.BOJS…HHE | 2020-12-30T23:59:58.139539Z - 2021-01-01T00:00:01.274539Z | 200.0 Hz, 17280628 samples

/Users/bvicic/anaconda3/envs/qtt/lib/python3.7/site-packages/obspy/io/mseed/core.py:790: UserWarning: The encoding specified in trace.stats.mseed.encoding does not match the dtype of the data.

A suitable encoding will be chosen.

warnings.warn(msg, UserWarning)

SL.BOJS…HHN | 2020-12-30T23:59:58.479538Z - 2021-01-01T00:00:02.074538Z | 200.0 Hz, 17280720 samples

SL.BOJS…HHZ | 2020-12-30T23:59:58.269539Z - 2021-01-01T00:00:01.024539Z | 200.0 Hz, 17280552 samples

._cleanup() does not make any difference. and btw, it only works with stream, not trace :slight_smile:

Is it possible that st.merge(fill_value=0) mixes some integers with something else and breaks compression efficiency ?

well, i tried with ‘interpolate’ and ‘last’ and its all the same.

Is the record very noisy? I think that would justify why the file would be heavier than normal. Still you suggest that you write individual traces separately, I guess that you checked if HHE is heavier than the others. I would then try to slice the in several pieces and write them in separate files to understand if there any problematic time period …

ah, no no. All of the individual trace files are around 150mb. also other stations. the same. so. 1day data on 3 channels is 450mb

2 comments:

  • you can save the MiniSEED data directly to disk, which avoids reading/writing with obspy (which loses some record-level metadata), see https://docs.obspy.org/master/packages/autogen/obspy.clients.fdsn.client.Client.get_waveforms.html?highlight=filename
  • as soon as you do any (pre-)processing you are not looking at integer values anymore but floating point numbers. MiniSEED has good compression for the original raw, integer data but floating point data is basically written without compression (this is also why this warning message shows up, since data was read from an integer encoding and has to be saved in a different encoding – for floats). This is causing the higher file size. I would recommend to just save and keep the raw data (see above) and do those pre-processing steps you do ad-hoc in your processing workflows.
1 Like

Sadly this does not work, since RoutingClient does not support filename.

I changed the code to use classic Client, and used orfeus data provider (before I used eida-routing). The problem is now solved! indeed the size of mseed files is now around 17mb!

Thank you for your help. Maybe this could be added in the documentation. I didnt know this works like that.

You could send a PR to allow direct saving to disc with filename on RoutingClient if you think it’s useful for other people.