Decimation and writing mseed files - integer and float numbers

maryani · July 9, 2024, 12:59pm

Hi,
I have an issue in decimating mseed while converting them from .wav file.
In short, after running the decimation function “stream.decimate” with the “no_filter=False”, the numbers from integers are transformed in floats and I cannot then write my mseed in STEIM2 encoding.

I read here: STEIM2 stream encoding · Issue #296 · obspy/obspy · GitHub
That changes in the original files -running a filter for instance- “may change the data type”.

Well, in my case, applying the filter to avoid aliasing is important and thus to have “no_filter=False”. If I write “True” the data type remains integer but it is not what I want.

I thought to change the datatype again at the stream level after the decimation with this: stream[0].data = stream[0].data.astype(np.int32)

QUESTIONS:

is the above line correct?
is there another way to have int numbers after data decimation? I also tried “trace.decimate” and I got the same issue.

Any help will be VERY MUCH appreciated!

Best, Marianna.

##########################################
A simplified version of my script:

#Read .wav
samplerate,data=wavfile.read(filename)
#Data conversion from floats to integers
data=np.asarray(data* 2**23, dtype=np.int32)
#Stream
stream=Stream(Trace(data=data,header=stats))
#Decimation
stream.decimate(4,strict_length=False,no_filter=False)
#Data conversion again to integers
stream[0].data = stream[0].data.astype(np.int32)
#Write mseed
stream.write(outdir+filename[-34:-4]+“.mseed”,format=‘MSEED’)

Ringler_Adam · July 9, 2024, 1:20pm

It is always possible to save the data as miniSEED but use a different encoding. For example, you could use a 64-bit float encoding. This will produce a larger file but will avoid issues with forcing data to be integers.

maryani · July 9, 2024, 1:37pm

Hi Ringler_Adam,
Thank you for your reply.
I need to have them on integers to be able to read these files with SEISAN software for seismic analysis. I think that converting them to another float encoding does not help me, unfortunately.
Cheers, Marianna.

megies · July 10, 2024, 8:15am

If SEISAN can’t handle miniSEED float encodings, then how about reading original data in SEISAN and decimating there? Pretty much any processing done on data will force you into float and then forcing back to integer you introduce arbitrary artifacts you certainly don’t want to have. You could jump through some hoops like multiply by some large number to make the loss of precision less impactful, but that means quite some data management burden and anyway, it will always be distorted data.

Hannah_Proffitt · July 16, 2024, 7:31am

Hello, I had a similar situation awhile back and found this to work for my data.

t = stream[0].stats.starttime
UTC = UTCDateTime(t)

stream.write(directory + ‘filename’ + UTC.strftime(‘.%Y_%m_%d’) + ‘.mseed’,
format=‘MSEED’, encoding=‘STEIM1’, renclen=4096, filesize=262144,
byteorder=‘<’)

This sets up the files to be written continuously and saved exactly where you would like to find it later with the naming convention of your preference.

maryani · July 16, 2024, 1:52pm

Thank you all for your replies.

Hannah_Proffitt, I will try your line of codes and see if it work also with my files.
Thank you for this hint, it looks promising!

Cheers, Marianna.

mth · November 14, 2024, 2:28pm

This solution will not help the OP.
Here are the allowed encodings from io/mseed/headers.py

# allowed encodings:
# id: (name, sampletype a/i/f/d, default NumPy type, write support)
ENCODINGS = {0: ("ASCII", "a", np.dtype("|S1").type, True),
             1: ("INT16", "i", np.dtype(np.int16), True),
             3: ("INT32", "i", np.dtype(np.int32), True),
             4: ("FLOAT32", "f", np.dtype(np.float32), True),
             5: ("FLOAT64", "d", np.dtype(np.float64), True),
             10: ("STEIM1", "i", np.dtype(np.int32), True),
             11: ("STEIM2", "i", np.dtype(np.int32), True),
             12: ("GEOSCOPE24", "f", np.dtype(np.float32), False),
             13: ("GEOSCOPE16_3", "f", np.dtype(np.float32), False),
             14: ("GEOSCOPE16_4", "f", np.dtype(np.float32), False),
             16: ("CDSN", "i", np.dtype(np.int32), False),
             30: ("SRO", "i", np.dtype(np.int32), False),
             32: ("DWWSSN", "i", np.dtype(np.int32), False)}

As you can see, you cannot use STEIM1 or STEIM2 encoding unless your data dtype is int32. As pointed out above, once you do any mathematical operation on your original/raw int32 data, you wind up with float64 and your options are: 1. truncate float to int (and accept weird artefacts) 2. devise some scaling/shifting algorithm to convert float to large int (e.g., move the decimal place several units to the right) (also not a great solution, and any subsequent filtering/rotation/instrument removal would need an additional conversion from float back to int) or 3. keep dtype=float64 and accept larger mseed files on disk.