Use of MaskedArray vs use of multiple Trace's

cpaitor · June 2, 2021, 9:32pm

Hi

If building Stream object where some of the traces have data with gaps numpy.ma.MaskedArray can be used. However, another approach would be to split the the traces that contain data gaps into multiple traces, presumably this would be more memory efficient (and speed of scripts).

The question then is if this would have any implications for the functioning of the methods available to the Stream object?

If the answer is No, what other benefits would warrant the use of numpy.ma.MaskedArray for traces assembled in a Stream object (excluding the obvious ease of book-keeping of the traces)?

cpaitor · June 3, 2021, 7:29am

A bit of testing indicates that the handling of masked arrays is not fully implemented (at least not in obspy 1.1.1). e.g. the write() method can not handle masked arrays and the remove_response() method seem to ignore the mask and returns the data unmasked.

trichter · June 3, 2021, 7:43am

Yes, masked arrays are not handled in a consistent way. Note that you can easily convert between the two states with Stream methods split() and merge().

cpaitor · June 3, 2021, 8:08am

Ahh, great, had missed the split() method. However, is it documented somewhere what works and what does not work when working with masked arrays or is it left to trial-an-error (or plunging into the code base)?

trichter · June 3, 2021, 8:54am

I think it is not documented and is left to trial-and-error. First we have to think about when it could make sense to perform some operations on a masked array. E.g. if we think about a filter, we could fill the data array with the fill value, perform the operation on the underlying array and after the operation mask the array again. But this is quiet complicated, and it is probably better to raise an error for masked arrays (as is done with write) in such equivocal situations.

cpaitor · June 3, 2021, 9:13am

Sure, the topic is quite complex.

However, it seems to me that gap handling is an integral part of data handling (could of course be a problem exclusive to us due to occasionally poor conditions in our data collection setup but I sort of doubt this). If left to trial-and-error in the software you use the task of handling data gaps will be substantially increased.

Perhaps it would be a good idea to add to the obspy tutorial a section on gap handling (with focus on what obspy could offer or not).

Second time around today that I have suggested to add material to the obspy tutorial I’d be glad to contribute (given some hint on howto) but question is if my limited understanding of the suggested topics would warrant more work correcting it than the work required by someone with an understanding to write it from scratch.

trichter · June 4, 2021, 9:26am

You are right, the documentation (API docs or tutorial section) and possibly the code should be improved regarding gap handling.