Value of npts in trace header when using masked array (data with gaps)

Hi

I’m currently putting together a convenience function to wrap up data into a Trace object so I later can use it with obspy. While numpy.ma.MaskedArray can be used for data with gaps it is not clear in the obspy docs what value to assign to the Trace.stats.npts attribute when a masked array is used so hopefully someone can help with with this. In principle the question boils down to:

If a numpy.ma.MaskedArray is used as value on Trace.data should Trace.stats.npts be set to the size of the underlying numpy.ndarray or to the number of unmasked data in the numpy.ma.MaskedArray?

I think it should be set to the size of the underlying numpy.ndarray.

Though, you don’t need to set npts at all, it will be determined automatically by obspy.

I also were under the impression that one did not have to explicitly set npts, however the following code:

import numpy as np
from obspy.core import trace, UTCDateTime
n = 1000
data = [i for i in range(n)]
stats = trace.Stats()
stats.network = "X"
stats.station = "Y"
stats.channel = "Z"
stats.starttime = UTCDateTime(0)
stats.sampling_rate = 10
tr = trace.Trace(data=np.array(range(n)),header=stats)
print(tr)
tr.stats.npts = n
print(tr)

when run outputs:

X.Y..Z | 1970-01-01T00:00:00.000000Z - 1970-01-01T00:00:00.000000Z | 10.0 Hz, 0 samples
X.Y..Z | 1970-01-01T00:00:00.000000Z - 1970-01-01T00:01:39.900000Z | 10.0 Hz, 1000 samples

Indicates that one has to. Further if trying to plot the created trace before setting npts proper I get the following error:

Python 3.6.9 (default, Jan 26 2021, 15:33:00) 
[GCC 8.4.0] on linux
>>> import obspy
>>> from obspy.core import trace, UTCDateTime
>>> print(obspy.__version__)
1.1.1
>>> n = 1000
>>> data = [i for i in range(n)]
>>> stats = trace.Stats()
>>> stats.network = "X"
>>> stats.station = "Y"
>>> stats.channel = "Z"
>>> stats.starttime = UTCDateTime(0)
>>> stats.sampling_rate = 10
>>> tr = trace.Trace(data=np.array(range(n)),header=stats)
>>> print(tr)
X.Y..Z | 1970-01-01T00:00:00.000000Z - 1970-01-01T00:00:00.000000Z | 10.0 Hz, 0 samples
>>> tr.plot()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/obspy/core/trace.py", line 893, in plot
    waveform = WaveformPlotting(stream=self, **kwargs)
  File "/usr/lib/python3/dist-packages/obspy/imaging/waveform.py", line 218, in __init__
    self.title = kwargs.get('title', self.stream[0].id)
  File "/usr/lib/python3/dist-packages/obspy/core/stream.py", line 661, in __getitem__
    return self.traces.__getitem__(index)
IndexError: list index out of range

I see. The problem with your code is that header should be a dictionary. Then it will work. The stats object sets the npts attribute, please check with print(stats).

Hmm, played around with this a bit more as, if I understand it correctly, the Stats object attributes can be accessed using a dict similar syntax, e.g. the value of stats.npts can be accessed as stats['npts'].

So my conclusions then are:

When creating a new Trace object the argument header takes a dict as input and stores the input key:value pairs in a Stats object, is this correct?

It seems though that it is also possible to use a Stats as value for header but in this case the attribute npts will not be updated with the proper data size of the input data vector, is this correct?

The reason for this is that Stats has an attribute npts (which defaults to 0) in which case npts in the trace meta data will not get updated. Similarly if the input dict on argument header has a key npts then npts in the meta data of the created Trace object will be set to the value of this key and not to the size of the input data vector, is this correct?

All observations are correct. I think the npts entry in header (be it dict or Stats object) should probably be ignored by ObsPy if data is present. Note that you can create pure metadata traces. In this case it is important to set npts.

Sure, agree on this.

Guess getting all the nuances into the documentation would be quite a task but would it be sensible to add to the obspy tutorial a section on creating a trace object from scratch including various potential pitfalls (I consider the the topic in this post such a as the documentation of Trace states that the header argument will accept a Stats object as input).