Reading a SEG2 file format

Oliver14 · December 7, 2020, 5:47pm

Good afternoon, I’m Angel Matos from Peru, Here, this topic processing seismological data using Python and Obspy is not very widespread. I am currently working on my undergraduate thesis and a part of the workflow is to read a SEG2 file format recorded from an OYO seismograph (OYO Corp from Japan). I have two queries and I hope you can help me.

a) With Obspy, I can read files with .dat extension (from Geometrics seismograph), on the other hand, I can’t read files with .sg2 extension (from OYO seismograph). You can see from the picture below a better explanation about the extensions. This screenshot was taken from SeisImagerSW Manual

The case i that I don’t really want to use SeisImager, For my undergraduate thesis, I would like to read seismological data using Obspy, so, Is there a way I can read files with .sg2 extension.
Example of my code in Python (Python 3.8.5, Obspy 1.2.2, and working with JupyterNotebook on an environment created with Anaconda)
from obspy import read
st = read(pathname_or_url = “filename_1.dat”) #this works
st = read(pathname_or_url = “filename_2.sg2”) #this gives me a large error ending with KeyError: ‘SAMPLE_INTERVAL’

b) If that’s not possible, is there a way to convert .sg2 to .dat? Is it possible that exists a Python code for converting .sg2 to .dat? , since I need to automate the process (work with many files)

c) I would like you to be able to help me by providing a bibliography or some standard guide to how to work with these files. In a try to convert the files, I found this on internet:

I sincerely hope that you can help me because this is very important to me. Thanks in advance.

megies · December 8, 2020, 10:14am

If you attach an example file that fails to load somebody might find time to have a look.

Oliver14 · December 8, 2020, 4:27pm

Thank for the reply: I attach the next files:

This is .sg2 extension (used from OYO seismographs)
data_OYO.sg2 (204.6 KB)
This is .dat extension (used from GEOMETRICS seismographs)
data_OYO_fixed.dat (201.0 KB)

Both are SEG2 file format about the same data but with different .extension, the case is that obspy is able to read .dat extension only, and the other one gave me an error. Thank a lot.

barsch · December 8, 2020, 4:40pm

Did you try st = read("data_OYO.sg2", format="SEG2")?

Oliver14 · December 8, 2020, 5:20pm

Yes sir, I tried that code too, and gives me the same eror.

megies · December 9, 2020, 2:44pm

It looks like the free form header written by that instrument does not follow SEG2 conventions (although I don’t know much about SEG2, only know that it’s definition is ugly and flexible to an extent that makes it hard to deal with).

The binary content of the free form header of the valid file looks like this:

b'\x0f\x00CDP_NUMBER 0\x00\x0e\x00CDP_TRACE 0\x00\n\x00DELAY 0\x00\x1c\x00DIGITAL_HIGH_CUT_FILTER 0\x00\x1b\x00DIGITAL_LOW_CUT_FILTER 0\x000\x00RECEIVER_GEOMETRY 10.000000 0.000000 0.000000\x000\x00RECEIVER_LOCATION 10.000000 0.000000 0.000000\x00\x1c\x00RECEIVER_STATION_NUMBER 0\x00\x19\x00SAMPLE_INTERVAL 0.0005\x00\x19\x00SHOT_SEQUENCE_NUMBER 0\x00-\x00SOURCE_LOCATION 0.000000 0.000000 0.000000\x00\x1a\x00SOURCE_STATION_NUMBER 0\x00\x00\x00\x00\x00\x00'

Looking at our source code it is a list of items each comprised of one unsigned integer (2 bytes length) followed by “field_name value” of varying size given by the leading number.

The file that can not be read looks different in terms of free form header:

b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xf31\x10ACQUISITION_DATE 01/01/2001\x00  ACQUISITION_TIME 00:00:00\x00  CLIENT OYO_CORPORATION\x00  COMPANY OYO_CORPORATION\x00  *DELAY 0.000000\x00  GENERAL_CONSTANT 1\x00  INSTRUMENT PICKWIN95\x00  JOB_ID 1\x00  OBSERVER OYO_CORPORATION\x00  PROCESSING_DATE 01/01/2001\x00  PROCESSING_TIME 00:00:00\x00  RECEIVER_LOCATION 10.000000\x00  SAMPLE_INTERVAL 0.00050000\x00  SOURCE_LOCATION 0.000000\x00  TRACE_SORT COMMON_SOURCE\x00  TRACE_TYPE 1\x00  *UNITS METERS\x00  MANUAL_CHANNEL_NUMBER 0\x00  NOTE \x00                       '

It basically seems to have some garbage in front and also each individual header value is not preceded by its length in bytes, solely relying on a single NULL character to represent the end of one key/value string.

It looks like it might be possible to read that file from the looks of it, but you would have to probably patch the obspy source code to get it done, replacing or extending the part that reads the free form header (function parse_free_form(...) in obspy/io/seg2/seg2.py). You could set a debugger break point in there and do it interactively for a start.

github.com/obspy/obspy

obspy/io/seg2/seg2.py

e7b947457


      
                      'Data format code 3 requires that the number of samples '
                      'is divisible by 4, but sample count is %d' % (
                          number_of_samples_in_data_block, ))
          else:
              msg = 'Unrecognized data format code'
              raise SEG2InvalidFileError(msg)
          
          # The rest of the trace block is free form.
          header = {}
          header['seg2'] = AttribDict()
          self.parse_free_form(self.file_pointer.read(size_of_this_block - 32),
                               header['seg2'])
          header['delta'] = float(header['seg2']['SAMPLE_INTERVAL'])
          # Set to the file's start time.
          header['starttime'] = deepcopy(self.starttime)
          if 'DELAY' in header['seg2']:
              if float(header['seg2']['DELAY']) != 0:
                  msg = "Non-zero value found in Trace's 'DELAY' field. " + \
                        "This is not supported/tested yet and might lead " + \
                        "to a wrong starttime of the Trace. Please contact " + \
                        "the ObsPy developers with a sample file."

github.com/obspy/obspy

obspy/io/seg2/seg2.py

e7b947457


      
                  result[3::4] = ((data[4::5] + one_to_two[4::5]) *
                                  2**((exponents & 0xf000) >> 12))
                  data = result
          
              # Integrate SEG2 file header into each trace header
              tmp = self.stream.stats.seg2.copy()
              tmp.update(header['seg2'])
              header['seg2'] = tmp
              return Trace(data=data, header=header)
          
          def parse_free_form(self, free_form_str, attrib_dict):
              """
              Parse the free form section stored in free_form_str and save it in
              attrib_dict.
              """
              def cleanup_and_decode_string(value):
                  # Some software/hardware produces invalid characters.
                  def is_good_char(c):
                      return c in (b'0123456789'
                                   b'abcdefghijklmnopqrstuvwxyz'
                                   b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

megies · December 9, 2020, 2:53pm

So looks like to read the file you only need the SAMPLE_INTERVAL from free form header, so if you manually replace and hard code this line..

github.com/obspy/obspy

obspy/io/seg2/seg2.py

e7b947457


      
                          number_of_samples_in_data_block, ))
          else:
              msg = 'Unrecognized data format code'
              raise SEG2InvalidFileError(msg)
          
          # The rest of the trace block is free form.
          header = {}
          header['seg2'] = AttribDict()
          self.parse_free_form(self.file_pointer.read(size_of_this_block - 32),
                               header['seg2'])
          header['delta'] = float(header['seg2']['SAMPLE_INTERVAL'])
          # Set to the file's start time.
          header['starttime'] = deepcopy(self.starttime)
          if 'DELAY' in header['seg2']:
              if float(header['seg2']['DELAY']) != 0:
                  msg = "Non-zero value found in Trace's 'DELAY' field. " + \
                        "This is not supported/tested yet and might lead " + \
                        "to a wrong starttime of the Trace. Please contact " + \
                        "the ObsPy developers with a sample file."
                  warnings.warn(msg)

        header['delta'] = float(header['seg2']['SAMPLE_INTERVAL'])

..with the actual value..

        header['delta'] = 0.0005

.. the file can be read. Obviously this is..

dangerous, as you would get wrong data if you read a file that has a different sampling rate, and..
you will lose all the other free form header info (although it seems like in the “broken” file all the header variables are either 0 or 1 so they might just be bogus values anyway maybe)

Oliver14 · December 10, 2020, 10:49pm

Dear Mr. Megies, Than you so much for you reply. I understood the main idea of what you said, but since I’m too inexperienced (I’m not a developer) working with the source code of modules and libraries and also understanding binary code, I’m afraid I could damage my data or even my system.

Just to mention, I found a way to convert .sg2 files in .dat files, by using a software (I’ not mentioning the name of that software, cause maybe it can be illegal or considered promotion for the policy of this community; of course, I’m sure there exist a lot of softwares that convert files). Then, with that conversion, I can succesfully use Obspy and proceed with my workflow.

As I mentioned before, for me, the ideal would be that Obspy could read directly .sg2 files or to build a little Python code that converts .sg2 files into .dat files, in order to automate the procces, instead of using a software and converting the files one by one.

Next year, I expect my university reopen in a normally way so I can work with developer parties. And, I will be happy to share my findings.

Thank you again, Mr. Megies. Regards from Perú.

megies · December 11, 2020, 8:20am

Glad you found a solution, and we have no problem at all if you mention what software you converted your files with.

Like I said , it would totally be possible to modify our source code to read those files, but first we would need somebody more familiar and/or working with SEG2 give a statement how much of a format breach we are looking at and how to judge the situation.

megies · December 11, 2020, 8:25am

See SEG2: reading OYO .dat files · Issue #2767 · obspy/obspy · GitHub

ThomasLecocq · December 11, 2020, 8:35am

I tested with own segD reader and doesn’t work from scratch either. will see if that hard-coded hack could be safely ported to the seg2 reader

megies · December 11, 2020, 8:49am

Well that hard coded hack is really only if you are sure about the sampling rate a-priori, obviously. It wouldnt be hard to read, just would like some input from people familliar with the SEG2 file format definition how to judge this different header structure.

Nomin_Erdene_Erdene · September 2, 2021, 7:50am

Hi! Oliver14
Which program did you use to convert you sg2 data? I’m having same problem too. Can you help me?

Oliver14 · September 2, 2021, 1:27pm

Hi Nomin_Erdene_Erdene,

There must be a variety of programs, I used this one:

The downside is that with this program, from my knowledge, you have to convert one by one, no automatization. But maybe for your task is valuable, greetings.

Nomin_Erdene_Erdene · September 3, 2021, 3:49am

Thank you for your prompt reply😊
It really helped me. Yeah, frontend doesn’t seem to have an automatic function. I have 1000 shots, I guess I need to start working on it.