ObsPy and PyInstaller

I'm working on an installable ObsPy-based application using PyInstaller, and have run into a number of issues. Primarily these are due to places where code/data are pulled from the source code at runtime.

1. Module data

For example, the data under obspy.imaging.data is loaded at runtime. I was able to fix these by adding a bunch of datas entries. One particularly thorny spot is obspy.core.util.libnames._load_cdll(), which looks up its own place in the filesystem then navigates across directories to find a library.

2. Plugins

Plugins are discovered by looking at pkg_resources.iter_entry_points(), which gets the entry points from the .egg-info file. This needs to be added as metadata, but there is a catch -- loading all of the metadata causes other errors, so I extracted just the entry_points.txt file.

3. os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))

This is used in a number of places to establish the path to a given module. This doesn't work in PyInstaller, but PyInstaller does support using __file__. (I see both approaches used in various places in ObsPy, it's not clear to me whether there's a meaningful difference here.) Fixing this required a (really ugly) run-time patch that overrides inspect.getfile() and tries to extract __file__ from the given execution frame.

I'm including the hooks I wrote to get this working. I'm happy to put these into a pull request, but they are not very thoroughly tested and especially in the case of #3 the it seems much cleaner to change the ObsPy code to use __file__ rather than monkey-patch the inspect library. I guess I'm mainly curious whether there's any interest in trying to bring PyInstaller support into the ObsPy codebase.

Cheers,
Adam

hook-obspy.py -- this is the build-time hook:

from PyInstaller.utils.hooks import collect_dynamic_libs, collect_data_files, exec_statement, copy_metadata,\
    collect_submodules, get_package_paths
import os.path

(_, obspy_root) = get_package_paths('obspy')

binaries = collect_dynamic_libs('obspy')
datas = [
    # Dummy path, this needs to exist for obspy.core.util.libnames._load_cdll
    (os.path.join(obspy_root, "*.txt"), os.path.join('obspy', 'core', 'util')),
    # Data
    (os.path.join(obspy_root, "imaging", "data"), os.path.join('obspy', 'imaging', 'data')),
    (os.path.join(obspy_root, "taup", "data"), os.path.join('obspy', 'taup', 'data')),
    (os.path.join(obspy_root, "geodetics", "data"), os.path.join('obspy', 'geodetics', 'data')),
]

# Plugins are defined in the metadata (.egg-info) directory, but if we grab the whole thing it causes
# other errors, so include only entry_points.txt
metadata = copy_metadata('obspy')
egg = metadata[0]
if '.egg' not in egg[0]:
    raise Exception("Unexpected metadata: %s" % (metadata,))
# Specify the source as just the entry points file
metadata = [(os.path.join(egg[0], 'entry_points.txt'), egg[1])]
datas += metadata

# Thse are the actual plugin packages
hiddenimports = collect_submodules('obspy.io')

rthook-obspy.py -- this is the run-time hook to monkey-patch inspect.getfile()

import inspect

_old_getfile = inspect.getfile
def _getfile(object):
    """
    Override inspect.getfile to try to return __file__ from the given frame
    """
    if inspect.isframe(object):
        try:
            file = object.f_globals['__file__']
            # print("inspect.getfile returning %s" % file)
            return file
        except:
            pass
    return _old_getfile(object)
inspect.getfile = _getfile
1 Like

Hi Adam,

I remember we played around with packaging up ObsPy for applications a
couple years ago but we never had a truly sustainable and clean solution
so I'm happy about your effort.

1. Module data

For example, the data under obspy.imaging.data is loaded at runtime. I was able to fix these by adding a bunch of datas entries. One particularly thorny spot is obspy.core.util.libnames._load_cdll(), which looks up its own place in the filesystem then navigates across directories to find a library.

There is no way around shipping the data that is required to run ObsPy
when packaging it. Our `setup.py` script just recursively includes
everything and then explicitly discards some file types and folders.
Maybe that could be put in a script and re-used by pyinstaller? Then it
would be simpler to keep everything consistent.

Regarding the `_load_cdll()` call: I agree its a bit ugly but how else
could it be done?

2. Plugins

Plugins are discovered by looking at pkg_resources.iter_entry_points(), which gets the entry points from the .egg-info file. This needs to be added as metadata, but there is a catch -- loading all of the metadata causes other errors, so I extracted just the entry_points.txt file.

I don't know enough about pyinstaller to comment on that but obspy
definitely needs the entry points.

3. os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe())))

This is used in a number of places to establish the path to a given module. This doesn't work in PyInstaller, but PyInstaller does support using __file__. (I see both approaches used in various places in ObsPy, it's not clear to me whether there's a meaningful difference here.) Fixing this required a (really ugly) run-time patch that overrides inspect.getfile() and tries to extract __file__ from the given execution frame.

__file__ is not always set (for example it is not set at a module's
import time or in py2exe or when using the zipmodule) - still: I also
wonder if we would be fine by just always using __file__. Please feel
free to give it a shot in a pull request.

I'm including the hooks I wrote to get this working. I'm happy to put these into a pull request, but they are not very thoroughly tested and especially in the case of #3 the it seems much cleaner to change the ObsPy code to use __file__ rather than monkey-patch the inspect library. I guess I'm mainly curious whether there's any interest in trying to bring PyInstaller support into the ObsPy codebase.

Please do - there is definitely interest and I have the feeling the a
number of applications would benefit from this. Best put it in the
'misc/installer` folder. We could also think about integrating this in
the CI in some fashion but let's discuss this on github.

Cheers!

Lion