arclink-0.4.7 data retrieval slow vs. arclink-0.4.6 ?

Hi,

we've been using obspy-arclink frequently to retrieve data through webdc and recently stunned when the same script processed much faster on an older machine than on our regular workstations (all Debian).

I tracked the issue down to the arclink getWaveform call in our scripts and it turned out that the old/fast machine with python2.5 and arclink-0.4.1 needs around 4-6s per waveform retrieval and the new/slow machines with python2.6 and arclink-0.4.8 around 18-25s.

Further tests with versions arclink-0.4.[1,5,6,7,8] showed that versions incl. 0.4.6 are fast (with python2.5), while 0.4.7 and 0.4.8 are significantly slower (both python2.5 and 2.6) (see examples below).

Is that a known issue? Can you reproduce that?

Cheers,
Christian

P.S.: since this is my first post to the list: Thanks to the ObsPy developers, it's a great package.

Code snippet:
import obspy.arclink; import time
from obspy.core import UTCDateTime
client = obspy.arclink.Client()
print ("Client version: %s" % obspy.arclink.__version__)
t0 = time.time()
st = client.getWaveform('GR', 'BFO', '', 'LH*', UTCDateTime(2004,12,26,0,58,53), UTCDateTime(2004,12,26,2,58,13), getPAZ=True, getCoordinates=True)
t1 = time.time()
print (" -> waveform request took %.2f seconds" % (t1 - t0))

Repeated execution (python2.5) with different arclink versions:
Client version: 0.4.1
     -> waveform request took 4.30 seconds
Client version: 0.4.1
     -> waveform request took 3.96 seconds
Client version: 0.4.1
     -> waveform request took 4.00 seconds

Hi Christian,

I guess its related to routing - newer versions of obspy.arclink support
routing which is enabled by default - this will ask before each request
the central node webdc.eu for the responsible server.

You can disable routing if you know the server where the data is
situated and request the data directly, e.g. for your example down there:

client = obspy.arclink.Client(host="grsn01.szgrf.bgr.de", port=18001)

...

st = client.getWaveform('GR', ... , route=False)

Also enabling the options "getPAZ=True" and "getCoordinate=True"
initializes additional routing/data requests - if you don't need these
disable it in order to speed up the request.

However I'm very interested in the debug output of the older machines
(Debian) - could you please sent me the output of the script underneath
but with the following modification:

client = obspy.arclink.Client(debug=True)

Thanks in advance,
Robert

However if you know the node beforehand

Hi,

we've been using obspy-arclink frequently to retrieve data through webdc
and recently stunned when the same script processed much faster on an
older machine than on our regular workstations (all Debian).

I tracked the issue down to the arclink getWaveform call in our scripts
and it turned out that the old/fast machine with python2.5 and
arclink-0.4.1 needs around 4-6s per waveform retrieval and the new/slow
machines with python2.6 and arclink-0.4.8 around 18-25s.

Further tests with versions arclink-0.4.[1,5,6,7,8] showed that versions
incl. 0.4.6 are fast (with python2.5), while 0.4.7 and 0.4.8 are
significantly slower (both python2.5 and 2.6) (see examples below).

Is that a known issue? Can you reproduce that?

Cheers,
Christian

P.S.: since this is my first post to the list: Thanks to the ObsPy
developers, it's a great package.

Code snippet:
import obspy.arclink; import time
from obspy.core import UTCDateTime
client = obspy.arclink.Client()
print ("Client version: %s" % obspy.arclink.__version__)
t0 = time.time()
st = client.getWaveform('GR', 'BFO', '', 'LH*',
UTCDateTime(2004,12,26,0,58,53), UTCDateTime(2004,12,26,2,58,13),
getPAZ=True, getCoordinates=True)
t1 = time.time()
print (" -> waveform request took %.2f seconds" % (t1 - t0))

Repeated execution (python2.5) with different arclink versions:
Client version: 0.4.1
    -> waveform request took 4.30 seconds
Client version: 0.4.1
    -> waveform request took 3.96 seconds
Client version: 0.4.1
    -> waveform request took 4.00 seconds
---
Client version: 0.4.5
    -> waveform request took 5.41 seconds
Client version: 0.4.5
    -> waveform request took 3.86 seconds
Client version: 0.4.5
    -> waveform request took 3.82 seconds
---
Client version: 0.4.6
    -> waveform request took 3.58 seconds
Client version: 0.4.6
    -> waveform request took 4.79 seconds
Client version: 0.4.6
    -> waveform request took 5.95 seconds
---
Client version: 0.4.7
    -> waveform request took 20.13 seconds
Client version: 0.4.7
    -> waveform request took 19.28 seconds
Client version: 0.4.7
    -> waveform request took 19.31 seconds
---
Client version: 0.4.8
    -> waveform request took 22.56 seconds
Client version: 0.4.8
    -> waveform request took 19.58 seconds
Client version: 0.4.8
    -> waveform request took 19.05 seconds

small variations are overhead/localnetwork related, but the difference
between ca.4s and ca.20s persists.

- --

Dr. Robert Barsch

Ludwig-Maximilians-Universität München
Department of Earth and Environmental Sciences, Geophysics
Theresienstr. 41/IV
D-80333 Munich
Germany

Tel: +49 (0) 89 2180 4201
Fax: +49 (0) 89 2180 9942010
Mail: barsch@lmu.de

Thanks for the quick response,

I'm aware that requesting PAZ and coordinates slows down the request but the 4s could be acceptable for us.

I attach logs of one run of the sample script for each version of arclink with debug=True, as requested.
Even in the 0.4.6 version, there are 8 arclink requests (routing webdc, data download grsn01.szgrf.bgr.de, and for each channel (routing grsn01, metadata download)).
The repeated routing requests seem a bit redundant to me but I'm not too familiar with ArcLink, so maybe I'm missing something here?

Anyway, from the logs I cannot see why the arclink-0.4.7/8 requests are slower?

Puzzled,
Christian

arclink0.4.1-8.log.tar.gz (4.09 KB)

Hi Christian,

this issue has been (hopefully) fixed with commit [2689] - commit [2690]
removes some redundant routing requests for getting PAZ and metadata.

Please check out the latest obspy.arclink developer version and report
back if you still hit the issue.

Thanks for reporting + best regards,
Robert

Perfect, fixed!

[2689] fixes the slowdown issue and [2690] speeds the requests a bit further by reducing routing requests.

just for documentation: time per waveform requests (incl. PAZ and coordinates)
versions <= arclink-0.4.6: ~4s
versions arclink-0.4.7/8: ~20s
arclink-0.4.8.dev2690: ~3s

Thanks and Cheers
Christian