obspy.fdsn Client.get_waveforms() producing AttributeError: 'NoneType' object has no attribute 'seek'

Hi,

In a script using ObsPy, a user of the IRIS DMC's services is calling get_waveforms() that is raising an exception that I need some help understanding.

Here is the code:
...
    try:
       fname=dirname+'/'+net+'_'+sta+'.m'
       client.get_waveforms(net,sta,"*","BH*",tutc+start,tutc+end, filename=fname)
       files_written += 1
    except Exception as e:
       print "get_waveforms threw an error, skipping ",net,":",sta, "Start time ",tutc
       exc_type, exc_value, exc_traceback = sys.exc_info()
       traceback.print_exception(exc_type, exc_value, exc_traceback, limit=2)
       pass
...

This runs in a loop over many combinations of net and sta and eventually produces the error below at seemingly random times (so far never the same call, it is non-repeating):

Hi Chad,

I am actually not sure how that can happen without another error being raised along the way. I can imagine it happening when the server sends HTTP code 200 but no actual data. But that should have resulted in an exception in a different part of the code.

Could you run the code snippet without the try/except clause so we can see the full traceback?

Can you also initialize the client with

client = obspy.fdsn.Client(“IRIS”, debug=True)

This prints a lot more information about what is actually going on. If debug is True the error you have could actually happen but at least more information should be available.

Which version of ObsPy are you running by the way? The latest stable or the repository version?

Cheers!

Lion

I am the author of the script Chad was referring to. I am running a version I updated about 10 days ago with easy_install with the update flag (I don't remember the exact incantation). Two details that may or may not be relevant: (1) the parent version of python I'm using is from antelope to allow me to use their database software to automate the requests, and (2) the script is multithreaded to get efficiency. Note I doubt seriously either of these are the issue because there is no hint that anything connected to (1) is a problem and I've run the script with only one thread with similar results.

This has a long history of this problem that I told Chad was "scary" because one could easily ignore this problem and think they were getting all the data and were not. In the run this week I was retrieving all teleseismic data recorded by the Earthscope TA in 2013 that had a pick defined by the Array Network Facility. Here is what happened:
1. First pass made 220934 requests (3 times that many seismograms) but there were 11592 of the errors Chad described. (that is a 5% failure rate)
2. Because the script was driven by a database it was relatively easy to build a new request to retrieve the orphans. I did this using only a single thread and had only 10 residuals.
3. Pass 3 threw only a single error. I didn't try to get one in 220934, but given how this has worked I have no reason to think it would have failed.

The error messages are based on what you folks suggested to me last september when I first wrote this.

Gary Pavlis
Indiana University

Looks like it might be a non HTTPError getting caught in fdsn/client.py, line 1176, in download_url, and data_stream is being set to a None, but there is no error code match to raise an FDSNException or a NotImplementedError… This should maybe be a Github issue?

…and as I type i see Lion patch it, you guys are too fast!

-Mark

Hi all!

Looks like it might be a non HTTPError getting caught in fdsn/client.py, line 1176, in download_url, and data_stream is being set to a None, but there is no error code match to raise an FDSNException or a NotImplementedError... This should maybe be a Github issue?

...and as I type i see Lion patch it, you guys are too fast!

Yea I added a check for non-200 HTTP codes but just because I had the same though process as you. I don’t think this explains the problem. This download function can never return None for `data` if `debug` is False.

@Gary Pavlis: Can you please run it again but this time initialize the Client with debug set to `True` and the actual download outside of the try/except clause? You’re error messages are otherwise perfectly fine for normal use of the client but maybe the additional information in the full traceback helps finding the bug/problem. Thanks a lot!

I also think that the Python version and the fact that it is multithreaded should not have any influence on the client.

Cheers!

Lion

@krischer in the packaged version of 0.9.0, that return isn’t indented, that’s where I thought the None was coming from…

https://github.com/obspy/obspy/blob/0.9.0/obspy/fdsn/client.py#L1176-1179

Well that explains it. I now remember that we did already fix it something along these lines…

@Gary, @Chad: The errors you see are most likely timeout errors as the default timeout value is fairly low. If you update to the latest master or the head of the releases branch the error should surface with a clear message.

Cheers!

Lion

That completely make sense as I saw a loose correlation with error rate with number of threads I was running for and how long the job was running. i.e. back in September when I ran this with 16 threads I had periods where the error rate approached 50%. That is largely from memory, however, and I wasn't keeping detailed statistics then and just trying to figure out what was going on. Point if the timeout is too short the error rate will clearly scale with load on the IRIS servers, which will follow typical network traffic patterns (occasional traffic jams just like car traffic).

What is the default now and is it possible to change it to an even larger value? With jobs that run for days one might want the timeout very long. In any case, it should be a user definable value. I certainly didn't see this as a parameter for the Client constructor. It probably should be.

Before I release this script to the world it is clear it will always need to write a failed database that is a table of all failures. Web services clearly are not a 100% reliable transfer mechanism or even a 90% reliable transfer mechanism so bookkeeping is required to allow cross checks. Not surprising in any environment.

Thanks for the very prompt action on this.
gary pavlis

Hi Lion & Gary,

I cloned the requests branch and tried out the same request pattern. This pretty quickly showed some timeout problems. Looking at the code I see the now documented 'timeout' argument and changed it like so:

client=Client("IRIS",user_agent=agentname,timeout=120)

BTW, the 10 second default for waveform data is quite low, as you have acknowledged, perhaps a larger value could be the default for waveform requests?

Fixing the timeout issue reduced the errors and allowed me to find some other errors of the service returning 400 "Bad Request" (which lead to finding a bug for the DMC to fix). The service actually reports why a 400 was returned, I needed to see it's error response and so added a bit of code to obspy/fdsn/client.py to print what was returned by the server:

    except urllib2.HTTPError as e:
        if debug is True:
            print("HTTP error %i while downloading '%s': %s" %
                  (e.code, url, e.read()))
            data = url_obj.read()
            print "Service error: "
            print data
            return e.code, None
        raise

I'm not sure if that's very Pythonic or unsafe or whatever, but I got to see what the service was reporting as wrong with the request; you might consider adding something like that to output the error message returned by the service.

thanks for the help.
Chad