Ingestion: TypeError on dashboard cleanup phase

Description

(ingestion)dpla@ingestion-prod:/v1/ingestion/ingestion$ python scripts/dashboard_cleanup.py 2de00b5e16b17120007a6a8cf36a4c2f
[2014-11-21T20:24:03.886177Z] Caught <type 'exceptions.TypeError'> querying dashboard documents: argument of type 'NoneType' is not iterable
Total dashboard documents deleted: 0

For example, in this ingestion (for IA), it's failing when it's trying to query the dashboard documents for ingestionSequence 3. Ultimately, the faliure is happening (I think) within @Couch._query_all_prov_docs_by_ingest_seq()@ around line 198.

Activity

Show:
Mark Breedlove
April 29, 2015, 12:26 AM

This is on the verge of becoming obsolete and closeable. I'm leaving it open because we've been using the legacy ingestion system for a few ingests.

Mark Breedlove
July 10, 2015, 1:29 AM

The exception is thrown in https://github.com/djc/couchdb-python/blob/0.9/couchdb/http.py#L547

The value of `headers` in this case is `None`, when the call is made for `headers.get('content-type')`, although the status of the request is 200.

The problem appears related to the fact that our AWS loadbalancer inserts a "Connection: keep-alive" header that is not expected in a CouchDB response. The couchdb-python module, I have to assume, is not designed to be used with BigCouch behind a loadbalancer, furthermore, one that inserts this HTTP header. As a further note, we have couchdb-python pinned to version 0.9, and versions 0.10 and 1.0 are available.

couchdb-python uses its own HTTP module, instead of one of the ones in the standard library. I have not dug into its code yet to see exactly how it handles keepalive requests, or if it does, or if this keepalive header is directly the cause of the `None` value of `headers`, but it's something that's obviously different.

If I switch the `[CouchDB]` `Url` parameter in akara.ini to point directly to one of the BigCouch nodes, the problem goes away.

Response headers sent directly from the BigCouch node:
<pre>
HTTP/1.1 200 OK
X-Couch-Request-ID: 2e1c06d1
Transfer-Encoding: chunked
Server: CouchDB/1.1.1 (Erlang OTP/R14B01)
Etag: 61d348729ed196db95deafb2d906dfdc
Date: Wed, 01 Jul 2015 00:56:29 GMT
Content-Type: text/plain;charset=utf-8
Cache-Control: must-revalidate
</pre>

Response headers sent from the AWS loadbalancer:

<pre>
HTTP/1.1 200 OK
Cache-Control: must-revalidate
Content-Type: text/plain;charset=utf-8
Date: Wed, 01 Jul 2015 00:53:57 GMT
Etag: 61d348729ed196db95deafb2d906d7fe
Server: CouchDB/1.1.1 (Erlang OTP/R14B01)
X-Couch-Request-ID: 864d13f8
transfer-encoding: chunked
Connection: keep-alive
</pre>

Differences:
1. Loadbalancer adds "Connection: keep-alive"
2. Loadbalancer spells "transfer-encoding" with all lowercase.

I think that we can safely close this ticket. We can keep pointing the ingestion app directly at one BigCouch node, and the performance of either members of the cluster, or of the ingestion application, isn't any different than it was going through the loadbalancer. Based on our current workload, and our plans to discontinue use of these systems, we should not spend further time on this.

Done

Assignee

Mark Breedlove

Reporter

Mark Matienzo

Labels

None

Priority

Low