Missing Missouri records

Description

more than 1900 records from the Federal Reserve are being harvested from Missouri, but for some reason not making it through the mapping/enrichment/indexing step. The setSpec for the federal reserve collection is "frbstl_fraser_v2." We are assuming the problem only affects this set, but since we're not sure what the problem is, it's probably good to verify that.

It happened in the last ingest as well (in April). At that time, Mark M. and I briefly discussed it and thought that we would just wait until we did an ingest in the new system to see if the problem persists. However, since we have ingested Missouri again this month and the problem is still there, it's become a bit more urgent (i.e. the Hub is concerned that it is still an issue

Activity

Show:
Mark Breedlove
June 22, 2015, 2:21 PM

Hi, Gretchen: Could you provide one or two examples of records that are missing?

Gretchen Gueguen
June 22, 2015, 4:32 PM

The only one I know of is from an email David Henry sent to me. I'm attaching them here. One is a record that was included, the other is a record that was not (for comparison's sake). It should be obvious from the file names which is which.

Mark Matienzo
June 22, 2015, 7:00 PM

Looking at the "ingestion document (firewalled link)":http://repo-prod1:5984/dashboard/4d67b061fb610a9ab790233f936eacd3 it appears that 1,940 records are missing the @sourceResource@ after the mapping/enrichment process completes. Looking at the logs, it appears that the problems may be related to the creator mapping:

<pre>
Jun 16 07:40:22 akara[32022]: [ERROR] Uncaught exception from 'dpla_mapper' ('http://purl.org/la/dp/dpla_mapper')
Traceback (most recent call last):
File "/v1/ingestion/lib/python2.7/site-packages/akara/multiprocess_http.py", line 304, in _wsgi_application
result = service.handler(environ, start_response_)
File "/v1/ingestion/lib/python2.7/site-packages/akara/services.py", line 417, in wrapper
result = func(*args, **kwargs)
File "/v1/ingestion/lib/python2.7/site-packages/dplaingestion/akamod/dpla_mapper.py", line 20, in dpla_mapper
mapper.map()
File "/v1/ingestion/lib/python2.7/site-packages/dplaingestion/mappers/mapper.py", line 99, in map
self.map_source_resource()
File "/v1/ingestion/lib/python2.7/site-packages/dplaingestion/mappers/mapper.py", line 124, in map_source_resource
self.map_creator()
File "/v1/ingestion/lib/python2.7/site-packages/dplaingestion/mappers/missouri_mapper.py", line 66, in map_creator
creators = [n for n in creator_names(name)]
File "/v1/ingestion/lib/python2.7/site-packages/dplaingestion/mappers/missouri_mapper.py", line 57, in creator_names
if n['role']['roleTerm'] == 'creator' \
TypeError: string indices must be integers
</pre>

Mark Matienzo
June 22, 2015, 7:56 PM

In the FRBSTL records, there are values for @mods:name@ that look like the following (in the JSON output from the fetcher):

<pre>
[
{
"xmlns:default": "http://www.loc.gov/mods/v3",
"role": "creator",
"recordInfo": {
"recordIdentifier": "522"
},
"namePart": "Federal Reserve Bank of San Francisco"
},
{
"namePart": "Federal Reserve Bank of San Francisco",
"role": {
"roleTerm": "creator"
}
}
]
</pre>

Since @role@ in the first element is not a @dict@, it's failing.

Mark Matienzo
June 22, 2015, 8:37 PM

Per notes in Slack, Gretchen will follow up with David at MHM about this.

Assignee

Mark Breedlove

Reporter

Gretchen Gueguen

Labels

None

Priority

Medium
Configure