more than 1900 records from the Federal Reserve are being harvested from Missouri, but for some reason not making it through the mapping/enrichment/indexing step. The setSpec for the federal reserve collection is "frbstl_fraser_v2." We are assuming the problem only affects this set, but since we're not sure what the problem is, it's probably good to verify that.
It happened in the last ingest as well (in April). At that time, Mark M. and I briefly discussed it and thought that we would just wait until we did an ingest in the new system to see if the problem persists. However, since we have ingested Missouri again this month and the problem is still there, it's become a bit more urgent (i.e. the Hub is concerned that it is still an issue
Hi, Gretchen: Could you provide one or two examples of records that are missing?
The only one I know of is from an email David Henry sent to me. I'm attaching them here. One is a record that was included, the other is a record that was not (for comparison's sake). It should be obvious from the file names which is which.
Looking at the "ingestion document (firewalled link)":http://repo-prod1:5984/dashboard/4d67b061fb610a9ab790233f936eacd3 it appears that 1,940 records are missing the @sourceResource@ after the mapping/enrichment process completes. Looking at the logs, it appears that the problems may be related to the creator mapping:
Jun 16 07:40:22 akara: [ERROR] Uncaught exception from 'dpla_mapper' ('http://purl.org/la/dp/dpla_mapper')
Traceback (most recent call last):
File "/v1/ingestion/lib/python2.7/site-packages/akara/multiprocess_http.py", line 304, in _wsgi_application
result = service.handler(environ, start_response_)
File "/v1/ingestion/lib/python2.7/site-packages/akara/services.py", line 417, in wrapper
result = func(*args, **kwargs)
File "/v1/ingestion/lib/python2.7/site-packages/dplaingestion/akamod/dpla_mapper.py", line 20, in dpla_mapper
File "/v1/ingestion/lib/python2.7/site-packages/dplaingestion/mappers/mapper.py", line 99, in map
File "/v1/ingestion/lib/python2.7/site-packages/dplaingestion/mappers/mapper.py", line 124, in map_source_resource
File "/v1/ingestion/lib/python2.7/site-packages/dplaingestion/mappers/missouri_mapper.py", line 66, in map_creator
creators = [n for n in creator_names(name)]
File "/v1/ingestion/lib/python2.7/site-packages/dplaingestion/mappers/missouri_mapper.py", line 57, in creator_names
if n['role']['roleTerm'] == 'creator' \
TypeError: string indices must be integers
In the FRBSTL records, there are values for @mods:name@ that look like the following (in the JSON output from the fetcher):
"namePart": "Federal Reserve Bank of San Francisco"
"namePart": "Federal Reserve Bank of San Francisco",
Since @role@ in the first element is not a @dict@, it's failing.
Per notes in Slack, Gretchen will follow up with David at MHM about this.