Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Place was added in 2014 to maintain state for the properties of the place, like city, state, and coordinates.

Analysis of Ingestion 2 (audumbla) coarse_geocode Enrichment

The Geokit Ruby gem is used by adumbla's coarse_geocode, but only for determining if a looked-up Place matches the existing Place. It checks if the old place is within new place's bounds, and checks if the old Place's center is within a distance of new place's center.

The twofishes Ruby gem is also used by audumbla's coarse_geocode. It manages request timeouts and retries, but that's most of what we get out of that gem. It seems that what it's used for could be done with an HTTP standard library module.

As a further note when it comes time to pick which of audumbla's behaviors to emulate, it uses a timeout parameter that appears to be a response timeout, whereas I think it would make more sense to think about connection timeouts and allow requests as much time as they need to complete. Since we're running Twofishes on the internal network it seems we should only retry once after a short connect timeout and not worry about the response-completion timeout.

MAPv4's edm:Place is quite different than MVPv3.1's dpla:Place, so the audumbla enrichment is concerned with adding skos:exactMatch and skos:closeMatch URIs, which we don't have to worry about with Ingestion 1. When we do get to Ingestion 3, it seems they should be easy enough to add because Twofishes provides one or more URIs for any given feature, for Wikipedia or other sources. audumbla populates GeoNames URIs, in the form of: http://sws.geonames.org/<id>/. It should also be easy enough to simply add a parent feature instead of filling in the city, county, state, etc.

In the audumbla enrichment, only the Place's identity (URI or node ID) and providedLabel are kept. All other properties are replaced. Ingestion1 by comparison preserves any properties that were provided. For example, audumbla will overwrite the city, but Ingestion1 will keep it. It gets replaced with the "display name," for example, "Somerville, MA, United States".

Though the DPLA Geographic and Temporal Guidelines document says providers can give us providedLabels like "United States, Pennsylvania, Erie, 42.1167, -80.07315", such a string does not return any results, given directly to Twofishes. It doesn't appear that there's anything in the Ruby Twofishes gem that parses out the coordinates. You have to use the ll (el el) parameter to get Twofishes to do a reverse lookup, whereas searches for names use the query parameter. It's possible that the recommendations in that document are aspirational, pending future work on the geocoding enrichment.

The Ingestion1 enrichment does something I don't think the audumbla enrichment does. It detects when there are multiple dpla:Place values that
are part of the same hierarchy, and combines them. See test_geocode.test_collapse_hierarchy(). We don't want to overlook this when redoing Ingestion1's geocode, or when writing the new one for Ingestion3.

Twofishes Suitability For Our Purpose

...