Enrichments (draft)
Standard Enrichments
Name | Type | Description | Typical fields used | ingestion3 | Ruby module (Krikri) |
---|---|---|---|---|---|
Create prefLabel from providedLabel | RDF | Copy the value from a provided label to the preferred label of a property when no preferred label exists. | All | create_pref_label_from_provided.rb | |
DCMIType Map | RDF | Match string values within the type properties to DCMIType terms and add matching term URI. Remove non-DCMIType values and move to format or genre. | Type, Genre, Format | dcmi_enforcer.rb dcmi_type_map.rb move_non_dcmi_type.rb | |
De-Duplicate | Normalization | Look for identically matching values within instances of a property and remove the duplicate property. | All | dedup_nodes.rb | |
Genre matching | Normalization | Match string values to a controlled list of values for genre. | Genre, Format, Type | genre_filter.rb move_non_dcmi_type.rb | |
Language Normalization | Normalization | Match a string to corresponding ISO 639-3 code and add that value and the URI for the language to record. Alternately, match a three-character string to a corresponding language in the ISO 639-3 vocabulary. Add the name string and the URI for the language to the record. | Language | language_to_lexvo.rb | |
Parse Date | Normalization | Parse information in date fields and normalize to EDTF format. When a single date is present, split it into begin and end dates for the Temporal Class. When only a begin and end date is present, create a date range label. | Date, Timespan | parse_date.rb timespan_split.rb timespan_label.rb | |
Remove Empty Fields | Normalization | Looks for and removes any existing property with no value. | All | remove_empty_fields.rb | |
Geocoding | Normalization | Search GeoNames vocabulary for matches to string values in spatial fields. Build out the rest of the spatial class properties based on the GeoNames data. | Spatial | Yes | |
Split at Delimiter | String | Split values at a particular delimiter (usually semicolon) and put each resulting value in its own property instance | All | Yes | split_at_delimiter.rb split_provided_label_at_delimiter.rb |
Strip Ending Punctuation | String | Remove ending punctuation from property values including quotation marks, colons, semicolons, and dashes, but excluding brackets and parentheses. Periods are typically removed except when the final word is a two character string (such as “Jr.”) or two characters separated by periods (“P.A.”). | Title, Creator, Contributor, Publisher, Subject, Format, Genre | Yes | strip_ending_punctuation.rb |
Strip HTML | String | Looks for and removes any HTML code from a property value. | All | Yes | strip_html.rb |
Strip Leading Colons | String | Remove initial character from a value if it is a colon. | Title | Yes | strip_leading_colons.rb |
Strip Leading Punctuation | String | Remove initial punctuation from property values including quotation marks, colons, semicolons, and dashes, but excluding brackets and parentheses. | Creator, Contributor, Publisher, Subject, Format, Genre | Yes | strip_leading_punctuation.rb |
Strip Whitespace | String | Look for multiple whitepaces within a value and reduce to a single whitespace. | All | Yes | strip_whitespace.rb |
Web Resource URIs | Normalization | Remove Web Resources that are not http URIs. | isShownAt, Preview, Object | web_resource_uri.rb |
Specialized Enrichments
Name | Type | Description | Typical fields used | ingestion3 | Ruby module | Hubs |
---|---|---|---|---|---|---|
Convert to Sentence Case | String | Convert a string in all caps to sentence case. | Title | Yes | convert_to_sentence_case.rb | |
Dates in Coverage fields | Parse coverage properties in original records for values that are dates and move to timespan property. | Spatial | ||||
Genre from MARC leader | Match specific codes in the MARC leader for a record to a controlled list of values for genre | Genre | HathiTrust, University of Florida, GPO | |||
Limit Characters | String | Limit the value of a property to a specified number of chracters | Description | Yes | limit_characters.rb | University of Washington, Minnesota Digital Library |
Remove Placeholder Values | Remove a given placeholder value such as "XYZ" | Subject | remove_placeholder.rb | University of Washington | ||
Separate Coordinates | Split sets of geospatial coordinates in original record properties into appropriate latitude and longitude properties. | Spatial | N/A – Twofishes does not require this enrichment | split_coordinates.rb | Tennessee Digital Library, Minnesota Digital Library |