/
Enrichments (draft)
Enrichments (draft)
Standard Enrichments
Name | Type | Description | Typical fields used | ingestion3 | Ruby module (Krikri) |
---|---|---|---|---|---|
Create prefLabel from providedLabel | RDF | Copy the value from a provided label to the preferred label of a property when no preferred label exists. | All | create_pref_label_from_provided.rb | |
DCMIType Map | RDF | Match string values within the type properties to DCMIType terms and add matching term URI. Remove non-DCMIType values and move to format or genre. | Type, Genre, Format | dcmi_enforcer.rb dcmi_type_map.rb move_non_dcmi_type.rb | |
De-Duplicate | Normalization | Look for identically matching values within instances of a property and remove the duplicate property. | All | dedup_nodes.rb | |
Genre matching | Normalization | Match string values to a controlled list of values for genre. | Genre, Format, Type | genre_filter.rb move_non_dcmi_type.rb | |
Language Normalization | Normalization | Match a string to corresponding ISO 639-3 code and add that value and the URI for the language to record. Alternately, match a three-character string to a corresponding language in the ISO 639-3 vocabulary. Add the name string and the URI for the language to the record. | Language | language_to_lexvo.rb | |
Parse Date | Normalization | Parse information in date fields and normalize to EDTF format. When a single date is present, split it into begin and end dates for the Temporal Class. When only a begin and end date is present, create a date range label. | Date, Timespan | parse_date.rb timespan_split.rb timespan_label.rb | |
Remove Empty Fields | Normalization | Looks for and removes any existing property with no value. | All | remove_empty_fields.rb | |
Geocoding | Normalization | Search GeoNames vocabulary for matches to string values in spatial fields. Build out the rest of the spatial class properties based on the GeoNames data. | Spatial | Yes | |
Split at Delimiter | String | Split values at a particular delimiter (usually semicolon) and put each resulting value in its own property instance | All | Yes | split_at_delimiter.rb split_provided_label_at_delimiter.rb |
Strip Ending Punctuation | String | Remove ending punctuation from property values including quotation marks, colons, semicolons, and dashes, but excluding brackets and parentheses. Periods are typically removed except when the final word is a two character string (such as “Jr.”) or two characters separated by periods (“P.A.”). | Title, Creator, Contributor, Publisher, Subject, Format, Genre | Yes | strip_ending_punctuation.rb |
Strip HTML | String | Looks for and removes any HTML code from a property value. | All | Yes | strip_html.rb |
Strip Leading Colons | String | Remove initial character from a value if it is a colon. | Title | Yes | strip_leading_colons.rb |
Strip Leading Punctuation | String | Remove initial punctuation from property values including quotation marks, colons, semicolons, and dashes, but excluding brackets and parentheses. | Creator, Contributor, Publisher, Subject, Format, Genre | Yes | strip_leading_punctuation.rb |
Strip Whitespace | String | Look for multiple whitepaces within a value and reduce to a single whitespace. | All | Yes | strip_whitespace.rb |
Web Resource URIs | Normalization | Remove Web Resources that are not http URIs. | isShownAt, Preview, Object | web_resource_uri.rb |
Specialized Enrichments
Name | Type | Description | Typical fields used | ingestion3 | Ruby module | Hubs |
---|---|---|---|---|---|---|
Convert to Sentence Case | String | Convert a string in all caps to sentence case. | Title | Yes | convert_to_sentence_case.rb | |
Dates in Coverage fields | Parse coverage properties in original records for values that are dates and move to timespan property. | Spatial | ||||
Genre from MARC leader | Match specific codes in the MARC leader for a record to a controlled list of values for genre | Genre | HathiTrust, University of Florida, GPO | |||
Limit Characters | String | Limit the value of a property to a specified number of chracters | Description | Yes | limit_characters.rb | University of Washington, Minnesota Digital Library |
Remove Placeholder Values | Remove a given placeholder value such as "XYZ" | Subject | remove_placeholder.rb | University of Washington | ||
Separate Coordinates | Split sets of geospatial coordinates in original record properties into appropriate latitude and longitude properties. | Spatial | N/A – Twofishes does not require this enrichment | split_coordinates.rb | Tennessee Digital Library, Minnesota Digital Library |
Related content
Metadata Mapping DSL
Metadata Mapping DSL
More like this
DSL Knowledge Base
DSL Knowledge Base
More like this
Enrichment Profiles
Enrichment Profiles
More like this
Architecture
Architecture
More like this
Crosswalk to Relational Database
Crosswalk to Relational Database
More like this
Data Specifications
Data Specifications
More like this