Enrichments (draft)

Standard Enrichments

(example enrichment profile)

NameTypeDescriptionTypical fields usedingestion3Ruby module (Krikri)
Create prefLabel from providedLabelRDFCopy the value from a provided label to the preferred label of a property when no preferred label exists.All
create_pref_label_from_provided.rb
DCMIType MapRDFMatch string values within the type properties to DCMIType terms and add matching term URI. Remove non-DCMIType values and move to format or genre.Type, Genre, Format

dcmi_enforcer.rb

dcmi_type_map.rb

move_non_dcmi_type.rb

De-Duplicate

NormalizationLook for identically matching values within instances of a property and remove the duplicate property.All
dedup_nodes.rb

Genre matching

NormalizationMatch string values to a controlled list of values for genre.Genre, Format, Type

genre_filter.rb

move_non_dcmi_type.rb

Language Normalization

NormalizationMatch a string to corresponding ISO 639-3 code and add that value and the URI for the language to record. Alternately, match a three-character string to a corresponding language in the ISO 639-3 vocabulary. Add the name string and the URI for the language to the record.Language
language_to_lexvo.rb

Parse Date

NormalizationParse information in date fields and normalize to EDTF format. When a single date is present, split it into begin and end dates for the Temporal Class. When only a begin and end date is present, create a date range label.Date, Timespan

  parse_date.rb

timespan_split.rb

timespan_label.rb

Remove Empty FieldsNormalizationLooks for and removes any existing property with no value.All
remove_empty_fields.rb

Geocoding

NormalizationSearch GeoNames vocabulary for matches to string values in spatial fields. Build out the rest of the spatial class properties based on the GeoNames data.SpatialYes 
Split at DelimiterStringSplit values at a particular delimiter (usually semicolon) and put each resulting value in its own property instanceAllYes

split_at_delimiter.rb

split_provided_label_at_delimiter.rb

Strip Ending PunctuationStringRemove ending punctuation from property values including quotation marks, colons, semicolons, and dashes, but excluding brackets and parentheses. Periods are typically removed except when the final word is a two character string (such as “Jr.”) or two characters separated by periods (“P.A.”).

Title, Creator, Contributor, Publisher, Subject, Format, Genre

Yesstrip_ending_punctuation.rb
Strip HTMLStringLooks for and removes any HTML code from a property value.AllYesstrip_html.rb
Strip Leading ColonsStringRemove initial character from a value if it is a colon.TitleYesstrip_leading_colons.rb
Strip Leading PunctuationStringRemove initial punctuation from property values including quotation marks, colons, semicolons, and dashes, but excluding brackets and parentheses.Creator, Contributor, Publisher, Subject, Format, GenreYesstrip_leading_punctuation.rb
Strip WhitespaceStringLook for multiple whitepaces within a value and reduce to a single whitespace.AllYesstrip_whitespace.rb
Web Resource URIsNormalizationRemove Web Resources that are not http URIs.isShownAt, Preview, Object
web_resource_uri.rb

 

Specialized Enrichments

NameTypeDescriptionTypical fields usedingestion3Ruby moduleHubs
Convert to Sentence CaseStringConvert a string in all caps to sentence case.TitleYesconvert_to_sentence_case.rb 
Dates in Coverage fields
Parse coverage properties in original records for values that are dates and move to timespan property.Spatial
  
Genre from MARC leader
Match specific codes in the MARC leader for a record to a controlled list of values for genreGenre
 HathiTrust, University of Florida, GPO
Limit CharactersStringLimit the value of a property to a specified number of chractersDescriptionYeslimit_characters.rbUniversity of Washington, Minnesota Digital Library
Remove Placeholder Values
Remove a given placeholder value such as "XYZ"Subject
remove_placeholder.rbUniversity of Washington
Separate Coordinates
Split sets of geospatial coordinates in original record properties into appropriate latitude and longitude properties.SpatialN/A – Twofishes does not require this enrichmentsplit_coordinates.rbTennessee Digital Library, Minnesota Digital Library