Enrichments (draft)

Enrichments (draft)

Standard Enrichments

(example enrichment profile)

Name

Type

Description

Typical fields used

ingestion3

Ruby module (Krikri)

Create prefLabel from providedLabel

RDF

Copy the value from a provided label to the preferred label of a property when no preferred label exists.

All

 

create_pref_label_from_provided.rb

DCMIType Map

RDF

Match string values within the type properties to DCMIType terms and add matching term URI. Remove non-DCMIType values and move to format or genre.

Type, Genre, Format

 

dcmi_enforcer.rb

dcmi_type_map.rb

move_non_dcmi_type.rb

De-Duplicate

Normalization

Look for identically matching values within instances of a property and remove the duplicate property.

All

 

dedup_nodes.rb

Genre matching

Normalization

Match string values to a controlled list of values for genre.

Genre, Format, Type

 

genre_filter.rb

move_non_dcmi_type.rb

Language Normalization

Normalization

Match a string to corresponding ISO 639-3 code and add that value and the URI for the language to record. Alternately, match a three-character string to a corresponding language in the ISO 639-3 vocabulary. Add the name string and the URI for the language to the record.

Language

 

language_to_lexvo.rb

Parse Date

Normalization

Parse information in date fields and normalize to EDTF format. When a single date is present, split it into begin and end dates for the Temporal Class. When only a begin and end date is present, create a date range label.

Date, Timespan

 

  parse_date.rb

timespan_split.rb

timespan_label.rb

Remove Empty Fields

Normalization

Looks for and removes any existing property with no value.

All

 

remove_empty_fields.rb

Geocoding

Normalization

Search GeoNames vocabulary for matches to string values in spatial fields. Build out the rest of the spatial class properties based on the GeoNames data.

Spatial

Yes

 

Split at Delimiter

String

Split values at a particular delimiter (usually semicolon) and put each resulting value in its own property instance

All

Yes

split_at_delimiter.rb

split_provided_label_at_delimiter.rb

Strip Ending Punctuation

String

Remove ending punctuation from property values including quotation marks, colons, semicolons, and dashes, but excluding brackets and parentheses. Periods are typically removed except when the final word is a two character string (such as “Jr.”) or two characters separated by periods (“P.A.”).

Title, Creator, Contributor, Publisher, Subject, Format, Genre

Yes

strip_ending_punctuation.rb

Strip HTML

String

Looks for and removes any HTML code from a property value.

All

Yes

strip_html.rb

Strip Leading Colons

String

Remove initial character from a value if it is a colon.

Title

Yes

strip_leading_colons.rb

Strip Leading Punctuation

String

Remove initial punctuation from property values including quotation marks, colons, semicolons, and dashes, but excluding brackets and parentheses.

Creator, Contributor, Publisher, Subject, Format, Genre

Yes

strip_leading_punctuation.rb

Strip Whitespace

String

Look for multiple whitepaces within a value and reduce to a single whitespace.

All

Yes

strip_whitespace.rb

Web Resource URIs

Normalization

Remove Web Resources that are not http URIs.

isShownAt, Preview, Object

 

web_resource_uri.rb

 

Specialized Enrichments

Name

Type

Description

Typical fields used

ingestion3

Ruby module

Hubs

Convert to Sentence Case

String

Convert a string in all caps to sentence case.

Title

Yes

convert_to_sentence_case.rb

 

Dates in Coverage fields

 

Parse coverage properties in original records for values that are dates and move to timespan property.

Spatial

 

 

 

Genre from MARC leader

 

Match specific codes in the MARC leader for a record to a controlled list of values for genre

Genre

 

 

HathiTrust, University of Florida, GPO

Limit Characters

String

Limit the value of a property to a specified number of chracters

Description

Yes

limit_characters.rb

University of Washington, Minnesota Digital Library

Remove Placeholder Values

 

Remove a given placeholder value such as "XYZ"

Subject

 

remove_placeholder.rb

University of Washington

Separate Coordinates

 

Split sets of geospatial coordinates in original record properties into appropriate latitude and longitude properties.

Spatial

N/A – Twofishes does not require this enrichment

split_coordinates.rb

Tennessee Digital Library, Minnesota Digital Library