Metadata Mapping DSL
DRAFT
The Krikri Metadata Mapping Domain Specific Language aims to provide a declarative, machine actionable language for specifying crosswalks between metadata formats. Unlike other mapping tools in popular use (e.g. XSLT), the KriKri DSL is focused on semantic mapping rather than document transformation.
The DSL handles arbitrary input formats through a common Krikri::Parser
interface. Parsers project OriginalRecord
contents as a tree of values which can be processed through the DSL to select nodes within the tree and construct an RDF Graph from the results.
Mappings are declared by calling Krikri::Mapper#define
. The definition has a single required argument: a symbol representing the mapping name (e.g. :esdn_mods
). Optionally, it accepts a parser class to use for processing records (and specialized arguments for parser initialization, if any).
Krikri::Mapper.define(:my_mapping, parser: Krikri::ModsParser) do end
Nodes within the created graph are represented as ActiveTriples::Resource
model objects. The graph node (RDF Resource) created by such a definition is a DPLA::MAP::Aggregation
(ore:Aggregation
); alternative models can be specified with #define
's class:
option.
Property & Child Declarations
Values of properties within the graph are set with Declarations.
Property Declarations
The simple case is a Property Declaration which directly sets a property on the node.
Krikri::Mapper.define(:my_mapping) do provider "Moomin Valley Historical Society" end
Property names must correspond to ActiveTriples
properties on the class. The mapping above is equivalent to running aggregation.provider = "Moomin Valley Historical Society"
, assuming an aggregation
created for each record processed with the mapping.
ActiveTriples
properties map to RDF predicates. In RDF terms, the above mapping results in a graph consisting of a single triple:
_:aggregation edm:provider "Moomin Valley Historical Society" .
rdf:type
Most ActiveTriples
models automatically insert an rdf:type
statement, giving a configured class (e.g. ore:Aggregation
). These are omitted from the example graphs here for brevity.
Literal Types
Property Declarations handle typed values through RDF::Literal
's type system, converting Ruby Date
, true
, false
, Symbol
, and others to their applicable literal data types in generated RDF.
Child Declarations
Child (Node) Declarations extend the language with tools for specifying nested graph structure. Rather than specifying a value to set on the property, they create new RDF resource.
Child Declarations are much like property declarations. They give a property name, corresponding to an RDF predicate. Where they differ is that they must also provide a class:
option, giving a model class for new resources created by the declaration, and a block providing declarations (property and/or child) to set properties for that resource. As an example:
Krikri::Mapper.define(:my_mapping) do provider class: DPLA::MAP::Agent do providedLabel "Moomin Valley Historical Society" end end
This creates a single DPLA::MAP::Agent
(edm:Agent
) as a provider. It results in this graph:
_:aggregation edm:provider _:agent .
_:agent dpla:providedLabel "Moomin Valley Historical Society" .
Because the declaration block can accept its own Child Declarations, it is possible to create deeply nested structures.
Additionally, Child Declarations accept a pair of options for creating multiple new resources from a set of values: each:
and as:
. A new resource is created for every value in each:
, and the values are accessible from within the declaration's scope using the name provided to as:
.
Krikri::Mapper.define(:my_mapping) do sourceResource class: DPLA::MAP::SourceResource do contributor class: DPLA::MAP::Agent, each: ["Moomin", "Snufkin", "Snorkmaiden", "Little My"], as: :contrib do providedLabel contrib end end end
This example creates a new DPLA::MAP::Agent
as a dct:contributor
for each of "Moomin", "Snuffkin", etc... The resulting graph is:
_:aggregation edm:aggregatedCHO _:sourceResource .
_:sourceResource dct:contributor _:moomin .
_:moomin dpla:providedLabel "Moomin" .
_:sourceResource dct:contributor _:snufkin .
_:snufkin dpla:providedLabel "Snufkin" .
_:sourceResource dct:contributor _:snorkmaiden .
_:snorkmaiden dpla:providedLabel "Snorkmaiden" .
_:sourceResource dct:contributor _:littleMy .
_:littleMy dpla:providedLabel "Little My" .
Rooted Directed Graphs
Krikri::Mapping
is a tree of declarations. The declarations themselves can be seen as rules for generating a rooted RDF graph from parsed records. The Aggregation
constitutes the root node, and each additional Child Declaration creates a descendent node.Dynamic Values
While static declarations like those above are sometimes useful, the real power of the DSL is in combining these declarations with methods for processing parsed values to dynamically build graphs based on the content of the input.
DSL Methods
The DSL Core provides a small set of methods for accessing record values and manipulating declarations. These methods can be called anywhere within the scope of a DSL definition.
#header
#local_name
#record
Gives delayed access to the record passed in at processing time, beginning with the root node specified by the Parser
.
You can chain Value Methods from this call to select specific nodes from the parsed record.
Krikri::Mapper.define(:my_mapping) do sourceResource class: DPLA::MAP::SourceResource do title record.field('dc:title') end end Krikri::Mapper.define(:my_mapping) do sourceResource class: DPLA::MAP::SourceResource do contributor class: DPLA::MAP::Agent, each: record.field('item, 'contributor').reject_attribute(:role, 'creator'), as: :contrib do providedLabel contrib.field('name') end end end
Delayed Calls
Delayed access to record values is provided through a RecordProxy
. One of these proxies is returned by #record
and #header
. When the mapping is defined, the proxy a chain of Value Method calls, continually returning itself to receive the next call. At processing time, the stored call chain is replayed using the real parsed records to retrieve the values.
#record_uri
#uri
Provides a special declaration for setting the URI of a resource. This method behaves like a property declaration, but validates that the value passed in results in a well formed URI. If a stub value is passed, it will use the configured base_uri
for the model to try to construct a fully qualified name.
This method will fail if a URI is already set for the resource.
Krikri::Mapper.define(:my_mapping) do uri 'this' sourceResource class: DPLA::MAP::SourceResource do uri 'http://example.org/source_resource' end end
Value Methods
Current and version specific documentation for Value methods is in Parser::ValueArray
on RubyDoc.