Metadata Mapping DSL

DRAFT

The Krikri Metadata Mapping Domain Specific Language aims to provide a declarative, machine actionable language for specifying crosswalks between metadata formats.  Unlike other mapping tools in popular use (e.g. XSLT), the KriKri DSL is focused on semantic mapping rather than document transformation.

The DSL handles arbitrary input formats through a common Krikri::Parser interface. Parsers project OriginalRecord contents as a tree of values which can be processed through the DSL to select nodes within the tree and construct an RDF Graph from the results.

Mappings are declared by calling Krikri::Mapper#define. The definition has a single required argument: a symbol representing the mapping name (e.g. :esdn_mods). Optionally, it accepts a parser class to use for processing records (and specialized arguments for parser initialization, if any).  

Mapping Definition
Krikri::Mapper.define(:my_mapping, parser: Krikri::ModsParser) do
end

Nodes within the created graph are represented as ActiveTriples::Resource model objects. The graph node (RDF Resource) created by such a definition is a DPLA::MAP::Aggregation (ore:Aggregation); alternative models can be specified with #define's class: option.

Property & Child Declarations

Values of properties within the graph are set with Declarations.

Property Declarations

The simple case is a Property Declaration which directly sets a property on the node.

Simple Property Declaration
Krikri::Mapper.define(:my_mapping) do
  provider "Moomin Valley Historical Society"
end

Property names must correspond to ActiveTriples properties on the class. The mapping above is equivalent to running aggregation.provider = "Moomin Valley Historical Society", assuming an aggregation created for each record processed with the mapping.

ActiveTriples properties map to RDF predicates. In RDF terms, the above mapping results in a graph consisting of a single triple: 

  _:aggregation edm:provider "Moomin Valley Historical Society" .


rdf:type

Most ActiveTriples models automatically insert an rdf:type statement, giving a configured class (e.g. ore:Aggregation). These are omitted from the example graphs here for brevity.

Literal Types

Property Declarations handle typed values through RDF::Literal's type system, converting Ruby Date, true, false, Symbol, and others to their applicable literal data types in generated RDF.


Child Declarations

Child (Node) Declarations extend the language with tools for specifying nested graph structure. Rather than specifying a value to set on the property, they create new RDF resource.

Child Declarations are much like property declarations. They give a property name, corresponding to an RDF predicate. Where they differ is that they must also provide a class: option, giving a model class for new resources created by the declaration, and a block providing declarations (property and/or child) to set properties for that resource. As an example:

Simple Child Declaration
Krikri::Mapper.define(:my_mapping) do
  provider class: DPLA::MAP::Agent do
    providedLabel "Moomin Valley Historical Society"
  end
end

This creates a single DPLA::MAP::Agent (edm:Agent) as a provider. It results in this graph:

  _:aggregation edm:provider _:agent .
  _:agent dpla:providedLabel "Moomin Valley Historical Society" .


Because the declaration block can accept its own Child Declarations, it is possible to create deeply nested structures.

Additionally, Child Declarations accept a pair of options for creating multiple new resources from a set of values: each: and as:. A new resource is created for every value in each:, and the values are accessible from within the declaration's scope using the name provided to as:.

Child Declaration with `each:`
Krikri::Mapper.define(:my_mapping) do
  sourceResource class: DPLA::MAP::SourceResource do
    contributor class: DPLA::MAP::Agent,
                each: ["Moomin", "Snufkin", "Snorkmaiden", "Little My"],
                as: :contrib do
      providedLabel contrib
    end
  end
end

This example creates a new DPLA::MAP::Agent as a dct:contributor for each of "Moomin", "Snuffkin", etc... The resulting graph is:

  _:aggregation edm:aggregatedCHO _:sourceResource .
  _:sourceResource dct:contributor _:moomin .
  _:moomin dpla:providedLabel "Moomin" .
  _:sourceResource dct:contributor _:snufkin .
  _:snufkin dpla:providedLabel "Snufkin" .
  _:sourceResource dct:contributor _:snorkmaiden .
  _:snorkmaiden dpla:providedLabel "Snorkmaiden" .
  _:sourceResource dct:contributor _:littleMy .
  _:littleMy dpla:providedLabel "Little My" .


Rooted Directed Graphs

 Abstractly, a Krikri::Mapping is a tree of declarations. The declarations themselves can be seen as rules for generating a rooted RDF graph from parsed records. The Aggregation constitutes the root node, and each additional Child Declaration creates a descendent node.

Dynamic Values

While static declarations like those above are sometimes useful, the real power of the DSL is in combining these declarations with methods for processing parsed values to dynamically build graphs based on the content of the input.

DSL Methods

The DSL Core provides a small set of methods for accessing record values and manipulating declarations. These methods can be called anywhere within the scope of a DSL definition.

#header

#local_name

#record

Gives delayed access to the record passed in at processing time, beginning with the root node specified by the Parser.

You can chain Value Methods from this call to select specific nodes from the parsed record.

#record examples
Krikri::Mapper.define(:my_mapping) do
  sourceResource class: DPLA::MAP::SourceResource do
    title record.field('dc:title')
  end
end


Krikri::Mapper.define(:my_mapping) do
  sourceResource class: DPLA::MAP::SourceResource do
    contributor class: DPLA::MAP::Agent,
                each:  record.field('item, 'contributor').reject_attribute(:role, 'creator'),
                as:    :contrib do
      providedLabel contrib.field('name')
    end
  end
end

Delayed Calls

Delayed access to record values is provided through a RecordProxy. One of these proxies is returned by #record and #header. When the mapping is defined, the proxy a chain of Value Method calls, continually returning itself to receive the next call. At processing time, the stored call chain is replayed using the real parsed records to retrieve the values.

#record_uri

#uri

Provides a special declaration for setting the URI of a resource. This method behaves like a property declaration, but validates that the value passed in results in a well formed URI. If a stub value is passed, it will use the configured base_uri for the model to try to construct a fully qualified name.

This method will fail if a URI is already set for the resource.

#uri examples
Krikri::Mapper.define(:my_mapping) do
  uri 'this'
  sourceResource class: DPLA::MAP::SourceResource do
    uri 'http://example.org/source_resource'
  end
end


Value Methods

Current and version specific documentation for Value methods is in Parser::ValueArray on RubyDoc.