Queue a Harvest

From Rails Console

Rails Console
 > opts = { uri: 'http://example.org/oai', oai: { metadata_prefix: 'mods' } }
 > Krikri::Harvesters::OAIHarvester.enqueue(opts)
# created activity 1
# enqueued Krikri::HarvestJob
# => true

Harvester Options

Each harvester may have different options available to it at harvest time. `uri` will always be the URI for the harvest endpoint or provider. Harvester specific options are nested under a key and documented in the harvester's `#initialize` method.  You get a description of the harvester options by calling `.expected_opts` on the Harvester class.

Available Harvesters are:

  • Krikri::Harvesters::OAIHarvester

  • Krikri::Harvesters::CouchdbHarvester


Running an Ad-Hoc Harvest

In non-production environments, it's often useful to run a portion of the full harvest. You can do this with a harvester as follows:

Run a partial harvest
 > harvester = Krikri::Harvesters::OAIHarvester.new(opts)
 > test_harvest_uri = RDF::URI('http://example.org/my_test_harvest')
 > harvester.records.take(1000).each { |rec| harvester.process_record(rec, test_harvest_uri) 

This does effectively what the Harvester's `#run` method does, and will process the records in the same way as a queued harvest, as though it had been run by an activity "http://example.org/my_test_harvest". Note that while you can query the records by that URI through the provenance query client, etc... this does not create a `Krikri::Activity` in the database.

 

 Harvest Behaviors and Record Class

By default, running a harvester saves each record as a `Krikri::OriginalRecord`.  This behavior is customizable by passing a class implementing the `HarvestBehavior` interface to `:harvest_behavior`, and/or a different record class (responding to `#build`) to the `:record_class` option.  The OAI harvester, for instance, implements a specialized `SkipDeletedBehavior` which passes silently over OAI records marked with the status "deleted".