Queue a Harvest
From Rails Console
> opts = { uri: 'http://example.org/oai', oai: { metadata_prefix: 'mods' } }
> Krikri::Harvesters::OAIHarvester.enqueue(opts)
# created activity 1
# enqueued Krikri::HarvestJob
# => true
Harvester Options
Each harvester may have different options available to it at harvest time. `uri` will always be the URI for the harvest endpoint or provider. Harvester specific options are nested under a key and documented in the harvester's `#initialize` method. You get a description of the harvester options by calling `.expected_opts` on the Harvester class.
Available Harvesters are:
Krikri::Harvesters::OAIHarvesterKrikri::Harvesters::CouchdbHarvester
Running an Ad-Hoc Harvest
In non-production environments, it's often useful to run a portion of the full harvest. You can do this with a harvester as follows:
> harvester = Krikri::Harvesters::OAIHarvester.new(opts)
> test_harvest_uri = RDF::URI('http://example.org/my_test_harvest')
> harvester.records.take(1000).each { |rec| harvester.process_record(rec, test_harvest_uri)
This does effectively what the Harvester's `#run` method does, and will process the records in the same way as a queued harvest, as though it had been run by an activity "http://example.org/my_test_harvest". Note that while you can query the records by that URI through the provenance query client, etc... this does not create a `Krikri::Activity` in the database.
Harvest Behaviors and Record Class
By default, running a harvester saves each record as a `Krikri::OriginalRecord`. This behavior is customizable by passing a class implementing the `HarvestBehavior` interface to `:harvest_behavior`, and/or a different record class (responding to `#build`) to the `:record_class` option. The OAI harvester, for instance, implements a specialized `SkipDeletedBehavior` which passes silently over OAI records marked with the status "deleted".