Queue a Harvest
From Rails Console
> opts = { uri: 'http://example.org/oai', oai: { metadata_prefix: 'mods' } } > Krikri::Harvesters::OAIHarvester.enqueue(opts) # created activity 1 # enqueued Krikri::HarvestJob # => true
Harvester Options
Each harvester may have different options available to it at harvest time. `uri`
will always be the URI for the harvest endpoint or provider. Harvester specific options are nested under a key and documented in the harvester's `#initialize
` method. You get a description of the harvester options by calling `.expected_opts
` on the Harvester class.
Available Harvesters are:
Krikri::Harvesters::OAIHarvester
Krikri::Harvesters::CouchdbHarvester
Running an Ad-Hoc Harvest
In non-production environments, it's often useful to run a portion of the full harvest. You can do this with a harvester as follows:
> harvester = Krikri::Harvesters::OAIHarvester.new(opts) > test_harvest_uri = RDF::URI('http://example.org/my_test_harvest') > harvester.records.take(1000).each { |rec| harvester.process_record(rec, test_harvest_uri)
This does effectively what the Harvester's `#run
` method does, and will process the records in the same way as a queued harvest, as though it had been run by an activity "http://example.org/my_test_harvest". Note that while you can query the records by that URI through the provenance query client, etc... this does not create a `Krikri::Activity
` in the database.
Harvest Behaviors and Record Class
By default, running a harvester saves each record as a `Krikri::OriginalRecord
`. This behavior is customizable by passing a class implementing the `HarvestBehavior
` interface to `:harvest_behavior
`, and/or a different record class (responding to `#build
`) to the `:record_class
` option. The OAI harvester, for instance, implements a specialized `SkipDeletedBehavior
` which passes silently over OAI records marked with the status "deleted".