Queue an Indexing

QA

Krikri::Indexer.enqueue({ index_class: 'Krikri::QASearchIndex', generator_uri: Krikri::Activity.find(activity_id).rdf_subject })

A successful index job will commit to solr upon completion.

To clear a provider from the QA index manually, you can do:

  qa = Krikri::QASearchIndex.new
  qa.delete_by_query 'provider_id:http\://dp.la/api/contributor/washington'
  qa.commit

 

 

Staging

The staging host is given by a ansible configuration variable es_cluster_loadbal at https://github.com/dpla/aws/blob/master/ansible/group_vars/staging#L25.

stg = 'internal-search-lbal-stg-1352112635.us-east-1.elb.amazonaws.com' # verify that this is up to date; the job will fail after the query (within 5 minutes) if it is incorrect.
Krikri::Indexer.enqueue({ index_class: 'Krikri::ProdSearchIndex', generator_uri: Krikri::Activity.find(activity_id).rdf_subject, host: stg, index_name: 'dpla_alias' })

fQA (Frontend QA)

If you need to index data to the temporary frontend QA portal (http://ec2-54-172-127-200.compute-1.amazonaws.com/) use the staging host for the search load balancer (from above) but change the index name from 'dpla_alias' to 'fqa_172_30_0_143'

stgHost = 'internal-search-lbal-stg-1352112635.us-east-1.elb.amazonaws.com' # verify that this is up to date; the job will fail after the query (within 5 minutes) if it is incorrect.
Krikri::Indexer.enqueue({ index_class: 'Krikri::ProdSearchIndex', generator_uri: Krikri::Activity.find(activity_id).rdf_subject, host: stgHost, index_name: 'fqa_172_30_0_143' })

Production

Krikri::Indexer.enqueue(index_class: 'Krikri::ProdSearchIndex', generator_uri: Krikri::Activity.find(activity_id).rdf_subject)

When indexing an existing provider from Heidrun for the first time, we need to clear the old indexed items. These will appear as duplicates with the same API ID, due to a change in how we handle the index's internal `_id`. To do this, delete all the items with an ingestion sequence other than `999999` for the provider:

 

idx_prod      = Krikri::ProdSearchIndex.new
provider_name = "scdl" # for example
 
query    = {:query=>{:filtered=>{
                      :query=>{:match_all=>{}},
                        :filter=>{:bool=>{
                          :must_not=>{:term=>{:ingestionSequence=>"999999"}},
                          :must    =>{:term=>{:"provider.@id"=>"http://dp.la/api/contributor/#{provider_name}"}}
            }}}}}
 
response = idx_prod.elasticsearch.search(index: 'dpla_alias', body: query)
response['hits']['total'] # check that hit total matches expected; probably a good idea to check actual matches, too.
 
# delete the items; look for "successful"=>5
# if you get failures, checking the logs in `/var/log/elasticsearch` on the production boxes is a good starting place for diagnostics
idx_prod.elasticsearch.delete_by_query(index: 'dpla_alias', body: query[:query])
# => {"ok"=>true, "_indices"=>{"dpla-20150410-144958"=>{"_shards"=>{"total"=>5, "successful"=>5, "failed"=>0}}}}