Queue an Indexing
QA
Krikri::Indexer.enqueue({ index_class: 'Krikri::QASearchIndex', generator_uri: Krikri::Activity.find(activity_id).rdf_subject })
A successful index job will commit to solr upon completion.
To clear a provider from the QA index manually, you can do:
qa = Krikri::QASearchIndex.new
qa.delete_by_query 'provider_id:http\://dp.la/api/contributor/washington'
qa.commit
Staging
The staging host is given by a ansible configuration variable es_cluster_loadbal
at https://github.com/dpla/aws/blob/master/ansible/group_vars/staging#L25.
stg = 'internal-search-lbal-stg-1352112635.us-east-1.elb.amazonaws.com' # verify that this is up to date; the job will fail after the query (within 5 minutes) if it is incorrect. Krikri::Indexer.enqueue({ index_class: 'Krikri::ProdSearchIndex', generator_uri: Krikri::Activity.find(activity_id).rdf_subject, host: stg, index_name: 'dpla_alias' })
fQA (Frontend QA)
If you need to index data to the temporary frontend QA portal (http://ec2-54-172-127-200.compute-1.amazonaws.com/) use the staging host for the search load balancer (from above) but change the index name from 'dpla_alias' to 'fqa_172_30_0_143'
stgHost = 'internal-search-lbal-stg-1352112635.us-east-1.elb.amazonaws.com' # verify that this is up to date; the job will fail after the query (within 5 minutes) if it is incorrect. Krikri::Indexer.enqueue({ index_class: 'Krikri::ProdSearchIndex', generator_uri: Krikri::Activity.find(activity_id).rdf_subject, host: stgHost, index_name: 'fqa_172_30_0_143' })
Production
Krikri::Indexer.enqueue(index_class: 'Krikri::ProdSearchIndex', generator_uri: Krikri::Activity.find(activity_id).rdf_subject)
When indexing an existing provider from Heidrun for the first time, we need to clear the old indexed items. These will appear as duplicates with the same API ID, due to a change in how we handle the index's internal `_id
`. To do this, delete all the items with an ingestion sequence other than `999999
` for the provider:
idx_prod = Krikri::ProdSearchIndex.new
provider_name = "scdl" # for example
query = {:query=>{:filtered=>{ :query=>{:match_all=>{}}, :filter=>{:bool=>{ :must_not=>{:term=>{:ingestionSequence=>"999999"}}, :must =>{:term=>{:"provider.@id"=>"http://dp.la/api/contributor/#{provider_name}"}} }}}}}
response = idx_prod.elasticsearch.search(index: 'dpla_alias', body: query) response['hits']['total'] # check that hit total matches expected; probably a good idea to check actual matches, too.
# delete the items; look for "successful"=>5 # if you get failures, checking the logs in `/var/log/elasticsearch` on the production boxes is a good starting place for diagnostics idx_prod.elasticsearch.delete_by_query(index: 'dpla_alias', body: query[:query]) # => {"ok"=>true, "_indices"=>{"dpla-20150410-144958"=>{"_shards"=>{"total"=>5, "successful"=>5, "failed"=>0}}}}