Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Appended to title

...

This document describes the way we currently index our records, and suggests some alternatives that require changes to our technology stack.

Current method

  • An Activity is run, which itself runs an Indexer to insert a representation of the record into a QASearchIndex. This QASearchIndex is always a Solr search index, at present.
  • People use the QA web application to check the data, issuing queries, generating reports, and so on. Solr is behind this, powering the searches and producing facets for reporting and navigation of the data.
  • When all's well, another Activity is run, which starts up an Indexer to read the record into a ProdSearchIndex. This is always an Elasticsearch search index, at present. At this point, the representation of the record is in the final search index that's behind our main website and our API. The record is live and searchable according to the functionality and configuration of our Elasticsearch installation.

...

Solr, though it has in common with Elasticsearch its use of the Lucene library for low-level search index internals, is sufficiently different than Elasticsearch for there to be inconsistencies and inefficiencies that affect our QA work. For example, it has been difficult to get facets to work the same way between Solr and Elasticsearch, causing surprises when data reach production.

Alternate method

Some preliminaries are established:

...

This new method allows us to see providers' data either on their own, or mixed in with others, as appropriate for the case at hand. It also allows us to set up test instances of our frontend website, pointed at these solo or combined QA indices, to see how the data will look on the frontend.

Outcomes

Enriched records appear either in their own index, if isolation is necessary, or mixed in with other providers' records.

...

We only have to learn and keep current with one search engine product, instead of two.