Strip high-bit chars out of LDA model

Description

Try stripped high-bit characters, like https://lucene.apache.org/solr/guide/6_6/charfilterfactories.html#CharFilterFactories-solr.MappingCharFilterFactory . This would need to be applied to the LDA model first, and then to the ingestion3 process of generating vectors for new, unseen documents.

Status

Assignee

Audrey Altman

Reporter

Audrey Altman

Labels

None

Epic Link

Priority

Medium