Possible future avenues to follow:
Google analytics tracking
Try stripped high-bit characters, like https://lucene.apache.org/solr/guide/6_6/charfilterfactories.html#CharFilterFactories-solr.MappingCharFilterFactory . This would need to be applied to the LDA model first, and then to the ingestion3 process of generating vectors for new, unseen documents.
Recall that postman is a good way to test changes to the DPLA API