...
DPLA tries to ensure that the IDs we mint for your records will not change over time. However, this requires that the value we base our identifier on does not change over time. One issue we have discovered during the development of ingestion3 was that that in the original ingestion1 system in some cases the value we based our identifiers on in ingestion1 was were not the most stablestable and did in fact change over time.
History of DPLA identifiers in ingestion1
...
The last step is to take the final provider identifier and create and MD5 hash which will be the persistent DPLA identifier for this record.
Examples:
Provider | Use prefix? | Identifier field | pre-MD5 hash value | DPLA identifier |
---|---|---|---|---|
Illinois | Yes | dc:identifier | il--https://madison-historical.siue.edu/archive/files/original/79c4cf9b0da358e32fa7bab46563e79e.pdf | 02a5aa4975b941d340d14cb9ad4f7a37 |
PA Digital | No | OAI header identifier |
oai:libcollab.temple.edu:dplapa: |
---|
SLPa_biologicalfactst00bate | 000178f5b0d971292ca1f6539a9f3a9b |
---|
Problems with this approach
- Default source of local "persistent" identifiers for providers using either DC or QDC is the dc:identifier field which is not what it was designed for
- The order of identifiers is now significant and adding a new URI to the dc:identifier field will change the DPLA identifier
- Moving from http:// to https:// will change the DPLA identifier
- Changing a domain will change the DPLA identifier
These and other subtle changes can cause the DPLA identifiers to change without notice. Additionally, DPLA did not save the pre-hashed value so it is very difficult to reverse engineer what the DPLA identifier is derived from.
Ingestion3 DPLA ID minting
We have sought to remedy many of these issues in our approach to DPLA ID minting in ingestion3, however, because of this legacy it is virtually impossible to live up to guaranteeing persistent identifiers to all records in our corpus. However, we can make some changes which will make it less likely for DPLA identifiers to change going forward.
...
Code Block | ||||
---|---|---|---|---|
| ||||
// ID minting functions for Tennessee
override def useProviderName(): Boolean = true
override def getProviderName(): String = "tn"
override def originalId(implicit data: Document[NodeSeq]): ZeroToOne[String] =
extractString(data \ "header" \ "identifier") |
Making the switch
Implementing these changes can create some short-term headaches when the previous identifiers were based on values in dc:identifier or some other non-persistent value. In these cases, all of the DPLA identifiers will change and links to DPLA item pages may be broken. Will we try to fix an broken internal links (Primary Source Sets, Exhibitions etc.) but external links are outside our control. This is unfortunate but the status quo of continuing to use the a non-persistent identifier is just a fraught and we cannot guarantee that the identifiers won't eventaully change. Performing the switchover in this way gives us control to try and identify problems ahead of time and make the appropriate corrections quickly.