Page tree
Skip to end of metadata
Go to start of metadata

All data in the DPLA repository are available for download as gzipped JSON files. These include the standard DPLA fields, as well as the complete record received from the partner.

File format

Files are formatted as JSON, and have the following structure:

[
    {
		...
		"_source": { ... record ... }
		...
    },
	... more ...
]

This is a straight dump of an Elasticsearch index and has some fields outside of "_source" that you can ignore.

Per the note below, we would like to switch back to a lighter-weight structure with fewer unnecessary fields some time in 2016, after we can perform a software upgrade that will make this possible.

Former file formats

If you wrote software to process our files before December 16th, 2015, it was designed to work with one of the following structures, and will need to be updated.

The first format resulted from our old method of exporting the data from CouchDB views, where each element of "rows" had a "doc" property, as follows.

{
    "total_rows": <number>,
    "rows": [
                {
                    "doc": {
                               ... record ...
                    }
                },
                ... more rows ...
            ]
}

We changed the structure of our export files' JSON on July 1st, 2014 to be as follows:

[
    {  ... record ... },
    ... more records ...
]

We intent to change back to this format some time in 2016, pending a related software upgrade.

Please let us know if you have any comments or questions about the new format, using our contact form.

 

  • No labels