Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

All DPLA data in the DPLA repository is available for download as zipped JSON files on Amazon Simple Storage Service (S3) in the bucket named s3://dpla-provider-export.

For more details about how to access and download these files from S3, see the S3 documentation.

Current format

Files are formatted as JSONJSONL, and have the following structure:. Every line is a JSON object.

Code Block
languagejs
[
    {
		...
		"_source": { ... record ... }
		...
}
{ ... another record ... },
	... more records ... ]

This is a straight dump of an Elasticsearch index and has some fields outside of "_source" that you can ignore.

Per the note below, we would like to switch back to a lighter-weight structure with fewer unnecessary fields some time in 2016, after we can perform a software upgrade that will make this possible.

Former file formatsFormer file formats

Before July 2018 the file format was as follows. Note that this is a JSON array.

Code Block
languagejs
[
    {
		...
		"_source": { ... record ... }
		...
    },
	... more ...
]


If you wrote software to process our files before December 16th, 2015, it was designed to work with one of the following structures, and will need to be updated.

The first format resulted from our old method of exporting the data from CouchDB views, where each element of "rows" had a "doc" property, as follows.

Code Block
{
    "total_rows": <number>,
    "rows": [
                {
                    "doc": {
                               ... record ...
                    }
                },
                ... more rows ...
            ]
}

We changed the structure of our export files' JSON on July 1st, 2014 to be as follows:

Code Block
languagejs
[
    {  ... record ... },
    ... more records ...
]

We intent to change back to this format some time in 2016, pending a related software upgradeThe second format that we used on some of the older files was the JSON array format described above.

Please let us know if you have any comments or questions about the new format, using our contact form.

...