Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

All DPLA data in the DPLA repository is available for download as zipped JSON files on Amazon Simple Storage Service (S3) in the bucket named s3://dpla-provider-export.

For more details about how to access and download these files from S3, see the S3 documentation.


Files are formatted as JSON, and have the following structure:

[
    {
		...
		"_source": { ... record ... }
		...
    },
	... more ...
]

This is a straight dump of an Elasticsearch index and has some fields outside of "_source" that you can ignore.

Per the note below, we would like to switch back to a lighter-weight structure with fewer unnecessary fields some time in 2016, after we can perform a software upgrade that will make this possible.

Former file formats

If you wrote software to process our files before December 16th, 2015, it was designed to work with one of the following structures, and will need to be updated.

The first format resulted from our old method of exporting the data from CouchDB views, where each element of "rows" had a "doc" property, as follows.

{
    "total_rows": <number>,
    "rows": [
                {
                    "doc": {
                               ... record ...
                    }
                },
                ... more rows ...
            ]
}

We changed the structure of our export files' JSON on July 1st, 2014 to be as follows:

[
    {  ... record ... },
    ... more records ...
]

We intent to change back to this format some time in 2016, pending a related software upgrade.

Please let us know if you have any comments or questions about the new format, using our contact form.


  • No labels