/
Database export files

Database export files

All DPLA data in the DPLA repository is available for download as zipped JSON and parquet files on Amazon Simple Storage Service (S3) in the bucket named s3://dpla-provider-export.

For more details about how to access and download these files from S3, see the S3 documentation.

Current JSON format

Files are formatted as JSONL, and have the following structure. Every line is a JSON object.

{
	...
	"_source": { ... record ... }
	...
}
{ ... another record ... }
... more records ... 

This is a straight dump of an Elasticsearch index and has some fields outside of "_source" that you can ignore.

Former JSON file formats

Before August 2018 the file format was as follows. Note that this is a JSON array.

[
    {
		...
		"_source": { ... record ... }
		...
    },
	... more ...
]


If you wrote software to process our files before December 16th, 2015, it was designed to work with one of the following structures.

The first format resulted from our old method of exporting the data from CouchDB views, where each element of "rows" had a "doc" property, as follows.

{
    "total_rows": <number>,
    "rows": [
                {
                    "doc": {
                               ... record ...
                    }
                },
                ... more rows ...
            ]
}

The second format that we used on some of the older files was the JSON array format described above.

Please let us know if you have any comments or questions about the new format, using our contact form.


Related content

Contributing Metadata Files to DPLA
Contributing Metadata Files to DPLA
More like this
Ingestion 3 Storage Specification
Ingestion 3 Storage Specification
More like this
LDP Interactions Overview
LDP Interactions Overview
More like this
Marmotta
Marmotta
More like this
Working with DPLA::MAP Resources
Working with DPLA::MAP Resources
More like this
DPLA Technology Team
DPLA Technology Team
More like this