...
Files are formatted as JSON, and have the following structure:
Code Block | ||
---|---|---|
| ||
[ { ... "_source": { ... record ... } ... }, ... more ... ] |
This is a straight dump of an Elasticsearch index and has some fields outside of "_source" that you can ignore.
Per the note below, we would like to switch back to a lighter-weight structure with fewer unnecessary fields some time in 2016, after we can perform a software upgrade that will make this possible.
Former file formats
If you wrote software to process our files before December 15th, 2015, it was designed to work with one of the following structures, and will need to be updated.
The first format resulted from our old method of exporting the data from CouchDB views, where each element of "rows" had a "doc" property, as follows.
Code Block |
---|
{
"total_rows": <number>,
"rows": [
{
"doc": {
... record ...
}
},
... more rows ...
]
} |
...
Prior to May 28th, 2014, we were also including various other CouchDB-related properties alongside "doc" in every row element.
New file format
We will be changing We changed the structure of our export files' JSON on July 1st, 2014 . The existing format is a legacy of the way we used to export the direct output of CouchDB views, where each element of "rows" had a "doc" property. The new format will be more simple, and will result in lower file sizes, especially for the larger files. The format that we are currently considering is as to be as follows:
Code Block | ||
---|---|---|
| ||
[ { ... record ... }, ... more records ... ] |
We intent to change back to this format some time in 2016, pending a related software upgrade.
Please let us know if you have any comments or questions about the new format, using our contact form.
...