Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Added fields to la.dp.avro.MAP.4_0.MAPRecord.v1

...

Code Block
{
  "namespace": "la.dp.avro",
  "type": "record",
  "name": "OriginalRecord.v1",
  "doc": <DOCUMENTATION>,
  "fields": [
    {"name": "id", "type": "string"},
    {"name": "ingestDate", "type": "int", "doc": "UNIX timestamp"},
    {"name": "provider", "type": "string"},
    {"name": "document", "type": "string"},
    {"name": "mimetype",
     "type": {"name": "MimeType",
              "type": "enum",
              "symbols": ["application_json", "application_xml", "text_turtle"]}}
  ]
}

...

Notes On Fields

We will have to translate "_" characters in the mimetype field into "/" characters in our code. The "/" character is not allowed within an enum symbol in an Avro schema.

In Ingestion1, ingestDate was when the particular record was enriched. It would probably be more useful, however, to have this be the timestamp of the beginning of the harvest, which is more important as far as the provider and the end-user are concerned. Having ingestDate expressed internally in the Avro file as an integer allows for easier decision-making calculations about date ranges, such as queries that filter records. The timestamp can be output as a date string later when the record is indexed.

la.dp.avro.MAP.4_0.MAPRecord.v1

...

Code Block
{
  "namespace": "la.dp.avro.MAP.4_0",
  "type": "record",
  "name": "MAPRecord.v1",
  "doc": <DOCUMENTATION>,
  "fields": [

    {"name": "id", "type": "string", "doc": "DPLA record ID"},
    {"name": "ingestType",
     "type": {"name": "IngestType",
              "type": "enum",
              "symbols": ["item", "collection"]}},
    {"name": "ingestDate", "type": "int", "doc": "UNIX timestamp"},


	{"name": "dataProvider", "type": "string"},
    {"name": "hasView", "type": "string"},
    {"name": "intermediateProvider", "type": "string"},
    {"name": "isShownAt", "type": "string"},
"document    {"name": "object", "type": "string"},
    {"name": "originalRecord", "type": "string"},
    {"name": "preview", "type": "string"},
	{"name": "provider", "type": "string"},
    {"name": "rightsStatement", "type": "string"},
    {"name": "sourceResource_alternative", "type": "string"},
	{"name": "sourceResource_collection", "type": "string"},
	{"name": "sourceResource_contributor", "type": "string"},
	{"name": "sourceResource_creator", "type": "string"},
	{"name": "sourceResource_date", "type": "string"},
	{"name": "sourceResource_description", "type": "string"},
	{"name": "sourceResource_extent", "type": "string"},
	{"name": "sourceResource_format", "type": "string"},
	{"name": "sourceResource_genre", "type": "string"},
	{"name": "sourceResource_identifier", "type": "string"},
	{"name": "sourceResource_isReplacedBy", "type": "string"},
	{"name": "sourceResource_language", "type": "string"},
	{"name": "sourceResource_publisher", "type": "string"},
	{"name": "sourceResource_relation", "type": "string"},
	{"name": "sourceResource_replaces", "type": "string"},
	{"name": "sourceResource_rights", "type": "string"},
	{"name": "sourceResource_rightsHolder", "type": "string"},
	{"name": "sourceResource_spatial", "type": "string"},
	{"name": "sourceResource_spatial", "type": "string"},
	{"name": "sourceResource_subject", "type": "string"},
	{"name": "sourceResource_temporal", "type": "string"},
	{"name": "sourceResource_title", "type": "string"},
	{"name": "sourceResource_type", "type": "string"},
    {"name": "title", "type": "string", "doc": "Only for collection records"}


  ]
}

Notes About Fields

See note above in OriginalRecord for ingestDate.

la.dp.avro.MAP.4_0.IndexRecord.v1

...