/
Ingestion 3 Dependencies

Ingestion 3 Dependencies

Notes, for reference by the DPLA tech team.

The following dependencies work together, and enable read/write to S3

  • spark 2.3.1
  • hdfs 2.8.4
  • spark-avro 4.0.0
  • hadoop-aws 2.7.6
  • aws-java-sdk 1.7.4

Flintrock

Specify spark and HDFS versions in Flintrock's config file.

Use either of these methods to add dependencies to an EC2 cluster:

  • Add as jar files to /home/ec2-user/spark/jars/
  •  List after the --packages flag when running spark-submit

Dependencies for HarvestEntry

  • com.databricks:spark-avro_2.11:4.0.0

  • org.apache.hadoop:hadoop-aws:2.7.6

  • com.amazonaws:aws-java-sdk:1.7.4

  • org.rogach:scallop_2.11:3.0.3

  • com.typesafe:config:1.3.1

Dependencies for MappingEntry

  • com.databricks:spark-avro_2.11:4.0.0

  • org.apache.hadoop:hadoop-aws:2.7.6

  • com.amazonaws:aws-java-sdk:1.7.4

  • org.rogach:scallop_2.11:3.0.3

  • com.typesafe:config:1.3.1

Dependencies for EnrichEntry

  • com.databricks:spark-avro_2.11:4.0.0

  • org.apache.hadoop:hadoop-aws:2.7.6

  • com.amazonaws:aws-java-sdk:1.7.4

  • org.rogach:scallop_2.11:3.0.3

  • com.typesafe:config:1.3.1

  • org.eclipse.rdf4j:rdf4j-model:2.2

  • org.jsoup:jsoup:1.10.2

Dependencies for JsonlEntry

  • com.databricks:spark-avro_2.11:4.0.0

  • org.apache.hadoop:hadoop-aws:2.7.6

  • com.amazonaws:aws-java-sdk:1.7.4

  • org.rogach:scallop_2.11:3.0.3

Dependences for IngestRemap 

  • com.databricks:spark-avro_2.11:4.0.0

  • org.apache.hadoop:hadoop-aws:2.7.6

  • com.amazonaws:aws-java-sdk:1.7.4

  • org.rogach:scallop_2.11:3.0.3

  • com.typesafe:config:1.3.1

  • org.eclipse.rdf4j:rdf4j-model:2.2

  • org.jsoup:jsoup:1.10.2

Related content

Spark OAI Harvester
Spark OAI Harvester
More like this
Ingestion 3 Storage Specification
Ingestion 3 Storage Specification
More like this
Record Persistence
Record Persistence
More like this
2016-10-28 Meeting notes
2016-10-28 Meeting notes
More like this
Ingestion2 Roadmap
Ingestion2 Roadmap
More like this
Contributing Metadata Files to DPLA
Contributing Metadata Files to DPLA
More like this