Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Record Processing

Description:

An execution environment for running harvests, maps, and enrichments across a provider's contributed metadata.


Selection criteria:

  • TODO

Nice-to-haves:

  • TODO

Notes:

It might not make sense to consider the Record Processing, Mapping DSL, and Queuing System projects separately if they are highly coupled.


Technology OptionLanguageStrengthsWeaknessesOpportunitiesThreats
































Mapping DSL

Description:

A generalized, easy-to-use language for converting documents of arbitrary schemas into DPLA MAP. At first, this will likely be implemented in a general programing language as part of the Record Processing project, with the expectation that we will eventually deliver a language that metadata experts with little-to-no programming experience will be successful using on their own, or with minimal supervision.

It's expected that this project will primarily be custom code with possibly a number of implementations if it needs to work in mutually-incompatible environments. Therefore, framework exploration probably isn't needed.


Selection criteria:

  • Simple to use 
  • Accessible by non-programmers
  • Needs to handle core use cases 
    • JSON
    • XML
    • RDF
    • multi-schema/multi-namespace documents
    • DPLA MAP

Nice-to-haves:

  • Able to run in a variety of execution contexts (browser, command line, grid computing frameworks)
  • Easily usable by partners in their own environments
  • Deep document validity checks (not just well-formedness)


Technology OptionStrengthsWeaknessesOpportunitiesThreats
Javascript



Python



Scala



Java










Dashboard

Description:TODO

The Dashboard is a web application that 


Selection criteria:

  • TODO

Nice-to-haves:

  • TODO


Technology OptionLanguageStrengthsWeaknessesOpportunitiesThreats































QA App

Description:

TODO


Selection criteria:

  • TODO

Nice-to-haves:

  • TODO


Technology OptionLanguageStrengthsWeaknessesOpportunitiesThreats































Queueing System

Description:

The queuing system controls the runtime execution of activities. Currently, Ingestion 2 uses Resque, which is a Ruby-based environment that uses Redis as a datastore and for transaction logic.


Selection criteria:

  • Must allow for a batch of operations to be queued
  • Must somehow report statistics about the state of play of a batch for reporting purposes
  • Must allow for management of failures
  • Must allow for distribution of tasks among multiple workers

Nice-to-haves:

  • Choice of implementation languages for workers
  • Retrying capabilities
  • Broader utility outside of ingestion use cases


Technology OptionWorker LanguageStrengthsWeaknessesOpportunitiesThreats
AirflowMany



RQPython



CustomMany



KafkaMany



ResqueRuby





Developers Experience / Interests

DevExpert AtGood AtFamiliar WithWants to Learn
AudreyHTML+CSS, Javascript for DOM manipulations, Ruby (in Ruby on Rails context)Object oriented Javascript, PHP (a little rusty), Ruby, SQLPython, JavaPython, Scala, Java
MarkUnix, Python(was pretty confident, now a little rusty), Javascript, PHP(formerly, doesn't like), HTML+CSS(a little rusty), Perl(rusty, been a while, is so over that)RubyC, JavaGo, more Python, Scala, Java, Natural Language Processing

Michael

Java, XML, Solr, HadoopScala, Ruby (mostly not Rails)

Python, Javascript, Perl, C, Objective-C, XSLT, Spark, NLP, Machine Learning, Elasticsearch, Redshift,Python, more Scala, Spark,
Scott