Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Discuss R&D projects.

  • Discuss allocation and make issue tracker tasks for prototype work.

  • Discuss milestones, deadlines.

  • Generate list of priorities for what we want our solution to achieve

Discussion items

TimeItemWhoNotes
Priorities (what problems are we solving?)All
    • Speed: speed is a feature. Predictably say how long some ingest will take.
    • Allowing recovery from failure; pick up where it left off.  Speed affects this; if it's fast enough you don't have to worry about it. Otherwise, make sure there's recovery. Harvesters should allow recovery, where possible. Indexers could also be less speedy than mappings and enrichments, and may deserve recovery features.
    • Adding automation that was originally specified: have a program that shepherds the process all the way through. Scheduling.
    • Eventually, provide a useable mapping DSL
      • Needs real market research
      • This is not a turnkey solution yet. Some things like DSLs will be evaluated later when we can more confident in understanding how big the user base is.
      • Writing mappings ourselves in the third system without a DSL will allow us to understand the problem space better.
    • Ability to debug things, especially mappings
Code examples

Michael et. al.

Got walkthroughs of the Python, Python + Spark, Java, and Scala prototypes

Scheduling systemAll

Scheduling / operation chaining / "Plans" in the Prov-O sense

  • Need metrics for what qualifies job failure. (Partly thought out)
  • Need to get together and assess our experiences running ingests.
  • If we automate things, we need to know how to define success.
  • Tools exist that can help with this.
  • Need to schedule a period after basic manual ingest running is figured out, but need to design for there being a scheduling facility.
Roadmapping all of thisAll

...