2014Apr30

From Filtered Push Wiki
Jump to: navigation, search


Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2014Apr30

Agenda

Non-Tech

  • SPNHC (April 25/May 5)
    • Abstract Preparation, one in, another do do.
  • InvertEBase
  • iDigBio - Report on Integration visit from Greg Riccardi, Apr 29.

Tech

  • Report from Thursday call
  • Driver
    • Report Maureen: Status of driver - current annotation processor integration.
    • Chuck: State of getting set up to work on annotation processor.
  • Nodes
    • Report Maureen: Status of ingests (taxon/occurrence) on FP2 and FP3
    • Report David: Morphbank integration status.
  • FP-DataEntry
    • Report Chuck: Duplicate detection integration into Yale data entry application
  • SCAN
    • Report David/Tianhong: Akka integration in FP2
    • Test of Akka workflow with SCAN data.
    • Query for harvested data and analysis results.
  • NEVP
    • Report David: Progress on updating deployment.
    • Akka Workflow for NEVP
  • Analysis
    • Tianhong: Progress on cleaning data with data.
    • Report Bob: Progress on Duplicate Finding data mining.
  • SemanticMediaWiki as FP Client, review of SMW use cases.
  • For Thursday:

Reports

  • Paul
    • Got SPNHC demo camp abstract revised and submitted.
    • Other SPNHC abstract still in progress
    • Meeting discussing annotations for iDigBio with Bob, David, and Greg Riccardi all day Tuesday.
    • Ed is good with us developing UI for agent authority file in Symbiota.
  • Chuck
    • Trying to script FP developer installation, but this might not be the right direction. (See email)
    • Better solution to the Aduna issue. (Blocked Aduna in settings.xml and reached out to third parties to get their stuff cleaned up.)
    • FP-DataEntry: arbitrary selectors: you can specify a selector engine of your choice with an anonymous function in config.xml
    • FP-DataEntry: pre-hook js: So, for example, you can emulate mouse-clicks to expand divs which are not rendered by default.
    • FP-DataEntry: screen shots for wiki.
    • FP-DataEntry: make bootstrap more robust.
  • Maureen
    • created a Solr index for SCAN occurrence data, installed it on fp1
    • created a Solr index for SCAN collector data, added some birth/death dates for ~20 collectors, installed on fp1

Notes

FilteredPush Team Meeting 2014 Apr 30 Present: Bertram, Tianhong, Bob, David, Chuck, Paul, Maureen. Non-Tech

  • SPNHC (April 25/May 5)
    • Abstract Preparation, one in, another to do.
  • InvertEBase

Paul: From Petra, request to redo budget at slightly higher funding level. InvertEBase grant will be funded at 75% of original request.

  • iDigBio - Report on Integration visit from Greg Riccardi, Apr 29.

Paul: Greg visited yesterday, talked about FP-iDigBio integration. Main use case was supporting data flow from NotesFromNature croudsourcing application. Was good that we had Chuck at the hackathon last fall, as we had allready done proof of concept of what he'd like. Discussed broad requirements for setting up a node to support this use case. Bob: Also had a generic discussion about a slide set he is developing about requirements for data aggregators. One of those requirements was tracking GUIDs that they have been given, minting GUIDs for data objects that they have been given that don't have GUIDs, and asserting relationships between GUIDs. Paul: One implication is that we need to make sure that the occurrenceId is present in the selector all of the annotations, when known. Bob: A report item: We have a system that is able to support broader requirements than the systems we are supporting with ApplePie.

  • NCE

Jim: First NCE was evaluated/approved at the campus level; additional requests must go to the program officer. We heard back this week that our second NCE request has been recommended for approval. It was held up by a final report (by someone) being outstanding. That report has since been submitted, and we should hear final decision in a couple weeks. Tech

  • Report from Thursday call

Maureen: Discussed Solr schema for Tianhong's data mining uses. Hammered out list of fields that need to be redacted for sensitive data. Determined data flow into and out of index, and placed responsibility for correctly handling the redaction on the component that is extracting data from the index. Bob: Very urgent to have discussion about redaction with the clients. Do they have different needs than we've assumed. What is the nature of the trust model (ours being trusted for nothing sensitive or trusted for everything sensitive). Not sure that we know what the security model in their heads is. Chuck: The clients don't know which things they are doing in cases where the redaction is happening or isn't happening - they don't have a model of the components.

  • Driver
    • Report Maureen: Status of driver - current annotation processor integration.

Maureen: Have updated the UI so that the widgets seem to do what they should do. Have a stub driver. Response annotation list (response history) looks a little funny, not sure what it is doing.

    • Chuck: State of getting set up to work on annotation processor.

Chuck: Mostly trying to get the build working, and looking at settings. Trying to understand the wiki documentation well enough to put together a script to carry out the deployment. Have an email discussing issues around this. Starting to see some issues, nature of messy field names in the UI that can be interacted with more effectively in DOM.

  • Nodes
    • Report Maureen: Status of ingests (taxon/occurrence) on FP2 and FP3

Maureen: David imported data into mongo for SCAN David: SCAN occurrence record data is on mongo on FP2. Tianhong should be able to access this data. Paul: Where is Akka on FP2? Tianhong: Deploy on FP1 or FP2? Currently looking at deploying on FP1, can't access MongoDB on FP1 (yes, can ssh into FP1). Can't access Mongodb from outside machine. Paul: Looks like Mongo on FP1 is bound to 127.0.0.1:27017 not *:27071 Maureen: Protected by local mongo configuration. Paul: By design. Bob: Best approach is to tunnel 27017 over ssh. Paul: Look at ssh -R Bob: I can send some samples. Paul: Can Tianhong/and or Akka on FP2 access the mongo data on FP2. Tianhong: Yes, can access the data. Paul: If we run an Akka workflow twice, does it overwrite the existing collection (where the collection is at the level of a record)? Tianhong: New copy with the same ID? Discussion: Put on friday exactly what the next test step is and where it runs.

    • Report David: Morphbank integration status.

David: Have new determinations and annotation view integrated with current morphbank. Showed to Greg yesterday. He will put us in contact with their deployment manager who handles the checkout into production from svn (and who we'll need to talk with about deploying the client helper).

  • FP-DataEntry
    • Report Chuck: Duplicate detection integration into Yale data entry application
  • SCAN
    • Report David/Tianhong: Akka integration in FP2
    • Test of Akka workflow with SCAN data.
    • Query for harvested data and analysis results.

David: Not yet integrated in Symbiota - want to test against live results.

  • NEVP
    • Report David: Progress on updating deployment.
    • Akka Workflow for NEVP
  • Analysis
    • Tianhong: Progress on cleaning data with data.
    • Report Bob: Progress on Duplicate Finding data mining.
  • SemanticMediaWiki as FP Client, review of SMW use cases.
  • For Thursday:
    • Chuck's email about documentation and scripting, what do we need to have in place for getting things up and running
    • figure out what the test needs to be for deciding whether or not to run Akka on fp2;