2013Aug14

From Filtered Push Wiki
Jump to: navigation, search


Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2013Aug14

Reminder: Change of meeting time effective Sept 4: (12-1 Eastern 9-10 Pacific).

Agenda

  • New Project Programmer
  • iDigBio server shutdown Saturday Aug 17th.
  • Request from Bertram to change meeting time (start at Noon Eastern/10 Pacific, effective Aug 21 rather than Sept 4).
    • Documentation updates
  • Kepler
    • Taxon name cleaning - Homonyms
  • NEVP TCN Support
    • UVN site for AnnotationProcessor deployment, visit (week of Aug 19th)
    • Status update (production deployment target date 2013 Aug 15).
    • Duplicate Finding Find_Duplicates. State of old code.
  • SCAN TCN Support.
    • Status update on deployments.
  • MCZbase Driver
    • Update on MCZbase test instance for driver development access:
  1. Can: Access database with SQL developer over ssh X tunnel.
  2. Can: SSH in to see/edit coldfusion code.
  3. Can: Access the specimen search page and run searches, see search results.
  4. Can: Have root on machine, can install tomcat if needed.
  5. Can't: Access to database from SQL developer on desktop (firewall issues).
  6. Can't: Need to create user account (through web form, gets to an error message).
  7. Can't: Don't know how yet: Start/stop coldfusion (need instructions on how).

Non-Tech

  • Kurator
  • Collaborations
    • Specify/Symbiota
  • Burndown.

For Future meetings

  • dwcFP and DarwinCore RDF guide - feedback.
  • Kepler
  • Prospective meetings, development targets.
  • Task Group for Applicability Statement on OA


Reports

  • Paul
    • Editing work on Kurator proposal. Soem proofing/ontology work on annotation paper with Bob.
    • Ran through sequence of old to new presentations on project with Maureen, David, Bob, and Chuck on Monday, also Jeff Holmes from the EOL Education group.

Notes

FilteredPush Team meeting 2013 Aug 14 Present: Maureen, David, Chuck, Paul, James, Jim, Tianhong, Patrick, Bob. Agenda:

  • New Project Programmer

Introducing [User:Chuck|Chuck].

  • iDigBio server shutdown Saturday Aug 17th.

Note from iDigBio, date scheduled for this Saturday.

  • Request from Bertram to change meeting time (start at Noon Eastern/9 Pacific, effective Aug 21 rather than Sept 4).

James: Conflict once a month, time works otherwise.

Jim H: Okay with time change, but will miss meetings on August 28th and September 4th. No other issues.

Paul: Will make arrangements to reschedule webex with iPlant and room.

  • Documentation updates

David: Created a developer quick start guide, eclipse, debugging, etc (linked off FP Wiki Home [DeveloperHowTo]. Ran through this with Chuck. Updates to deploying annotation processor in Tomcat.

Bob: How do we keep track of what we do to create some deployment, and what maintinance we need for this environment?

David: Issue for current documentation is for both general deployment and development deployment (entangling things that developers are interested in, e.g. debugging with things that involve production deployments). Deployments are being done by checkout to server then command line invocations to build/configure/deploy to container.

Bob: Will construct email to be sent out asking for advice, then send round to team for comments.

David: Also working on refactoring and documenting new configuration (three paralell domain configuration xml documents).

Bob: All builds with Maven?

David: Correct (except for Kepler, built as Jar with ant, that jar included with Maven in a deployment artifact - sometimes issues involved in builds of Kepler jar (ant builds of jar don't end up putting all jars needed on classpath in development builds)).

  • Kepler
    • Taxon name cleaning - Homonyms

Tianhong:

  1. Can handle most of the homonyms cases, for some specially cases with no possible regular pattern, plan to say “we don’t know how to deal with”
  2. Noticed some homonyms has been resolved, when use global name resolver for misspelling check, it also returns current name, could use that. But when ask for an ambiguous name, it could return both of the names indistinguishable, trying to deal with that.
  3. Exchanging email with GBIF programmer Markus, he recommended we use GBIF backbone as very first stop, it’s comprehensive cross reference checklist, is that a good idea? I’m trying to get familiar with backbone checklist.

Paul: Good. James: Do we understand the GBIF backbone taxonomy well enough to use it?

James: We need to know what the've done already, have they already interacted with GNI/GNUB?

Paul: Good question for Markus, how the bakbone taxonomy is created.

Tianhong: backbone page: http://www.gbif.org/informatics/name-services/using-names-data/taxonomic-backbone/

Paul: Let's ask Markus if there is a techinical description of what is happening in the diagram on this page.

  • NEVP TCN Support
    • UVN site for AnnotationProcessor deployment, visit (week of Aug 19th)

Maureen: Got some details on their Specify setup, they've answered some questions, and provided us with a copy of their

Paul: Will you need any xml files?

Maureen: Will ask them.

Paul: Date set?

Maureen: Not yet.

Bob: Patrick and I have agreement on what the AudubonCore that he refers to. What is the FilteredPush or NEVP entanglement?

Patrick: In relation to FilteredPush, image xml document isn't being consumed, is metadata that consumers (herbaria) might be interested in at some point. Document is being sent to iPlant, and they will be ingesting some subset of the metadata elements included in this document. They will be taking some metadata elements about

Paul: Ingesting into Symbiota?

Patrick: Could be used to construct URIs in Symbiota that link to the specimen image in iPlant's infrastructure, based on the image filename. Can retrieve iPlant ID for the image (to make the links), based on the filename.

Paul: Consider adding the filename element to the specimen record file.

Bob: Constructing URIs for access point based on local meaning of filename. When shipping someplace interpretation changes for that filename.

Bob:There's a use case for serving the AudubonCore document itself as a description (from iPlant and Symbiota), only needing to fix identifier - thus all need to go into this with eyes wide open, foreclosing some kinds of utility for which audubon core is designed. AudubonCore not designed as exchange format, but probably harmelss to do so. Clients should be able to evaluate fitness for purpose without fetching the image (primary design principle for AudubonCore).

Paul: Let's discuss including filename in occurrence document and rewriting the image document after the iPlant GUIDs are available.

(Maureen: side note -- METS http://www.loc.gov/standards/mets/ is "Metadata Encoding and Transmission Standard", an xml schema in the library world used for moving metadata from producer to consumer)

    • Status update (production deployment target date 2013 Aug 15).

David: FP3 - node, access point, messaging, knowlege, annotation processor, fedora, fuseki, mongodb, mulgara.

Maureen: Harvesting not deployed yet. However, I've harvested all the taxon data into n3 files on my workstation, so we can do something about getting that data loaded into a triplestore. Have split into several pieces, due to several hundred thousand records.

Paul: Provenance of NEVP data in CNH portal.

Patrick: Aggregating data sets from several institutions, data has come to portal through different routes - largely flat files fitting flat DarwinCore, imported through Symbiota ingest tool, no attempts made yet to correct issues. Data ingested once, in one case have a direct connection to Acadia to do refresh of snapshots.

Paul: Data a rich source for running quality control tools on.

Paul: Possibly harvest data from collections into Mongo, then from mongo into symbiota?

Maureen: Could do, but would need a driver for Symbiota.

Paul: Other case for Symbiota driver is SCAN where institutions are using Symbiota as a primary database.

Bob: What happens when some institutions don't wnat to share any unredacted locality data?

Maureen: Data provider could provide a flag for sensitivity.

Patrick: Affecting duplicate finding, probably not many people will want to not share undredacted locality data for duplicate finding in the network.

Bob: May be simply adequate to include a reliability specification in the duplicate finding results.

James: Agree with Bob, this is a good requiremnet. The locality data is only one (and not the primary way) to detect duplicates. Likely information still has value.

Discussion: Need a term to describe occurrence records for which information is provided but should not be shared with others.

Bob: Zhimin's code is adequately factored so that the algorithms should be reusable.

Paul: Done?

Bob: Still need to write up documentation, might need a couple hours of David's time to dig further.

  • SCAN TCN Support.
    • Status update on deployments.

David: FP2, same state as FP3. Resource/perfomance issues on FP2, has less ram, asking Alex to match FP3.

Maureen: Harvests not done yet. Ready to start on it.

  • MCZbase Driver
    • Update on MCZbase test instance for driver development access:
  1. Can: Access database with SQL developer over ssh X tunnel.
  2. Can: SSH in to see/edit coldfusion code.
  3. Can: Access the specimen search page and run searches, see search results.
  4. Can: Have root on machine, can install tomcat if needed.
  5. Can: Access to database from SQL developer on desktop (firewall issues).
  6. Can't: Need to create user account (through web form, gets to an error message), appears to be same issue with searches over collection objects, throws a stack trace. Working with brendan on this, appears to be not all permissions having been copied over correctly.
  7. Can: Don't know how yet: Start/stop coldfusion (need instructions on how).

Paul: Also a request out for Working group sessions. Bob has one in for AudubonCore WG, Paul will put in a parallel one for Annotation WG mentioning OA.

Non-Tech

  • Kurator

Submitted.

  • Collaborations
    • Specify/Symbiota
  • Burndown.