2012Aug01

From FilteredPush
Jump to: navigation, search


Etherpad for meeting notes: http://firuta.huh.harvard.edu:9000/FP-2012Aug01

Agenda

  • Status of AnnotationProccessor/Mapper
  • Status of Client Authentication
  • iDigBio VMs

Non-Tech

  • Collaborations
    • Kepler
    • Specify/Symbiota
    • SCAN TCN
    • NEVP TCN

Reports

  • Maureen:
    • worked on integrating Specify's user-configurable formatting settings into the Specify Driver
    • looked at Marcximil, an open source "bibliographic similarity analysis framework" written in Python; would be useful for both duplicate detection of various record types (not necessarily occurrences)-- tried it out on a botanist records in MARC authority format to get started.
  • Paul
    • Some minor work on Kuration and some testing with users.
  • David:
    • Created XSLT for Fuseki query results. Displays annotations in human readable form (replaces annotation tab code)
    • Implemented php client to fuseki for espressing interests as queries and applying xslt corresponding to that interest query to the results (for use in FPLite).
    • Worked on simple FP-Lite web interface to SPARQLpush, pubsubhubub and annotation generation. For use as a lightweight annotation system client.
    • Constructed a general sparql query for expressing canned interests by kvp (such as collectionCode=A). A sparql FILTER statement can be parameterized to find matches on these key value pairs.

Notes

Filtered Push Team Meeting 2012 Aug 1 Present: Lei, David, Paul, Maureen, Heather.

Agenda:

  • Status of AnnotationProccessor/Mapper

Maureen: Finished authentication part in mapper. Working on moving user configuration in specify into the driver.

Maureen: Looking at software for examination of duplicates by fuzzy matching in xml data (MARC records by default). Experimenting with HUH botanist data into MARC.

Paul: Maureen and Lei to look at possible integration as a Kepler workflow actor (with an eye to kepler as embedded analytical capability.

James Levenstein distance?

Maureen: Included in Marcximil. Also includes other text mining metric distances.

Lei: Working on web ui and how to make ui look more like web client. Looks like simply grabbing an open source web mail client and modifying it would be much more work than simply making the current UI have mail like behaviors. Also investgating, with David how to retrieve mesages/interests.

David: Messaging system. Working with LEI on message API issues. Core issue is finding only unread messages rather than all messages. Also disconnect between pull from client with push in messaging system. Looking at ways to integrate.

Lei: Retrieving new/unread messages - does unread equal not yet response annotated - doesn't seem exactly the same. Annotation processor manintaining state with a timestamp, messaging API currently doesn't support getting messages since this timestamp.

Maureen: Unread should be just a concern of the client.

Lei: Concur, the network doesn't keep track of users on the client side and what they have seen.

Maureen: Client could persist just the identifiers of messages, and retrieve details when desired.

Bob notes: Re unread: note gmail remarks as unread if anything in a \conversation/ is "new".

Bob: gmail client allows user full control over what is marked as read or not, and also supports user provided labels. Granulariy is that of a "thread" or "conversation". Most mail clients are similar to this and perhaps so are RSS clients. BTW, there seem to be a lot of RSS->SMTP gateway software around, so maybe RSS clients is a good start. There are even cloud-based free gateways around. Linux has a local such, but it needs access to an smtp service, which may be a firewall issue for many. (Maybe not a problem for iDigBio VMs though!)

Maureen: Kuali Rice workfow relevant.

TODO: Maureen to demonstrate next week. Maureen to discuss software components involved with Lei to see what would fit in annotation processor.

  • Status of Client Authentication

David: Messaging system. SPARQL push with authentication (xml dsig) working for FP Light, working on medium. Also set of PHP libraries for using xml dsignatures. Also have a command line utility for managing the keystore on the server side (adding client certificates, revoking authorization, etc).

  • iDigBio VMs

Nothing back yet.

Non-Tech

  • Collaborations
    • Kepler

Doing some user testing on Kuration for taxon names. Getting some good feedback and test data cases (particularly tautonyms, but also other trinomials).

TODO: Paul and James to talk about text on phone - 9:30 am Friday.

Paul: Question for Bertram will be what state he wants the Kuration code in for the proposal submission.

    • Specify/Symbiota

Paul: Nothing heard.

    • SCAN TCN

Meeting in a couple of weeks. Will need to do demonstration.

    • NEVP TCN

May be a need for terms beyond darwin core as we are using it now - infraspecific rank, storage location, look to taxonomic and curatorial extensions to darwin core. Bob: I'm studying Dave Thau's thesis for ETC meeting in Sept. He remarks that CleanTax eschews instances, e.g. specimens, so some care maybe necessary if you want tractable reasoning for taxonomy reconciliations. I'll pay attention to whether this is really an issue as I get into later chapters. I might be putting too much into it, but it won't surprise me if care is needed. Tractable versions of OWL, for example, do not allow classes to be instances.