Task List for the Prototype

From Filtered Push Wiki
Jump to: navigation, search

Improve the demo

Here is a list of problems or inadequacies that we have identified with the demo, along with their currently proposed remedies.

  • Filtered Push queries launched with the Specify client are too slow. They take a few seconds for each request to return results. Response time needs to be less than or about one second.

  • We have only one type of node on the network: Specify databases with the same schema. We need to have at least one different data source.
    • Paul has a data set from Brooklyn. It is a set of tab-delimited files representing their database. We should make this data set available to the prototype.
      • The prototype currently requires a JDBC connection. Someone will have to make the data into a data source and open the necessary ports to access it. The data source will need to be able to respond to a very specific sql query, all fields must be present in a table named 'object'. Currently those fields are: id, barcode, collectorNumber, collector, taxon, locality, latitude, longitude). We need a non-trivial implementation of the Mapper and DataProvider components to generalize the schema specific and JDBC specific implementation.
      • Each node in the demonstration network needs to have the configuration info for the new data source. Someone should put the new hadoop-site.xml into our Subversion repository, and the API jar also needs to be adjusted so that its hadoop-site.xml includes the new data source in its scope.

  • The Specify client GUI is a little rough around the edges and is "feature-poor." It needs to be cleaned up, and some features added. Maureen needs to fix the following:
    • The results display is too wide for 600 x 800 screens.
    • The results display contains empty space where a map icon used to be in a previous life.
    • There is currently no way for the user to configure which fields should be searched.
    • There is currently no way for the user to configure the mapping.

  • We can only demonstrate one use case at the moment. We can only find duplicates, which demonstrates "filtered." We should add a demonstration for another use case, preferably one demonstrating "push."
    • The Specify client needs a GUI to handle incoming annotations.
    • The Specify client might also need a GUI for managing "subscriptions."
    • Another client needs to implement injection of FP annotation messages into the message injection API. (Also need to free message injection API from Java class for message, and enable injection of XML message document, allowing non-java clients to API).

  • Our "find duplicates" query is very rudimentary. It detects duplicates with exact matches on fields. We should add a fuzzy matching algorithm in order to return a larger and more inclusive result set.
    • We need to give Zhimin specifications for the fuzzy matching algorithm.
    • Zhimin needs to add the fuzzy matching to the API.
    • We need to decide if any of the fuzziness needs client GUI display.
    • Maureen needs to implement fuzzy results client GUI specifications, if any.

  • It is difficult to install the demo on a new platform. We need a more streamlined build and installation process.
    • We need to sort out the configuration issues (multiple hadoop-site.xml files).
    • We need to tweak the ant build file for the API code to conform with the new configuration plan.
    • Maureen should figure out how to do cross-project builds so that she can get the results of the above step into a Specify distribution.
    • We should create a distributable hadoop.

  • It is difficult to run the demo. The demo requires a large number of ports to be open, which requires an unusual amount of intervention and trust from sysadmins in order to run the demo from within their network.