2009Aug27

From Filtered Push Wiki
Jump to: navigation, search


2009Sep17


User: Bob Morris | User: Zhimin Wang | User: Maureen Kelly | User: Paul J. Morris | User:
"User:" cannot be used as a page name in this wiki.
| User:
"User:" cannot be used as a page name in this wiki.
| User:
"User:" cannot be used as a page name in this wiki.
| User:
"User:" cannot be used as a page name in this wiki.
| User:
"User:" cannot be used as a page name in this wiki.
| User:
"User:" cannot be used as a page name in this wiki.
| User:
"User:" cannot be used as a page name in this wiki.



Agenda

  • Admin
    • annual report
    • UMB extension budget
  • Technical
    • SMW; finish UMB extension budget
    • IPT
    • AGU
    • MTree report from Zhimin

Bob reports that he has a better understanding of how ontologies are constructed, and Semantic Media Wiki: has an emphasis on categories as classes. What is still unclear is how to introduce new categories. How can we represent "is duplicate of?" He will start on that task next week.

Donna is going to install SMW on her own machine. Hopefully if all goes well she will have an example of the new skin up and working. She needs to work with Anne Marie on firewall issues.

Bob and James and Paul have been in contact with the EDIT and know that we have to track their following work.

Maureen has nothing to report. She's been working on Specify migration things. She has a lot of mocked-up user interface behaviors to implement. She should have a few done by next week.

Target deadline of October 15th for David's project of FP-Specify-on-a-stick. It would be good to be able to do a live demo at TDWG, especially with hosts on different computers.

  • One problem is that hadoop needs lots of ports + ssh for each node.
    • Paul suggests a solution involving creating our own network on a wireless access point.
    • Who is in charge of setting up the network infrastructure? Paul says it will be easy.
    • Bob will research hardware requirements, and we will have Anne Marie pick it up.
  • Another problem is that hadoop is sloooooow.
    • Someone will have to solve the the classloading overhead problem. We'll have to ask Zhimin if he has a good way to address this. Classloading might not be the problem so much as being able to attach to a currently running JVM.
      • Zhimin reports that we're using Cloudbase now and that performance is still slow but not as slow.
      • Can we find another map-reduce implementation that is better for repeated requests? Zhimin will take a look.
      • If not, can we do some kind of wrapping (something like Tomcat) that keeps a JVM open and classes loaded that intercepts the hadoop requests? There is some point at which the JVM is invoked, and we need to find it and change it.
  • We should also consider the case in which we have only one of us available, and we want to be able to demonstrate multiple hosts.
    • Multiple sticks on one machine? Would need to have each slightly differently configured, because each would need its own ports, unless only one is a master node. In that case they only the master would need its own separate configuration.

IPT: Paul has been working on how to integerate FP with IPT, but has nothing to demonstrate yet.

AGU: Paul's created the framework for abstract, just needs some text. Due date is September 3. Meeting is in December.

Zhimin's M-Tree report: He's got the C++ code to run and has customized the implementation for our string distance algorithm. The software has some problems, the debugger doesn't run well on it.

We need to have a technical discussion about performance factors. Which parts are hadoop, which parts are indexing, which parts are the fuzzy match metric. We need to make sure we separate fast messaging from hadoop.

Agenda for 2009Sep03

Next week is a technical problems meeting. Everyone invited, but not required.

  • Admin
    • annual report
    • UMB extension budget
  • Donna
    • Bob would like to see a template "NextWeek" that calculates the date for the next week's meeting.
  • Bob will formulate a question regarding discovering ontological properties of data sets using simple data mining. What kind of properties can be deduced, given many examples of some general concept of a "record?" (Paul has some PHP code that queries the Flickr API that may be useful in the investigation.) This is about reducing the barriers to institutions becoming FP nodes: automatic ontology creation for institution-specific versions of the general concept of the FP specimen record.