FP-Lite deployment for SCAN

From FilteredPush
Jump to: navigation, search
Overview diagram of annotation generation by a client of a light weight annotation system using the client helper tools.


Prerequisites

The following instructions were tested on a clean install of Debian 6.0.5 "squeeze" with the following packages installed vi apt-get install (also serves as a list of prerequisites)

  • General Deployment:
    • sun-java6-jdk
    • tomcat6
    • apache2
    • libapache2-mod-php5
    • php5-cli
    • php5-common
    • php5-cgi
    • mysql-server
    • mysql-client
  • General PHP Extensions:
    • php5-mysql (for filteredpush branch of symbiota and SparqlPuSH)
    • php5-curl (for FP-PHP-Library)
    • php5-xsl
  • Development/deployment utils:
    • ant
    • subversion
    • git
    • maven2
  • SparqlPuSH and pubsubhubbub (mostly for pubsubhubbub, especially python):
    • libcurl4-openssl-dev
    • libsqlite3-dev
    • libssl-dev
    • python
    • python-django
    • sqlite3
    • ssl-cert

Fuseki

Additional documentation can be found at: http://jena.apache.org/documentation/serving_data/index.html

1) Download and extract the latest version of the Jena Fuseki sparql endpoint from http://www.apache.org/dist/jena/binaries/ (look for the latest version of jena-fuseki-0.x.x-distribution.tar.gz). Place the Fuseki directory wherever you would like it installed (i.e. /usr/share/fuseki/jena-fuseki-0.2.5):

tar -xzvf jena-fuseki-0.2.5-distribution.tar.gz
sudo mkdir /usr/share/fuseki
sudo mv jena-fuseki-0.2.5 /usr/share/fuseki

2) Download the example assembler config (using TDB as the triple store) from the sourceforge (http://sourceforge.net/projects/filteredpush/files/Release_1/misc/tdb-assembler.ttl) and place it in the Fuseki directory you just extracted to:

wget http://sourceforge.net/projects/filteredpush/files/Release_1/misc/tdb-assembler.ttl
sudo cp tdb-assembler.ttl /usr/share/fuseki/jena-fuseki-0.2.5/

3) Download the example fuseki startup script from sourceforge (http://sourceforge.net/projects/filteredpush/files/Release_1/misc/fuseki) and, if necessary, edit the file so that FUSEKI_HOME is set to the directory you installed fuseki to. Also make it executable via chmod and use update-rc.d to make it a startup script.

wget http://sourceforge.net/projects/filteredpush/files/Release_1/misc/fuseki
sudo cp fuseki /etc/init.d/
sudo chmod 755 /etc/init.d/fuseki
sudo update-rc.d fuseki defaults
  • NOTE: On Debian I reveived the following warning: insserv: warning: script 'fuseki' missing insserv: warning: script 'fuseki' missing LSB tags and overrides

and overrides. If you want to add the LSB tags that the warning refers to you can find more info here http://wiki.debian.org/LSBInitScripts/

4) Start fuseki and visit http://localhost:3030/ in your browser:

/etc/init.d/fuseki start
  • Additional info on using Fuseki: If you need to execute SPARQL queries on Fuseki manaully you can do so from the web interface. First select the Control Panel link on the main page (under Server Management) and select the "/AnnotationStore" dataset (the dataset is supplied as an argument when the startup script in /etc/init.d starts fuseki and the files are stored in the directory configured in the tdb-assembler.ttl file in FUSEKI_HOME). On the page that follows, you can launch sparql queries, updates or upload rdf/xml. Some useful queries are listed below.
  • in the query textarea:
    • 1) SELECT * {?s ?p ?o} (select all triples currently in the default un-named graph)
    • 2) SELECT * WHERE { GRAPH ?g {?s ?p ?o} } LIMIT 10 (select all triples in any named graph besides those in the default graph)
  • in the update textarea:
    • 1) CLEAR ALL (clear all triples from the triplestore)
    • 2) CLEAR GRAPH <name of graph here> (clear all triples in any given named graph)
    • 3) LOAD <http://yourhost/data/somefile.rdf> (load all the triples from the rdf file hosted at the address specified)

5) Additional useful queries will be presented later on in the documentation.

  • IMPORTANT NOTE about formatting of query results: For the queries, selecting XML from the Output dropdown on the form will probably give the most human readable results. Additionally, you may specify an xslt file for styling the xml result for browser viewing (a value of /xml-to-html.xsl is supplied by default, this file is located in your FUSEKI_HOME/pages directory). This functionality is relevant to the annotation viewer available as part of the php libraries for use with FP clients (see [PHP_Client_Libraries_and_FPConfig]).

6) It may also be useful to enable logging to a file in the log4j.properties for fuseki. If you wish to do this now, uncomment the following lines in the log4j.properties file (located in your FUSEKI_HOME):

log4j.rootLogger=INFO, FusekiFileLog
...
log4j.appender.FusekiFileLog=org.apache.log4j.DailyRollingFileAppender
log4j.appender.FusekiFileLog.DatePattern='.'yyyy-MM-dd
log4j.appender.FusekiFileLog.File=fuseki-log
log4j.appender.FusekiFileLog.layout=org.apache.log4j.PatternLayout
log4j.appender.FusekiFileLog.layout.ConversionPattern=%d{HH:mm:ss} %-5p %-20c{1} :: %m%n

Also, I prefer to put the log file in a sub directory "logs" in FUSEKI_HOME: mkdir logs and change the value of log4j.appender.FusekiFileLog.File to logs/fuseki-log

Pubsubhubbub

Additional information hosted at: https://code.google.com/p/pubsubhubbub/

1) Fetch and install Google App Engine SDK. The pubsubhubbub sample hub runs on the Google App Engine framework. Download Google App Engine SDK for Python: http://code.google.com/appengine/downloads.html and unpack it in ~/google.

mkdir google
cd google
wget http://googleappengine.googlecode.com/files/google_appengine_1.7.1.zip
unzip google_appengine_1.7.1.zip

2) Checkout the application from the pubsubhubbub google code page:

svn checkout http://pubsubhubbub.googlecode.com/svn/trunk/ pubsubhubbub

3) From the root of your google app engine directory (i.e. ~/google/google_appengine) run the commmand listed below to start the hub. You may also receive the following prompt the first time you start google app engine. I usually reply with Y: Allow dev_appserver to check for updates on startup? (Y/n): Y

Also note that we are starting the hub with argument --address=localhost, the hub will not be accessible outside of localhost so be sure that you install SparqlPuSH (in the next section) on the same host as the hub.

./dev_appserver.py ~/google/pubsubhubbub/hub/ --port=8000 --debug --address=localhost

4) Type http://localhost:8000 in your browser's address bar and you should see a welcome page.

  • NOTE: The following warning upon first stating the hub can be ignored: WARNING 2012-09-07 17:18:08,900 datastore_file_stub.py:518] Could not read datastore data from /tmp/dev_appserver.datastore

The hub/googleappengine will store information about registered feeds and the endpoints to notify in a file at the following location by default: /tmp/dev_appserver.datastore. If you delete this file, the hub will recreate it upon server restart. You can use this to reset the hub and clear all data.

SparqlPuSH

NOTE: Older (but perhaps more detailed) documentation of SparqlPuSH for development purposes can be found at SparqlPuSH Install for reference purposes.

Checkout projects from svn:

svn co https://filteredpush.svn.sourceforge.net/svnroot/filteredpush/FP-SPARQLPuSH/spqlpsh-server
svn co https://filteredpush.svn.sourceforge.net/svnroot/filteredpush/FP-SPARQLPuSH/spqlpsh-client


spqlpsh-server

1) Put the spqlpsh-server PHP includes on your php include path (i.e. /usr/share/php5/). These are simplepie.inc for parsing rss feeds and xmlseclibs.php for xml digital signature authentication (annotation rdf/xml is digitally signed):

cp spqlpsh-server/lib/includes/* /usr/share/php/

2) Edit build.properties to configure the deployment. Replace /var/www in the server.home property value with your document root, replace sparql.endpoint with the fuseki url or other endpoint url, replace the value for pubsubhubbub.hub with the hub host. The urls for sparql.endpoint and pubsubhubbub.hub must end in a trailing slash (see the example build.properties provided) however server.home should not. The properties prefixed with db are specific to the ARC2 triplestore. You only need to configure these if you are using ARC2 instead of fuseki. Otherwise you can leave the defaults.

3) Each client that should be authorized to load annotation rdf/xml into the triplestore via the hub needs to supply a generated certificate to the adminstrator of SparqlPuSH. These certificates should be a .pem file that contains the public key and clients should sign outgoing rdf/xml with the corresponding private key (the PHP libraries for fp contain a class that a client can use for doing this: fp/common/XmlSign.php). Generate a certificate via the following commmand:

openssl req -x509 -nodes -newkey rsa:2048 -out newcert.pem -outform PEM -days 1825

4) The above will generate two files: privkey.pem (which should reside with the client and be placed somewhere outside the directory root of the server) and newcert.pem (contains public key, a copy of this should be stored on the same server as SparqlPuSH somewhere the application can access it)

5) Configure SparqlPuSH with a list of clients who are authorized to load data into the triple store by editing the certs.txt (in the spqlpsh-server project) file and adding an alias (such as symbiota) paired with the path to the pem file that contains the client's public key.

6) Run the ant build script to deploy and create directories that are prerequisite to deploying the spqlpsh-client project:

cd spqlpsh-server
vi build.properties (make changes or use defaults)
vi certs.txt (add alias,certificate pem file pairs for authorized clients)
ant deploy


spqlpsh-client

1) In the build.properties file in this project directory, replace /var/www in the client.home property with your apache document root and run the ant build script to deploy:

cd spqlpsh-client
vi build.properties
ant deploy

2) Restart apache after installing client and server:

/etc/init.d/apache2 restart


Annotation Generation Webservice

More comprehensive coverage of the Annotation Generator can be found at: Annotation_Generator

The Annotation Generator is the library (webservice when deployed to tomcat) that generates annotation rdf/xml from json or xml serializations of a data model object representation. The result and model classes can be configured using an xml descriptor that maps to an owl ontology and Java classes (which can be generated from the descriptor xml). The FP-AnnotationGenerator project contains a default configuration for tomcat deployment via the following steps:

checkout the project from svn:

svn co https://filteredpush.svn.sourceforge.net/svnroot/filteredpush/FP-AnnotationGenerator/trunk/ FP-AnnotationGenerator

1) First deploy the default descriptor configuration files to the directory of your choosing (i.e. /etc/filteredpush/descriptors) and edit the descriptors.dir property in the src/main/resources/generator.properties file to point to that directory. The configuration defaults are found in the configuration directory at the root level of the project. Copy all the files from that directory into the descriptors directory as configured.

sudo mkdir -p /etc/filteredpush/descriptors
sudo cp FP-AnnotationGenerator/configuration/*.xml /etc/filteredpush/descriptors


2) The next step is to build the web service via maven:

cd FP-AnnotationGenerator
mvn install

3) At this point you should have a FP-AnnotationGenerator-0.0.1-SNAPSHOT.war file in your target directory which can either be deployed directly to the tomcat webapps directory or if you are planning on an FP-Medium deployment this file will be packaged in the deployment ear file automatically and deployed on GlassFish. Continue to the next steps only if you are deploying to tomcat.

  • NOTE: I received the following error on Debian but not Ubuntu:
[java] [ERROR] COMPILATION ERROR : 
[java] [INFO] -------------------------------------------------------------
[java] [ERROR] Unable to locate the Javac Compiler in:
[java]   /usr/lib/jvm/java-6-openjdk/jre/../lib/tools.jar
[java] Please ensure you are using JDK 1.4 or above and
[java] not a JRE (the com.sun.tools.javac.Main class is required).
[java] In most cases you can change the location of your Java
[java] installation by setting the JAVA_HOME environment variable.

If you encounter this error, check your JAVA_HOME environment variable (also make sure it points to a jdk and not a jre). On debian I had to set it to point to the sun jdk:

echo $JAVA_HOME
export JAVA_HOME=/usr/lib/jvm/java-6-sun/

4) If you deployed to Tomcat, restart the server via /etc/init.d/tomcat6 restart

6) You should now be able to test the Annotation Generator by visiting http://localhost:8080/FP-AnnotationGenerator/rest/generate. This page will display a list of the currently configured generator instances (corresponding to each of the descriptor files that we deployed earlier). A particular instance of the generator can be accessed by name at its url (of the form http://localhost:8080/FP-AnnotationGenerator/rest/generate/{name}

FilteredPush modified Symbiota

for testing/demonstration purposes (Depends on having installed and configured Fuseki and having includes from Sparqlpush on the php include path):

1) Login to mysql and create the symbiotafp database:

create database symbiotafp;

2) Obtain a dump of the fp symbiota sample data and import it (this step can be skipped if you are using your own data)

wget http://sourceforge.net/projects/filteredpush/files/Release_1/misc/symbiota-fp-2012-08-27.sql
mysql -uroot -p -D symbiotafp < symbiota-fp-2012-08-27.sql

3) Obtain the filteredpush symbiota branch from our sourceforge via git:

mkdir ~/git
git clone git://filteredpush.git.sourceforge.net/gitroot/filteredpush/symbiota
cd symbiota
git checkout filteredpush

4) In the symbiota checkout directory edit the build.properties file and replace db.name with your database name (such as symbiotafp if you followed the steps above), db.user and db.pass with the mysql username and password of a user with read and write privileges to the symbiota database. Also, set the deploy.dir to your apache2 server document root and fuseki.dir to the directory you extracted fuseki to. Make sure that you deploy as a user who has privileges to write to the document root directory (I added my linux user to the group www-data) After having done this, run the ant command as this user:

(as su) adduser lowery www-data

(logout and login again as user)
cd ~/git/symbiota
vi build.properties
ant

5) If you initialized the database with the specifyfp database dump in the instructions above you can now login as user "mkelly" with password "password".

  • NOTE: The filteredpush branch contains edits to the forms which include: a "Submit determination to Filtered Push network" checkbox on the add determination form (name=fpsubmit) and to the form processing to create annotation rdf/xml from a new determination and inject into the network.
  • Summary of major changes:
    • 1) collections/editor/occurrenceeditor.php (line 129 for processing the checkbox value)
    • 2) collections/editor/includes/determinationtab.php (line 66 the checkbox on the form for fp and lines directly above it add some data to the array passed in to OccurrenceEditorDetermination)
    • 3) classes/OccurrenceEditorDeterminations.php (starting on line 58 through 105 the annotation rdf/xml is created and injected into the currently configured network)
    • 4) added classes/fp (these php files are exactly the same as those in the php libraries that clients can use to interact with Filtered Push clients, more about them later on)


PHP Client Libraries and FPConfig

Supporting documentation will be posted at Using_the_FP_PHP_Library

The FilteredPush libraries for php clients can be checked out from the sourceforge svn:

svn co https://filteredpush.svn.sourceforge.net/svnroot/filteredpush/FP-Tools/FP-PHP-Library/ FP-PHP-Library

Once checked out we can deploy the fp directory in this project to /usr/share/php. Clients (such as symbiota) will use these libraries when interacting with both FP-Medium and FP-Lite:

cd FP-PHP-Library
cp -r fp /usr/share/php

Clients must be configured to use the network components deployed in the steps above. To do this for the symbiota filteredpush branch edit classes/fp/FPConfig.php. A summary of the configuration options can be found below:

  • RDFHANDLER_ENDPOINT - the url for the annotation webservice, used for creating new identification annotation rdf/xml
  • FPNODE_ENDPOINT - this is the AccessPoint SOAP webservice as part of FP-Medium (for an FP-Lite deployment the default can be used for now)
  • SPARQLPUSH_SERVER - sparqlpush server uri
  • SPARQLPUSH_CLIENT - sparqlpush client uri
  • DS - the dataset that fuseki was started with in the startup script
  • SPARQL_ENDPOINT - the uri to the fuseki endpoint
  • RESULT_XSLT - the xsl for styling query results and the annotations shown on the Annotations tab in Symbiota (on the Occurrence Record form).
  • X509_CERTIFICATE - the pem file (newcert.pem from the example above) that contains the public key for this client
  • PRIVATE_KEY - the pem file used by this client (privkey.pem) for signing the rdf/xml
  • NETWORK_FACADE - Current network implementation to use (see classes/fp/facades), choices are FPLiteFacade and FPMediumFacade

Once you have everything deployed and configured you should be able to submit a new determination in Symbiota and it should now show up in the Annotations tab (on the Occurrence Record form in Symbiota)

For older instructions on setting up the node for FPMedium see http://etaxonomy.org/mw/Build_and_run_FP_in_Eclipse#FP-Node (updated instructions soon) Prerequisites for FP-Medium are that the glassfish server be installed and configured as described in the linked documentation.

Testing

Navigate to the Symbiota install from the FilteredPush git branch to test adding a new determination as a FilteredPush annotation. Log in as "mkelly" with password "password" (if you are using the fp sample data) and search for a specimen by clicking the "Collections" link on the left. Select the pencil icon next to a specimen to edit the occurrence record.

If you have the fp sample data in mysql you can also use the following link to quickly access an occurrence record I have been using for testing:

http://localhost/collections/editor/occurrenceeditor.php?occid=6418

Once on the Occurrence Editor page, view the Determination History tab and click the green plus symbol to add a new determination. On the add new determination form make sure that Submit determination to Filtered Push network is checked and enter information. Once you click submit you should receive the " Action Status: Determination added successfully and submitted to Filtered Push! " message at the top of the screen if the annotation was submitted successfully.

On the same occurrence editor page, click the Annotations tab. This will query the Fuseki sparql endpoint and display all annotations currently submitted to the Filtered Push network. This includes annotations submitted by other clients as well (the query will return all specimens with the same collection code and catalog number).

Another similar test can be performed in Fuseki (if the annotations tab is not working properly and you need to debug or if you are not using the fp Symbiota branch). Instead of clicking the Annotations tab after submitting an annotation open another browser window and navigate to the Fuseki control panel (i.e. http://localhost:3030/). Under "Server Management" click Control Panel and select the dataset (it should be AnnotationStore if the default Fuseki setup was used). Select "XML" from the Output dropdown under the query textarea and enter SELECT * {?s ?p ?o} as the query. Clicking "Get Results" at this point will display the results of all annotations currently in the triplestore in an html table.

Additionally, if you want query result styling similar to the Annotation tab you can deploy an xsl stylesheet to Fuseki. You can download an example xsl file (filteredpush.xsl) from our sourceforge and deploy it to Fuseki's pages directory:

cd /home/lowery/jena-fuseki-0.2.4/pages/
wget http://sourceforge.net/projects/filteredpush/files/Release_1/misc/filteredpush.xsl 

On the Fuseki query page, replace "/xml-to-html.xsl" with "/filteredpush.xsl" in the XSLT style sheet field, make sure that "XML" is selected as the value of the Output field and enter the following query:

PREFIX aod: <http://etaxonomy.org/ontologies/ao/aod.owl#> 
PREFIX pav: <http://purl.org/pav/> 
PREFIX dwcFP:<http://etaxonomy.org/ontologies/ao/dwcFP.owl#> 
PREFIX ao: <http://purl.org/ao/> 
PREFIX bom: <http://www.ifi.uzh.ch/ddis/evoont/2001/11/bom#> 
PREFIX marl: <http://purl.org/marl/ns/> 
PREFIX foaf: <http://xmlns.com/foaf/0.1/> 
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
	
SELECT ?uri ?date ?annotation ?describesObject ?createdOn ?createdBy ?identifiedBy ?dateIdentified ?scientificName ?scientificNameAuthorship ?opinionText ?polarity ?evidence ?motivation 
WHERE { 
  ?annotation ao:annotatesResource ?uri .
  ?annotation pav:createdOn ?date .
  ?subject dwcFP:collectionCode ?collectionCode .
  ?subject dwcFP:catalogNumber ?catalogNumber .
  { ?annotation ao:annotatesResource ?subject } UNION
  { ?issue ao:annotatesResource ?subject .
    ?annotation ao:annotatesResource ?issue 
  } .
  ?annotation ao:hasTopic ?topic .
  ?annotation pav:createdBy ?annotator . 
  OPTIONAL  { ?annotation bom:hasResolution ?resolution } . 	
  OPTIONAL  {
    ?topic marl:describesObject ?describesObject .
    ?topic marl:opinionText ?opinionText .
    ?topic marl:hasPolarity ?thePolarity .
    ?thePolarity rdf:type ?polarity } . 
  OPTIONAL {
    ?topic dwcFP:identifiedBy ?identifiedBy .
    ?topic dwcFP:dateIdentified ?dateIdentified .
    ?topic dwcFP:scientificName ?scientificName } . 
  OPTIONAL { ?topic dwcFP:scientificNameAuthorship ?scientificNameAuthorship } .
    ?annotator foaf:name ?createdBy .
    ?annotation pav:createdOn ?createdOn . 
  OPTIONAL {?annotation aod:hasEvidence ?theEvidence . 
    ?theEvidence aod:asText ?evidence } . 
  OPTIONAL {
    ?annotation aod:hasMotivation ?theMotivation . 
    ?theMotivation aod:asText ?motivation
  }
}

When you click "Get Results" this time you should be presented with a nicely formatted view of all the annotations in the system. You can create your own stylesheets and deploy them in this way if you wish to define custom queries and associated styling (view source in a browser window to see the underlying xml).

Securing Fuseki

The deployment and configuration steps presented in the previous sections do not access restrictions on the Fuseki triplestore. Both the update and query endpoints are exposed to external clients. If you want to restrict access to fuseki from outside of localhost you can configure Fuseki (which is running in a bootstrapped Jetty servlet container) to only accept connections from localhost on a specified port. Provided that SparqlPuSH is installed on the same host, only SparqlPuSH will have access to the triplestore. External clients must be authenticated by SparqlPuSH before the updates are made to the triplestore.

If you would like to open access to the Fuseki query endpoint (i.e. http://localhost:3030/AnnotationStore/query) but not upload or updates you can configure mod_proxy in apache to allow access to the endpoints selectively. The steps for this are detailed below:

1) Obtain the jetty config for Fuseki from our sourceforge and put this in the FUSEKI_HOME directory (i.e. /home/shared/jena-fuseki-0.2.4/jetty.xml). Make edits if necessary but the defaults should restrict access to localhost on port 3030:

cd /home/shared/jena-fuseki-0.2.4/
wget http://sourceforge.net/projects/filteredpush/files/Release_1/misc/jetty.xml

NOTE: more info about the jetty configuration can be found at: http://wiki.eclipse.org/Jetty/Reference/jetty.xml

2) Stop the Fuseki server if it is running and edit the startup script (/etc/init.d/fuseki). Add the --jetty-config=jetty.xml option to the invocation of fuseki-server (should look like ./fuseki-server --jetty-config=jetty.xml --desc=tdb-assembler.ttl --update /AnnotationStore 2>&1 &

Restart fuseki and confirm that external access is not possible (http://hostname:3030).

3) If you only wish for applications running on localhost to invoke the fuseki (via query, upload, update) then you can stop here. Otherwise if you would like to expose only certain endpoints while restricting others (for example you want to allow queries from the outside but no changes/additions to data) you can follow the rest of the configuration steps.

NOTE: The following steps very closely resemble those presented in this documentation: [[1]]

4) First enable mod_proxy in apache via the following:

sudo a2enmod proxy_http

5) This should have created a proxy.conf file in /etc/apache2/mods-enabled. This file contains some default configuration, edit it and replace everything within <IfModule>...</IfModule> with the following:

# Turn off support for true Proxy behaviour as we are acting as 
# a transparent proxy
ProxyRequests Off
 
# Turn off VIA header as we know where the requests are proxied
ProxyVia Off
 
# Turn on Host header preservation so that the servlet container
# can write links with the correct host and rewriting can be avoided.
ProxyPreserveHost On
 
 
# Set the permissions for the proxy
<Proxy *>
  AddDefaultCharset off
  Order deny,allow
  Allow from all
</Proxy>
 
# Turn on Proxy status reporting at /status
# This should be better protected than: Allow from all
ProxyStatus On
<Location /status>
  SetHandler server-status
  Order Deny,Allow
  Allow from all
</Location>

6) Next, add the following line to the default site in /etc/apache2/sites-available and make sure it is enabled (via a2ensite default):

ProxyPass /fuseki/AnnotationStore/query http://127.0.0.1:3030/AnnotationStore/query (if your dataset is not named AnnotationStore, replace this in both urls)

7) Restart apache and check that outside access to the query interface is enabled (http://hostname/fuseki/AnnotationStore/query should give an error from fuseki since no query is specified in the request. If you get an error from apache instead this means that it is still not accessible)

NOTE: With outside access no longer possible you may not be able to get at the Fuseki control panel to test queries and updates. You could run the ruby scripts (located in your FUSEKI_HOME) provided by the Jena Fuseki project on the localhost instead:

see http://jena.apache.org/documentation/serving_data/index.html#script-control for how to use the scripts. also see http://jena.apache.org/documentation/serving_data/soh.html for more info.

Integration

The following sections provide information about how to integrate filteredpush with a symbiota deployment. The first section contains instructions for applying a patch containing the fp integration changes to trunk symbiota and the second provides more detailed information about integration via the use of fp php libraries.

Sybmiota Patch

I have created a patch against the most recent trunk of symbiota from the sourceforge svn (rev 913):

wget http://sourceforge.net/projects/filteredpush/files/Release_1/symbscan-2012-09-17.patch

This can be used with the scan deployment of symbiota to integrate the changes required to submit annotations to filteredpush. Currently this patch will hook into the new determination form only and does not include the annotation tab. I'll hold off on that part until we have finished adding the annotation generation and submission to all of the desired forms (i.e. new georeference annotations from edits on the occurence record form and an edit determination annotation on the edit determinations form.

If you open the .patch file in a text editor it will give you a summary of all the changes that will be applied. To apply the patch go to the trunk directory for symbiota (/var/www/html/symbiota/scan/trunk), use svn update to make sure symbiota is at revision 913, and use the patch command:

cd /var/www/html/symbiota/scan/trunk/
svn update
patch -p0 -i symbscan-2012-09-17.patch

Use the svn diff command to see the changes that were applied. One configuration variable has been added to symbini_template.php in config (look for $fpEnabled = false;) This allows you to switch the integration on or off, the default value of false will keep all the filteredpush changes disabled (no submit to filteredpush checkbox and no submission ever occurs). In the deployment symbini.php config, set this variable to true.

The php libraries that are invoked by the changes are located on symbiota2 in /usr/share/php/fp. I have already configured the libraries with the FP-Lite deployment on symbiota2 so everything should just work provided that the patch was applied successfully.

Test the patched deployment by submitting a new determination (if $fpEnabled in symbini.php is true you will see the submit to fp checkbox on the form). You should see the status "Determination added successfully and submitted to Filtered Push!" in symbiota on the occurrence editor form following submission. Now check to see if the annotation is present in the triple-store by using the following command on symbiota2 to query fuseki:

/usr/share/fuseki/jena-fuseki-0.2.4/s-query --service=http://localhost:3030/AnnotationStore/query 'SELECT ?o {?s ?p ?o}' --output=text


Integration via PHP-Libraries

The libraries for client integration with FilteredPush can be checked out via SVN at the following repository location:

svn co https://filteredpush.svn.sourceforge.net/svnroot/filteredpush/FP-Tools/FP-PHP-Library/

On symbiota2, I have this checked out in /home/shared/FP-PHP-Library and configured for use on symbiota2.

Edit FPConfig.php (FP-PHP-Library/fp/FPConfig.php) and set the appropriate values for the constants in this class.

1) After configuring the libraries, copy the fp directory into your php includes directory.

cp -r FP-PHP-Library/fp /usr/share/php/

The Annotation generation web service (hosted at http://symbiota2.acis.ufl.edu:8080/annotationws/rest/generate/ on symbiota2) is a RESTful web service that can produce RDF/XML from XML or JSON representations of a data model. Objects in the model and properties of those objects can be mapped to terms in any given vocabulary via a configuration file that resides with the annotation generation web service. Once configured, the annotation generation service can be invoked via http POST and annotation RDF/XML is returned in the http response.

This frees the client programmer from having to worry about the details of the semantic web technologies or the vocabularies used and allows for very straight-forward generation of annotations using either multi-dimensional arrays or PHP objects. Additionally, reconfiguring the web service for use with a different model takes only minutes.

The annotation generation web service hosted on symbiota2 has been preconfigured to produce annotation RDF/XML suitable for use with filteredpush. If the value of RDFHANDLER_ENDPOINT in fp/FPConfig.php is set to to point to the annotation generation webservice url, then the FPNetworkFactory (fp/FPNetworkFactory) php class can be used to obtain a proxy object for the annotation generation web service. The service can then be invoked using the generateRdfXml($annotation) method of the object returned by the factory. In the example presented below, the value of the $annotation argument is an array that contains the data for a new determination and the variable $rdf at the end contains the response (as rdf/xml) from the annotation generation webservice:

//prepare new determination annotation for fp
$annotator = new Array();
$annotator['name'] = 'Robert A. Morris';

$subject = new Array();
$subject['catalogNumber'] = '00107080';
$subject['collectionCode'] = 'A';
$subject['institutionCode'] = 'HUH';

$topic = new Array();
$topic['dateIdentified'] = '1990-11-23';
$topic['identifiedBy'] = 'C.H. Stirton';
$topic['scientificName'] = 'Ateleia gummifera';
$topic['scientificNameAuthorship'] = 'D. Dietrich';

$evidence['asText'] = 'Written on the sheet along with the annotation text "Flora Neotropica"';

$annotation = Array('annotator' => $annotator, 'subject' => $subject, 'topic' => $topic, 'evidence' => $evidence)

// generate rdf/xml
$generator = FPNetworkFactory::getAnnotationGenerator();
$rdf = $generator->generateRdfXml($annotation);

Behind the scenes the generateRdfXml method of the generator uses json_encode on the object passed in (either an array or PHP object) to produce the JSON that the annotation generation is configured to interpret. It then uses curl to post this JSON to the webservice url (which it gets from the FPConfig.php file).

The integration of the annotation generation in the changes applied by the patch is mostly contained in symbiotahelper.php and is invoked by OccurenceEditorDeterminations.php (see /home/shared/FP-PHP-Library/fp/includes/symbiotahelper.php). I plan to add other annotation types to this for future patches.

Once you have invoked the annotation generation webservice and have an rdf/xml representation of your data, you can obtain a proxy object to the currently configured network node via FPNetworkFactory::getNetworkFacade(). FPConfig.php on symbiota2 is already configured to use the instance of SparqlPuSH hosted on that vm. The proxy object returned by the factory will allow you to inject an rdf/xml annotation into the network via its injectIntoFP($rdf) function:

// inject annotation into fp
$network = FPNetworkFactory::getNetworkFacade();
$response = $network->injectIntoFP($rdf);

Depending on the configuration in FPConfig.php, the correct network implementation (FP-Medium or FP-Lite) and the correct url will be determined automatically. The use of a factory object that returns a facade means that when we are ready to switch from the FP-Lite implementation to the FP-Medium one, the change only needs to be made in FPConfig.php and not in any client code. FPNetworkFactory::getNetworkFacade() will always return a proxy object that represents the currently configured implementation. Changing or creating new filteredpush network node implemenations is a matter of reconfiguration and should not affect client code.

Some improvements I plan to make:

  • Documentation of the different annotation types currently supported by the system that describes the structure of the expected json so that client programmers know what kind of array to construct.
  • The Annotation Generation webservice should perform some sort of validation to ensure that the annotations it is generating conform to our rules and return the result of that in the HTTP response (return value of $generator->generateRdfXml).
  • put config somewhere besides a php class containing constants, create a setup script for setting the configuration values and add more exceptions to the php library code rather than relying on error_log.