ChuckAtIdigBioHackathon

From Filtered Push Wiki
Jump to: navigation, search

Deadline: End of work, Tuesday, December 10. Idigbio is asking for mini-talks, and if this isn't going to work, we need at least a couple days to find something that will.

TODO: Stand-alone REST client: accepts new RDF XML for input, supports named queries for output

Assertion: This already exists.

TODO: Put the jar somewhere, and have usage documentation on the linking page.

  • Download the jar.
  • Generate a keypair as instructed.
  • Chuck logs in to fp1 and adds the key.
  • Start the jar, providing a port number and the url for fp1.
  • There is now a localhost http service which accepts POSTs of new RDF XML.
  • There is also now a localhost http service which supports simple GETs of named queries.
  • ... and someone else who starts the jar in the same way should be able to also get the data out, because it's actually being stored at fp1.

TODO: Chuck knows how to maintain queries on fp1

Chuck needs to be able to

  • Log in to fp1.
  • Add new keys for new clients.
  • Find the DomainConfiguration.xml.
  • Edit it. (Should I need to sudo, or should I be able to edit it as myself?)
  • Restart server / reload settings: whatever is necessary.
  • Hit a URL that corresponds to one of the given queries, and parse the results.
  • Understand how query-time security would be implemented: Right now, I think it's open to a query from anyone, but if there were protected info in there, how is it protected?
  • Hit a URL that returns the original document. (Is an ID returned at the time of the original POST?)

Chuck also needs to know what the most common errors look like:

  • What if the DomainConfiguration.xml is bad XML?
  • What if the SPARQL is bad SPARQL?
  • What if a name is reused?

TODO: Chuck can construct plausible annotation RDF, and apply it to the crowd sourcing story.

Region identification

Identification of labels or barcodes.

QA/consensus on region identification

Sub-region identification

Identification of lines of text / fields of data.

QA/consensus on subregion identification

OCR or transcription

Transcribe an image of a single line to characters.

QA/consensus on transcription

Semantic markup

Connect text with terms from a controlled vocabulary.

QA/consensus on markup