Database Interop Hackathon/Target Projects

From Evolutionary Informatics Working Group
Revision as of 11:37, 24 February 2009 by (talk) (Criteria to consider for deliverables)
Jump to: navigation, search

Criteria to consider for deliverables

  • successful outcome would enhance (or showcase) interoperability
  • work involves 2 or more participants of hackathon
  • project is reasonably expected to produce tangible outcomes (even if it falls short of some goals)
  • project outcomes can be re-used by others for further interop improvements

Suggested deliverables with no plan (yet)

  • The One Database that Rules them All
  • an experimental NESCent portal that would provide a common interface to several different resources.
  • a common provenance standard for identifying a data resource as the source of something (name, url, citation)
  • your idea here

Suggested deliverables with a plan

Data retrieval, based on an identifier

This simple reference service returns phylogenetic data that is identifiable by some GUID, such as a ToLWeb accession number. The service is implemented following the PhyloWS/REST proposal, has a CDAO annotated service description and emits NeXML.

Describing the interface

The first step is to formally describe the interface. In general terms, PhyloWS/REST proposes that data retrieval services are exposed using a URL API like this: /phylows/${dataType}/${nameSpace}:${identifier}, where ${dataType} is something like "Tree", "Matrix", etc. ${nameSpace} is a naming authority such as ToLWeb, and ${identifier} is unique within ${nameSpace} (and consequently globally unique). This implies URLs such as /phylows/Tree/ToLWeb:16299, which, when accessed using the GET HTTP method returns a representation of tree 16299.

A standard way to describe this behaviour is to express it in WSDL2.0 - there's a nifty example of wsdl generation and annotation here. At the time of writing, the best free editor for wsdl files comes with the WTP extension for eclipse. The end result is a file such as this one


Graphical representation of this service description.

Implementing the service

A service for the interface described in the previous section can be implemented as an MVC-like application. The controller part of the service needs to find out what the requested Tree ID is. Depending on the implementation language (and whether some advanced web application programming framework is used) the Tree ID is either part of the ${PATH_INFO} environment variable, or encapsulated in some kind of request object. However the Tree ID is retrieved from the request, the next step is to look up the record that the ID refers to. Typically this would be done in a database query. The goal here is to collect all information needed to populate a model object (in this case a tree) that can be serialized to the right return format. Assuming that the return format is NeXML, libraries for perl, python, java and c++ are available that supply model objects.

Once populated, the controller object creates a view using the model objects. In the simplest case, for web applications, this boils down to printing out the XML string representations of the model objects, preceded by the correct response code, e.g. 200 OK, and mime-type, e.g. application/xml. In more complex web application architectures, the string representations of the model objects may be passed to a response object (which in turn is serialized and returned to the client), or the objects may be passed into a template (jsp, Template Toolkit, php) where they are stringified.

Outstanding issues

  • Dearth of support for PHP
  • How to deal with errors (e.g. response codes)
  • Query interface
  • CDAO integration