PhyloWS/REST

From Evolutionary Informatics Working Group
Revision as of 16:11, 10 March 2009 by Hlapp (talk) (Phylodb: Phylogenetic Tree Database)
Jump to: navigation, search

Specification for a REST-based instantiation of the PhyloWS API.

Disclaimer: Note that this is pre-alpha, work in progress, and not even nearly finished. Any comments highly appreciated.

Principles

HTTP CRUD
POST Create
GET Read
PUT Create, Update
DELETE Delete
From the Wikipedia entry on REST.
  1. RESTful queries should be stateless, hence all needed input needs to be provided in a single call, rather than "accumulating" the input (if there are multiple pieces of input) over multiple calls.
  2. The architecture builds on the principle of viewing data and operations as resources, which get created, modified (delete, update), or retrieved.
  3. The allowable HTTP method (GET, POST, PUT, DELETE) depends on the type of CRUD operation being represented by the resource call. For example, resource calls that simply retrieve data without also creating a resource should use the GET method, and not POST.
    Note that a resource may be created virtually; for example, calling a resource might start a calculation and return the results directly (rather than returning a URL to the results, which would be more appropriate). Nonetheless, the calculation is still being created.
  4. The API should also be described as a WSDL document. In fact, WSDL 2.0 allows binding to HTTP methods and supports a RESTful interface.
  5. We try to limit the invention of new standards as much as possible. For example, we try to reuse SRU to the extent that this is possible and useful, rather than reinventing a query syntax that would simply be s subset of SRU in a different syntax.

Basic structure

  1. All resource URIs start with BASE_URL/phylows/. BASE_URL is specific to the implementing service.
  2. Optional resources may not be implemented.
  3. The first path element designates the type of data the resource points to, or the operation. (Shouldn't these data types be governed by CDAO? -rav)
    • Examples for data: /phylows/tree, /phylows/clade, /phylows/node
    • Examples for operation: /phylows/aggregate, /phylows/find
  4. The second path element depends on whether the resource points to a data resource or to an operation:
    • Data resources: The second path element gives the unique identifier of the data resource. This need not (and arguably should not) be the primary key of the resource in the provider database, but could also be an accession number-type identifier.
    • Operation resources: For operations that act on or return possibly multiple resources (data elements), the second path element gives the data type being acted on, or being returned. Subsequent path elements specify the query path, with final parameters giving query parameters.
      • Examples: /phylows/find/tree, with additional parameters specifying the query, e.g.: /phylows/find/tree/?name=Primates
  5. Operations acting on a single identified data resource use the URI of the data resource, and express the action as the HTTP method, possibly in combination with input parameters.
    • Examples: /phylows/tree/TreeBASE:S123455 with HTTP method DELETE is a request to delete the respective tree. /phylows/node/urn:lsid:phylodb.org:node:123456 with HTTP method DELETE is a request to prune the respective node from the tree it is in.

Specification

Conventions:

  • <value> is a placeholder and must be replaced with an actual value when accessing the resource (i.e., typically a mandatory parameter)
  • {val1|val2|val3} must be replaced with exactly one of the values (literals) between the curly braces, delimited by the vertical bar
  • [param=<value>] or [param] denote optional parameters (with or without value, respectively)
  • [param={val1|val2|val2}] denotes an optional parameter with the default value in bold font.

Phylodb: Phylogenetic Tree Database

Retrieve:

Task: Retrieve tree and tree metadata

  • Query URI: /phylows/tree/<identifier>/?[metadata={true|false}]&[topology={true|false}]&[{recordSchema|format}=<format>]
  • identifier is a valid and unique identifier of the tree, for example a namespace:ID specification, a primary key, or an LSID
  • If metadata is true (the default), all metadata of the tree will be returned.
  • If topology is false, the topology (nodes and edges) of the tree will be not be returned, and neither will be data that are only required if the tree nodes were included (such as <otu> elements).
  • format designates the desired response format. Example formats are nhx (New Hampshire Extended) and nexml (default). If the data provider doesn't support the requested format, an error will result.

Task: Retrieve all metadata for a node

  • Query URI: /phylows/node/<identifier>/?[{recordSchema|format}=<format>]
  • identifier is a valid and unique identifier of the node, for example a namespace:ID specification, a primary key, or an LSID
  • format designates the desired response format. Example formats are nhx (New Hampshire Extended) and nexml (default). If the data provider doesn't support the requested format, an error will result.

Task: Find trees by metadata

  • Query URI: /phylows/find/tree/?[query=<CQL query>]&[recordSchema=<format>]&[operation=searchRetrieve]&[version=1.2]
  • This aims to be a SRU 1.2 compliant query service (though it may not be entirely).
  • CQL query is a valid CQL v1.2 query specification. A PhyloWS server is expected to support at least Level 0. Possible metadata elements are at least nexml.treeName and nexml.treeNamespace, and any other metadata elements that the data provider may have recorded.
  • format designates the desired response format. At least nexml should be supported, and is the default (note that SRU allows the server to determine the default format). If the data provider doesn't support the requested format, an error will result.
  • operation=searchRetrieve is mandated by SRU 1.2, but is optional here. The same goes for version=1.2 (versions lower than 1.1 are not supported).
  • Unsupported queries need to result in a properly formatted diagnostic response.
  • The response will only return the trees with names and identifiers, not all metadata or the topology. The response body is formatted according to the SRU Response format, and records are returned within the body in the requested recordSchema (provided it is supported).

Task: Discovery of data provider capabilities

  1. Discovery of supported tree metadata elements
    • Query URI: /phylows/provider/metadata/tree
    • Response is a XML-formatted list of supported metadata elements. This needs to be defined. Reuse NexML? Or another standard?
  2. Discovery of supported node metadata elements
    • Query URI: /phylows/provider/metadata/node
    • Response is a XML-formatted list of supported metadata elements. This needs to be defined. Reuse NexML? Or another standard?
  3. Discovery of supported tree and clade formats
    • Query URI: /phylows/provider/formats/tree
    • Response is a XML-formatted list of supported formats for trees (and clades). This needs to be defined. Reuse NexML? Or another standard?


Create, Update, Delete (only for databases supporting write-access)

Task: Create one or more trees in the database

  • Query URI: /phylows/create
  • Uses POST as the HTTP method.
  • Input: a NeXML file with one or more trees to be created. If the file contains metadata (for the tree, nodes, or edges), the metadata will be stored, too.
  • Output: success status and list of URIs that were created

Task: Update existing trees in the database'

  • Query URI: /phylows/tree/<identifier>/?[removeObsolete={true|false}]
  • Uses PUT as the HTTP method.
  • Input: a NeXML file with one to be updated. If there is more than one tree in the input, the one with the identifier matching the identifier of the tree at the given query URI will be used, and an error will result otherwise.
  • Each attribute of the tree will supplant the value in the database (if any). Nodes and edges not yet in the database will be added, and those that are present will have their attributes and metadata updated.
  • If removeObsolete is true, nodes not in the input document but present in the file will be treated as obsolete and deleted from the database.
  • If the file contains metadata (for the tree, nodes, or edges), the metadata will update existing metadata with the same metadata terms and objects they are attached to, and be added otherwise.
  • Output: success status

Task: Delete a tree from the database

  • Query URI: /phylows/tree/<identifier>
  • Uses DELETE as the HTTP method.
  • Output: success status

Task: Delete a node from the database

  • Query URI: /phylows/node/<identifier>
  • Uses DELETE as the HTTP method.
  • Output: success status

Phylodb: Phylogenetic Data Matrix Database

Retrieve:

Task: Retrieve matrix and matrix metadata

  • Query URI: /phylows/matrix/<identifier>/?[metadata={true|false}]]&[data={true|false}]]&[{recordSchema|format}=<format>]
  • identifier is a valid and unique identifier of the matrix, for example a namespace:ID specification, a primary key, or an LSID
  • If metadata is true (the default), all metadata of the matrix (i.e., for characters, character states, and OTUs will be returned.
  • If data is false, the values of character states (the cells of the matrix) will not be returned, nor will metadata for any matrix cells.
  • format designates the desired response format. NeXML is the default. If the data provider doesn't support the requested format, an error will result.

Task: Find matrices by metadata

  • Query URI: /phylows/find/matrix/?[query=<SRU query>]&[recordSchema=<format>]&[operation=searchRetrieve]&[version=1.2]
  • This aims to be a SRU 1.2 compliant query service (though it may not be entirely).
  • SRU query is a valid CQL v1.2 query specification. A PhyloWS server is expected to support at least Level 0. Possible metadata elements are at least nexml.characters.label, and any other metadata elements that the data provider may have recorded.
  • format designates the desired response format. At least nexml should be supported, and is the default (note that SRU allows the server to determine the default format). If the data provider doesn't support the requested format, an error will result.
  • operation=searchRetrieve is mandated by SRU 1.2, but is optional here. The same goes for version=1.2 (versions lower than 1.1 are not supported).
  • Unsupported queries need to result in a properly formatted diagnostic response.
  • The response will only return the matrices (formally, the "characters block", as defined by the /nexml/characters element) with names and identifiers, not all metadata or the actual matrix data. The response body is formatted according to the SRU Response format, and records are returned within the body in the requested recordSchema (provided it is supported).

Task: Discovery of data provider capabilities

  1. Discovery of supported data matrix metadata elements
    • Query URI: /phylows/provider/metadata/matrix
    • Response is a XML-formatted list of supported metadata elements. This needs to be defined. Reuse NexML? Or another standard?
  2. Discovery of supported data matrix formats
    • Query URI: /phylows/provider/formats/matrix
    • Response is a XML-formatted list of supported formats for data matrices ("character blocks"). This needs to be defined. Reuse NexML? Or another standard?


PhyloConv: Phyloinformatics Conversion Services

Task: Convert phylogenetic data