PhyloWS/REST

From Evolutionary Informatics Working Group
Jump to: navigation, search

Specification for a REST-based instantiation of the PhyloWS API.

Disclaimer: Note that this is pre-alpha, work in progress, and not even nearly finished. Any comments highly appreciated.

Principles

HTTP CRUD
POST Create
GET Read
PUT Create, Update
DELETE Delete
From the Wikipedia entry on REST.
  1. RESTful queries should be stateless, hence all needed input needs to be provided in a single call, rather than "accumulating" the input (if there are multiple pieces of input) over multiple calls.
  2. The architecture builds on the principle of viewing data and operations as resources, which get created, modified (delete, update), or retrieved.
  3. The allowable HTTP method (GET, POST, PUT, DELETE) depends on the type of CRUD operation being represented by the resource call. For example, resource calls that simply retrieve data without also creating a resource should use the GET method, and not POST.
    Note that a resource may be created virtually; for example, calling a resource might start a calculation and return the results directly (rather than returning a URL to the results, which would be more appropriate). Nonetheless, the calculation is still being created.
  4. The API should also be described as a WSDL document. In fact, WSDL 2.0 allows binding to HTTP methods and supports a RESTful interface.
  5. We try to limit the invention of new standards as much as possible. For example, we try to reuse SRU to the extent that this is possible and useful, rather than reinventing a query syntax that would simply be s subset of SRU in a different syntax.

Basic structure

  1. All resource URIs start with BASE_URL/phylows/. BASE_URL is specific to the implementing service.
  2. Optional resources may not be implemented.
  3. The first path element designates the type of data the resource points to, or the operation. (Shouldn't these data types be governed by CDAO? -rav)
    • Examples for data: /phylows/tree, /phylows/clade, /phylows/node
    • Examples for operation: /phylows/aggregate, /phylows/find
  4. The second path element depends on whether the resource points to a data resource or to an operation:
    • Data resources: The second path element gives the unique identifier of the data resource. This need not (and arguably should not) be the primary key of the resource in the provider database, but could also be an accession number-type identifier.
    • Operation resources: For operations that act on or return possibly multiple resources (data elements), the second path element gives the data type being acted on, or being returned. Subsequent path elements specify the query path, with final parameters giving query parameters.
      • Examples: /phylows/find/tree, with additional parameters specifying the query, e.g.: /phylows/find/tree/?name=Primates
  5. Operations acting on a single identified data resource use the URI of the data resource, and express the action as the HTTP method, possibly in combination with input parameters.
    • Examples: /phylows/tree/TreeBASE:S123455 with HTTP method DELETE is a request to delete the respective tree. /phylows/tree/TreeBASE:S123455/node/urn:lsid:phylodb.org:node:123456 with HTTP method DELETE is a request to prune the respective node from the respective tree.

PhyloWS REST Specification

Conventions:

  • <value> is a placeholder and must be replaced with an actual value when accessing the resource (i.e., typically a mandatory parameter)
  • {val1|val2|val3} must be replaced with exactly one of the values (literals) between the curly braces, delimited by the vertical bar
  • [param=<value>] or [param] denote optional parameters (with or without value, respectively)
  • [param={val1|val2|val2}] denotes an optional parameter with the default value in bold font.

PhyloDB: Phylogenetic Tree Database

Retrieve:

Note that the API definition below specifically excludes authentication and authorization from its scope. Authenticating a user, and authorizing her for the requested data, need to use established standards for that, if read access to the data is not public.

Task: Retrieve tree and tree metadata

  • Query URI: /phylows/tree/<identifier>/?[metadata={true|false}]&[topology={true|false}]&[{recordSchema|format}=<format>]
  • identifier is a valid and unique identifier of the tree, for example a namespace:ID specification, a primary key, or an LSID
  • If metadata is true (the default), all metadata of the tree will be returned.
  • If topology is false, the topology (nodes and edges) of the tree will be not be returned, and neither will be data that are only required if the tree nodes were included (such as <otu> elements).
  • format designates the desired response format. Example formats are nhx (New Hampshire Extended) and nexml (default). If the data provider doesn't support the requested format, an error will result.

Task: Retrieve a clade (subtree) of a tree

  • Query URI: /phylows/tree/<identifier>/clade/<nodeID>?[metadata={true|false}]&[{recordSchema|format}=<format>]
  • identifier is a valid and unique identifier of the tree of which a subtree is to be returnded, for example a namespace:ID specification, a primary key, or an LSID
  • nodeID is the root node of the clade (subtree) that should be returned.
  • If metadata is true (the default), all metadata of the tree, and those metadata of nodes that are within the subtree will be returned.
  • format designates the desired response format. Example formats are nhx (New Hampshire Extended) and nexml (default). If the data provider doesn't support the requested format, an error will result.

Task: Retrieve a clade (subtree) of a tree defined by MRCA

  • Query URI: /phylows/tree/<identifier>/clade/mrca/?includes=<nodeID1,nodeID2,...>&[excludes= <nodeID1,nodeID2,...>] &[metadata={true|false}]&[{recordSchema|format}=<format>]
  • identifier is a valid and unique identifier of the tree of which a subtree is to be returnded, for example a namespace:ID specification, a primary key, or an LSID
  • nodeIDs are node identifiers
  • The includes parameters specifies which nodes are descended from the MRCA, and excludes specifies which ones are not descended from it (if any).
  • If metadata is true (the default), all metadata of the tree, and those metadata of nodes that are within the subtree will be returned.
  • format designates the desired response format. Example formats are nhx (New Hampshire Extended) and nexml (default). If the data provider doesn't support the requested format, an error will result.

Task: Retrieve all trees related to a data matrix

  • Query URI: /phylows/matrix/<matrixID>/trees/?[metadata={true|false}]&[topology={true|false}]&[{recordSchema|format}=<format>]
  • matrixID is a valid and unique identifier of the data matrix for which related trees are to be returned, for example a namespace:ID specification, a primary key, or an LSID
  • If metadata is true (the default), all metadata of the tree, and those metadata of nodes that are within the subtree will be returned.
  • It is up to the provider to decide what makes a tree connected to a data matrix. If the provider does not support such a connection (or doesn't support data matrices), then this should result in an empty result set, or in an error.
  • If there are no matching trees (but the operation is otherwise supported), this should result in a result document with an empty <trees> element.
  • format designates the desired response format. Example formats are nhx (New Hampshire Extended) and nexml (default). If the format is not NeXML, it has to support multiple trees in a single document. If the data provider doesn't support the requested format, an error will result.

Task: Retrieve all metadata for a node

  • Query URI: /phylows/tree/<identifier>/node/<nodeID>/?[{recordSchema|format}=<format>]
  • identifier is a valid and unique identifier of the tree, for example a namespace:ID specification, a primary key, or an LSID
  • nodeID is a valid and unique identifier of the node, for example a namespace:ID specification, a primary key, or an LSID
  • format designates the desired response format. Example formats are nhx (New Hampshire Extended) and nexml (default). If the data provider doesn't support the requested format, an error will result.

Task: Find trees by metadata

  • Query URI: /phylows/find/tree/?[query=<CQL query>]&[recordSchema=<format>]&[operation=searchRetrieve]&[version=1.2]
  • This aims to be a SRU 1.2 compliant query service (though it may not be entirely).
  • CQL query is a valid CQL v1.2 query specification. A PhyloWS server is expected to support at least Level 0. Possible metadata elements are at least nexml.treeName and nexml.treeNamespace, and any other metadata elements that the data provider may have recorded. Note that when used in REST form, the query must be URL-encoded, so a CQL query of "dc.title=lemur" would be encoded as "dc.title%3Dlemur"
  • format designates the desired response format. Servers are required to support at least two formats: dc (Dublin Core) and nexml. Dublin Core is the default. A Dublin Core response will only return tree labels and identifiers, not all metadata or the topology. A Nexml response will contain the entire tree along with complete metadata. If the data provider doesn't support the requested format, an error will result.
  • operation=searchRetrieve is mandated by SRU 1.2, but is optional here. The same goes for version=1.2 (versions lower than 1.1 are not supported).
  • Unsupported queries need to result in a properly formatted diagnostic response.

The response body is formatted according to the SRU Response format, and records are returned within the body in the requested recordSchema (provided it is supported).

Task: Discovery of data provider capabilities

  1. Discovery of supported tree metadata elements
    • Query URI: /phylows/provider/metadata/tree
    • Response is a XML-formatted list of supported metadata elements. This needs to be defined. Reuse NexML? Or another standard?
  2. Discovery of supported node metadata elements
    • Query URI: /phylows/provider/metadata/node
    • Response is a XML-formatted list of supported metadata elements. This needs to be defined. Reuse NexML? Or another standard?
  3. Discovery of supported tree and clade formats
    • Query URI: /phylows/provider/formats/tree
    • Response is a XML-formatted list of supported formats for trees (and clades). This needs to be defined. Reuse NexML? Or another standard?

Find/search examples:

  1. Task: Find trees by nodes
    • Input: a list of node specifiers, and a designation of what the specifiers should match (node label, sequence ID, taxon, gene name)
  2. Task: Find trees by clade
    • Input: clade specification (phylocode)
  3. Task: Find, or filter trees matching a query topology.
    • The query topology might have polytomies, of which matching trees may be a specialization.
    • Input: A database (or result set) of trees, a query tree, and a distance metric
    • Output: The matching trees (names, identifiers), or alternatively the subtrees of matching trees projected onto the query topology

More advanced queries (not covered in the API yet)

  1. Task: Project tree to subtree induced by a set of nodes
    • Input: specifications of nodes (labels, identifiers) that induce a subtree
    • Output: the subtree induced by the specified nodes, with all other nodes pruned
  2. Task: Aggregate (summarize) trees
    • Input: a list of identifiers of trees, and an aggregation operation (#nodes, #internal nodes, #tips, length, height, balance, stemness, resolution)
    • Output: for each tree the requested aggregation result(s)

Create, Update, Delete (only for databases supporting write-access)

Note that the API definition below specifically excludes authentication and authorization from its scope. Authenticating a user, and authorizing her for the requested operation, need to use established standards for that.

Task: Create one or more trees in the database

  • Query URI: /phylows/create/tree
  • Uses POST as the HTTP method.
  • Input: a NeXML file with one or more trees to be created. If the file contains metadata (for the tree, nodes, or edges), the metadata will be stored, too.
  • Output: success status and list of URIs that were created

Task: Update existing trees in the database

  • Query URI: /phylows/tree/<identifier>/?[removeObsolete={true|false}]
  • Uses PUT as the HTTP method.
  • identifier is a valid and unique identifier of the tree, for example a namespace:ID specification, a primary key, or an LSID
  • Input: a NeXML file with one to be updated. If there is more than one tree in the input, the one with the identifier matching the identifier of the tree at the given query URI will be used, and an error will result otherwise.
  • Each attribute of the tree will supplant the value in the database (if any). Nodes and edges not yet in the database will be added, and those that are present will have their attributes and metadata updated.
  • If removeObsolete is true, nodes not in the input document but present in the database will be treated as obsolete and deleted from the database.
  • If the file contains metadata (for the tree, nodes, or edges), the metadata will update existing metadata with the same metadata terms and objects they are attached to, and be added otherwise.
  • Output: success status

Task: Delete a tree from the database

  • Query URI: /phylows/tree/<identifier>
  • Uses DELETE as the HTTP method.
  • identifier is a valid and unique identifier of the tree, for example a namespace:ID specification, a primary key, or an LSID
  • Output: success status

Task: Delete a node in the database

  • Query URI: /phylows/tree/<identifier>/node/<nodeID>/?[simplify={true|false}]&[subtree={true|false}]
  • Uses DELETE as the HTTP method.
  • identifier is a valid and unique identifier of the tree, for example a namespace:ID specification, a primary key, or an LSID
  • nodeID is a valid and unique identifier of the node within the tree, for example a namespace:ID specification, a primary key, or an LSID.
  • If subtree is true all descendants of the node will be recursively deleted, too.
  • Deleting a node will cause each of its outgoing edges (if any, and if subtree is false) to be updated to come from its ancestor instead, and its incoming edges to be deleted. If a node has multiple ancestors, each outgoing each will be duplicated for each ancestor.
  • If simplify is true, if deleting the node causes its parent node(s) to have only a single descendent, the deletion will recursively cascade to the parent node (but with subtree set to false).
  • Output: success status

Task: Graft a subtree in the database

  • Query URI: /phylows/tree/<identifier>/node/<nodeID>/?[simplify={true|false}]&targetTree=<identifier>&targetNode=<nodeID>
  • Uses DELETE as the HTTP method.
  • identifier is a valid and unique identifier of the tree, for example a namespace:ID specification, a primary key, or an LSID
  • nodeID is a valid and unique identifier of the node within the tree, for example a namespace:ID specification, a primary key, or an LSID.
  • Prunes the subtree rooted at and including the node identified by the URI and grafts it as a child of the node identified by the graftTo parameter.
  • If simplify is true, if pruning the subtree causes the node's parent node(s) to have only a single descendent, the deletion will recursively cascade to the parent node (but with subtree set to false).
  • Output: success status.

PhyloDB: Phylogenetic Data Matrix Database

Retrieve:

Note that the API definition below specifically excludes authentication and authorization from its scope. Authenticating a user, and authorizing her for the requested data, need to use established standards for that, if read access to the data is not public.

Task: Retrieve matrix and matrix metadata

  • Query URI: /phylows/matrix/<identifier>/?[metadata={true|false}]]&[data={true|false}]]&[{recordSchema|format}=<format>]
  • identifier is a valid and unique identifier of the matrix, for example a namespace:ID specification, a primary key, or an LSID
  • If metadata is true (the default), all metadata of the matrix (i.e., for characters, character states, and OTUs) will be returned.
  • If data is false, the values of character states (the cells of the matrix) will not be returned, nor will metadata for any matrix cells.
  • format designates the desired response format. NeXML is the default. If the data provider doesn't support the requested format, an error will result.

Task: Retrieve all data matrices related to a tree

  • Query URI: /phylows/tree/<treeID>/matrices/?[metadata={true|false}]&[topology={true|false}]&[{recordSchema|format}=<format>]
  • treeID is a valid and unique identifier of the tree for which related data matrices are to be returnded, for example a namespace:ID specification, a primary key, or an LSID
  • If metadata is true (the default), all metadata of the data matrix (i.e., for characters, character states, and OTUs) will be returned.
  • It is up to the provider to decide what makes a data matrix connected to a tree. If the provider does not support such a connection (or doesn't support trees), then this should result in an error.
  • If there are no matching data matrices (but the operation is otherwise supported), this should result in a result document with an empty <characters> element.
  • format designates the desired response format. Example formats are nhx (New Hampshire Extended) and nexml (default). If the format is not NeXML, it has to support multiple trees in a single document. If the data provider doesn't support the requested format, an error will result.

Task: Find matrices by metadata

  • Query URI: /phylows/find/matrix/?[query=<CQL query>]&[recordSchema=<format>]&[operation=searchRetrieve]&[version=1.2]
  • This aims to be a SRU 1.2 compliant query service (though it may not be entirely).
  • CQL query is a valid CQL v1.2 query specification. A PhyloWS server is expected to support at least Level 0. Possible metadata elements are at least nexml.characters.label, and any other metadata elements that the data provider may have recorded.
  • format designates the desired response format. At least nexml should be supported, and is the default (note that SRU allows the server to determine the default format). If the data provider doesn't support the requested format, an error will result.
  • operation=searchRetrieve is mandated by SRU 1.2, but is optional here. The same goes for version=1.2 (versions lower than 1.1 are not supported).
  • Unsupported queries need to result in a properly formatted diagnostic response.
  • The response will only return the matrices (formally, the "characters block", as defined by the /nexml/characters element) with names and identifiers, not all metadata or the actual matrix data. The response body is formatted according to the SRU Response format, and records are returned within the body in the requested recordSchema (provided it is supported).

Task: Discovery of data provider capabilities

  1. Discovery of supported data matrix metadata elements
    • Query URI: /phylows/provider/metadata/matrix
    • Response is a XML-formatted list of supported metadata elements. This needs to be defined. Reuse NexML? Or another standard?
  2. Discovery of supported data matrix formats
    • Query URI: /phylows/provider/formats/matrix
    • Response is a XML-formatted list of supported formats for data matrices ("character blocks"). This needs to be defined. Reuse NexML? Or another standard?

Create, Update, Delete (only for databases supporting write-access)

Note that the API definition below specifically excludes authentication and authorization from its scope. Authenticating a user, and authorizing her for the requested operation, need to use established standards for that.

Task: Create one or more data matrices in the database

  • Query URI: /phylows/create/matrix
  • Uses POST as the HTTP method.
  • Input: a NeXML file with one or more data matrices to be created. If the file contains metadata (for the matrix, characters, or character states), the metadata will be stored, too.
  • Output: success status and list of URIs that were created

Task: Update existing matrix in the database

  • Query URI: /phylows/matrix/<identifier>/?[removeObsolete={true|false}]
  • Uses PUT as the HTTP method.
  • identifier is a valid and unique identifier of the matrix, for example a namespace:ID specification, a primary key, or an LSID
  • Input: a NeXML file with one data matrix to be updated. If there is more than one data matrix in the input, the one with the identifier matching the identifier of the matrix at the given query URI will be used, and an error will result otherwise.
  • Each attribute of the data matrix will supplant the value in the database (if any). Characters, character states, and OTUs not yet in the database will be added (but new OTUs will not be added to a tree using the matrix), and those that are present will have their attributes and metadata updated.
  • If removeObsolete is true, characters and OTUs not in the input matrix but present in the database will be treated as obsolete and their associations with the matrix will be deleted from the database.
  • If the file contains metadata (for the character, character states, or OTUs), the metadata will update existing metadata with the same metadata terms and objects they are attached to, and be added otherwise.
  • Output: success status

Task: Delete a data matrix from the database

  • Query URI: /phylows/matrix/<identifier>
  • Uses DELETE as the HTTP method.
  • identifier is a valid and unique identifier of the matrix, for example a namespace:ID specification, a primary key, or an LSID
  • Output: success status

Task: Delete a character from the data matrix in the database

  • Query URI: /phylows/matrix/<identifier>/character/<charID>
  • Uses DELETE as the HTTP method.
  • identifier is a valid and unique identifier of the matrix, for example a namespace:ID specification, a primary key, or an LSID
  • charID is the identifier of a character in the given data matrix.
  • The character may be used in other data matrices; in this case, this query will not delete the character from the database or any other data matrix.
  • Output: success status

Task: Delete an OTU from the data matrix in the database

  • Query URI: /phylows/matrix/<identifier>/otu/<otuID>
  • Uses DELETE as the HTTP method.
  • identifier is a valid and unique identifier of the matrix, for example a namespace:ID specification, a primary key, or an LSID
  • otuID is the identifier of an OTU in the given data matrix.
  • The OTU may be used in other data matrices; in this case, this query will not delete the OTU from the database or any other data matrix.
  • Output: success status

PhyloConv: Phyloinformatics Conversion Services

Task: Convert phylogenetic data