Difference between revisions of "PhyloWS/REST"

From Evolutionary Informatics Working Group
Jump to: navigation, search
(Basic structure)
(Specification)
Line 48: Line 48:
  
 
==Specification==
 
==Specification==
 +
 +
Conventions:
 +
* <''value''> is a placeholder and must be replaced with an actual value when accessing the resource (i.e., typically a mandatory parameter)
 +
* {val1|val2|val3} must be replaced with exactly one of the values (literals) between the curly braces, delimited by the vertical bar
 +
* [param=<''value''>] or [param] denote optional parameters (with or without value, respectively)
 +
 +
===Phylodb: Phylogenetic Tree Database===
 +
 +
'''Retrieve:'''
 +
 +
'''Task: Retrieve tree and tree metadata'''
 +
* Query URI: /phylows/tree/<''identifier''>/?metadata={true|false}&topology={true|false}&[format=<''format''>]
 +
* ''identifier'' is a valid and unique identifier of the tree, for example a namespace:ID specification, a primary key, or an LSID
 +
* If metadata is true, all metadata of the tree will be returned. If topology is true, the topology (structure) of the tree will be returned.
 +
* ''format'' designates the desired response format. Example formats are nhx (New Hampshire Extended) and nexml (default). If the data provider doesn't support the requested format, an error will result.
 +
 +
'''Task: Retrieve all metadata for one or more nodes'''
 +
* Query URI: /phylows/node/<''identifier''>/?[format=<''format''>]
 +
* ''identifier'' is a valid and unique identifier of the node, for example a namespace:ID specification, a primary key, or an LSID
 +
* ''format'' designates the desired response format. Example formats are nhx (New Hampshire Extended) and nexml (default). If the data provider doesn't support the requested format, an error will result.
 +
 +
<!--
 +
# '''Task: Find trees by name or identifier'''
 +
#* Input: one or more (partial) names, or identifiers, and optionally a namespace of matching trees
 +
#* Output: names and identifiers of matching trees
 +
#* ''Q: Should this also return metadata for each tree?''
 +
# '''Task: Find trees by nodes'''
 +
#* Input: a list of node specifiers, and a designation of what the specifiers should match (node label, sequence ID, taxon, gene name)
 +
#* Output: names and identifiers of trees that each contain nodes matching the node specifiers
 +
#* ''Q: Should this use a convention for encoding the type of specifier (such as namespace:value)?''
 +
# '''Task: Find trees by clade'''
 +
#* Input: clade specification (phylocode)
 +
#* Output: names and identifiers of trees that each contain nodes with each of the labels
 +
# '''Task: Find trees by metadata'''
 +
#* Input: metadata constraints as (attribute, operator, value) structures
 +
#* Output: names and identifiers of matching trees
 +
#* ''Q: Should this also return complete metadata for each tree? Or only the metadata element by which it matched?''
 +
#* ''Q: Should this borrow from or be based on [http://www.loc.gov/standards/sru/ SRU]? Or [http://www.opensearch.org/ OpenSearch]?''
 +
# '''Task: Retrieve subtree or root node for matching clades'''
 +
#* Input:
 +
#** clade specification (identifier or label of clade root, phylocode specification)
 +
#** whether to only return the root of the clade (MRCA query)
 +
#** optionally, filter by namespace and name(s) (or identifier(s)) of trees
 +
#* Output: matching clades as subtrees (with complete structure)
 +
#* ''Q: Should this also return all metadata of all nodes in the clade, or would that require a separate request?''
 +
# '''Task: Project tree to subtree induced by a set of nodes'''
 +
#* Input: specifications of nodes (labels, identifiers) that induce a subtree
 +
#* Output: the subtree induced by the specified nodes, with all other nodes pruned
 +
#* ''Q: Should this also return all metadata of all nodes in the clade, or would that require a separate request?'' 
 +
# '''Task: Find, or filter trees matching a query topology.'''
 +
#* The query topology might have polytomies, of which matching trees may be a specialization.
 +
#* Input: A database (or result set) of trees, a query tree, and a distance metric
 +
#* Output: The matching trees (names, identifiers), or alternatively the subtrees of matching trees projected onto the query topology
 +
# '''Task: Aggregate (summarize) trees'''
 +
#* Input: a list of identifiers of trees, and an aggregation operation (#nodes, #internal nodes, #tips, length, height, balance, stemness, resolution)
 +
#* Output: for each tree the requested aggregation result(s)
 +
 +
'''Create:''' (only for databases supporting write-access)
 +
# '''Task: Create a tree in the database'''
 +
#* Input: tree with metadata, nodes, node metadata, and structure
 +
#* Output: success status
 +
#* ''Q: should the input be in NexML format? Or NH? Or NHX? Or all of these?''
 +
 +
'''Update:''' (only for databases supporting write-access)
 +
# '''Task: Prune clade from tree in the database'''
 +
#* Input: identifier of root node of clade to be pruned, optionally identifier of node where to graft the clade
 +
#* Output: success status
 +
# '''Task: Reroot tree'''
 +
#* Input: identifier of node that is to become the new root of its tree
 +
#* Output: success status
 +
 +
'''Delete:''' (only for databases supporting write-access)
 +
# '''Task: Delete tree from database'''
 +
#* Input: list of 1 or more identifier(s) of trees to be deleted
 +
#* Output: success status
 +
-->

Revision as of 05:08, 14 February 2008

Specification for a REST-based instantiation of the PhyloWS API.

Disclaimer: Note that this is pre-alpha, work in progress, and not even nearly finished. Any comments highly appreciated.

Principles

HTTP CRUD
POST Create, Update, Delete
GET Read
PUT Create, Update
DELETE Delete
From the Wikipedia entry on REST.
  1. RESTful queries should be stateless, hence all needed input needs to be provided in a single call, rather than "accumulating" the input (if there are multiple pieces of input) over multiple calls.
  2. The architecture builds on the principle of viewing data and operations as resources, which get created, modified (delete, update), or retrieved.
  3. The allowable HTTP method (GET, POST, PUT, DELETE) depends on the type of CRUD operation being represented by the resource call. For example, resource calls that simply retrieve data without also creating a resource should use the GET method, and not POST.
    Note that a resource may be created virtually; for example, calling a resource might start a calculation and return the results directly (rather than returning a URL to the results, which would be more appropriate). Nonetheless, the calculation is still being created. Question: is the mapping PUT=create, GET=retrieve, POST=update, DELETE=delete? If so, what is PUT used for? Is a message body being submitted? I don't think so...
  4. The API should also be described as a WSDL document. In fact, WSDL 2.0 allows binding to HTTP methods and supports a RESTful interface.

Basic structure

  1. All resource URIs start with BASE_URL/phylows/. BASE_URL is specific to the implementing service.
  2. Optional resources may not be implemented.
  3. The first path element designates the type of data the resource points to, or the operation.
    • Examples for data: /phylows/tree, /phylows/clade, /phylows/node
    • Examples for operation: /phylows/aggregate, /phylows/find
  4. The second path element depends on whether the resource points to a data resource or to an operation:
    • Data resources: The second path element gives the unique identifier of the data resource. This need not (and arguably should not) be the primary key of the resource in the provider database, but could also be an accession number-type identifier.
    • Operation resources: For operations that act on or return possibly multiple resources (data elements), the second path element gives the data type being acted on, or being returned. Subsequent path elements specify the query path, with final parameters giving query parameters.
      • Examples: /phylows/find/tree, with additional parameters specifying the query, e.g.: /phylows/find/tree/?name=Primates
  5. Operations acting on a single identified data resource use the URI of the data resource, and express the action as the HTTP method, possibly in combination with input parameters.
    • Examples: /phylows/tree/TreeBASE:S123455 with HTTP method DELETE is a request to delete the respective tree. /phylows/node/urn:lsid:phylodb.org:node:123456 with HTTP method DELETE is a request to prune the respective node from the tree it is in.

Specification

Conventions:

  • <value> is a placeholder and must be replaced with an actual value when accessing the resource (i.e., typically a mandatory parameter)
  • {val1|val2|val3} must be replaced with exactly one of the values (literals) between the curly braces, delimited by the vertical bar
  • [param=<value>] or [param] denote optional parameters (with or without value, respectively)

Phylodb: Phylogenetic Tree Database

Retrieve:

Task: Retrieve tree and tree metadata

  • Query URI: /phylows/tree/<identifier>/?metadata={true|false}&topology={true|false}&[format=<format>]
  • identifier is a valid and unique identifier of the tree, for example a namespace:ID specification, a primary key, or an LSID
  • If metadata is true, all metadata of the tree will be returned. If topology is true, the topology (structure) of the tree will be returned.
  • format designates the desired response format. Example formats are nhx (New Hampshire Extended) and nexml (default). If the data provider doesn't support the requested format, an error will result.

Task: Retrieve all metadata for one or more nodes

  • Query URI: /phylows/node/<identifier>/?[format=<format>]
  • identifier is a valid and unique identifier of the node, for example a namespace:ID specification, a primary key, or an LSID
  • format designates the desired response format. Example formats are nhx (New Hampshire Extended) and nexml (default). If the data provider doesn't support the requested format, an error will result.