PhyloWS

From Evolutionary Informatics Working Group
Revision as of 18:41, 10 February 2008 by Hlapp (talk)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Phyloinformatics Web Services API: Overview

At present there is no standard web-service API for phylogenetic data that would allow integration of phylogenetic data and service providers into the programmable web. Hence, current approaches to integrate data and services into workflows are highly specific to the integration platform (CIPRES, BioPerl, Bio::Phylo, Kepler), and nearly unusable in other environments.

A web-service API standard would overcome this problem, and make phylogenetic data as well as services universally available to any client application that supports the API. Reference implementations of the client API could simplify and promote adoption.

Rather than proposing a particular implementation, this page is to gather requirements and use-cases that such an API would have to fulfill.

Scope

If we define phyloinformatics as the informatics of managing, querying, and manipulating phylogenetic data, we can define the scope of PhyloWS along two axes.

  1. Possible scopes of operations:
    • Managing: storing (create), updating, deleting phylogenetic data
    • Querying: retrieval only
    • Manipulating: manipulating the result (pruning, concatenating, on-the-fly super-tree)
  2. Possible scopes of phylogenetic data types:
    • Phylogenetic trees
    • Character matrices (discrete, continuous, DNA, RNA, protein)
    • Transition models

Use Cases

Phylogenetic trees

Topological queries

  1. Find most recent common ancestor of two or more leaf nodes in the specified tree
  2. Find the minimum spanning clade for two or more leaf nodes in the specified tree
  3. Obtain trees with length shorter or longer than an input tree, or a given length
  4. Given a tip (or internal) node, find the tree with the shortest (or longest) root to tip (or node) distance
    • Alternatively, obtain distribution of root-to-tip (or node) distances

Character-based queries

  1. Find all clades with all nodes having a given character
  2. Given characters X and Y, which trees support character X evolving before (or after) character Y

Tree and node annotation queries

  1. Find all clades with all nodes having a given annotation
    • For example, find all Drosophila species occurring in Hawaii

Filtering trees

  1. Filter a set of trees by topology: given a query topology, filter all (not) matching trees using some distance metric.
    • The query topology might have polytomies, of which matching trees may be a specialization

Functions on trees

Modifying functions:

  • Pruning clades (hierarchical subsetting)
  • Rerooting trees

Aggregrating functions:

  • Counting functions (the number of matching trees, number of nodes in matching trees)

Supertree functions:

  • automate pruning-grafting super-tree method
  • min-cut super-tree method

PhyloWS Requirements

Phylogenetic Tree API

  1. Task: Find trees
    • Input: one or more (partial) names, or identifiers, and optionally a namespace of matching trees
    • Output: names and identifiers of matching trees
  2. Task: Find trees by nodes
    • Input: a list of labels of nodes
    • Output: names and identifiers of trees that each contain nodes with each of the labels
  3. Task: Find trees by clade
    • Input: clade specification (phylocode)
    • Output: names and identifiers of trees that each contain nodes with each of the labels
  4. Task: Retrieve tree
    • Input: identifier of tree to be retrieved
    • Output: the tree (with complete structure)
  5. Task: Retrieve subtree or root node for matching clades
    • Input:
      • clade specification (identifier or label of clade root, phylocode specification)
      • whether to only return the root of the clade (MRCA query)
      • optionally, filter by namespace and name(s) (or identifier(s)) of trees
    • Output: matching clades as subtrees (with complete structure)
  6. Task: Project tree to subtree induced by a set of nodes
    • Input: specifications of nodes (labels, identifiers) that induce a subtree
    • Output: the subtree induced by the specified nodes, with all other nodes pruned

Tree Matching

  1. Task: Find, or filter trees matching a query topology.
    • The query topology might have polytomies, of which matching trees may be a specialization.
    • Input: A database (or result set) of trees, a query tree, and a distance metric
    • Output: The matching trees (names, identifiers), or alternatively the subtrees of matching trees projected onto the query topology

Tree Functions

PhyloCode

  • PhyloCode as phylogenetic clade query specification
  • Distinction between identifiers and specifiers

Example Resources

  • Phylota (~80,000 trees)
  • PhyloCode:
    • Regnum database (Sweden)
    • Sorano database (Chicago)