Talk:CDAO

From Evolutionary Informatics Working Group
Jump to: navigation, search

discussions on ontology issues

tree topology

Francisco:

This would be the part corresponding to Brandon's tree (http://www.cs.nmsu.edu/~bchisham/ontology/doc/generated/owl/tree/), that seems to be one of the less developed part of the ontology. We want to describe an evolutionary tree in ontology terms.

In regard of this aspect we have first though to use standard concepts in graph theory to describe the tree: such like making each OLT being a a node and being related to the others by conectors is_parent and is_child. After thinking in some examples, we verified that the usage of these concepts doesn't allow the representation of unrooted evolutionary trees.

And so, we suggest the utilization of the terms *clade* to represent a cluster of two TUs (taxonomy unit). All the TUs are linked to each other and to the clades they represent by relationships defined as is_sister_group. We believe that using only these terms it is possible to describe any consistent tree topology.

We can also add two other related links, such as is_root or is_outgroup when we'll have a rooted tree... and something like is_polytomic when we don't know the exact phylogeny.

Arlin

I'm afraid that does not solve the problem, because you are using "clade" in a non-standard way.

In fact, you're right. Phylocode description is: "a clade is an ancestor (an organism, population, or species) and all of its descendants" (http://www.ohiou.edu/phylocode/art1-3.html).

inferred character-state mappings

Francisco: For each character defined in our ontology, we may define also their evolutionary status for the clades under analysis. These status shall be:

  • Apomorphy: A derived or specialised character.
  • Plesiomorphy: An ancestral or primitive character.
  • Synapomorphy: An apomorphy (derived or specialised character) shared by two or more groups which originated in their last common ancestor.
  • Symplesiomorphy: A character shared by a number of groups, but inherited from ancestors older than the last common ancestor.
  • Homoplasy: character convergence not linked by vertical ascendence

Arlin: That's fine (except the definition of symplesiomorphy) but I think this is venturing into an area that has low utility and is not fundamental.

transformation details

Francisco: Regarding molecular data and, in order to describe how a sequence changes to another during the evolutionary process, we need to define which information we should store. This means to add in Brandon's transformation ontology [1] data saying that: for example, a duplication needs to have a size and a sequence of bases or amino acids... or something like this. Anyway, it is necessary to define concepts (classes) to store information about these transformations.

In this sense we'll also need to define putative ancestor sequences (do we need use cases for this?) for each clade in order to define the putative molecular events occurring to transform the ancestor into the actual sequence.

The transformation data will be attached to the information about the branchs in the trees, describing exactly what kind of transformation changes a node into a leaf (or another node).

generalizing the OTU concept

Francisco: "Besides these three points, we have also been discussing about the concept of OTU, OLT and etc. Since the evolution of genes may happen in parallel to evolution of species and since we might want to use the ontology to understand *gene family evolution* (instead of organisms' evolution), we suggest that a sequence may be consider an OTU (instead of an organism).

In the same way as an organism, a sequence presents characters and states of characters... and it may be understood as an OTU when analyzing gene family evolution. Moreover, the ancestral sequences may be understood as HTU (Hypothetical TU) and they should also enter in the analysis when working with the *transformation* sub-ontology."

Arlin: Yes, I agree that is a necessary generalization. Its already used in NEXUS files, for instance, where a "taxon" (the NEXUS word for an OTU) can be a gene, a protein, a species, a population, etc.