Concept Glossary

From Evolutionary Informatics Working Group
Revision as of 10:44, 27 September 2007 by Arlin.stoltzfus@nist.gov (talk)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Concept Glossary for the Domain of Evolutionary Analysis

Suggestions on providing feedback to improve the glossary

  • directly modify a definition to improve it (be prepared to defend your change by citing precedents)
  • define a term in the "undefined" table, then move it to the "defined" table
  • describe connections between terms (e.g., "see also"; "synonyms")
  • please help us by making your changes ATOMIC (one item at a time) and sticking to principles (next section)

Principles governing content

The principles that we aspire to follow in constructing the glossary are listed below. Comments on this part are welcome (e.g., "that principle is not relevant", "you aren't really following this principle", and so on).

  1. What is included
    • Terms that denote general concepts, not instances (e.g., Phylogenetic Inference Method, but not "PAUP*")
    • Terms whose meaning is domain-specific (e.g., "integer" means the same to mathematicians and phylogeneticists, but "tree" does not)
    • Composite terms (terms made of other terms) ONLY when the meaning is unexpected
  2. How the definition is determined
    • By studying usage in journal articles and, in particular, books (e.g., Nei and Kumar; Li and Graur; Felsenstein)
    • By consulting domain experts and by soliciting feedback
    • By studying the use of terms in software and data interfaces
  3. How synonyms, ambiguities and overlaps are handled
    • In some cases we make a term domain-specific by qualifying it, as in "Phylogenetic tree" (not just "tree") or "Organismal taxonomy" (not just "taxonomy")
    • Where synonyms exist, we prefer the term
      • that is most widely used by domain experts
      • that conflicts least with familiar extra-domain meanings
    • So far we are sticking to major meanings and not paying much attention to minor meanings

Defined Concepts

ConceptDefinition
Bifurcation, Bifurcating See Dichotomy, Dichotomous
Bipartition (Split) A partition of all OTUs (in the current analytical context) into two sets. Every Branch in a Phylogenetic Tree defines a Bipartition. Bipartititions are used when comparing tree topologies to assign Bootstrap Support Values or to identify shared features of Topology. Synonyms: Split
Bootstrap (Bootstrap confidence value; Bootstrap support value) Bootstrapping is a Resampling Method used to create pseudo-replicate data sets by drawing (with resampling) from the available data set. The fraction of times an outcome is computed from among bootstrap-resampled data sets is the Boostrap Support Value for that outcome.
Branch (Edge) "Branch" is the typical domain-specific term for the edges of a Phylogenetic Tree. Branches may have properties such as length and degree of statistical support. Each branch in a tree defines a split or bipartition, therefore bi-partition support values may be assigned to branches. Synonyms: Edge (rare)
Character A character is a set of features related by homology, or (usually indistinguishably), it is the archetype or Platonic form underlying the "same" feature observed in different instances. In a sequence alignment or a Character State Matrix, a charater is a column. If the character has discrete states (e.g., present vs. absent; T, C, A and G), then it is a "discrete character" (likewise a "continuous character" has continuous states). See also: Character State; Character State Matrix; Synonyms: Column (in some contexts)
Character-State The state of a Character for a given OTU. For instance, if Sequence 1 has a "G" in the 10th column of a sequence alignment, then "G" is the Character State of the 10th Character for Sequence 1. Typically Character States are observed values. However, the values of unobserved states, including Ancestral states as well as missing data, can be inferred using a Transition Model applied to a Phylogenetic Tree. Typically a Character State is treated as a singular definite value, however in some instances it may be conceived as a set of values present in a population, a distribution of values, and so on (as allowed in the NEXUS definition of a Character State Data Matrix
Character-State Data Matrix A matrix of observed Character-State data. Synonyms: Character Data Matrix
Clade A Clade is a set of species (and by extension, a set of OTUs of any type) that includes all of the descendants of their Most Recent Common Ancestor. Every node in a Dichotomous Phylogenetic Tree defines a Clade of descendants. A Clade is not the same as a subtree (a clade is a set of OTUs; a subtree is a part of a tree). Synonyms: Monophyloetic-holophyletic group (Systematics)
Dichotomy, Dichotomous A 2-fold branching. A Phylogenetic Tree has Dichotomous branching if each parent node has exactly two children. Synonyms: Bifurcation, Bifurcating
Distance Matrix A matrix of pairwise distances between OTUs, typically used in distance-based Phylogeny Inference Methods. A Distance matrix is not the same as a Character-State matrix.
Edge see Branch
Evolutionary Transition An evolutionary change. In the context of character analysis, an evolutionary transition is a change in the state of a character along a branch.
Fully Resolved A Phylogenetic Tree is said to be Fully Resolved if all its branchings are dichotomous. Trees with polytomies are said to be unresolved.
Gap The concept of a "gap" is ambiguous and is tied to the use of a "gap character" (often the en dash "-") in text representations of sequence alignments. In general, the "gap" represents the absence of any positively diagnosed Character-State. The gap may be interpreted as an additional Character-State, as the absence of the Character, or as a missing or otherwise unknown value. See: Indel, Missing Data
General Time-Reversible Model (GTR) A Transition Model for nucleotide states allowing a separate parameter for each reversible rate N_i to N_j (also called the 6-parameter model)
Homology Relationship of similarity due to inheritance from a common ancestor. A relationship of similarity that is not due to common ancestry, but to Convergence is called Analogy.
Leaf node see Terminal Node
Likelihood Method A Phylogenetic Inference Method in which the objective function used to find a most likely Phylogenetic Tree (and Transition Model) is a likelihood of the observed data conditional on a tree and a model.
Most Recent Common Ancestor The MRCA of a set of two or more species (or, by extension, any kind of OTU) is the most proximal ancestor (ancestral node on a Phylogenetic Tree) of which the species are descendants.
Neighbor-Joining Algorithm A distance-based Phylogeny Inference Method.
Operational Taxonomic Unit (OTU) The entities from which Character States are observed and taken as ground truths. In some cases the OTU may be a composite of data drawn from several sources. Note that the use of "taxon" for both an OTU and for a class in Organismal Taxonomy is a cause of confusion.
Organismal Taxonomy A classification of organismal species (and sub-species) consisting of a nested hierarchy of classes. Traditional Organismal Taxonomy includes named ranks of Kingdom, Phylum, Class, Order, Family, Genus, Species. Species are named by Genus and Species, thus Homo sapiens is the sapiens species of the genus Homo.
Outgroup 1. When used as a unary modifier, i.e., when a set of one or more OTUs is designated as "the outgroup", the outgroup is a set of OTUs assumed on prior grounds to be a phylogenetic outlier from the complementary "ingroup" consisting of all the other OTUs, that is, the ingroup and the outgroup are sister clades that represent two separate paths of descent from a common ancestor. Typically such an outgroup is designated for the purpose of Rooting a Phylogenetic Tree.
2. A secondary usage is to describe the relation of two sets A and B given a tree topology in which A and B are non-overlapping clades.
Parsimony Method A method for finding the minimum transitions to account for a character given a tree, and by extension, a character-based Phylogeny Inference Method in which the inferred tree (the "maximum parsimony tree") is the tree that minimizes transitions over all characters.
Phylogeny Broadly speaking, a phylogeny is the evolutionary history of some set of characters or OTUs. More narrowly, it is merely the Phylogenetic Tree representing paths of descent.
Phylogeny Inference Method A method of inferring an evolutionary history. Phylogeny inference methods may generate a Phylogenetic Tree as well as Reconstructed Ancestral Character-States, using inputs based on observed data. They fall into two broad classes: distance methods that use a Distance Matrix as input, and character-based methods that use a Character-State Matrix as input. Of the character-based methods, some are rule-based (Parsimony, Invariants), while others are probabilistic and depend on a Transition Model that must be specified explicitly (Likelihood, Bayesian)
Reconcile Tree (Reconciled Tree) The Phylogenetic Tree for a gene family may conflict with the tree for the implicated species due to duplications and deletions that occurred in its history. A Reconcile Tree (Reconciled Tree) is a special kind of Phylogenetic Tree that reconciles a gene tree with a species tree by means of hypothesized duplications and deletions. In principle, the concept of a "Reconcile Tree" might be extended to include the case of lateral gene transfer, but this has not been attempted.
Replacement In one sense, a synonym for Evolutionary Transition, and in another sense, a sub-class. In some contexts it is common for Evolutionary Transitions to be called "substitutions" when they refer to nucleotide states, and "replacements" when they refer to amino acid states (at one time, this was the official editorial policy of the journal Molecular Biology and Evolution).
Root The root node of a Phylogenetic Tree, the node with no parents
State see Character State
Step Matrix A matrix of positive integers representing the number of evolutionary "steps" between Character-States i and j. Used in the Phylogenetic Inference Method called Parsimony, a counting method. Compare: Transition Model
Supertree A Phylogenetic Tree derived from a set of partially overlapping, smaller, "source" Phylogenetic Trees. Supertree methods are used when character methods on the complete set of data would be too compute intensive, or when missing data would prevent an analysis using the same characters for all OTUs.
Taxon A class defined in a taxonomy. The domain-specific use implicates a class of organismal species, e.g., mammals and birds are animal taxa. Plural: Taxa.
Taxonomy see Organismal Taxonomy
Terminal Node The nodes of a tree that have no children. Typically terminal nodes in a tree correspond to OTUs with their observable properties, while internal nodes correspond to ancestors. However, in the case of simulations or evolution-in-the-lab, an internal node may be associated with known properties. Some Reconcile Trees have terminal nodes that represent inferred deletions.
Topology (Phylogenetic Tree Topology) Typically the term "topology" applied to a Phylogenetic Tree is a reference to the connectivity of nodes in the tree, disregarding branch properties such as length.
Transition, disambiguation
  1. A purine-to-purine or pyrimidine-to-pyrimidine change in nucleotides: see Nucleotide Transition
  2. An Evolutionary Transition, a change in the state of a character: see also Transition Model
Transition Model A model of rates or probabilities of Evolutionary Transitions, typically defined for use in a first-order Markov transition model.
Tree (Phylogenetic Tree) A tree that represents evolutionary paths of descent-with-modification from common ancestors. Typically a Phylogenetic Tree is assumed to be a connected, directed graph in which nodes have no more than one parent and the directionality of each edge is from the root toward the terminal nodes. When domain scientists wish to relax these restrictions due to conditions of not knowing the root, or of allowing for multiple parentage (e.g., lateral transfer), they favor the term "network", though this again does NOT correspond to the graph-theory meaning of the same term.
Unresolved A tree with Polytomies is often said to be Unresolved, on the assumption that the true tree must be Dichotomous.

Undefined concepts

ConceptDefinition
Ancestor
Cladogram
Coalescent Tree
Compatibility
Consensus Tree
Contrasts see Independent Contrasts Method
Convergence (Evolutionary convergence)
Cost Matrix cf. Step Matrix
Decay Index cf. Retention index, Compatibility
Dollo Parsimony A character-based Phylogeny Inference Method
Homoplasy
Horizontal Gene Transfer see Lateral Gene Transfer
Indel
Independent Contrasts Method
Lateral Gene Transfer
Missing Data
Molecular Clock
Nucleotide Transition ambiguous
Orthology
Parallelism (Evolutionary Parallelism) See also: Homoplasy
Paralogy
Polymorphism
Polytomy
Rank see Taxonomic Rank
Relative Rate Test
Substitution see Replacement
Taxonomic Rank
Unrooted


Additional terms from other domains whose meanings are consistent

  • From statistics and applied maths: likelihood ratio test; Akaiki information criterion; odds ratio; bootstrap resampling; Monte Carlo;
  • From biology, systematics, genetics and evolution: homoplasy, positive selection, cladistics, development, genotype, phenotype
  • From molecular biology and bioinformatics: alignment
  • From computer science and maths: HMM (Hidden Markov Model); Dynamic programming; </body> </html>