Concept Glossary

From Evolutionary Informatics Working Group
Jump to: navigation, search



This is a concept glossary for evolutionary comparative analysis (sometimes called "phylogenetic comparative analysis" or just "the comparative method" in evolutionary biology). By "evolutionary comparative analysis" we refer to the methods and foundational principles for interpreting similarities and differences as the outcome of an evolutionary process. The scope of the glossary should be such that it would serve as an effective resource for a student or a researcher (one who wishes to interpret research publications, software documentation, and interfaces), but does not duplicate common meanings from other fields

How to improve the glossary

  • define a term in the "undefined" section, then move it to the "defined" section
  • directly modify a definition to improve it
  • describe relations such as disjunction, synonymy, part_of and is_a, using wiki cross-refs like this:
    Insertion is disjoint to [[#Deletion]]
  • make it easier to maintain and disseminate this list by
    • making your changes ATOMIC (one item at a time)
    • maintaining the format
    • sticking to principles (next section)

Principles governing content

(note: email feedback on principles to Arlin)

  • What is included:
    • Terms that denote general concepts (e.g., Phylogeny Inference Method, but not MrBayes).
    • Terms that have a domain-specific meaning (e.g., "tree", "taxonomy").
    • Composite terms only when the meaning is unexpected, but not when the meaning is obvious (e.g., "non-synonymous" means "not synonymous")
  • What should not be included:
    • unique instances or particulars (e.g., PAUP*; we will include these at a later stage)
    • obscure, insiders-only jargon (e.g., "omega")
    • terms with common meaning well outside of the domain (e.g., "integer")
  • How the definition is determined
    • By studying usage in articles and books (e.g., Nei and Kumar; Li and Graur; Felsenstein)
    • By consulting domain experts and by soliciting feedback
    • By studying the use of terms in software and data interfaces
  • How synonyms, ambiguities and overlaps are handled
    • we may make a term domain-specific by qualifying it, as in "Phylogenetic tree" (not just "tree") or "Organismal taxonomy" (not just "taxonomy")
    • Where synonyms exist, we may choose the term
      • that is most widely used by domain experts
      • that conflicts least with familiar extra-domain meanings
    • We can decide later not to use a term that is too ambiguous
  • In the case of disputes over meanings
    • Open the topic for discussion on the Discussion page
    • Find examples that illustrate actual usage
    • Consider ways to replace a problematic term with alternatives that cover its meanings

Defined Concepts

Allele Fixation

From population genetics, fixation of an allele is the attainment of a frequency of 1, or more rigorously, attainment of a frequency approaching some point of stability that is equal to, or nearly equal to, 1.


In common usage, "analogy" is the relation between two things (or processes) that have a similar pattern or co-relation among their parts (or sub-processes). In evolutionary biology, "analogy" has a peculiar meaning that is restricted to cases in which the things compared are not homologous. That is, homologous structures may have the same pattern or function, but they are not called "analogous". A moth's wing and a robin's wing are said to be analogous, whereas a robin's wing and a crow's wing are not.


An entity from whom features were inherited by some #OTU of current interest. A #Parent is the closest form of ancestor. Ancestry is the reverse relation of #Descendant descendancy. The nearest ancestral node on a tree is sometimes called an "immediate" ancestor. See #Most Recent Common Ancestor. The relationship of ancestry is transitive: if A is the ancestor of B, and B is the ancestor of C, then A is the ancestor of C.


A #State of an #OTU that is a #Derived #State in the current context. Disjoint with #Plesiomorphy.


see #Reversion


See #Dichotomy


A partition of all #OTUs (in the current analytical context) into two sets. Every #Branch in a #Phylogenetic Tree defines a Bipartition. Bipartititions are used when comparing #Phylogenetic Tree topologies to assign Bootstrap Support Values or to identify shared features of Topology. Synonyms: Split

Bootstrap Support

In phylogenetics, references to a "bootstrap" ("bootstrap support", "bootstrap confidence") value typically refer to a #Branch Support Value computed by bootstrap resampling (bootstrapping). Bootstrapping is a Resampling Method used to create pseudo-replicate data sets by drawing (with replacement) from the available data set. The fraction of times an outcome occurs among #Phylogenetic Trees inferred from bootstrap-resampled data sets is the Bootstrap Support value for that outcome. Thus the bootstrap support value for a #Branch is the fraction of times this #Branch is found in #Phylogenetic Trees computed from resampled data sets.


"Branch" is the typical domain-specific term for an edge of a #Phylogenetic Tree. Branches may have properties such as length and degree of #Branch_Support. See also #Split.

Branch Support

Each #Branch in a #Phylogenetic Tree defines a #Bipartition. The degree of confidence in a particular #Branch (bipartition) may be indicated by a #Branch Support value, typically a #Bootstrap Support value or a Bayesian posterior probability.

Change, disambiguation

In comparative analysis, the compared things are related by descent from a common ancestor, therefore a difference between compared things (pattern) implicates a change (process), specifically a #Hereditary Change. Because every difference implicates a change, researchers often point to a difference and refer to it as a "change".


A character is a set of features related by homology, or (usually indistinguishably), it is the archetype or Platonic form underlying the "same" feature observed in different instances. In a sequence alignment or a #Character-State Data Matrix, a character is a column. If the character has discrete states (e.g., present vs. absent; T, C, A and G), then it is a "discrete character" (likewise a "continuous character" has continuous states). See also: #Character-State; #Character-State Data Matrix; Synonyms: Column (in some contexts), Site (in some contexts)


The state of a #Character for a given #OTU. For instance, if Sequence 1 has a "G" in the 10th column of a sequence alignment, then "G" is the Character-State of the 10th Character for Sequence 1. Typically Character-States are observed values. However, the values of unobserved states, including #Ancestral states as well as #Missing Data, can be inferred using a #Transition Model applied to a #Phylogenetic Tree. Typically a #Character-State is treated as a singular definite value, however in some instances it may be conceived as a set of values present in a #Population, a distribution of values, and so on (as allowed in the NEXUS definition of a #Character-State Data Matrix

Character-State Data Matrix

A matrix of observed #Character-State data. Synonyms: Character Data Matrix, Character-State Matrix


A Clade is a set of #Species (and by extension, a set of #OTUs of any type) that includes all of the #Descendants of their #Most Recent Common Ancestor. In a rooted #Phylogenetic Tree, every #Node or #Subtree defines a Clade. An alternative definition is that a Clade includes the #Ancestor and any #Descendants whether or not they are #OTUs. Cf #Holophyletic (Systematics)


A pictorial representation of a #Phylogenetic Tree that is understood to represent only what domain experts call the #Topology, meaning the connectivity of nodes, and not the lengths of #Branches between them. Nevertheless, as actual lines must have non-zero lengths, to draw a cladogram one must apply an arbitrary convention for #Branch lengths, typically either

  1. make all #Branches the same length, or
  2. adjust #Branch lengths so that #Terminal Nodes fall on a line


see #Independent Contrasts Method


See #Convergent Evolution

Convergent Evolution

#Convergent Evolution (convergence) is a pattern in which two different #OTUs reach the same #Derived #State by a different series of Evolutionary Transitions. Subclass of #Homoplasy. Considered to be disjoint with #Parallel Evolution, though the exact line of distinction is not always clear.

CpG Bias

An enhanced rate of #Mutation at CG dinucleotide sites typical in mammalian genomes, arising from oxidative damage. A kind of #Mutation Bias.


From genetics, the removal of one or more contiguous residues from a sequence. In phylogenetics, Deletion may refer either to a #Mutation or to an #Evolutionary Transition. Disjoint to #Insertion


A #State is #Derived if it is not #Ancestor ancestral in the given context. The opposite of #Derived is #Primitive. Note that, because the use of #Ancestor is context-dependent (relative to some #OTUs of interest), #Derived is also context-dependent (for explanation, see #Primitive).


An entity that inherits features from some entity of current interest. Descendancy is the reverse relation of #Ancestor ancestry. A #Child is the closest form of descendant. The relationship of descendancy is transitive: if A is the descendant of B, and B is the descendant of C, then A is the descendant of C.


A 2-fold branching. A #Phylogenetic Tree has Dichotomous #Branching if each parent node has exactly two children. Disjoint to #Polytomy. Synonyms: Bifurcation

Distance Matrix

A matrix of pairwise distances between #OTUs, typically used in distance-based #Phylogeny Inference Methods. A Distance Matrix is not the same as a #Character-State Data Matrix.

Dollo Parsimony

A character-based #Phylogeny Inference Method that applies the Parsimony principle to presence-and-absence (i.e., 2-state) #Characters with the restriction that gain (the #Transition from absence to presence) can happen only once for each #Character.


See #Random Genetic Drift


see #Branch

Evolutionary Transition

An evolutionary change. In the context of character analysis, an evolutionary transition is a change in the #Character-State of a #Character along a #Branch; in the terms of evolutionary genetics, it implicates a two-stage process of an originating #Mutation, and its subsequent #Fixation in a #Population (by #Selection or #Drift). Population geneticists tend to focus on reproductive sorting (selection or drift) and may refer to an evolutionary change as an allele "fixation" or "substitution" or "replacement", while molecular biologists tend to focus on the originating mutation and may refer to an evolutionary change as a "mutation".


See #Allele Fixation

Fully Resolved

A #Phylogenetic Tree is said to be Fully Resolved if all its branchings are dichotomous. Trees with #Polytomy are said to be unresolved.


The concept of a "gap" is ambiguous and is tied to the use of a "gap character" (often the en dash "-") in text representations of sequence alignments. In general, the "gap" represents the absence of any positively diagnosed #Character-State. As such, the gap may be interpreted as an additional #Character-State, as the absence of the #Character, or as an unknown value (#Missing Data).

General Time-Reversible Model (GTR)

A #Transition Model for nucleotide #Character-States allowing a separate parameter for each reversible rate of #Evolutionary Transition between Ni and Nj (also called the 6-parameter model)

Genetic Recombination

In classical genetics, and in some areas of evolutionary biology, "recombination" or "genetic recombination" (as distinct from #Molecular recombination) refers to any kind of genetic mixing, including #Molecular recombination as well as Mendelian re-assortment of chromosomes.

Genome Hypothesis

Grantham's hypothesis that each species has a distinctive genome-wide #Codon Usage strategy reflecting adaptation for translation efficiency.


Horizontal Gene Transfer (see #Lateral Gene Transfer).

Hereditary Change

A Hereditary Change is a change in hereditary material accounting for an observed difference. This category is not widely used by domain experts, but is necessary to generalize formal descriptions to cover both between-species and within-species problems. The classic use-case for comparative analysis is the analysis of species differences. When species differ, this is a "fixed difference" that implicates an #Evolutionary Transition. However, comparative methods also are applied to individuals within a species, where the difference implicates a Hereditary Change that is not an Evolutionary Transition.


Relationship of similarity due to inheritance from a common ancestor. A relationship of similarity that is not due to common ancestry, but to Convergent Evolution is called #Analogy.


Homoplasy (Lankester, 1870) is any pattern in which evolutionary change occurs but does not increase differentiation. If the comparison is between a descendant and an ancestor, then the pattern is #Reversal or #Atavism. If the comparison is between two descendants, the pattern is either #Convergent Evolution (when the descendants become more similar) or #Parallel Evolution (when the descendants undergo identical changes).


Condition of a set of #Species (and by extension, #OTUs of any kind) and their #Ancestors that includes their #Most Recent Common Ancestor and all of its #Descendants. "Holophyletic group" thus is synonymous with #Clade. Subclass of #Monophyly, disjoint to #Paraphyly (sensu Ashlock).

Horizontal Gene Transfer (HGT)

see #Lateral Gene Transfer


#Hypothetical Taxonomic Unit

Hypothetical Taxonomic Unit

Hypothetical analog of an #OTU, typically representing an unobserved #Ancestor entity.


A fusion of the terms for #Insertion and #Deletion that has two meanings, one based on the logic of OR (common in phylogenetics), and the other based on the logic of AND (used in mutation research, e.g., Chuzhanova, et al., 2003):

  1. A length difference between two aligned sequences, denoting the evolutionary occurrence of either an #Insertion OR a #Deletion during their divergence from a common ancestor.
  2. A complex mutational event involving the addition of some residues and the loss of others, i.e., #Insertion AND #Deletion


From genetics, the addition of one or more contiguous residues to a sequence. In phylogenetics, Insertion may refer either to a #Mutation or to an #Evolutionary Transition. Disjoint to #Deletion

Lateral Gene Transfer

Incorporation of genetic material from one organism into the genome of another organism that is not its reproductive offspring is called "Lateral" or "Horizontal" gene transfer. The more typical form of inheritance is the "vertical" transfer that takes place from parent to offspring during biological reproduction. Lateral Gene Transfer is sometimes abbreviated LGT.

Leaf node

see #Terminal Node


#Lateral Gene Transfer

Likelihood Method

A #Phylogeny Inference Method in which the objective function used to characterize a #Phylogenetic Tree (and #Transition Model) is the likelihood, which is the probability of the observed data conditional on the #Phylogenetic Tree and the #Transition Model.


A Lineage is a forward (in time) path in a #Phylogenetic Tree, representing a linear path of descent connecting an #Ancestors to #Descendant. Lineages are not determinable in the case of an un#Rooted tree.

Molecular Clock

The assumption that evolution is clock-like over some interval, typically in the sense that the expected number of #Evolutionary Transitions per unit of time is constant.

Molecular recombination

A kind of genetic event involving DNA (RNA) in which the starting material is rearranged so as to change the connectivity or pairing relationships of strands. This includes events of crossing-over as well as gene conversion events without crossing-over.


Condition of a set of #Species (and by extension, #OTUs of any kind) and their #Ancestors that includes their #Most Recent Common Ancestor. In contexts where #Monophyly is not sub-classed to #Holophyly and #Paraphyly, it often is assumed to mean #Holophyly (sensu Ashlock).

Most Recent Common Ancestor

The Most Recent Common Ancestor (MRCA; also LCA or Least Common Ancestor) of a set of two or more #Species (or, by extension, any kind of #OTU) is the most recent ancestor shared among the set, corresponding to the most proximal ancestral #Node on a #Phylogenetic Tree).


See #Most Recent Common Ancestor.


  1. (abstract) the process by which heritable changes in the genome occur
  2. a particular heritable change in a genome (e.g., the mutation causing the most common sickle-cell allele)
  3. the altered state resulting from a mutation, i.e., the mutant state

Mutation Bias

An asymmetry or non-uniformity in the occurrence of #Mutations categorized by position, type, effect class, or some other index variable.

Natural Variation

The presence, in a single #Species or #Population, of more than one #Character-State for a given #Character.

Natural Selection

The process by which inherent asymmetries in the survival and reproduction of competing forms lead cumulatively to differences in representation of these forms. Disjoint to #Random Genetic Drift.

Neighbor-Joining Algorithm

A distance-based #Phylogeny Inference Method.


A kind of #Phylogenetic tree in which some typical restrictions are not satisfied. In some contexts, it is a synonym for un#Rooted #Phylogenetic Tree. In other contexts, it signals the presence of #Nodes with multiple parents.


A difference that has an insignificant effect on fitness. In population genetics, a fitness difference <math>s</math> is considered insignificant if <math>|s|<<1/(2PN_e)</math>, where <math>P</math> is the ploidy (one for haploids, two for diploids), and <math>N_e</math> is the effective population size. Outside of population genetics, this term may be used more loosely to indicate a difference that is thought to be unimportant. This term may be applied to a #Polymorphism, #Mutation, or #Evolutionary Transition that represents a #Neutral difference.

Nucleotide Transition

A substitution of one purine (A or G) to another, or one pyrimidine (C or T) to another, a kind_of #Mutation or alternatively, a corresponding kind_of #Evolutionary Transition or #Polymorphism.

Nucleotide Transversion

A substitution of a purine (A or G) to a pyrimidine (C or T) or vice versa, a kind_of #Mutation or alternatively, a corresponding kind_of #Evolutionary Transition or #Polymorphism.

Operational Taxonomic Unit

The entities from which #Character-States are observed and taken as ground truths. In some cases the #OTU may be a composite of data drawn from several sources. Note that the use of "taxon" for both an #OTU and for a class in #Organismal Taxonomy is a cause of confusion.

Organismal Taxonomy

A classification of organismal #Species consisting of a nested hierarchy of classes. Traditional #Organismal Taxonomy includes named #Taxonomic Ranks and is the basis for the usual way of referring to species of organism by Genus and Species (e.g., Homo sapiens is the sapiens species of the genus Homo).


Relationship of sequences that have diverged via speciation events but not by events of #Gene Duplication. Subclass of #Homology. Disjoint to #Paralogy.


See #Operational Taxonomic Unit.


  1. When used as a unary modifier, i.e., when a set of one or more #OTUs is designated as "the outgroup", the outgroup is a set of #OTUs assumed on prior grounds to be a phylogenetic outlier from the complementary "ingroup" consisting of all the other #OTUs, that is, the ingroup and the outgroup are sister clades that represent two separate paths of descent from a common ancestor. Typically such an outgroup is designated for the purpose of Rooting a #Phylogenetic Tree.
  2. A secondary usage is to describe the relation of two sets A and B given a #Phylogenetic Tree in which A and B are non-overlapping clades.

Parallel Evolution

Parallel Evolution (parallelism) is a pattern in which two different #OTUs reach the same #State by the same series of Evolutionary Transitions. Parallelism is a subclass of #Homoplasy and is considered to be disjoint with #Convergence, though the exact line of distinction is not always clear.


Relationship of sequences that have diverged via one or more events of #Gene Duplication. Subclass of #Homology. Disjoint to #Orthology.


Condition of a set of #Species (and by extension, #OTUs of any kind) and their #Ancestors that includes their #Most Recent Common Ancestor and some but not all of its #Descendants. Subclass of #Monophyly, disjoint to #Holophyly (sensu Ashlock). For instance, the reptile group is paraphyletic because, while it s common ancestor was a reptile, some of its descendants are birds not included in the reptile.

Parsimony Method

A method for finding the minimum transitions to account for a #Character given a #Phylogenetic Tree, and by extension, a character-based #Phylogeny Inference Method in which the inferred #Phylogenetic Tree (the "maximum parsimony tree") is the #Phylogenetic Tree that minimizes #Evolutionary Transitions over all #Characters.

Phylogenetic Tree

A Phylogenetic Tree represents evolutionary paths of descent-with-modification from common ancestors. Typically a Phylogenetic Tree is assumed to be a connected, directed, acyclic graph in which #Nodes have no more than one parent and the directionality of each edge is from the #Root toward the #Terminal Nodes. When domain scientists wish to relax these restrictions due to conditions of not knowing the #Root, or of allowing for multiple parentage (cf. #Reticulate Evolution, #Lateral Gene Transfer), they favor the term "#Network", though this usage once again does not correspond to the meaning assigned to this term in the field of graph theory.

Phylogenetic Tree Topology

Typically the term "topology" applied to a #Phylogenetic Tree is a reference to the connectivity of nodes in the #Phylogenetic Tree, disregarding #Branch properties such as length. See #Cladogram.


Broadly speaking, a phylogeny is the evolutionary history of some set of #Characters or #OTUs. More narrowly, it is merely the #Phylogenetic Tree representing paths of descent.

Phylogeny Inference Method

A method of inferring an evolutionary history (#Phylogeny. Phylogeny inference methods may generate a #Phylogenetic Tree as well as #Reconstrutions of #Ancestral #Character-States, using inputs based on observed data. They fall into two broad classes: distance-based methods that use a #Distance Matrix as input, and character-based methods that use a #Character-State Data Matrix as input. Of the character-based methods, some are rule-based (#Parsimony, #Invariants), while others are probabilistic and depend on a #Transition Model that must be specified explicitly and evaluated in terms of Likelihood or Bayesian Posterior Probability.


A #State of an #OTU that is the #Ancestor ancestral state (i.e., #Primitive) in the current context. Disjoint with #Apomorphy.


The presence, in a single #Species or #Population, of more than one #Character-State for a given #Character, or more than one #Allele for a #Locus. In the literature of population genetics, a distinction is sometimes maintained between Polymorphic and merely variable loci, in which case a genetic locus is not considered Polymorphic unless the most frequent state has a frequency below 95 % (or, in some contexts, 99 %). Polymorphism is a subclass of #Natural Variation.


Condition of a set of #Species (and by extension, #OTUs of any kind) and #Ancestors that does not include their #Most Recent Common Ancestor. Disjoint to #Monophyly (sensu Ashlock).


An N-fold branching of a #Phylogenetic Tree, where <math>N > 2</math>. Disjoint to #Dichotomy. Synonyms: Multifurcation (rare).


A set of individuals of the same #Species, typically the inclusive set that shares some resource such as space or breeding opportunities.


A #State is #Primitive if it is #Ancestor ancestral in the given context. The opposite of #Primitive is #Derived. Note that, because the use of #Ancestor is context-dependent (relative to some #OTUs of interest), #Primitive is also context-dependent, e.g., when comparing human locomotion to mouse locomotion, human bipedalism is derived (since the ancestor walked on four legs), but when comparing human locomotion and chimp locomotion, it is unclear whether bipedalism is derived, since the human-chimp ancestor might have been bipedal (in which case, human bipedalism would be the primitive state, and chimp knuckle-walking the derived state).

Random Genetic Drift

The process by which stochastic asymmetries in survival and reproduction of competing forms lead cumulatively to differences in representation of these forms. Disjoint to #Natural Selection.


see #Taxonomic Rank

Recombination, disambiguation

"Recombination" may refer to

Genetic recombination includes most or all of Molecular recombination. Any molecular recombination event that includes crossing-over would qualify as a genetic recombination event. The possible exceptions would be gene-conversion events without crossing-over. Unarguably these are mechanistically allied with other "molecular recombination" events. However, if there is no crossing over, and if no new genotype is created (except by a mutational side-effect), then such events do not seem to be genetic recombination events in the way that "recombination" is used in classic Mendelian genetics.

Reconcile Tree

The #Phylogenetic Tree for a gene family may conflict with the #Phylogenetic Tree for the implicated species due to events of #Gene Duplication and #Gene Loss that occurred in its history. A Reconcile Tree (Reconciled Tree) is a special kind of #Phylogenetic Tree that reconciles a gene #Phylogenetic Tree with a species #Phylogenetic Tree by means of hypothesized events of #Gene Duplication and #Gene Loss. In principle, the concept of a "Reconcile Tree" might be extended to include events of #Lateral Gene Transfer (another source of conflicts between gene and species #Phylogenetic Trees), but this has not been attempted.


Reconstruction refers to the process of inferring #Ancestral #Character-States. The inferred states are said to be "reconstructed" states.


In one sense, a synonym for #Evolutionary Transition, and in another sense, an ambiguous sub-class of #Evolutionary Transition. In molecular evolution, there is a tendency to refer to #Evolutionary Transitions in sequence evolution as "substitutions" when they refer to nucleotide #Character-States, and "replacements" when they refer to amino acid #Character-States (at one time, this was the official editorial policy of the journal Molecular Biology and Evolution). In population genetics, an #Evolutionary Transition may be referred to as an "allele replacement" or "allele substitution" (e.g., Nei, 1987, p. 421), emphasizing the population-genetic mechanism in which one allele replaces another as the wild-type allele (via #Mutation and #Fixation).

Reticulate Evolution

Pattern of evolution in which separate lineages fuse (e.g., as when a new polyploid fern species arises from a fusion of the complete genomes of two previously separate fern species), corresponding to #Network in which a #Node has more than one parental node.


An #Evolutionary Transition to an #Ancestor ancestral #State. Reversion is a form of #Homoplasy.


The occurrence of an #Evolutionary Transition that is the reverse of an earlier transition. Note the distinction from #Reversion: all reversals return to an ancestral state, but not all #Reversions are the reverse of a previous transition, e.g., G to C to A to G is a #Reversion but none of the individual transitions (G to C, C to A, A to G) are reversed; G to C to G is both a #Reversion and a reversal.


The root (source) node of a #Phylogenetic Tree, the node with no parents. When the root is not known, the #Phylogenetic Tree is said to be unrooted, or is referred to as a #Network.


See #Natural Selection.

Sequence Family

A set of macromolecular sequences (DNA, RNA, or protein) related by #Descent from a #Common ancestor.


A difference that is invisible or has no phenotypic effect. Applied to a #Polymorphism, #Evolutionary Transition, or #Mutation. When applied to nucleotide differences in protein-coding genes, it has the same meaning as #Synonymous.


A fundamental unit of biological classification and analysis, roughly the smallest unit with the potential to persist over an evolutionary time scale. Species often are conceived as assemblages of actually or potentially inter-breeding organisms, in spite of the fact that most organisms are asexual and thus do not "interbreed".

Species Tree

A #Phylogenetic Tree in which the #Terminal Nodes represent #Species.

Star Tree

A tree with a single ancestral node and multiple #OTUs. Star trees are rarely encountered in nature but are useful for calculations.


see #Character-State

Step Matrix

A matrix { Si,j } representing the number of evolutionary "steps" between two Character-States i and j. Used in the #Phylogenetic Inference Method called #Parsimony. cf. #Transition Model


see #Replacement


A #Tree that is part of another #Tree.


A #Phylogenetic Tree derived from a set of partially overlapping, smaller, "source" #Phylogenetic Trees. Supertree methods are used when applying character-based #Phylogeny Inference Methods to the complete set of data would be too compute-intensive, or when #Missing Data would prevent an analysis using the same #Characters for all #OTUs.


A #State shared among some #OTUs that is #Plesiomorphy plesiomorphic in the current context. Disjoint with #Synapomorphy.


A #State shared among some #OTUs that is an #Apomorphy apomorphic (i.e., #Derived) state in the current context. Disjoint with #Symplesiomorphy.


A #Mutation, #Polymorphism, or #Evolutionary Transition that changes a codon without changing the encoded amino acid is #Synonymous. Disjoint to one sense of the ambiguous term #Replacement.


The domain-specific use implicates a class of organismal #Species defined in the #Organismal Taxonomy, e.g., mammals such as squirrels and horses are in the Taxon Mammalia. Plural: Taxa.

Taxonomic Rank

Taxonomic Rank is a ordered categorical descriptor applied to the classes of #Organismal Taxonomy. A class of a given rank may contain only classes of lower rank. The traditional ranks in descending order are: Kingdom, Phylum, Class, Order, Family, Genus and Species. In the 1980's, there were several conflicting proposals to extend this system upward to include a rank higher than Kingdom, variously called "Urkingdom" (Woese, et al), "Domain" (Woese, et al.) or "Empire" (Cavalier-Smith).


see #Organismal Taxonomy

Terminal Node

The nodes of a #Phylogenetic Tree that have no children. Typically Terminal Nodes in a #Phylogenetic Tree correspond to #OTUs with their observable properties, while internal nodes correspond to ancestors. However, in the case of simulations or evolution-in-the-lab, an internal node may be associated with known properties. Some #Reconcile Trees have terminal nodes that represent inferred events of #Gene Loss.


See #Phylogenetic Tree Topology


A feature of an organism, either abstract (#Character) or concrete (#State)

Transition, disambiguation

Transition-Transversion Bias

A ratio of nucleotide transitions to nucleotide transversions that differs from a null expected ratio. Typically the comparison is of aggregate rates or events of #Mutation or #Evolutionary transition relative to the null expectation of a 1:2 aggregate ratio (the aggregate of nucleotide transversions represents twice as many paths as that of transitions). However, one might also define the bias in terms of average rates, in which case the null expectation is 1:1. If in reference to #Mutation, a #Transition-Transversion Bias is a kind of #Mutation Bias. However, in any given case, it may be unclear whether a #Transition-Transversion Bias refers to a bias in #Mutations or to a bias in #Evolutionary transitions (the latter may be caused by the former; the former typically is diagnosed by way of the latter).

Transition Model

A model of rates or probabilities of Evolutionary Transitions, typically defined for use in a first-order Markov transition model.


See #Phylogenetic Tree


A #Phylogenetic Tree with #Polytomy is often said to be Unresolved, on the assumption that every branching event in the true #Phylogenetic Tree is a #Dichotomy.

Undefined concepts




opposite of divergent, looking at a branching process on the reverse time scale


a measure of between-character consistency

Consensus Tree

tree based on combining multiple trees. misuse of "consensus". not the same as supertree.

Cost Matrix

this sounds to me like Step Matrix, with the difference that step implies non-negative integers while cost implies real numbers.

Decay Index

cf. Retention index, Compatibility


typically the distance between otus


ration of non-synonymous to synonymous rate. usually based on a particular model for normalization.

Equilibrium State Frequencies

distribution of characters expected at equilibrium. often used in markov transition models.

Family alignment

sequence alignment for a sequence family

GC/AT pressure

an asymmetry in mutation or in evolutionary changes, favoring GC over AT (or vice versa).

Gene duplication

mutation or evolutionary change resulting in additional copy of gene

Gene loss

loss of a gene, typically implying either deletion or silencing

Gene tree

tree reflects the evolutionary history of genes, not the species of origin

Genetic event

Genetic Locus

Independent Contrasts Method

method of assessing covariance by reducing phylogenetic distribution pattern into independent comparisons (contrasts)


a phylogenetic inference method based on invariants (domain-specific application of more general term from numeric analysis)


local compositional area in chromosomes, characteristic of warm-blooded animal genomes.

minimum evolution

phylo inference method


assigned to transitions that have a direction (i.e., a different rate or cost in the forward vs. reverse direction); polarizing differences means determining which (if any) state is ancestral.

Positive Selection

differential reproduction (selection) viewed relative to the variant whose frequency increases


episode of rapid or repeated cladogenesis

Relative Rate Test

a simple test to correct for phylogenetic structure when assessing rate constancy, in which, given the tree (A, (B,C)), the distances (or inferred changes) from A to B are compared to those from A to C.


Unweighted Pair-Group Method of Analysis, a phenetic clustering method. Arguably, a phylogeny inference method based on using distances.


relation of homology via lateral transfer

Additional terms that do not have a domain-specific meaning

Generic or vague terms (no clear domain-specific meaning)

  • Conservative (see also Radical) has a vague meaning indistinguishable from the non-technical meaning
  • Constraint (Selective Constraint, Developmental Constraint) signifies a condition or relation that must be satisfied, and this usage is not different from the significance of "constraint" used in other kinds of technical writing in the context of an optimization or satisfaction problem.
  • Radical (see also Conservative) has a vague meaning indistinguishable from the non-technical meaning
  • Stasis means "non-change" or "lack of change" as expected

From statistics and applied maths

From biology, systematics, genetics, and evolution

From molecular biology and bioinformatics

  • sequence
  • alignment
  • aligned sequences
  • unaligned sequences

From computer science and maths

Acknowledgements and History

  • June, 2006, Aaron Mackey started a glossary (19 domain concepts and a few other terms) for the group that spawned the NESCent evolutionary informatics working group.
  • September, 2007, Arlin Stoltzfus expanded the concept glossary to 40 defined terms and 22 undefined terms.
  • 12 October, 2007, the concept glossary was released to the NESCent evolutionary informatics working group
  • November, 2007, EvoInfo working group meeting (12th to 14th) and follow-ups
    • terms added by Julie Thompson, Enrico Pontelli, Arlin Stoltzfus
    • reformatted to wiki to allow easier cross-referencing
    • dozens of cross-references added (not complete)
    • about 50 defined terms added (Arlin)