Difference between revisions of "Concept Glossary"

From Evolutionary Informatics Working Group
Jump to: navigation, search
(moved concept Polytomy to defined list)
(moved newly defined concept "Lineage" to defined list)
Line 6: Line 6:
 
==How to improve the glossary==
 
==How to improve the glossary==
 
* define a term in the "undefined" section, then move it to the "defined" section
 
* define a term in the "undefined" section, then move it to the "defined" section
* directly modify a definition to improve it  
+
* directly modify a definition to improve it
 
* describe relations such as disjunction, synonymy, part_of and is_a, using wiki cross-refs like this: <pre>Insertion is disjoint to [[#Deletion]]</pre>
 
* describe relations such as disjunction, synonymy, part_of and is_a, using wiki cross-refs like this: <pre>Insertion is disjoint to [[#Deletion]]</pre>
* make it easier to maintain and disseminate this list by  
+
* make it easier to maintain and disseminate this list by
** making your changes ATOMIC (one item at a time)  
+
** making your changes ATOMIC (one item at a time)
** maintaining the format  
+
** maintaining the format
 
** sticking to principles (next section)
 
** sticking to principles (next section)
  
 
==Principles governing content==
 
==Principles governing content==
(note: email feedback on principles to Arlin)  
+
(note: email feedback on principles to Arlin)
 
* What is included:
 
* What is included:
 
** Terms that denote general concepts (e.g., Phylogeny Inference Method, but not MrBayes).
 
** Terms that denote general concepts (e.g., Phylogeny Inference Method, but not MrBayes).
Line 24: Line 24:
 
** terms with common meaning well outside of the domain (e.g., "integer")
 
** terms with common meaning well outside of the domain (e.g., "integer")
 
* How the definition is determined
 
* How the definition is determined
** By studying usage in articles and books (e.g., Nei and Kumar; Li and Graur; Felsenstein)  
+
** By studying usage in articles and books (e.g., Nei and Kumar; Li and Graur; Felsenstein)
 
** By consulting domain experts and by soliciting feedback
 
** By consulting domain experts and by soliciting feedback
** By studying the use of terms in software and data interfaces  
+
** By studying the use of terms in software and data interfaces
 
* How synonyms, ambiguities and overlaps are handled
 
* How synonyms, ambiguities and overlaps are handled
 
** we may make a term domain-specific by qualifying it, as in "Phylogenetic tree" (not just "tree") or "Organismal taxonomy" (not just "taxonomy")
 
** we may make a term domain-specific by qualifying it, as in "Phylogenetic tree" (not just "tree") or "Organismal taxonomy" (not just "taxonomy")
Line 32: Line 32:
 
*** that is most widely used by domain experts
 
*** that is most widely used by domain experts
 
*** that conflicts least with familiar extra-domain meanings
 
*** that conflicts least with familiar extra-domain meanings
** We can decide later '''not''' to use a term that is too ambiguous  
+
** We can decide later '''not''' to use a term that is too ambiguous
 
* In the case of disputes over meanings
 
* In the case of disputes over meanings
 
** Open the topic for discussion on the Discussion page
 
** Open the topic for discussion on the Discussion page
** Find examples that illustrate actual usage  
+
** Find examples that illustrate actual usage
 
** Consider ways to replace a problematic term with alternatives that cover its meanings
 
** Consider ways to replace a problematic term with alternatives that cover its meanings
  
 
==Defined Concepts==
 
==Defined Concepts==
 
==== Allele Fixation ====
 
==== Allele Fixation ====
From population genetics, fixation of an allele is the attainment of a frequency of 1, or more rigorously, attainment of a frequency approaching some point of stability that is equal to, or nearly equal to, 1.
+
From population genetics, fixation of an allele is the attainment of a frequency of 1, or more rigorously, attainment of a frequency approaching some point of stability that is equal to, or nearly equal to, 1.
  
 
==== Analogy ====
 
==== Analogy ====
Line 46: Line 46:
  
 
==== Ancestor ====
 
==== Ancestor ====
An entity from whom features were inherited by some [[#OTU]] of current interest.  A [[#Parent]] is the closest form of ancestor.  Ancestry is the reverse relation of [[#Descendant descendancy]].  The nearest ancestral node on a tree is sometimes called an "immediate" ancestor.  See [[#Most Recent Common Ancestor]].  The relationship of ancestry is transitive: if A is the ancestor of B, and B is the ancestor of C, then A is the ancestor of C.  
+
An entity from whom features were inherited by some [[#OTU]] of current interest.  A [[#Parent]] is the closest form of ancestor.  Ancestry is the reverse relation of [[#Descendant descendancy]].  The nearest ancestral node on a tree is sometimes called an "immediate" ancestor.  See [[#Most Recent Common Ancestor]].  The relationship of ancestry is transitive: if A is the ancestor of B, and B is the ancestor of C, then A is the ancestor of C.
  
 
==== Apomorphy ====
 
==== Apomorphy ====
A [[#State]] of an [[#OTU]] that is a [[#Derived]] [[#State]] in the current context.  Disjoint with [[#Plesiomorphy]].  
+
A [[#State]] of an [[#OTU]] that is a [[#Derived]] [[#State]] in the current context.  Disjoint with [[#Plesiomorphy]].
  
 
==== Atavism ====
 
==== Atavism ====
Line 66: Line 66:
 
"Branch" is the typical domain-specific term for an edge of a [[#Phylogenetic Tree]].  Branches may have properties such as length and degree of [[#Branch_Support]]. See also [[#Split]].
 
"Branch" is the typical domain-specific term for an edge of a [[#Phylogenetic Tree]].  Branches may have properties such as length and degree of [[#Branch_Support]]. See also [[#Split]].
  
==== Branch Support ====  
+
==== Branch Support ====
 
Each [[#Branch]] in a [[#Phylogenetic Tree]] defines a [[#Bipartition]].  The degree of confidence in a particular [[#Branch]] (bipartition) may be indicated by a [[#Branch]] Support value, typically a [[#Bootstrap Support]] value or a Bayesian posterior probability.
 
Each [[#Branch]] in a [[#Phylogenetic Tree]] defines a [[#Bipartition]].  The degree of confidence in a particular [[#Branch]] (bipartition) may be indicated by a [[#Branch]] Support value, typically a [[#Bootstrap Support]] value or a Bayesian posterior probability.
  
Line 82: Line 82:
  
 
==== Cladogram ====
 
==== Cladogram ====
A pictorial representation of a [[#Phylogenetic Tree]] that is understood to represent only what domain experts call the [[#Topology]], meaning the connectivity of nodes, and not the lengths of [[#Branch]]es between them.  Nevertheless, as actual lines must have non-zero lengths, to draw a cladogram one must apply an arbitrary convention for [[#Branch]] lengths, typically either  
+
A pictorial representation of a [[#Phylogenetic Tree]] that is understood to represent only what domain experts call the [[#Topology]], meaning the connectivity of nodes, and not the lengths of [[#Branch]]es between them.  Nevertheless, as actual lines must have non-zero lengths, to draw a cladogram one must apply an arbitrary convention for [[#Branch]] lengths, typically either
# make all [[#Branch]]es the same length, or  
+
# make all [[#Branch]]es the same length, or
 
# adjust [[#Branch]] lengths so that [[#Terminal Node]]s fall on a line
 
# adjust [[#Branch]] lengths so that [[#Terminal Node]]s fall on a line
  
Line 96: Line 96:
  
 
==== CpG Bias ====
 
==== CpG Bias ====
An enhanced rate of [[#Mutation]] at CG dinucleotide sites typical in mammalian genomes, arising from oxidative damage.  A kind of [[#Mutation Bias]].  
+
An enhanced rate of [[#Mutation]] at CG dinucleotide sites typical in mammalian genomes, arising from oxidative damage.  A kind of [[#Mutation Bias]].
  
 
==== Deletion ====
 
==== Deletion ====
Line 105: Line 105:
  
 
==== Descendant ====
 
==== Descendant ====
An entity that inherits features from some entity of current interest.  Descendancy is the reverse relation of [[#Ancestor ancestry]].  A [[#Child]] is the closest form of descendant.  The relationship of descendancy is transitive: if A is the descendant of B, and B is the descendant of C, then A is the descendant of C.  
+
An entity that inherits features from some entity of current interest.  Descendancy is the reverse relation of [[#Ancestor ancestry]].  A [[#Child]] is the closest form of descendant.  The relationship of descendancy is transitive: if A is the descendant of B, and B is the descendant of C, then A is the descendant of C.
  
 
==== Dichotomy ====
 
==== Dichotomy ====
Line 114: Line 114:
  
 
==== Dollo Parsimony ====
 
==== Dollo Parsimony ====
A character-based [[#Phylogeny Inference Method]] that applies the [[Parsimony]] principle to presence-and-absence (i.e., 2-state) [[#Character]]s with the restriction that gain (the [[#Transition]] from absence to presence) can happen only once for each [[#Character]].  
+
A character-based [[#Phylogeny Inference Method]] that applies the [[Parsimony]] principle to presence-and-absence (i.e., 2-state) [[#Character]]s with the restriction that gain (the [[#Transition]] from absence to presence) can happen only once for each [[#Character]].
  
 
==== Drift ====
 
==== Drift ====
Line 123: Line 123:
  
 
==== Evolutionary Transition ====
 
==== Evolutionary Transition ====
An evolutionary change.  In the context of character analysis, an evolutionary transition is a change in the [[#Character-State]] of a [[#Character]] along a [[#Branch]].  In some contexts, "substitution", "replacement" and even "mutation" may be used as though there were synonyms for Evolutionary Transition.
+
An evolutionary change.  In the context of character analysis, an evolutionary transition is a change in the [[#Character-State]] of a [[#Character]] along a [[#Branch]].  In some contexts, "substitution", "replacement" and even "mutation" may be used as though there were synonyms for Evolutionary Transition.
  
 
==== Fixation ====
 
==== Fixation ====
Line 132: Line 132:
  
 
==== Gap ====
 
==== Gap ====
The concept of a "gap" is ambiguous and is tied to the use of a "gap character" (often the en dash "-") in text representations of sequence alignments.  In general, the "gap" represents the absence of any positively diagnosed [[#Character-State]].  As such, the gap may be interpreted as an additional [[#Character-State]], as the absence of the [[#Character]], or as an unknown value ([[#Missing Data]]).
+
The concept of a "gap" is ambiguous and is tied to the use of a "gap character" (often the en dash "-") in text representations of sequence alignments.  In general, the "gap" represents the absence of any positively diagnosed [[#Character-State]].  As such, the gap may be interpreted as an additional [[#Character-State]], as the absence of the [[#Character]], or as an unknown value ([[#Missing Data]]).
  
 
==== General Time-Reversible Model (GTR) ====
 
==== General Time-Reversible Model (GTR) ====
Line 138: Line 138:
  
 
==== Genome Hypothesis ====
 
==== Genome Hypothesis ====
Grantham's hypothesis that each species has a distinctive genome-wide [[#Codon Usage]] strategy reflecting adaptation for translation efficiency.
+
Grantham's hypothesis that each species has a distinctive genome-wide [[#Codon Usage]] strategy reflecting adaptation for translation efficiency.
  
 
====HGT====
 
====HGT====
Line 150: Line 150:
  
 
==== Holophyly ====
 
==== Holophyly ====
Condition of a set of [[#Species]] (and by extension, [[#OTU]]s of any kind) and their [[#Ancestor]]s that includes their [[#Most Recent Common Ancestor]] and all of its [[#Descendant]]s.  "Holophyletic group" thus is synonymous with [[#Clade]]. Subclass of [[#Monophyly]], disjoint to [[#Paraphyly]] (sensu Ashlock).  
+
Condition of a set of [[#Species]] (and by extension, [[#OTU]]s of any kind) and their [[#Ancestor]]s that includes their [[#Most Recent Common Ancestor]] and all of its [[#Descendant]]s.  "Holophyletic group" thus is synonymous with [[#Clade]]. Subclass of [[#Monophyly]], disjoint to [[#Paraphyly]] (sensu Ashlock).
  
 
==== Horizontal Gene Transfer (HGT) ====
 
==== Horizontal Gene Transfer (HGT) ====
Line 159: Line 159:
  
 
==== Hypothetical Taxonomic Unit ====
 
==== Hypothetical Taxonomic Unit ====
Hypothetical analog of an [[#OTU]], typically representing an unobserved [[#Ancestor]] entity.
+
Hypothetical analog of an [[#OTU]], typically representing an unobserved [[#Ancestor]] entity.
  
 
==== Indel ====
 
==== Indel ====
 
A fusion of the terms for [[#Insertion]] and [[#Deletion]] that has two meanings, one based on the logic of '''OR''' (common in phylogenetics), and the other based on the logic of '''AND''' (used in mutation research, e.g., [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=ShowDetailView&TermToSearch=12497629&ordinalpos=6&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum Chuzhanova, et al., 2003]):
 
A fusion of the terms for [[#Insertion]] and [[#Deletion]] that has two meanings, one based on the logic of '''OR''' (common in phylogenetics), and the other based on the logic of '''AND''' (used in mutation research, e.g., [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=ShowDetailView&TermToSearch=12497629&ordinalpos=6&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum Chuzhanova, et al., 2003]):
#A length difference between two aligned sequences, denoting the evolutionary occurrence of either an [[#Insertion]]  '''OR''' a [[#Deletion]] during their divergence from a common ancestor.  
+
#A length difference between two aligned sequences, denoting the evolutionary occurrence of either an [[#Insertion]]  '''OR''' a [[#Deletion]] during their divergence from a common ancestor.
 
#A complex mutational event involving the addition of some residues and the loss of others, i.e., [[#Insertion]] '''AND''' [[#Deletion]]
 
#A complex mutational event involving the addition of some residues and the loss of others, i.e., [[#Insertion]] '''AND''' [[#Deletion]]
  
Line 170: Line 170:
  
 
==== Lateral Gene Transfer ====
 
==== Lateral Gene Transfer ====
Incorporation of genetic material from one organism into the genome of another organism that is not its reproductive offspring is called "Lateral" or "Horizontal" gene transfer.  The more typical form of inheritance is the "vertical" transfer that takes place from parent to offspring during biological reproduction.  Lateral Gene Transfer is sometimes abbreviated LGT.
+
Incorporation of genetic material from one organism into the genome of another organism that is not its reproductive offspring is called "Lateral" or "Horizontal" gene transfer.  The more typical form of inheritance is the "vertical" transfer that takes place from parent to offspring during biological reproduction.  Lateral Gene Transfer is sometimes abbreviated LGT.
  
 
==== Leaf node ====
 
==== Leaf node ====
Line 180: Line 180:
 
==== Likelihood Method ====
 
==== Likelihood Method ====
 
A [[#Phylogeny Inference Method]] in which the objective function used to characterize a [[#Phylogenetic Tree]] (and [[#Transition Model]]) is the likelihood, which is the probability of the observed data conditional on the [[#Phylogenetic Tree]] and the [[#Transition Model]].
 
A [[#Phylogeny Inference Method]] in which the objective function used to characterize a [[#Phylogenetic Tree]] (and [[#Transition Model]]) is the likelihood, which is the probability of the observed data conditional on the [[#Phylogenetic Tree]] and the [[#Transition Model]].
 +
 +
==== Lineage ====
 +
A Lineage is a forward (in time) path in a [[#Phylogenetic Tree]], representing a linear path of descent connecting an [[#Ancestor]]s to [[#Descendant]].    Lineages are not determinable in the case of an un[[#Root]]ed tree.
  
 
==== Molecular Clock ====
 
==== Molecular Clock ====
The assumption that evolution is clock-like over some interval, typically in the sense that the expected number of [[#Evolutionary Transition]]s per unit of time is constant.
+
The assumption that evolution is clock-like over some interval, typically in the sense that the expected number of [[#Evolutionary Transition]]s per unit of time is constant.
  
 
==== Monophyly ====
 
==== Monophyly ====
Condition of a set of [[#Species]] (and by extension, [[#OTU]]s of any kind) and their [[#Ancestor]]s that includes their [[#Most Recent Common Ancestor]].  In contexts where [[#Monophyly]] is not sub-classed to [[#Holophyly]] and [[#Paraphyly]], it often is assumed to mean [[#Holophyly]] (sensu Ashlock).  
+
Condition of a set of [[#Species]] (and by extension, [[#OTU]]s of any kind) and their [[#Ancestor]]s that includes their [[#Most Recent Common Ancestor]].  In contexts where [[#Monophyly]] is not sub-classed to [[#Holophyly]] and [[#Paraphyly]], it often is assumed to mean [[#Holophyly]] (sensu Ashlock).
  
 
====Most Recent Common Ancestor ====
 
====Most Recent Common Ancestor ====
Line 199: Line 202:
  
 
==== Mutation Bias ====
 
==== Mutation Bias ====
An asymmetry or non-uniformity in the occurrence of [[#Mutation]]s categorized by position, type, effect class, or some other index variable.  
+
An asymmetry or non-uniformity in the occurrence of [[#Mutation]]s categorized by position, type, effect class, or some other index variable.
  
 
==== Natural Selection ====
 
==== Natural Selection ====
Line 208: Line 211:
  
 
==== Network ====
 
==== Network ====
A kind of [[#Phylogenetic tree]] in which some typical restrictions are not satisfied.  In some contexts, it is a synonym for un[[#Root]]ed [[#Phylogenetic Tree]].  In other contexts, it signals the presence of [[#Node]]s with multiple parents.
+
A kind of [[#Phylogenetic tree]] in which some typical restrictions are not satisfied.  In some contexts, it is a synonym for un[[#Root]]ed [[#Phylogenetic Tree]].  In other contexts, it signals the presence of [[#Node]]s with multiple parents.
  
 
==== Neutral ====
 
==== Neutral ====
Line 214: Line 217:
  
 
==== Nucleotide Transition ====
 
==== Nucleotide Transition ====
A substitution of one purine (A or G) to another, or one pyrimidine (C or T) to another, a kind_of [[#Mutation]] or alternatively, a corresponding kind_of [[#Evolutionary Transition]] or [[#Polymorphism]].  
+
A substitution of one purine (A or G) to another, or one pyrimidine (C or T) to another, a kind_of [[#Mutation]] or alternatively, a corresponding kind_of [[#Evolutionary Transition]] or [[#Polymorphism]].
  
 
==== Nucleotide Transversion ====
 
==== Nucleotide Transversion ====
A substitution of a purine (A or G) to a pyrimidine (C or T) or vice versa, a kind_of [[#Mutation]] or alternatively, a corresponding kind_of [[#Evolutionary Transition]] or [[#Polymorphism]].  
+
A substitution of a purine (A or G) to a pyrimidine (C or T) or vice versa, a kind_of [[#Mutation]] or alternatively, a corresponding kind_of [[#Evolutionary Transition]] or [[#Polymorphism]].
  
 
==== Operational Taxonomic Unit ====
 
==== Operational Taxonomic Unit ====
Line 223: Line 226:
  
 
==== Organismal Taxonomy ====
 
==== Organismal Taxonomy ====
A classification of organismal [[Species]] consisting of a nested hierarchy of classes.  Traditional [[#Organismal Taxonomy]] includes named [[#Taxonomic Rank]]s and is the basis for the usual way of referring to species of organism by Genus and Species (e.g., Homo sapiens is the sapiens species of the genus Homo).  
+
A classification of organismal [[Species]] consisting of a nested hierarchy of classes.  Traditional [[#Organismal Taxonomy]] includes named [[#Taxonomic Rank]]s and is the basis for the usual way of referring to species of organism by Genus and Species (e.g., Homo sapiens is the sapiens species of the genus Homo).
  
 
==== Orthology ====
 
==== Orthology ====
Relationship of sequences that have diverged via speciation events but not by events of [[#Gene Duplication]]. Subclass of [[#Homology]].  Disjoint to [[#Paralogy]].  
+
Relationship of sequences that have diverged via speciation events but not by events of [[#Gene Duplication]]. Subclass of [[#Homology]].  Disjoint to [[#Paralogy]].
  
 
==== OTU ====
 
==== OTU ====
See [[#Operational Taxonomic Unit]].  
+
See [[#Operational Taxonomic Unit]].
  
 
==== Outgroup ====
 
==== Outgroup ====
Line 236: Line 239:
  
 
==== Parallel Evolution ====
 
==== Parallel Evolution ====
Parallel Evolution (parallelism) is a pattern in which two different [[#OTU]]s reach the same [[#State]] by the same series of [[Evolutionary Transition]]s.  Parallelism is a subclass of [[#Homoplasy]] and is considered to be disjoint with [[#Convergence]], though the exact line of distinction is not always clear.  
+
Parallel Evolution (parallelism) is a pattern in which two different [[#OTU]]s reach the same [[#State]] by the same series of [[Evolutionary Transition]]s.  Parallelism is a subclass of [[#Homoplasy]] and is considered to be disjoint with [[#Convergence]], though the exact line of distinction is not always clear.
  
 
==== Paralogy ====
 
==== Paralogy ====
Relationship of sequences that have diverged via one or more events of [[#Gene Duplication]]. Subclass of [[#Homology]].  Disjoint to [[#Orthology]].  
+
Relationship of sequences that have diverged via one or more events of [[#Gene Duplication]]. Subclass of [[#Homology]].  Disjoint to [[#Orthology]].
  
 
==== Paraphyly ====
 
==== Paraphyly ====
Condition of a set of [[#Species]] (and by extension, [[#OTU]]s of any kind) and their [[#Ancestor]]s that includes their [[#Most Recent Common Ancestor]] and some but not all of its [[#Descendant]]s.  Subclass of [[#Monophyly]], disjoint to [[#Holophyly]] (sensu Ashlock).  For instance, the reptile group is paraphyletic because, while it s common ancestor was a reptile, some of its descendants are birds not included in the reptile.
+
Condition of a set of [[#Species]] (and by extension, [[#OTU]]s of any kind) and their [[#Ancestor]]s that includes their [[#Most Recent Common Ancestor]] and some but not all of its [[#Descendant]]s.  Subclass of [[#Monophyly]], disjoint to [[#Holophyly]] (sensu Ashlock).  For instance, the reptile group is paraphyletic because, while it s common ancestor was a reptile, some of its descendants are birds not included in the reptile.
  
 
==== Parsimony Method ====
 
==== Parsimony Method ====
Line 257: Line 260:
  
 
==== Phylogeny Inference Method ====
 
==== Phylogeny Inference Method ====
A method of inferring an evolutionary history ([[#Phylogeny]].  Phylogeny inference methods may generate a [[#Phylogenetic Tree]] as well as [[#Reconstrution]]s of [[#Ancestral]] [[#Character-State]]s, using inputs based on observed data.  They fall into two broad classes: distance-based methods that use a [[#Distance Matrix]] as input, and character-based methods that use a [[#Character-State Data Matrix]] as input.  Of the character-based methods, some are rule-based ([[#Parsimony]], [[#Invariants]]), while others are probabilistic and depend on a [[#Transition Model]] that must be specified explicitly and evaluated in terms of Likelihood or Bayesian Posterior Probability.  
+
A method of inferring an evolutionary history ([[#Phylogeny]].  Phylogeny inference methods may generate a [[#Phylogenetic Tree]] as well as [[#Reconstrution]]s of [[#Ancestral]] [[#Character-State]]s, using inputs based on observed data.  They fall into two broad classes: distance-based methods that use a [[#Distance Matrix]] as input, and character-based methods that use a [[#Character-State Data Matrix]] as input.  Of the character-based methods, some are rule-based ([[#Parsimony]], [[#Invariants]]), while others are probabilistic and depend on a [[#Transition Model]] that must be specified explicitly and evaluated in terms of Likelihood or Bayesian Posterior Probability.
  
 
==== Plesiomorphy ====
 
==== Plesiomorphy ====
A [[#State]] of an [[#OTU]] that is the [[#Ancestor ancestral]] state (i.e., [[#Primitive]]) in the current context.  Disjoint with [[#Apomorphy]].  
+
A [[#State]] of an [[#OTU]] that is the [[#Ancestor ancestral]] state (i.e., [[#Primitive]]) in the current context.  Disjoint with [[#Apomorphy]].
  
 
==== Polymorphism ====
 
==== Polymorphism ====
Line 284: Line 287:
  
 
==== Reconstruction ====
 
==== Reconstruction ====
Reconstruction refers to the process of inferring [[#Ancestral]] [[#Character-State]]s.  The inferred states are said to be "reconstructed" states.
+
Reconstruction refers to the process of inferring [[#Ancestral]] [[#Character-State]]s.  The inferred states are said to be "reconstructed" states.
  
 
==== Replacement ====
 
==== Replacement ====
Line 290: Line 293:
  
 
==== Reversion ====
 
==== Reversion ====
An [[#Evolutionary Transition]] to an [[#Ancestor ancestral]] [[#State]].  Reversion is a form of [[#Homoplasy]].  
+
An [[#Evolutionary Transition]] to an [[#Ancestor ancestral]] [[#State]].  Reversion is a form of [[#Homoplasy]].
  
 
==== Reversal ====
 
==== Reversal ====
The occurrence of an [[#Evolutionary Transition]] that is the reverse of an earlier transition.  Note the distinction from [[#Reversion]]: all reversals return to an ancestral state, but not all [[#Reversion]]s are the reverse of a previous transition, e.g., G to C to A to G is a [[#Reversion]] but none of the individual transitions (G to C, C to A, A to G) are reversed; G to C to G is both a [[#Reversion]] and a reversal.
+
The occurrence of an [[#Evolutionary Transition]] that is the reverse of an earlier transition.  Note the distinction from [[#Reversion]]: all reversals return to an ancestral state, but not all [[#Reversion]]s are the reverse of a previous transition, e.g., G to C to A to G is a [[#Reversion]] but none of the individual transitions (G to C, C to A, A to G) are reversed; G to C to G is both a [[#Reversion]] and a reversal.
  
 
==== Root ====
 
==== Root ====
Line 299: Line 302:
  
 
==== Selection ====
 
==== Selection ====
See [[#Natural Selection]].  
+
See [[#Natural Selection]].
  
 
==== Silent ====
 
==== Silent ====
A difference that is invisible or has no phenotypic effect. Applied to a [[#Polymorphism]], [[#Evolutionary Transition]], or [[#Mutation]].  When applied to nucleotide differences in protein-coding genes, it has the same meaning as [[#Synonymous]].  
+
A difference that is invisible or has no phenotypic effect. Applied to a [[#Polymorphism]], [[#Evolutionary Transition]], or [[#Mutation]].  When applied to nucleotide differences in protein-coding genes, it has the same meaning as [[#Synonymous]].
  
 
==== Species Tree ====
 
==== Species Tree ====
Line 310: Line 313:
 
see [[#Character-State]]
 
see [[#Character-State]]
  
==== Step Matrix ====  
+
==== Step Matrix ====
 
A matrix { S<sub>i,j</sub> } representing the number of evolutionary "steps" between two [[Character-State]]s i and j.  Used in the [[#Phylogenetic Inference Method]] called [[#Parsimony]].  cf. [[#Transition Model]]
 
A matrix { S<sub>i,j</sub> } representing the number of evolutionary "steps" between two [[Character-State]]s i and j.  Used in the [[#Phylogenetic Inference Method]] called [[#Parsimony]].  cf. [[#Transition Model]]
  
Line 317: Line 320:
  
 
==== Subtree ====
 
==== Subtree ====
A [[#Tree]] that is part of another [[#Tree]].
+
A [[#Tree]] that is part of another [[#Tree]].
  
 
==== Supertree ====
 
==== Supertree ====
Line 323: Line 326:
  
 
==== Symplesiomorphy ====
 
==== Symplesiomorphy ====
A [[#State]] shared among some [[#OTU]]s that is [[#Plesiomorphy plesiomorphic]] in the current context.    Disjoint with [[#Synapomorphy]].  
+
A [[#State]] shared among some [[#OTU]]s that is [[#Plesiomorphy plesiomorphic]] in the current context.    Disjoint with [[#Synapomorphy]].
  
 
==== Synapomorphy ====
 
==== Synapomorphy ====
A [[#State]] shared among some [[#OTU]]s that is an [[#Apomorphy apomorphic]] (i.e., [[#Derived]]) state in the current context.  Disjoint with [[#Symplesiomorphy]].  
+
A [[#State]] shared among some [[#OTU]]s that is an [[#Apomorphy apomorphic]] (i.e., [[#Derived]]) state in the current context.  Disjoint with [[#Symplesiomorphy]].
  
 
==== Synonymous ====
 
==== Synonymous ====
Line 335: Line 338:
  
 
==== Taxonomic Rank ====
 
==== Taxonomic Rank ====
Taxonomic Rank is a ordered categorical descriptor applied to the classes of [[#Organismal Taxonomy]].  A class of a given rank may contain only classes of lower rank. The traditional ranks in descending order are: Kingdom, Phylum, Class, Order, Family, Genus and Species.  In the 1980's, there were several conflicting proposals to extend this system upward to include a rank higher than Kingdom, variously called "Urkingdom" (Woese, et al), "Domain" (Woese, et al.) or "Empire" (Cavalier-Smith).
+
Taxonomic Rank is a ordered categorical descriptor applied to the classes of [[#Organismal Taxonomy]].  A class of a given rank may contain only classes of lower rank. The traditional ranks in descending order are: Kingdom, Phylum, Class, Order, Family, Genus and Species.  In the 1980's, there were several conflicting proposals to extend this system upward to include a rank higher than Kingdom, variously called "Urkingdom" (Woese, et al), "Domain" (Woese, et al.) or "Empire" (Cavalier-Smith).
  
 
==== Taxonomy ====
 
==== Taxonomy ====
Line 341: Line 344:
  
 
==== Terminal Node ====
 
==== Terminal Node ====
The nodes of a [[#Phylogenetic Tree]] that have no children.  Typically [[Terminal Node]]s in a [[#Phylogenetic Tree]] correspond to [[#OTU]]s with their observable properties, while internal nodes correspond to ancestors.  However, in the case of simulations or evolution-in-the-lab, an internal node may be associated with known properties.  Some [[#Reconcile Tree]]s have terminal nodes that represent inferred events of [[#Gene Loss]].
+
The nodes of a [[#Phylogenetic Tree]] that have no children.  Typically [[Terminal Node]]s in a [[#Phylogenetic Tree]] correspond to [[#OTU]]s with their observable properties, while internal nodes correspond to ancestors.  However, in the case of simulations or evolution-in-the-lab, an internal node may be associated with known properties.  Some [[#Reconcile Tree]]s have terminal nodes that represent inferred events of [[#Gene Loss]].
  
 
====Topology====
 
====Topology====
Line 365: Line 368:
 
==== Anagenesis ====
 
==== Anagenesis ====
 
==== Basal ====
 
==== Basal ====
pertaining to root of tree?  
+
pertaining to root of tree?
 
==== Cladogenesis ====
 
==== Cladogenesis ====
 
==== Coalescent ====
 
==== Coalescent ====
Line 372: Line 375:
 
note measures RSCU, CAI
 
note measures RSCU, CAI
 
==== Compatibility ====
 
==== Compatibility ====
a measure of between-character consistency  
+
a measure of between-character consistency
 
==== Consensus Tree ====
 
==== Consensus Tree ====
 
tree based on combining multiple trees.  misuse of "consensus".  not the same as supertree.
 
tree based on combining multiple trees.  misuse of "consensus".  not the same as supertree.
 
==== Conservative ====
 
==== Conservative ====
Applied to characters or to transitions, implicating minor changes or characters that undergo such changes.  opposite of radical.
+
Applied to characters or to transitions, implicating minor changes or characters that undergo such changes.  opposite of radical.
 
==== Constraints, disambiguation ====
 
==== Constraints, disambiguation ====
only one of these is relatively clear  
+
only one of these is relatively clear
 
* evo-devo meaning
 
* evo-devo meaning
 
* selective constraints
 
* selective constraints
* nuisance variables  
+
* nuisance variables
 
==== Cost Matrix ====
 
==== Cost Matrix ====
 
this sounds to me like Step Matrix, with the difference that step implies non-negative integers while cost implies real numbers.
 
this sounds to me like Step Matrix, with the difference that step implies non-negative integers while cost implies real numbers.
Line 390: Line 393:
 
typically the distance between otus
 
typically the distance between otus
 
==== dN/dS ====
 
==== dN/dS ====
ration of non-synonymous to synonymous rate.  usually based on a particular model for normalization.
+
ration of non-synonymous to synonymous rate.  usually based on a particular model for normalization.
 
==== Equilibrium State Frequencies ====
 
==== Equilibrium State Frequencies ====
distribution of characters expected at equilibrium.  often used in markov transition models.  
+
distribution of characters expected at equilibrium.  often used in markov transition models.
 
==== Evolutionary history ====
 
==== Evolutionary history ====
 
phylogeny in the broad sense
 
phylogeny in the broad sense
 
==== Family alignment ====
 
==== Family alignment ====
sequence alignment for a sequence family  
+
sequence alignment for a sequence family
 
==== GC/AT pressure ====
 
==== GC/AT pressure ====
an asymmetry in mutation or in evolutionary changes, favoring GC over AT (or vice versa).  
+
an asymmetry in mutation or in evolutionary changes, favoring GC over AT (or vice versa).
 
==== Gene duplication ====
 
==== Gene duplication ====
 
mutation or evolutionary change resulting in additional copy of gene
 
mutation or evolutionary change resulting in additional copy of gene
Line 409: Line 412:
 
==== Independent Contrasts Method ====
 
==== Independent Contrasts Method ====
 
method of assessing covariance by reducing phylogenetic distribution pattern into independent comparisons (contrasts)
 
method of assessing covariance by reducing phylogenetic distribution pattern into independent comparisons (contrasts)
==== Invariants ====  
+
==== Invariants ====
 
a phylogenetic inference method based on invariants (domain-specific application of more general term from numeric analysis)
 
a phylogenetic inference method based on invariants (domain-specific application of more general term from numeric analysis)
 
==== Isochore ====
 
==== Isochore ====
local compositional area in chromosomes, characteristic of warm-blooded animal genomes.  
+
local compositional area in chromosomes, characteristic of warm-blooded animal genomes.
 
==== Lineage ====
 
==== Lineage ====
 
A Lineage is a forward path in a [[#Phylogenetic Tree]], representing a linear path of descent connecting an [[#Ancestor]]s to [[#Descendant]].    Lineages are not determinable in the case of an un[[#Root]]ed tree.
 
A Lineage is a forward path in a [[#Phylogenetic Tree]], representing a linear path of descent connecting an [[#Ancestor]]s to [[#Descendant]].    Lineages are not determinable in the case of an un[[#Root]]ed tree.
Line 419: Line 422:
 
phylo inference method
 
phylo inference method
 
==== Missing Data ====
 
==== Missing Data ====
absence of character state data due to deletion or lack of information  
+
absence of character state data due to deletion or lack of information
 
==== Polarity ====
 
==== Polarity ====
assigned to transitions that have a direction (i.e., a different rate or cost in the forward vs. reverse direction); polarizing differences means determining which (if any) state is ancestral.  
+
assigned to transitions that have a direction (i.e., a different rate or cost in the forward vs. reverse direction); polarizing differences means determining which (if any) state is ancestral.
  
  
 
==== Population ====
 
==== Population ====
a reproducing population, consisting of interbreeding individuals.
+
a reproducing population, consisting of interbreeding individuals.
 
==== Positive Selection ====
 
==== Positive Selection ====
 
differential reproduction (selection) viewed relative to the variant whose frequency increases
 
differential reproduction (selection) viewed relative to the variant whose frequency increases
 
==== Radiation ====
 
==== Radiation ====
episode of rapid or repeated cladogenesis  
+
episode of rapid or repeated cladogenesis
 
==== Radical ====
 
==== Radical ====
 
opposite of conservative
 
opposite of conservative
 
==== Relative Rate Test ====
 
==== Relative Rate Test ====
a simple test to correct for phylogenetic structure when assessing rate constancy, in which, given the tree (A, (B,C)), the distances (or inferred changes) from A to B are compared to those from A to C.  
+
a simple test to correct for phylogenetic structure when assessing rate constancy, in which, given the tree (A, (B,C)), the distances (or inferred changes) from A to B are compared to those from A to C.
 
==== Reticulate Evolution ====
 
==== Reticulate Evolution ====
pattern of evolution in which separate lineages fuse again  
+
pattern of evolution in which separate lineages fuse again
 
==== Selective Constraint ====
 
==== Selective Constraint ====
relative to an imaginary case in which there is no selection, a condition of a reduced rate of change due to the probability of deleterious effects.
+
relative to an imaginary case in which there is no selection, a condition of a reduced rate of change due to the probability of deleterious effects.
 
==== Sequence Family ====
 
==== Sequence Family ====
 
gene family, protein family.  see paralogy, orthology, homology
 
gene family, protein family.  see paralogy, orthology, homology
==== Species ====
+
==== Species ====
assemblage of actually or potentially reproducing organisms, consisting of one or more populations.  
+
assemblage of actually or potentially reproducing organisms, consisting of one or more populations.
 
==== Stasis ====
 
==== Stasis ====
 
==== Star Tree ====
 
==== Star Tree ====
Line 448: Line 451:
 
is_a Mutation Bias
 
is_a Mutation Bias
 
==== Unrooted ====
 
==== Unrooted ====
Not rooted, applied to trees.  
+
Not rooted, applied to trees.
 
==== UPGMA ====
 
==== UPGMA ====
A phenetic clustering method.  Arguably, a phylogeny inference method based on using distances.
+
A phenetic clustering method.  Arguably, a phylogeny inference method based on using distances.
 
==== Variation ====
 
==== Variation ====
 
too ambiguous?  population variation is one clear usage.  others are vague
 
too ambiguous?  population variation is one clear usage.  others are vague
Line 459: Line 462:
  
 
===From statistics and applied maths===
 
===From statistics and applied maths===
* [http://en.wikipedia.org/wiki/Likelihood_ratio_test likelihood ratio test];  
+
* [http://en.wikipedia.org/wiki/Likelihood_ratio_test likelihood ratio test];
* [http://en.wikipedia.org/wiki/Akaike_Information_Criterion Akaike information criterion]; * [http://en.wikipedia.org/wiki/Odds_ratio odds ratio]; bootstrap resampling;  
+
* [http://en.wikipedia.org/wiki/Akaike_Information_Criterion Akaike information criterion]; * [http://en.wikipedia.org/wiki/Odds_ratio odds ratio]; bootstrap resampling;
* [http://en.wikipedia.org/wiki/Monte_carlo_method Monte Carlo methods];  
+
* [http://en.wikipedia.org/wiki/Monte_carlo_method Monte Carlo methods];
  
===From biology, systematics, genetics, and evolution===  
+
===From biology, systematics, genetics, and evolution===
* [http://en.wikipedia.org/wiki/Cladistics cladistics], development,  
+
* [http://en.wikipedia.org/wiki/Cladistics cladistics], development,
* [http://en.wikipedia.org/wiki/Genotype genotype], [http://en.wikipedia.org/wiki/Homoplasy homoplasy], [http://en.wikipedia.org/wiki/Kinase kinase],  
+
* [http://en.wikipedia.org/wiki/Genotype genotype], [http://en.wikipedia.org/wiki/Homoplasy homoplasy], [http://en.wikipedia.org/wiki/Kinase kinase],
* [http://en.wikipedia.org/wiki/Phenotype phenotype],  
+
* [http://en.wikipedia.org/wiki/Phenotype phenotype],
 
* [http://en.wikipedia.org/wiki/Positive_selection positive selection]
 
* [http://en.wikipedia.org/wiki/Positive_selection positive selection]
  
===From molecular biology and bioinformatics===  
+
===From molecular biology and bioinformatics===
 
* [http://en.wikipedia.org/wiki/Sequencing sequence]
 
* [http://en.wikipedia.org/wiki/Sequencing sequence]
 
* alignment
 
* alignment
Line 476: Line 479:
  
 
===From computer science and maths===
 
===From computer science and maths===
* [http://en.wikipedia.org/wiki/Hidden_Markov_model HMM (Hidden Markov Model)];  
+
* [http://en.wikipedia.org/wiki/Hidden_Markov_model HMM (Hidden Markov Model)];
* [http://en.wikipedia.org/wiki/Dynamic_programming Dynamic programming];  
+
* [http://en.wikipedia.org/wiki/Dynamic_programming Dynamic programming];
  
  
 
==Acknowledgements and History==
 
==Acknowledgements and History==
  
* June, 2006, Aaron Mackey started a glossary (19 domain concepts and a few other terms) for the group that spawned the NESCent evolutionary informatics working group.
+
* June, 2006, Aaron Mackey started a glossary (19 domain concepts and a few other terms) for the group that spawned the NESCent evolutionary informatics working group.
* September, 2007, Arlin Stoltzfus expanded the concept glossary to 40 defined terms and 22 undefined terms.
+
* September, 2007, Arlin Stoltzfus expanded the concept glossary to 40 defined terms and 22 undefined terms.
 
* 12 October, 2007, the concept glossary was released to the NESCent evolutionary informatics working group
 
* 12 October, 2007, the concept glossary was released to the NESCent evolutionary informatics working group
 
* November, 2007, EvoInfo working group meeting (12th to 14th) and follow-ups
 
* November, 2007, EvoInfo working group meeting (12th to 14th) and follow-ups

Revision as of 11:22, 27 May 2008

Contents

Scope

By "evolutionary comparative analysis" we refer to the methods and foundational principles for interpreting similarities and differences as the outcome of an evolutionary process. The scope of the glossary should be such that it would serve as an effective resource for a student or a researcher (one who wishes to interpret research publications, software documentation, and interfaces), but does not duplicate common meanings from other fields

How to improve the glossary

  • define a term in the "undefined" section, then move it to the "defined" section
  • directly modify a definition to improve it
  • describe relations such as disjunction, synonymy, part_of and is_a, using wiki cross-refs like this:
    Insertion is disjoint to [[#Deletion]]
  • make it easier to maintain and disseminate this list by
    • making your changes ATOMIC (one item at a time)
    • maintaining the format
    • sticking to principles (next section)

Principles governing content

(note: email feedback on principles to Arlin)

  • What is included:
    • Terms that denote general concepts (e.g., Phylogeny Inference Method, but not MrBayes).
    • Terms that have a domain-specific meaning (e.g., "tree", "taxonomy").
    • Composite terms only when the meaning is unexpected, but not when the meaning is obvious (e.g., "non-synonymous" means "not synonymous")
  • What should not be included:
    • unique instances or particulars (e.g., PAUP*; we will include these at a later stage)
    • obscure, insiders-only jargon (e.g., "omega")
    • terms with common meaning well outside of the domain (e.g., "integer")
  • How the definition is determined
    • By studying usage in articles and books (e.g., Nei and Kumar; Li and Graur; Felsenstein)
    • By consulting domain experts and by soliciting feedback
    • By studying the use of terms in software and data interfaces
  • How synonyms, ambiguities and overlaps are handled
    • we may make a term domain-specific by qualifying it, as in "Phylogenetic tree" (not just "tree") or "Organismal taxonomy" (not just "taxonomy")
    • Where synonyms exist, we may choose the term
      • that is most widely used by domain experts
      • that conflicts least with familiar extra-domain meanings
    • We can decide later not to use a term that is too ambiguous
  • In the case of disputes over meanings
    • Open the topic for discussion on the Discussion page
    • Find examples that illustrate actual usage
    • Consider ways to replace a problematic term with alternatives that cover its meanings

Defined Concepts

Allele Fixation

From population genetics, fixation of an allele is the attainment of a frequency of 1, or more rigorously, attainment of a frequency approaching some point of stability that is equal to, or nearly equal to, 1.

Analogy

In common usage, "analogy" is a common term for the relation between two things (or processes) that have a simmilar pattern among their parts (or sub-processes). In evolutionary biology, "analogy" has a peculiar meaning that is restricted to cases in which the things compared are not homologous. That is, homologous structures may have the same pattern or function, but they are not called "analogous". A moth's wing and a robin's wing are said to be analogous, whereas a robin's wing and a crow's wing are not.

Ancestor

An entity from whom features were inherited by some #OTU of current interest. A #Parent is the closest form of ancestor. Ancestry is the reverse relation of #Descendant descendancy. The nearest ancestral node on a tree is sometimes called an "immediate" ancestor. See #Most Recent Common Ancestor. The relationship of ancestry is transitive: if A is the ancestor of B, and B is the ancestor of C, then A is the ancestor of C.

Apomorphy

A #State of an #OTU that is a #Derived #State in the current context. Disjoint with #Plesiomorphy.

Atavism

see #Reversion

Bifurcation

See #Dichotomy

Bipartition

A partition of all #OTUs (in the current analytical context) into two sets. Every #Branch in a #Phylogenetic Tree defines a Bipartition. Bipartititions are used when comparing #Phylogenetic Tree topologies to assign Bootstrap Support Values or to identify shared features of Topology. Synonyms: Split

Bootstrap Support

In phylogenetics, references to a "bootstrap" ("bootstrap support", "bootstrap confidence") value typically refer to a #Branch Support Value computed by bootstrap resampling (bootstrapping). Bootstrapping is a Resampling Method used to create pseudo-replicate data sets by drawing (with replacement) from the available data set. The fraction of times an outcome occurs among #Phylogenetic Trees inferred from bootstrap-resampled data sets is the Bootstrap Support value for that outcome. Thus the bootstrap support value for a #Branch is the fraction of times this #Branch is found in #Phylogenetic Trees computed from resampled data sets.

Branch

"Branch" is the typical domain-specific term for an edge of a #Phylogenetic Tree. Branches may have properties such as length and degree of #Branch_Support. See also #Split.

Branch Support

Each #Branch in a #Phylogenetic Tree defines a #Bipartition. The degree of confidence in a particular #Branch (bipartition) may be indicated by a #Branch Support value, typically a #Bootstrap Support value or a Bayesian posterior probability.

Character

A character is a set of features related by homology, or (usually indistinguishably), it is the archetype or Platonic form underlying the "same" feature observed in different instances. In a sequence alignment or a #Character-State Data Matrix, a charater is a column. If the character has discrete states (e.g., present vs. absent; T, C, A and G), then it is a "discrete character" (likewise a "continuous character" has continuous states). See also: #Character-State; #Character-State Data Matrix; Synonyms: Column (in some contexts), Site (in some contexts)

Character-State

The state of a #Character for a given #OTU. For instance, if Sequence 1 has a "G" in the 10th column of a sequence alignment, then "G" is the Character-State of the 10th Character for Sequence 1. Typically Character-States are observed values. However, the values of unobserved states, including #Ancestral states as well as #Missing Data, can be inferred using a #Transition Model applied to a #Phylogenetic Tree. Typically a #Character-State is treated as a singular definite value, however in some instances it may be conceived as a set of values present in a #Population, a distribution of values, and so on (as allowed in the NEXUS definition of a #Character-State Data Matrix

Character-State Data Matrix

A matrix of observed #Character-State data. Synonyms: Character Data Matrix, Character-State Matrix

Clade

A Clade is a set of #Species (and by extension, a set of #OTUs of any type) that includes all of the #Descendants of their #Most Recent Common Ancestor. In a rooted #Phylogenetic Tree, every #Node or #Subtree defines a Clade. An alternative definition is that a Clade includes the #Ancestor and any #Descendants whether or not they are #OTUs. Cf #Holophyletic (Systematics)

Cladogram

A pictorial representation of a #Phylogenetic Tree that is understood to represent only what domain experts call the #Topology, meaning the connectivity of nodes, and not the lengths of #Branches between them. Nevertheless, as actual lines must have non-zero lengths, to draw a cladogram one must apply an arbitrary convention for #Branch lengths, typically either

  1. make all #Branches the same length, or
  2. adjust #Branch lengths so that #Terminal Nodes fall on a line

Contrasts

see #Independent Contrasts Method

Convergence

See #Convergent Evolution

Convergent Evolution

#Convergent Evolution (convergence) is a pattern in which two different #OTUs reach the same #Derived #State by a different series of Evolutionary Transitions. Subclass of #Homoplasy. Considered to be disjoint with #Parallel Evolution, though the exact line of distinction is not always clear.

CpG Bias

An enhanced rate of #Mutation at CG dinucleotide sites typical in mammalian genomes, arising from oxidative damage. A kind of #Mutation Bias.

Deletion

From genetics, the removal of one or more contiguous residues from a sequence. In phylogenetics, Deletion may refer either to a #Mutation or to an #Evolutionary Transition. Disjoint to #Insertion

Derived

A #State is #Derived if it is not #Ancestor ancestral in the given context. The opposite of #Derived is #Primitive. Note that, because the use of #Ancestor is context-dependent (relative to some #OTUs of interest), #Derived is also context-dependent (for explanation, see #Primitive).

Descendant

An entity that inherits features from some entity of current interest. Descendancy is the reverse relation of #Ancestor ancestry. A #Child is the closest form of descendant. The relationship of descendancy is transitive: if A is the descendant of B, and B is the descendant of C, then A is the descendant of C.

Dichotomy

A 2-fold branching. A #Phylogenetic Tree has Dichotomous #Branching if each parent node has exactly two children. Disjoint to #Polytomy. Synonyms: Bifurcation

Distance Matrix

A matrix of pairwise distances between #OTUs, typically used in distance-based #Phylogeny Inference Methods. A Distance Matrix is not the same as a #Character-State Data Matrix.

Dollo Parsimony

A character-based #Phylogeny Inference Method that applies the Parsimony principle to presence-and-absence (i.e., 2-state) #Characters with the restriction that gain (the #Transition from absence to presence) can happen only once for each #Character.

Drift

See #Random Genetic Drift

Edge

see #Branch

Evolutionary Transition

An evolutionary change. In the context of character analysis, an evolutionary transition is a change in the #Character-State of a #Character along a #Branch. In some contexts, "substitution", "replacement" and even "mutation" may be used as though there were synonyms for Evolutionary Transition.

Fixation

See #Allele Fixation

Fully Resolved

A #Phylogenetic Tree is said to be Fully Resolved if all its branchings are dichotomous. Trees with #Polytomy are said to be unresolved.

Gap

The concept of a "gap" is ambiguous and is tied to the use of a "gap character" (often the en dash "-") in text representations of sequence alignments. In general, the "gap" represents the absence of any positively diagnosed #Character-State. As such, the gap may be interpreted as an additional #Character-State, as the absence of the #Character, or as an unknown value (#Missing Data).

General Time-Reversible Model (GTR)

A #Transition Model for nucleotide #Character-States allowing a separate parameter for each reversible rate of #Evolutionary Transition between Ni and Nj (also called the 6-parameter model)

Genome Hypothesis

Grantham's hypothesis that each species has a distinctive genome-wide #Codon Usage strategy reflecting adaptation for translation efficiency.

HGT

Horizontal Gene Transfer (see #Lateral Gene Transfer).

Homology

Relationship of similarity due to inheritance from a common ancestor. A relationship of similarity that is not due to common ancestry, but to Convergent Evolution is called #Analogy.

Homoplasy

Homoplasy (Lankester, 1870) is any pattern in which evolutionary change occurs but does not increase differentiation. If the comparison is between a descendant and an ancestor, then the pattern is #Reversal or #Atavism. If the comparison is between two descendants, the pattern is either #Convergent Evolution (when the descendants become more similar) or #Parallel Evolution (when the descendants undergo identical changes).

Holophyly

Condition of a set of #Species (and by extension, #OTUs of any kind) and their #Ancestors that includes their #Most Recent Common Ancestor and all of its #Descendants. "Holophyletic group" thus is synonymous with #Clade. Subclass of #Monophyly, disjoint to #Paraphyly (sensu Ashlock).

Horizontal Gene Transfer (HGT)

see #Lateral Gene Transfer

HTU

#Hypothetical Taxonomic Unit

Hypothetical Taxonomic Unit

Hypothetical analog of an #OTU, typically representing an unobserved #Ancestor entity.

Indel

A fusion of the terms for #Insertion and #Deletion that has two meanings, one based on the logic of OR (common in phylogenetics), and the other based on the logic of AND (used in mutation research, e.g., Chuzhanova, et al., 2003):

  1. A length difference between two aligned sequences, denoting the evolutionary occurrence of either an #Insertion OR a #Deletion during their divergence from a common ancestor.
  2. A complex mutational event involving the addition of some residues and the loss of others, i.e., #Insertion AND #Deletion

Insertion

From genetics, the addition of one or more contiguous residues to a sequence. In phylogenetics, Insertion may refer either to a #Mutation or to an #Evolutionary Transition. Disjoint to #Deletion

Lateral Gene Transfer

Incorporation of genetic material from one organism into the genome of another organism that is not its reproductive offspring is called "Lateral" or "Horizontal" gene transfer. The more typical form of inheritance is the "vertical" transfer that takes place from parent to offspring during biological reproduction. Lateral Gene Transfer is sometimes abbreviated LGT.

Leaf node

see #Terminal Node

LGT

#Lateral Gene Transfer

Likelihood Method

A #Phylogeny Inference Method in which the objective function used to characterize a #Phylogenetic Tree (and #Transition Model) is the likelihood, which is the probability of the observed data conditional on the #Phylogenetic Tree and the #Transition Model.

Lineage

A Lineage is a forward (in time) path in a #Phylogenetic Tree, representing a linear path of descent connecting an #Ancestors to #Descendant. Lineages are not determinable in the case of an un#Rooted tree.

Molecular Clock

The assumption that evolution is clock-like over some interval, typically in the sense that the expected number of #Evolutionary Transitions per unit of time is constant.

Monophyly

Condition of a set of #Species (and by extension, #OTUs of any kind) and their #Ancestors that includes their #Most Recent Common Ancestor. In contexts where #Monophyly is not sub-classed to #Holophyly and #Paraphyly, it often is assumed to mean #Holophyly (sensu Ashlock).

Most Recent Common Ancestor

The Most Recent Common Ancestor (MRCA; also LCA or Least Common Ancestor) of a set of two or more #Species (or, by extension, any kind of #OTU) is the most recent ancestor shared among the set, corresponding to the most proximal ancestral #Node on a #Phylogenetic Tree).

MRCA

See #Most Recent Common Ancestor.

Mutation

  1. (abstract) the process by which heritable changes in the genome occur
  2. a particular heritable change in a genome (e.g., the mutation causing the most common sickle-cell allele)
  3. the altered state resulting from a mutation, i.e., the mutant state

Mutation Bias

An asymmetry or non-uniformity in the occurrence of #Mutations categorized by position, type, effect class, or some other index variable.

Natural Selection

The process by which inherent asymmetries in the survival and reproduction of competing forms lead cumulatively to differences in representation of these forms. Disjoint to #Random Genetic Drift.

Neighbor-Joining Algorithm

A distance-based #Phylogeny Inference Method.

Network

A kind of #Phylogenetic tree in which some typical restrictions are not satisfied. In some contexts, it is a synonym for un#Rooted #Phylogenetic Tree. In other contexts, it signals the presence of #Nodes with multiple parents.

Neutral

A difference that has an insignificant effect on fitness. In population genetics, a fitness difference <math>s</math> is considered insignificant if <math>|s|<<1/(2PN_e)</math>, where <math>P</math> is the ploidy (one for haploids, two for diploids), and <math>N_e</math> is the effective population size. Outside of population genetics, this term may be used more loosely to indicate a difference that is thought to be unimportant. This term may be applied to a #Polymorphism, #Mutation, or #Evolutionary Transition that represents a #Neutral difference.

Nucleotide Transition

A substitution of one purine (A or G) to another, or one pyrimidine (C or T) to another, a kind_of #Mutation or alternatively, a corresponding kind_of #Evolutionary Transition or #Polymorphism.

Nucleotide Transversion

A substitution of a purine (A or G) to a pyrimidine (C or T) or vice versa, a kind_of #Mutation or alternatively, a corresponding kind_of #Evolutionary Transition or #Polymorphism.

Operational Taxonomic Unit

The entities from which #Character-States are observed and taken as ground truths. In some cases the #OTU may be a composite of data drawn from several sources. Note that the use of "taxon" for both an #OTU and for a class in #Organismal Taxonomy is a cause of confusion.

Organismal Taxonomy

A classification of organismal Species consisting of a nested hierarchy of classes. Traditional #Organismal Taxonomy includes named #Taxonomic Ranks and is the basis for the usual way of referring to species of organism by Genus and Species (e.g., Homo sapiens is the sapiens species of the genus Homo).

Orthology

Relationship of sequences that have diverged via speciation events but not by events of #Gene Duplication. Subclass of #Homology. Disjoint to #Paralogy.

OTU

See #Operational Taxonomic Unit.

Outgroup

  1. When used as a unary modifier, i.e., when a set of one or more #OTUs is designated as "the outgroup", the outgroup is a set of #OTUs assumed on prior grounds to be a phylogenetic outlier from the complementary "ingroup" consisting of all the other #OTUs, that is, the ingroup and the outgroup are sister clades that represent two separate paths of descent from a common ancestor. Typically such an outgroup is designated for the purpose of Rooting a #Phylogenetic Tree.
  2. A secondary usage is to describe the relation of two sets A and B given a #Phylogenetic Tree in which A and B are non-overlapping clades.

Parallel Evolution

Parallel Evolution (parallelism) is a pattern in which two different #OTUs reach the same #State by the same series of Evolutionary Transitions. Parallelism is a subclass of #Homoplasy and is considered to be disjoint with #Convergence, though the exact line of distinction is not always clear.

Paralogy

Relationship of sequences that have diverged via one or more events of #Gene Duplication. Subclass of #Homology. Disjoint to #Orthology.

Paraphyly

Condition of a set of #Species (and by extension, #OTUs of any kind) and their #Ancestors that includes their #Most Recent Common Ancestor and some but not all of its #Descendants. Subclass of #Monophyly, disjoint to #Holophyly (sensu Ashlock). For instance, the reptile group is paraphyletic because, while it s common ancestor was a reptile, some of its descendants are birds not included in the reptile.

Parsimony Method

A method for finding the minimum transitions to account for a #Character given a #Phylogenetic Tree, and by extension, a character-based #Phylogeny Inference Method in which the inferred #Phylogenetic Tree (the "maximum parsimony tree") is the #Phylogenetic Tree that minimizes #Evolutionary Transitions over all #Characters.

Phylogenetic Tree

A Phylogenetic Tree represents evolutionary paths of descent-with-modification from common ancestors. Typically a Phylogenetic Tree is assumed to be a connected, directed, acyclic graph in which Nodes have no more than one parent and the directionality of each edge is from the #Root toward the #Terminal Nodes. When domain scientists wish to relax these restrictions due to conditions of not knowing the #Root, or of allowing for multiple parentage (cf. #Reticulate Evolution, #Lateral Gene Transfer), they favor the term "#Network", though this usage once again does not correspond to the meaning assigned to this term in the field of graph theory.

Phylogenetic Tree Topology

Typically the term "topology" applied to a #Phylogenetic Tree is a reference to the connectivity of nodes in the #Phylogenetic Tree, disregarding #Branch properties such as length. See #Cladogram.

Phylogeny

Broadly speaking, a phylogeny is the evolutionary history of some set of #Characters or #OTUs. More narrowly, it is merely the #Phylogenetic Tree representing paths of descent.

Phylogeny Inference Method

A method of inferring an evolutionary history (#Phylogeny. Phylogeny inference methods may generate a #Phylogenetic Tree as well as #Reconstrutions of #Ancestral #Character-States, using inputs based on observed data. They fall into two broad classes: distance-based methods that use a #Distance Matrix as input, and character-based methods that use a #Character-State Data Matrix as input. Of the character-based methods, some are rule-based (#Parsimony, #Invariants), while others are probabilistic and depend on a #Transition Model that must be specified explicitly and evaluated in terms of Likelihood or Bayesian Posterior Probability.

Plesiomorphy

A #State of an #OTU that is the #Ancestor ancestral state (i.e., #Primitive) in the current context. Disjoint with #Apomorphy.

Polymorphism

The presence, in a single Species or #Population, of more than one #Character-State for a given #Character. In the field of population genetics, this term sometimes is restricted such that a genetic locus is not considered Polymorphic unless the most frequent state has a frequency below 95 % or 99 % (i.e., in this case, polymorphism is a restricted subclass of #Population Variation).

Polyphyly

Condition of a set of #Species (and by extension, #OTUs of any kind) and #Ancestors that does not include their #Most Recent Common Ancestor. Disjoint to #Monophyly (sensu Ashlock).

Polytomy

An N-fold branching of a #Phylogenetic Tree, where <math>N > 2</math>. Disjoint to #Dichotomy. Synonyms: Multifurcation (rare).

Primitive

A #State is #Primitive if it is #Ancestor ancestral in the given context. The opposite of #Primitive is #Derived. Note that, because the use of #Ancestor is context-dependent (relative to some #OTUs of interest), #Primitive is also context-dependent, e.g., when comparing human locomotion to mouse locomotion, human bipedalism is derived (since the ancestor walked on four legs), but when comparing human locomotion and chimp locomotion, it is unclear whether bipedalism is derived, since the human-chimp ancestor might have been bipedal (in which case, human bipedalism would be the primitive state, and chimp knuckle-walking the derived state).

Random Genetic Drift

The process by which stochastic asymmetries in survival and reproduction of competing forms lead cumulatively to differences in representation of these forms. Disjoint to #Natural Selection.

Rank

see #Taxonomic Rank

Reconcile Tree

The #Phylogenetic Tree for a gene family may conflict with the #Phylogenetic Tree for the implicated species due to events of #Gene Duplication and #Gene Loss that occurred in its history. A Reconcile Tree (Reconciled Tree) is a special kind of #Phylogenetic Tree that reconciles a gene #Phylogenetic Tree with a species #Phylogenetic Tree by means of hypothesized events of #Gene Duplication and #Gene Loss. In principle, the concept of a "Reconcile Tree" might be extended to include events of #Lateral Gene Transfer (another source of conflicts between gene and species #Phylogenetic Trees), but this has not been attempted.

Reconstruction

Reconstruction refers to the process of inferring #Ancestral #Character-States. The inferred states are said to be "reconstructed" states.

Replacement

In one sense, a synonym for #Evolutionary Transition, and in another sense, an ambiguous sub-class of #Evolutionary Transition. In molecular evolution, there is a tendency to refer to #Evolutionary Transitions in sequence evolution as "substitutions" when they refer to nucleotide #Character-States, and "replacements" when they refer to amino acid #Character-States (at one time, this was the official editorial policy of the journal Molecular Biology and Evolution). In population genetics, an #Evolutionary Transition may be referred to as an "allele replacement" or "allele substitution" (e.g., Nei, 1987, p. 421), emphasizing the population-genetic mechanism in which one allele replaces another as the wild-type allele (via #Mutation and #Fixation).

Reversion

An #Evolutionary Transition to an #Ancestor ancestral #State. Reversion is a form of #Homoplasy.

Reversal

The occurrence of an #Evolutionary Transition that is the reverse of an earlier transition. Note the distinction from #Reversion: all reversals return to an ancestral state, but not all #Reversions are the reverse of a previous transition, e.g., G to C to A to G is a #Reversion but none of the individual transitions (G to C, C to A, A to G) are reversed; G to C to G is both a #Reversion and a reversal.

Root

The root (source) node of a #Phylogenetic Tree, the node with no parents. When the root is not known, the #Phylogenetic Tree is said to be unrooted, or is referred to as a #Network.

Selection

See #Natural Selection.

Silent

A difference that is invisible or has no phenotypic effect. Applied to a #Polymorphism, #Evolutionary Transition, or #Mutation. When applied to nucleotide differences in protein-coding genes, it has the same meaning as #Synonymous.

Species Tree

A #Phylogenetic Tree in which the #Terminal Nodes represent #Species.

State

see #Character-State

Step Matrix

A matrix { Si,j } representing the number of evolutionary "steps" between two Character-States i and j. Used in the #Phylogenetic Inference Method called #Parsimony. cf. #Transition Model

Substitution

see #Replacement

Subtree

A #Tree that is part of another #Tree.

Supertree

A #Phylogenetic Tree derived from a set of partially overlapping, smaller, "source" #Phylogenetic Trees. Supertree methods are used when applying character-based #Phylogeny Inference Methods to the complete set of data would be too compute-intensive, or when #Missing Data would prevent an analysis using the same #Characters for all #OTUs.

Symplesiomorphy

A #State shared among some #OTUs that is #Plesiomorphy plesiomorphic in the current context. Disjoint with #Synapomorphy.

Synapomorphy

A #State shared among some #OTUs that is an #Apomorphy apomorphic (i.e., #Derived) state in the current context. Disjoint with #Symplesiomorphy.

Synonymous

A #Mutation, #Polymorphism, or #Evolutionary Transition that changes a codon without changing the encoded amino acid is #Synonymous. Disjoint to one sense of the ambiguous term #Replacement.

Taxon

The domain-specific use implicates a class of organismal #Species defined in the #Organismal Taxonomy, e.g., mammals such as squirrels and horses are in the Taxon Mammalia. Plural: Taxa.

Taxonomic Rank

Taxonomic Rank is a ordered categorical descriptor applied to the classes of #Organismal Taxonomy. A class of a given rank may contain only classes of lower rank. The traditional ranks in descending order are: Kingdom, Phylum, Class, Order, Family, Genus and Species. In the 1980's, there were several conflicting proposals to extend this system upward to include a rank higher than Kingdom, variously called "Urkingdom" (Woese, et al), "Domain" (Woese, et al.) or "Empire" (Cavalier-Smith).

Taxonomy

see #Organismal Taxonomy

Terminal Node

The nodes of a #Phylogenetic Tree that have no children. Typically Terminal Nodes in a #Phylogenetic Tree correspond to #OTUs with their observable properties, while internal nodes correspond to ancestors. However, in the case of simulations or evolution-in-the-lab, an internal node may be associated with known properties. Some #Reconcile Trees have terminal nodes that represent inferred events of #Gene Loss.

Topology

See #Phylogenetic Tree Topology

Trait

A feature of an organism, either abstract (#Character) or concrete (#State)

Transition, disambiguation

Transition Model

A model of rates or probabilities of Evolutionary Transitions, typically defined for use in a first-order Markov transition model.

Tree

See #Phylogenetic Tree

Unresolved

A #Phylogenetic Tree with #Polytomy is often said to be Unresolved, on the assumption that every branching event in the true #Phylogenetic Tree is a #Dichotomy.

Undefined concepts

Anagenesis

Basal

pertaining to root of tree?

Cladogenesis

Coalescent

opposite of divergent, looking at a branching process on the reverse time scale

Codon Usage

note measures RSCU, CAI

Compatibility

a measure of between-character consistency

Consensus Tree

tree based on combining multiple trees. misuse of "consensus". not the same as supertree.

Conservative

Applied to characters or to transitions, implicating minor changes or characters that undergo such changes. opposite of radical.

Constraints, disambiguation

only one of these is relatively clear

  • evo-devo meaning
  • selective constraints
  • nuisance variables

Cost Matrix

this sounds to me like Step Matrix, with the difference that step implies non-negative integers while cost implies real numbers.

Decay Index

cf. Retention index, Compatibility

Distance

typically the distance between otus

dN/dS

ration of non-synonymous to synonymous rate. usually based on a particular model for normalization.

Equilibrium State Frequencies

distribution of characters expected at equilibrium. often used in markov transition models.

Evolutionary history

phylogeny in the broad sense

Family alignment

sequence alignment for a sequence family

GC/AT pressure

an asymmetry in mutation or in evolutionary changes, favoring GC over AT (or vice versa).

Gene duplication

mutation or evolutionary change resulting in additional copy of gene

Gene loss

loss of a gene, typically implying either deletion or silencing

Gene tree

tree reflects the evolutionary history of genes, not the species of origin

Genetic event

Independent Contrasts Method

method of assessing covariance by reducing phylogenetic distribution pattern into independent comparisons (contrasts)

Invariants

a phylogenetic inference method based on invariants (domain-specific application of more general term from numeric analysis)

Isochore

local compositional area in chromosomes, characteristic of warm-blooded animal genomes.

Lineage

A Lineage is a forward path in a #Phylogenetic Tree, representing a linear path of descent connecting an #Ancestors to #Descendant. Lineages are not determinable in the case of an un#Rooted tree.

minimum evolution

phylo inference method

Missing Data

absence of character state data due to deletion or lack of information

Polarity

assigned to transitions that have a direction (i.e., a different rate or cost in the forward vs. reverse direction); polarizing differences means determining which (if any) state is ancestral.


Population

a reproducing population, consisting of interbreeding individuals.

Positive Selection

differential reproduction (selection) viewed relative to the variant whose frequency increases

Radiation

episode of rapid or repeated cladogenesis

Radical

opposite of conservative

Relative Rate Test

a simple test to correct for phylogenetic structure when assessing rate constancy, in which, given the tree (A, (B,C)), the distances (or inferred changes) from A to B are compared to those from A to C.

Reticulate Evolution

pattern of evolution in which separate lineages fuse again

Selective Constraint

relative to an imaginary case in which there is no selection, a condition of a reduced rate of change due to the probability of deleterious effects.

Sequence Family

gene family, protein family. see paralogy, orthology, homology

Species

assemblage of actually or potentially reproducing organisms, consisting of one or more populations.

Stasis

Star Tree

tree with a single ancestor node that is the parent of all otus

Transition-Transversion Bias

is_a Mutation Bias

Unrooted

Not rooted, applied to trees.

UPGMA

A phenetic clustering method. Arguably, a phylogeny inference method based on using distances.

Variation

too ambiguous? population variation is one clear usage. others are vague

Xenology

relation of homology via lateral transfer

Additional terms from other domains whose meanings are consistent

From statistics and applied maths

From biology, systematics, genetics, and evolution

From molecular biology and bioinformatics

  • sequence
  • alignment
  • aligned sequences
  • unaligned sequences

From computer science and maths


Acknowledgements and History

  • June, 2006, Aaron Mackey started a glossary (19 domain concepts and a few other terms) for the group that spawned the NESCent evolutionary informatics working group.
  • September, 2007, Arlin Stoltzfus expanded the concept glossary to 40 defined terms and 22 undefined terms.
  • 12 October, 2007, the concept glossary was released to the NESCent evolutionary informatics working group
  • November, 2007, EvoInfo working group meeting (12th to 14th) and follow-ups
    • terms added by Julie Thompson, Enrico Pontelli, Arlin Stoltzfus
    • reformatted to wiki to allow easier cross-referencing
    • dozens of cross-references added (not complete)
    • about 50 defined terms added (Arlin)