From Evolutionary Informatics Working Group
Jump to: navigation, search

How to proceed with the Concept Glossary

This discussion has been moved to the discussion page for the ConceptGlossary.

Substitution model language in relation to ontology

Sergei: Perhaps it is best if I enumerate the list of features I would like to have and ontology-aware folks can tell me what will be able to do that. In the context of EMDL (evo. model desc. lang) I want to be able to

  1. Define a formal ontology on models and their components, linked to the more general ontology developed by this group (e.g. codon models can refer to the concept of codons, genetic codes, nucleotides and interpretable quantities like dN/dS and branch lengths)
  2. Create arbitrary annotation fields for each model I describe. I am not sure what they will be yet, but presumably they can
  3. Have a human readable (i.e. HTML with images) description, so that at the end we can a have a browseable/searchable/hyperlinked 'library of models' document.
  4. Ideally, integration with LaTeX and Bibtex (to define rate matrices and external references) would be helpful.

I can do the last 3 in Wiki but the first seems to be more suited for OWL. Any suggestions are welcome. I am generally averse to spending a lot of time and effort writing something which will be difficult to extract or convert to another format later on.

Mixed data sets

The issue is how to combine multiple types of data in a NEXUS file, or in some other structured document. Some possible strategies are:

  • mixed characters block (character type differs among characters, e.g., some nucleotide characters followed by some re-coded gap characters)
  • multiple characters blocks (use the TITLE and LINK commands to disambiguate as in Mesquite)
  • codons block. If the mixed data are nucleotide and amino acid sequences for the same gene, then there is a mechanism to encode this using the codons block, which can specify the location of a reading frame and the genetic code used to translate it.

To think about this further, we should start by listing the use cases where we want to analyze mixed data sets. What kind of mixtures are used?

Hierarchy of character data analysis objects

An issue that came up in discussion at the first meeting can be expressed as the question of what is the basal level of a character analysis. Is it a set of character data populated for a set of OTUs, or is it an aggregation of character data from multiple sources for the union of all implicated OTUs.