Difference between revisions of "CDAO"

From Evolutionary Informatics Working Group
Jump to: navigation, search
(Initial Implementation)
(Initial Implementation)
Line 167: Line 167:
  
 
==Initial Implementation==
 
==Initial Implementation==
* The preliminary draft of the CDAO work done at NMSU is available here [http://www.cs.nmsu.edu/~bchisham/ontology/]; in particular
+
* The preliminary draft of the CDAO work done at NMSU is available here [http://www.cs.nmsu.edu/~bchisham/ontology/]. This is a current view of the content of the ontology [http://www.cs.nmsu.edu/~bchisham/ontology/total_view.png]. In particular
 
** MAO-Prime: [[http://www.cs.nmsu.edu/~bchisham/ontology/doc/generated/owl/obo-mao-prime/]Web page] this is a Protege implementation of the MAO along with the inclusion of some description of individual nucleotides, amino-acids, and meta symbols such as gap.
 
** MAO-Prime: [[http://www.cs.nmsu.edu/~bchisham/ontology/doc/generated/owl/obo-mao-prime/]Web page] this is a Protege implementation of the MAO along with the inclusion of some description of individual nucleotides, amino-acids, and meta symbols such as gap.
 
** CDAO: [[http://www.cs.nmsu.edu/~bchisham/ontology/doc/generated/owl/evo-info/]Web page] this is a fairly direct implementation of the draft ontology developed during the Fall meeting of the EvoInfo group at NESCent
 
** CDAO: [[http://www.cs.nmsu.edu/~bchisham/ontology/doc/generated/owl/evo-info/]Web page] this is a fairly direct implementation of the draft ontology developed during the Fall meeting of the EvoInfo group at NESCent

Revision as of 13:03, 17 March 2008

Comparative Data Analysis Ontology

The material previously on this page has been moved to CDAOManuscript.

This page is for ongoing work and contains links to supporting docs, past work, and sub-topics.

Test Data Sets

Each data set comes with a tree and a character matrix in NEXUS format. To explore these data sets you may wish to:

There are four different categories of character sets:

  • DNA: aligned nucleotides coded via IUPAC standard (T, C, G, A, and so on)
  • protein: aligned amino acids coded via IUPAC standard (A, C, D, E, F, G, H, I and so on)
  • continuous: numeric values of continuous characters (e.g., 0.001, 0.230)
  • morphology: discrete morphological characters with ad hoc numeric encoding (e.g., 0 = absent, 1 = present)

The DNA data are "CDS" or "coding sequence" data, meaning the sequence of nucleotide triplets in the protein-coding part of a gene.

There are three grades of difficulty:

  • Simplified: small number of OTUs and characters; unambiguous states; single bifurcating tree
  • Typical: may contain many OTUs, multiple trees, polytomies, other stuff
  • Demanding: may contain ambiguous characters, mixed data types, notes, assumptions, etc.


type difficulty description comments NEXUS
CDS (DNA) Simplified Subset of 10 ATPase CDSs comments PF00137_10_cds.nex
CDS (DNA) Typical Eukaryotic cytochrome C CDSs comments PF00034_39_cds.nex
CDS (DNA) Typical Eukaryotic ATPase CDSs comments PF00137_47_cds.nex
CDS (DNA) Demanding NA comments [[Media:|NA]]
Protein (AA) Simplified Subset of 10 ATPases comments PF00137_10_protein.nex
Protein (AA) Typical Eukaryotic cytochrome Cs comments PF00034_39_protein.nex
Protein (AA) Typical Eukaryotic ATPases comments PF00137_47_protein.nex
Protein (AA) Demanding NA comments [[Media:|NA]]
Continuous Simplified NA comments [[Media:|NA]]
Continuous Typical Inhibitor sensitivity data for human kinases -log(IC50) scaled kinase_rescaled3_sets.nex
Continuous Demanding NA comments [[Media:|NA]]
Morphological Simplified NA comments [[Media:|NA]]
Morphological Typical Nematode vulval morphology and development Kiontke, et al., 2007 Kiontke_CB_fixed.nex
Morphological Demanding NA comments [[Media:|NA]]

Initial Implementation

  • The preliminary draft of the CDAO work done at NMSU is available here [1]. This is a current view of the content of the ontology [2]. In particular
    • MAO-Prime: [[3]Web page] this is a Protege implementation of the MAO along with the inclusion of some description of individual nucleotides, amino-acids, and meta symbols such as gap.
    • CDAO: [[4]Web page] this is a fairly direct implementation of the draft ontology developed during the Fall meeting of the EvoInfo group at NESCent
    • Transformations: [[5]Web Page] During the Fall meeting we discussed the need of including in the ontology a description of possible transformations; this is an attempt of this.
    • Tree: [[6]Web Page] this is a draft ontology for the description of trees, mostly drawn from Nexus and from Chado.

Evaluation

Meeting Notes

Telecon, 7 March, 2007

present: Francisco Prodoscimi, Julie Thompson, Enrico Pontelli, Arlin Stoltzfus

What activities to do before the meeting? Plan for development?

  1. represent 4 simple test cases
    1. nt alignment plus tree
    2. prot alignment plus tree
    3. kinases with inhibitor sensitivity
    4. worm morphologies
  2. carry out operations with reasoning
    1. set and logic operations on characters and OTUs
    2. tree operations (clade selection, prune)
    3. other?
  3. map ontology to other representations
    1. NEXUS
    2. neXML
  4. start compiling list of concepts that are missing
    1. review Enrico's proposal
  5. look ahead to future challenges
    1. genetic encoding of characters
    2. ambiguous, multi-dimensional, or otherwise complex characters

Other issues for meeting and for paper

  • what is the scope?
  • How to integrate with other ontologies?
    • table from 'related artefacts' exercise
    • genetic code as a test case for integration
      • requires nt aa mapping to specify code
      • requires species taxonomy to assign code to species
      • requires cell ontology to assign code to compartmental genome (nuc, mito, cp)

Next meeting

  • telecon, 14 March, 2:00 pm UTC
  • agenda
    • nt and prot test data sets (arlin)
    • protege demo (brandon)

telecon, 14 March, 2:00 UTC

another meeting

Related Work

  • we are working on a direct generation of an ontology from the Concept Glossary. We are documenting the progress at this page [7]. Note that the page is not up-to-date at this moment (hopefully it will be by the end of the day or tomorrow [3/18/2008]). The goal is to eventually show that CDAO can map over all these concepts.