Difference between revisions of "CDAO"

From Evolutionary Informatics Working Group
Jump to: navigation, search
(Test Data Sets)
(Test Data Sets)
Line 38: Line 38:
 
<td>Subset of 10 ATPase CDSs</td>
 
<td>Subset of 10 ATPase CDSs</td>
 
<td>comments</td>
 
<td>comments</td>
<td>PF00137_10_cds.nex</td>
+
<td>[[Media:PF00137_10_cds.nex|PF00137_10_cds.nex]]</td>
<td>PF00137_10_cds.xml</td>
+
<td>[[Media:PF00137_10_cds.nexml|PF00137_10_cds.nexml]]</td>
 
</tr>
 
</tr>
  
Line 47: Line 47:
 
<td>Cytochrome C CDS sequences</td>
 
<td>Cytochrome C CDS sequences</td>
 
<td>comments</td>
 
<td>comments</td>
<td>PF00034_39_cds.nex</td>
+
<td>[[Media:PF00034_39_cds.nex|PF00034_39_cds.nex]]</td>
<td>PF00034_39_cds.xml</td>
+
<td>[[Media:PF00034_39_cds.nexml|PF00034_39_cds.nexml]]</td>
 
</tr>
 
</tr>
  
Line 56: Line 56:
 
<td>Full set eukaryotic ATPase CDSs</td>
 
<td>Full set eukaryotic ATPase CDSs</td>
 
<td>comments</td>
 
<td>comments</td>
<td>PF00137_47_cds.nex</td>
+
<td>[[Media:PF00137_47_cds.nex|PF00137_47_cds.nex]]</td>
<td>PF00137_47_cds.xml</td>
+
<td>[[Media:PF00137_47_cds.nexml|PF00137_47_cds.nexml]]</td>
 
</tr>
 
</tr>
  
Line 75: Line 75:
 
<td>Subset of 10 ATPase sequences</td>
 
<td>Subset of 10 ATPase sequences</td>
 
<td>comments</td>
 
<td>comments</td>
<td>PF00137_10_protein.nex</td>
+
<td>[[Media:PF00137_10_protein.nex|PF00137_10_protein.nex]]</td>
<td>PF00137_10_protein.xml</td>
+
<td>[[Media:PF00137_10_protein.nexml|PF00137_10_protein.nexml]]</td>
 
</tr>
 
</tr>
  
Line 84: Line 84:
 
<td>Full set eukaryotic ATPases</td>
 
<td>Full set eukaryotic ATPases</td>
 
<td>comments</td>
 
<td>comments</td>
<td>PF00137_47_protein.nex</td>
+
<td>[[Media:PF00137_47_protein.nex|PF00137_47_protein.nex]]</td>
<td>PF00137_47_protein.xml</td>
+
<td>[[Media:PF00137_47_protein.nexml|PF00137_47_protein.nexml]]</td>
 
</tr>
 
</tr>
  
Line 93: Line 93:
 
<td>NA</td>
 
<td>NA</td>
 
<td>comments</td>
 
<td>comments</td>
<td>NA</td>
+
<td>[[Media:|NA]]</td>
<td>NA</td>
+
<td>[[Media:|NA]]</td>
 
</tr>
 
</tr>
  
Line 102: Line 102:
 
<td>NA</td>
 
<td>NA</td>
 
<td>comments</td>
 
<td>comments</td>
<td>NA</td>
+
<td>[[Media:|NA]]</td>
<td>NA</td>
+
<td>[[Media:|NA]]</td>
 
</tr>
 
</tr>
  
Line 111: Line 111:
 
<td>NA</td>
 
<td>NA</td>
 
<td>comments</td>
 
<td>comments</td>
<td>NA</td>
+
<td>[[Media:|NA]]</td>
<td>NA</td>
+
<td>[[Media:|NA]]</td>
 
</tr>
 
</tr>
  
Line 120: Line 120:
 
<td>NA</td>
 
<td>NA</td>
 
<td>comments</td>
 
<td>comments</td>
<td>NA</td>
+
<td>[[Media:|NA]]</td>
<td>NA</td>
+
<td>[[Media:|NA]]</td>
 
</tr>
 
</tr>
  
Line 129: Line 129:
 
<td>NA</td>
 
<td>NA</td>
 
<td>comments</td>
 
<td>comments</td>
<td>NA</td>
+
<td>[[Media:|NA]]</td>
<td>NA</td>
+
<td>[[Media:|NA]]</td>
 
</tr>
 
</tr>
  
Line 138: Line 138:
 
<td>NA</td>
 
<td>NA</td>
 
<td>comments</td>
 
<td>comments</td>
<td>NA</td>
+
<td>[[Media:|NA]]</td>
<td>NA</td>
+
<td>[[Media:|NA]]</td>
 
</tr>
 
</tr>
  
Line 147: Line 147:
 
<td>NA</td>
 
<td>NA</td>
 
<td>comments</td>
 
<td>comments</td>
<td>NA</td>
+
<td>[[Media:|NA]]</td>
<td>NA</td>
+
<td>[[Media:|NA]]</td>
 
</tr>
 
</tr>
 
  
 
</table>
 
</table>

Revision as of 16:24, 11 March 2008

Comparative Data Analysis Ontology

The material previously on this page has been moved to CDAOManuscript.

This page is for ongoing work and contains links to supporting docs, past work, and sub-topics.

Test Data Sets

Each data set comes with a tree and a character matrix. The data are given in two formats where possible:

  • NEXUS
  • NeXML

There are four different categories of character sets:

  • cds: aligned nucleotides coded via IUPAC standard (T, C, G, A, and so on)
  • protein: aligned amino acids coded via IUPAC standard (A, C, D, E, F, G, H, I and so on)
  • continuous: numeric values of continuous characters (e.g., 0.001, 0.230)
  • morphology: discrete morphological characters with ad hoc numeric encoding (e.g., 0 = absent, 1 = present)

There are three grades of difficulty:

  • Simplified: small number of OTUs and characters; unambiguous states; single bifurcating tree
  • Typical: may contain many OTUs, multiple trees, polytomies, other stuff
  • Demanding: may contain ambiguous characters, mixed data types, notes, assumptions, etc.


type difficulty description comments NEXUS NeXML
CDS (DNA) Simplified Subset of 10 ATPase CDSs comments PF00137_10_cds.nex PF00137_10_cds.nexml
CDS (DNA) Typical Cytochrome C CDS sequences comments PF00034_39_cds.nex PF00034_39_cds.nexml
CDS (DNA) Typical Full set eukaryotic ATPase CDSs comments PF00137_47_cds.nex PF00137_47_cds.nexml
CDS (DNA) Demanding NA comments NA NA
Protein (AA) Simplified Subset of 10 ATPase sequences comments PF00137_10_protein.nex PF00137_10_protein.nexml
Protein (AA) Typical Full set eukaryotic ATPases comments PF00137_47_protein.nex PF00137_47_protein.nexml
Protein (AA) Demanding NA comments [[Media:|NA]] [[Media:|NA]]
Continuous Simplified NA comments [[Media:|NA]] [[Media:|NA]]
Continuous Typical NA comments [[Media:|NA]] [[Media:|NA]]
Continuous Demanding NA comments [[Media:|NA]] [[Media:|NA]]
Morphological Simplified NA comments [[Media:|NA]] [[Media:|NA]]
Morphological Typical NA comments [[Media:|NA]] [[Media:|NA]]
Morphological Demanding NA comments [[Media:|NA]] [[Media:|NA]]

Initial Implementation

Evaluation

Meeting Notes

Telecon, 7 March, 2007

present: Francisco Prodoscimi, Julie Thompson, Enrico Pontelli, Arlin Stoltzfus

What activities to do before the meeting? Plan for development?

  1. represent 4 simple test cases
    1. nt alignment plus tree
    2. prot alignment plus tree
    3. kinases with inhibitor sensitivity
    4. worm morphologies
  2. carry out operations with reasoning
    1. set and logic operations on characters and OTUs
    2. tree operations (clade selection, prune)
    3. other?
  3. map ontology to other representations
    1. NEXUS
    2. neXML
  4. start compiling list of concepts that are missing
    1. review Enrico's proposal
  5. look ahead to future challenges
    1. genetic encoding of characters
    2. ambiguous, multi-dimensional, or otherwise complex characters

Other issues for meeting and for paper

  • what is the scope?
  • How to integrate with other ontologies?
    • table from 'related artefacts' exercise
    • genetic code as a test case for integration
      • requires nt aa mapping to specify code
      • requires species taxonomy to assign code to species
      • requires cell ontology to assign code to compartmental genome (nuc, mito, cp)

Next meeting

  • telecon, 14 March, 2:00 pm UTC
  • agenda
    • nt and prot test data sets (arlin)
    • protege demo (brandon)

telecon, 14 March, 2:00 UTC

another meeting