Difference between revisions of "CDAO"

From Evolutionary Informatics Working Group
Jump to: navigation, search
(Working meeting march 24-april 4)
(Meeting Notes)
Line 216: Line 216:
 
==Meeting Notes==
 
==Meeting Notes==
  
===Telecon, 7 March, 2007===
+
===Working meeting march 24-april 4===
 +
 
 +
====Day 4, Thursday Mar. 27, 2008====
 +
 
 +
1. We have now produced a more consistent version of the ontology presenting (almost) all priority 1 concepts and also some priority 2 ones -- we've missed just a couple of priority 1 concepts that we didn't understand very well and that we'll be able to add into the ontology next week, after discussion with the other members of the group.
 +
 
 +
2. We believe this version of the ontology is much clearer and the relationship between classes are better described. Some concepts considered as classes before are now represented as object_properties or datatype_properties (and vice-versa). We have also restricted some of the datatype_properties to a set of limited values, avoiding misrepresentation of data. We think we've found a good representation of some difficult inter-related concepts such as the relationship between transformation, branch modification, character, OTU and character-state modification. I hope that we can re-refine this representation during the next week.
 +
 
 +
3. Brandon has finished a preliminary version of his algorithm that reads and interprets the NEXUS files and tomorrow morning he'll be adding to it the module that reads our new ontology -- he has an idea of some modules to use and he thinks there will be no problem with this.  We hope to finish the day tomorrow with the complete and validated-by-hand representation of at least 2 nexus files in an ontology XML format.
 +
 
 +
====Day 3, Wednesday Mar. 26, 2008====
 +
 
 +
1. Optimizing the ontology
 +
 
 +
Today, we began discussing two versions of a simplified ontology we made yesterday (each of us made our own simplified version). Finally, we realized that the original ontology made on Monday containing all the concepts was not very well encapsulated and we prefer to begin another one. We have checked the best descriptions made by each of us and produced a cleaner and more consistent ontology using the best way to describe concepts we've made. We've discussed differences in data representation and have come to an agreement on the best way to represent different kinds of data.
 +
 
 +
Although the new ontology is cleaner, more understandable and the concepts are inter-related in a better way, it still lacks some synonyms and some important concepts. We'll add them by importing from the original complete ontology in a step-by-step manner, testing each concept and their relationships before adding the next one.
 +
 
 +
 
 +
2. Automating the representation of test sets
 +
 
 +
Since we spent lots of time yesterday afternoon and this morning trying to represent 3-OTUs with 5-characters in the Protégé ontology manually, in the afternoon we decided that we would need at least a very preliminary algorithm to read the input test files made by Arlin and translate them into a file to be read and checked inside Protégé.
 +
 
 +
Brandon has spent the afternoon producing this algorithm (although he hasn't finished it yet, he has advanced well). In the meantime, Francisco continued to look into the simplified ontologies we have made and to add new concepts into them. Although they still lack many of Arlin's concepts with priority 1, we think that these new ontologies we have made beginning from zero are more internally-consistent and they will allow better representation of the data than the original one produced on Monday.
  
present: Francisco Prodoscimi, Julie Thompson, Enrico Pontelli, Arlin Stoltzfus
+
====Day 2, Tuesday Mar. 25, 2008====
  
What activities to do before the meeting?  Plan for development?
+
1. Revisions
# represent 4 simple test cases
+
We added synonyms to the ontology, in the needed places.
## nt alignment plus tree
+
We also separated characters and their related classes and properties
## prot alignment plus tree
+
into a separate ontology in order to better encapsulate these elements
##  kinases with inhibitor sensitivity
+
so that they could be refined in isolation without disturbing the other
##  worm morphologies
+
parts. This additionally helps to reduce confusion between properties
# carry out operations with reasoning
+
while working.
## set and logic operations on characters and OTUs
 
## tree operations (clade selection, prune)
 
## other? 
 
# map ontology to other representations
 
## NEXUS
 
## neXML
 
# start compiling list of concepts that are missing
 
## review Enrico's proposal
 
# look ahead to future challenges
 
## genetic encoding of characters
 
## ambiguous, multi-dimensional, or otherwise complex characters
 
  
Other issues for meeting and for paper
+
2. Examples
* what is the scope?
+
We started work encoding the examples provided on the Wiki page.
* How to integrate with other ontologies?
+
This encoding is not yet complete, but we are making progress, and have
** table from 'related artefacts' exercise
+
identified and made a few necessary changes to fix earlier errors such as
** genetic code as a test case for integration
+
relating traits/characters to edges rather than OLT's
*** requires nt aa mapping to specify code
 
*** requires species taxonomy to assign code to species
 
*** requires cell ontology to assign code to compartmental genome (nuc, mito, cp)
 
  
Next meeting
+
3. Testing and Protege Training
* telecon, 14 March, 2:00 pm UTC
 
* agenda
 
** nt and prot test data sets (arlin)
 
** protege demo (brandon)
 
  
===telecon, 14 March, 2:00 UTC===
+
As part of the testing process we each made simplified versions of the ontology
 +
and worked on encoding the examples, so that we could identify the critical
 +
components, transfer knowledge about protege, and also work out problem spots
 +
in a simple environment where they would be, most likely, easier to fix. Additionally
 +
the import system has proven to be somewhat brittle so while the encapsulation is desireable,
 +
until each of the sub-parts is stable, it iseasier to work with them as a single ontology file.
  
 +
4. Updates available
  
===Working meeting march 24-april 4===
+
We have uploaded the current versions of our work to
 +
[[http://www.cs.nmsu.edu/~bchisham/ontology/ ]]
 +
It's now available as both OWL and Protege Project files.
  
 
====Day 1, Monday====
 
====Day 1, Monday====
Line 279: Line 292:
 
** Root
 
** Root
 
** Basal
 
** Basal
 
  
 
* Group 2 - Tree related
 
* Group 2 - Tree related
Line 302: Line 314:
 
** Polytomy
 
** Polytomy
 
** Unrooted
 
** Unrooted
 
  
 
* Group 3 - Character related
 
* Group 3 - Character related
Line 314: Line 325:
 
** State
 
** State
 
** Missing data
 
** Missing data
 
  
 
* Group 4 - Others
 
* Group 4 - Others
Line 323: Line 333:
 
** Taxon
 
** Taxon
 
** Taxonomic Rank
 
** Taxonomic Rank
 
  
 
Then, we defined the synonymous usage of terms. When the terms are synonymous concepts or representation, we chose just one of them to present.
 
Then, we defined the synonymous usage of terms. When the terms are synonymous concepts or representation, we chose just one of them to present.
 
  
 
* Group 1
 
* Group 1
Line 338: Line 346:
 
** Basal
 
** Basal
 
(These two concepts may be derived from an algorithm reading the ontology-annotated file, but they are not explicitly defined in the ontology itself. The information is there, but no specific concept is provided. If we choose to represent all the MRCA of all OTU/HTU and which TUs are more or less basal than other ones, we think the representation file would be very big.)
 
(These two concepts may be derived from an algorithm reading the ontology-annotated file, but they are not explicitly defined in the ontology itself. The information is there, but no specific concept is provided. If we choose to represent all the MRCA of all OTU/HTU and which TUs are more or less basal than other ones, we think the representation file would be very big.)
 
 
  
 
* Group 2
 
* Group 2
Line 353: Line 359:
 
** Topology = Phylogenetic Tree Topology
 
** Topology = Phylogenetic Tree Topology
 
(the topology is something we need to have to build ontology-based representations, it is imported from NEXUS file and it can be retrieve by the ontology file through child-parent relationships)
 
(the topology is something we need to have to build ontology-based representations, it is imported from NEXUS file and it can be retrieve by the ontology file through child-parent relationships)
 
 
  
 
* Group 3 - Character related
 
* Group 3 - Character related
Line 382: Line 386:
 
* Question : can synonyms be represented in Protégé? I think it would be useful for scientists to be able to choose the term they want to use.
 
* Question : can synonyms be represented in Protégé? I think it would be useful for scientists to be able to choose the term they want to use.
  
 +
===telecon, 14 March, 2:00 UTC===
  
====Day 2, Tuesday Mar. 25, 2008====
+
skipped this
  
1. Revisions
+
===Telecon, 7 March, 2007===
We added synonyms to the ontology, in the needed places.
 
We also separated characters and their related classes and properties
 
into a separate ontology in order to better encapsulate these elements
 
so that they could be refined in isolation without disturbing the other
 
parts. This additionally helps to reduce confusion between properties
 
while working.
 
  
2. Examples
+
present: Francisco Prodoscimi, Julie Thompson, Enrico Pontelli, Arlin Stoltzfus
We started work encoding the examples provided on the Wiki page.
 
This encoding is not yet complete, but we are making progress, and have
 
identified and made a few necessary changes to fix earlier errors such as
 
relating traits/characters to edges rather than OLT's
 
  
3. Testing and Protege Training
+
What activities to do before the meeting?  Plan for development?
 +
# represent 4 simple test cases
 +
## nt alignment plus tree
 +
## prot alignment plus tree
 +
##  kinases with inhibitor sensitivity
 +
##  worm morphologies
 +
# carry out operations with reasoning
 +
## set and logic operations on characters and OTUs
 +
## tree operations (clade selection, prune)
 +
## other? 
 +
# map ontology to other representations
 +
## NEXUS
 +
## neXML
 +
# start compiling list of concepts that are missing
 +
## review Enrico's proposal
 +
# look ahead to future challenges
 +
## genetic encoding of characters
 +
## ambiguous, multi-dimensional, or otherwise complex characters
  
As part of the testing process we each made simplified versions of the ontology
+
Other issues for meeting and for paper
and worked on encoding the examples, so that we could identify the critical
+
* what is the scope?
components, transfer knowledge about protege, and also work out problem spots
+
* How to integrate with other ontologies?
in a simple environment where they would be, most likely, easier to fix. Additionally
+
** table from 'related artefacts' exercise
the import system has proven to be somewhat brittle so while the encapsulation is desireable,
+
** genetic code as a test case for integration
until each of the sub-parts is stable, it iseasier to work with them as a single ontology file.
+
*** requires nt aa mapping to specify code
 +
*** requires species taxonomy to assign code to species
 +
*** requires cell ontology to assign code to compartmental genome (nuc, mito, cp)
  
4. Updates available
+
Next meeting
 
+
* telecon, 14 March, 2:00 pm UTC
We have uploaded the current versions of our work to
+
* agenda
[[http://www.cs.nmsu.edu/~bchisham/ontology/ ]]
+
** nt and prot test data sets (arlin)
It's now available as both OWL and Protege Project files.
+
** protege demo (brandon)
 
 
====Day 3, Wednesday Mar. 26, 2008====
 
 
 
1. Optimizing the ontology
 
 
 
Today, we began discussing two versions of a simplified ontology we made yesterday (each of us made our own simplified version). Finally, we realized that the original ontology made on Monday containing all the concepts was not very well encapsulated and we prefer to begin another one. We have checked the best descriptions made by each of us and produced a cleaner and more consistent ontology using the best way to describe concepts we've made. We've discussed differences in data representation and have come to an agreement on the best way to represent different kinds of data.
 
 
 
Although the new ontology is cleaner, more understandable and the concepts are inter-related in a better way, it still lacks some synonyms and some important concepts. We'll add them by importing from the original complete ontology in a step-by-step manner, testing each concept and their relationships before adding the next one.
 
 
 
 
 
2. Automating the representation of test sets
 
 
 
Since we spent lots of time yesterday afternoon and this morning trying to represent 3-OTUs with 5-characters in the Protégé ontology manually, in the afternoon we decided that we would need at least a very preliminary algorithm to read the input test files made by Arlin and translate them into a file to be read and checked inside Protégé.
 
 
 
Brandon has spent the afternoon producing this algorithm (although he hasn't finished it yet, he has advanced well). In the meantime, Francisco continued to look into the simplified ontologies we have made and to add new concepts into them. Although they still lack many of Arlin's concepts with priority 1, we think that these new ontologies we have made beginning from zero are more internally-consistent and they will allow better representation of the data than the original one produced on Monday.
 
 
 
====Day 4, Thursday Mar. 27, 2008====
 
 
 
1. We have now produced a more consistent version of the ontology presenting (almost) all priority 1 concepts and also some priority 2 ones -- we've missed just a couple of priority 1 concepts that we didn't understand very well and that we'll be able to add into the ontology next week, after discussion with the other members of the group.
 
 
 
2. We believe this version of the ontology is much clearer and the relationship between classes are better described. Some concepts considered as classes before are now represented as object_properties or datatype_properties (and vice-versa). We have also restricted some of the datatype_properties to a set of limited values, avoiding misrepresentation of data. We think we've found a good representation of some difficult inter-related concepts such as the relationship between transformation, branch modification, character, OTU and character-state modification. I hope that we can re-refine this representation during the next week.
 
 
 
3. Brandon has finished a preliminary version of his algorithm that reads and interprets the NEXUS files and tomorrow morning he'll be adding to it the module that reads our new ontology -- he has an idea of some modules to use and he thinks there will be no problem with this.  We hope to finish the day tomorrow with the complete and validated-by-hand representation of at least 2 nexus files in an ontology XML format.
 
  
 
==Related Work==
 
==Related Work==
 
* we are working on a direct generation of an ontology from the Concept Glossary. We are documenting the progress at this page [http://www.cs.nmsu.edu/~epontell/Glossary/index.html]. Note that the page is not up-to-date at this moment (hopefully it will be by the end of the day or tomorrow [3/18/2008]). The goal is to eventually show that CDAO can map over all these concepts.
 
* we are working on a direct generation of an ontology from the Concept Glossary. We are documenting the progress at this page [http://www.cs.nmsu.edu/~epontell/Glossary/index.html]. Note that the page is not up-to-date at this moment (hopefully it will be by the end of the day or tomorrow [3/18/2008]). The goal is to eventually show that CDAO can map over all these concepts.

Revision as of 08:39, 1 April 2008

Comparative Data Analysis Ontology

The material previously on this page has been moved to CDAOManuscript.

This page is for ongoing work and contains links to supporting docs, past work, and sub-topics.

Protege

  • Some slides illustrating a brief introduction to the use of Protege [[1]HTML] [[2]PDF] [[3]Flash]


Initial test-driven development strategy

To get started, we propose to use a test-driven strategy based on explicit tests of the basic concepts from the ConceptGlossary. Attached is the media:prioritized_concept_list.txt (1 is highest priority, 3 is lowest). Here is how it works. Imagine we have a *high-level test language* and this is the code for testing the ontology on its implementation of the "ancestor" concept:

load_ontology("CDAO");
load_data("ancestor_test.nex"); 
statements = { "otuA is_a ancestor_of otuB", "htuAB is_a ancestor_of otuB" }; 
truth_value = { "false", "true" }; 
evaluate( statements, answers ); 

Here is the "ancestor_test.nex" file:

#NEXUS
BEGIN TAXA;
      dimensions ntax=4;
      taxlabels A B C D;  
END;
BEGIN TREES;
      tree bush = [&R] ((otuA,otuB)htuAB,(otuC,otuD)htuCD)htuABCD;
END;

I'm hoping to [[4]attach a tar file] with tests for concepts, but the wiki does not like tar files. I can send it via email. The files come in pairs,

<concept><test_number>.nex
<concept><test_number>.tab

The first file is a NEXUS file with the data. The second file is a table of statements for evaluation, with fields statement_number, truth_value, statement. Right now I am using a three-valued logic (true false and unknown or indeterminate), e.g., if the tree is not rooted, then whether an internal node is the ancestor of some other node is indeterminate.

More elaborate test data sets

Each data set comes with a tree and a character matrix in NEXUS format. To explore these data sets you may wish to:

There are four different categories of character sets:

  • DNA: aligned nucleotides coded via IUPAC standard (T, C, G, A, and so on)
  • protein: aligned amino acids coded via IUPAC standard (A, C, D, E, F, G, H, I and so on)
  • continuous: numeric values of continuous characters (e.g., 0.001, 0.230)
  • morphology: discrete morphological characters with ad hoc numeric encoding (e.g., 0 = absent, 1 = present)

The DNA data are "CDS" or "coding sequence" data, meaning the sequence of nucleotide triplets in the protein-coding part of a gene.

There are three grades of difficulty:

  • Simplified: small number of OTUs and characters; unambiguous states; single bifurcating tree
  • Typical: may contain many OTUs, multiple trees, polytomies, other stuff
  • Demanding: may contain ambiguous characters, mixed data types, notes, assumptions, etc.


type difficulty description comments NEXUS
CDS (DNA) Simplified Subset of 10 ATPase CDSs comments PF00137_10_cds.nex
CDS (DNA) Typical Eukaryotic cytochrome C CDSs comments PF00034_39_cds.nex
CDS (DNA) Typical Eukaryotic ATPase CDSs comments PF00137_47_cds.nex
CDS (DNA) Demanding NA comments [[Media:|NA]]
Protein (AA) Simplified Subset of 10 ATPases comments PF00137_10_protein.nex
Protein (AA) Typical Eukaryotic cytochrome Cs comments PF00034_39_protein.nex
Protein (AA) Typical Eukaryotic ATPases comments PF00137_47_protein.nex
Protein (AA) Demanding NA comments [[Media:|NA]]
Continuous Simplified NA comments [[Media:|NA]]
Continuous Typical Inhibitor sensitivity data for human kinases -log(IC50) scaled kinase_rescaled3_sets.nex
Continuous Demanding NA comments [[Media:|NA]]
Morphological Simplified NA comments [[Media:|NA]]
Morphological Typical Nematode vulval morphology and development Kiontke, et al., 2007 Kiontke_CB_fixed.nex
Morphological Demanding NA comments [[Media:|NA]]

Initial Implementation

  • The preliminary draft of the CDAO work done at NMSU is available here [5]. This is a current view of the content of the ontology [6]. In particular
    • MAO-Prime: [[7]Web page] this is a Protege implementation of the MAO along with the inclusion of some description of individual nucleotides, amino-acids, and meta symbols such as gap.
    • CDAO: [[8]Web page] this is a fairly direct implementation of the draft ontology developed during the Fall meeting of the EvoInfo group at NESCent
    • Transformations: [[9]Web Page] During the Fall meeting we discussed the need of including in the ontology a description of possible transformations; this is an attempt of this.
    • Tree: [[10]Web Page] this is a draft ontology for the description of trees, mostly drawn from Nexus and from Chado.


Evaluation

  • Some preliminary considerations:
    • Comparison of NeXML elements with ontology concepts (Updated Feb. 18, 2008) [11]
    • Comparison of Nexus elements with ontology concepts (Updated Mar. 1, 2008) [12]
    • Comparison of CHADO (Phylogeny Module) elements with ontology concepts (Added Feb. 25, 2008) [13]

Meeting Notes

Working meeting march 24-april 4

Day 4, Thursday Mar. 27, 2008

1. We have now produced a more consistent version of the ontology presenting (almost) all priority 1 concepts and also some priority 2 ones -- we've missed just a couple of priority 1 concepts that we didn't understand very well and that we'll be able to add into the ontology next week, after discussion with the other members of the group.

2. We believe this version of the ontology is much clearer and the relationship between classes are better described. Some concepts considered as classes before are now represented as object_properties or datatype_properties (and vice-versa). We have also restricted some of the datatype_properties to a set of limited values, avoiding misrepresentation of data. We think we've found a good representation of some difficult inter-related concepts such as the relationship between transformation, branch modification, character, OTU and character-state modification. I hope that we can re-refine this representation during the next week.

3. Brandon has finished a preliminary version of his algorithm that reads and interprets the NEXUS files and tomorrow morning he'll be adding to it the module that reads our new ontology -- he has an idea of some modules to use and he thinks there will be no problem with this. We hope to finish the day tomorrow with the complete and validated-by-hand representation of at least 2 nexus files in an ontology XML format.

Day 3, Wednesday Mar. 26, 2008

1. Optimizing the ontology

Today, we began discussing two versions of a simplified ontology we made yesterday (each of us made our own simplified version). Finally, we realized that the original ontology made on Monday containing all the concepts was not very well encapsulated and we prefer to begin another one. We have checked the best descriptions made by each of us and produced a cleaner and more consistent ontology using the best way to describe concepts we've made. We've discussed differences in data representation and have come to an agreement on the best way to represent different kinds of data.

Although the new ontology is cleaner, more understandable and the concepts are inter-related in a better way, it still lacks some synonyms and some important concepts. We'll add them by importing from the original complete ontology in a step-by-step manner, testing each concept and their relationships before adding the next one.


2. Automating the representation of test sets

Since we spent lots of time yesterday afternoon and this morning trying to represent 3-OTUs with 5-characters in the Protégé ontology manually, in the afternoon we decided that we would need at least a very preliminary algorithm to read the input test files made by Arlin and translate them into a file to be read and checked inside Protégé.

Brandon has spent the afternoon producing this algorithm (although he hasn't finished it yet, he has advanced well). In the meantime, Francisco continued to look into the simplified ontologies we have made and to add new concepts into them. Although they still lack many of Arlin's concepts with priority 1, we think that these new ontologies we have made beginning from zero are more internally-consistent and they will allow better representation of the data than the original one produced on Monday.

Day 2, Tuesday Mar. 25, 2008

1. Revisions

We added synonyms to the ontology, in the needed places. 

We also separated characters and their related classes and properties into a separate ontology in order to better encapsulate these elements so that they could be refined in isolation without disturbing the other parts. This additionally helps to reduce confusion between properties while working.

2. Examples

We started work encoding the examples provided on the Wiki page. 

This encoding is not yet complete, but we are making progress, and have identified and made a few necessary changes to fix earlier errors such as relating traits/characters to edges rather than OLT's

3. Testing and Protege Training

As part of the testing process we each made simplified versions of the ontology

and worked on encoding the examples, so that we could identify the critical components, transfer knowledge about protege, and also work out problem spots in a simple environment where they would be, most likely, easier to fix. Additionally the import system has proven to be somewhat brittle so while the encapsulation is desireable, until each of the sub-parts is stable, it iseasier to work with them as a single ontology file.

4. Updates available

We have uploaded the current versions of our work to [[14]] It's now available as both OWL and Protege Project files.

Day 1, Monday

We began by checking the concepts in the prioritized_concept_list, trying to make them available in the current version of the ontology. Most of the concepts were added in the tree subsection, although a number of them were shown to be redundant or better represented as other terms and relationships. We have also converted some terms that were classes to properties and other from properties to classes -- in the context of an OWL representation.

First we grouped the terms in related groups:

  • Group 1 - TU related
    • Descendant
    • Ancestor
    • HTU
    • Hypothetical Taxonomic Unit
    • Most Recent Common Ancestor
    • MRCA
    • Operational Taxonomic Unit
    • OTU
    • Outgroup
    • Leaf node
    • Terminal node
    • Root
    • Basal
  • Group 2 - Tree related
    • Branch support
    • Tree
    • Unresolved
    • Cladrogram
    • Dichotomy
    • Edge
    • Fully resolved
    • Monophyly
    • Network
    • Bifurcation
    • Phylogenetic Tree
    • Phylogentic Tree Topology
    • Bipartition
    • Bootstrap support
    • Branch
    • Subtree
    • Lineage
    • Topology
    • Polytomy
    • Unrooted
  • Group 3 - Character related
    • Trait
    • Character
    • Character-state
    • Character-State Data Matrix
    • Derived
    • Apomorphy
    • Primitive
    • State
    • Missing data
  • Group 4 - Others
    • Gap
    • Indel
    • Homology
    • Polymorphism
    • Taxon
    • Taxonomic Rank

Then, we defined the synonymous usage of terms. When the terms are synonymous concepts or representation, we chose just one of them to present.

  • Group 1
    • HTU = Hypothetical Taxonomic Unit = Ancestor
    • Leaf node = OTU = Operational Taxonomic Unit = Terminal node
    • Descendant = Child
    • Root
    • Outgroup
    • Most Recent Common Ancestor = MRCA
    • Basal

(These two concepts may be derived from an algorithm reading the ontology-annotated file, but they are not explicitly defined in the ontology itself. The information is there, but no specific concept is provided. If we choose to represent all the MRCA of all OTU/HTU and which TUs are more or less basal than other ones, we think the representation file would be very big.)

  • Group 2
    • Tree = Cladogram = Network = Phylogenetic Tree
    • Dichotomy = Fully resolved = Bifurcation = Monophyly = Bipartition
    • Edge = Branch
    • Polytomy = Unresolved
    • Unrooted
    • Subtree = Lineage
    • Branch confidence level = Branch support = Bootstrap support

(here we used confidence level as it can support any confidence analysis, even if bootstrap is the most used)

    • Topology = Phylogenetic Tree Topology

(the topology is something we need to have to build ontology-based representations, it is imported from NEXUS file and it can be retrieve by the ontology file through child-parent relationships)

  • Group 3 - Character related
    • Trait: Defined as any characteristic of the TU that the annotator would like to describe
    • Character: Defined as the characteristics used for evolutionary classification
    • State = Character-state
    • Derived = Apomorphy
    • Primitive
    • Missing data
    • Character-State Data Matrix

(This would be in the input file of the ontology and could also be retrieved from the ontology-annotated file by algorithms)


  • Group 4 - Others
    • Gap = defined in the transformation
    • Indel = defined in the transformation
    • Homology
    • Polymorphism = we didn't understand what it means
    • Taxon = defined as a property of an OTU
    • Taxonomic Rank


Once all these concepts were defined and added to the ontology, we began to make a simple representation of a simple hypothetical dataset. During this preliminary representation we have found some errors, and modified some concepts from properties to classes (such like the branch one, etc). Moreover, we had some difficulties to work with Protégé since it seems to be in a very beta release and each time we found something that would be represented better in the ontology by changing slightly the concepts, we need to rebuild and re-enter manually all the concepts in our test set.

  • Question : can synonyms be represented in Protégé? I think it would be useful for scientists to be able to choose the term they want to use.

telecon, 14 March, 2:00 UTC

skipped this

Telecon, 7 March, 2007

present: Francisco Prodoscimi, Julie Thompson, Enrico Pontelli, Arlin Stoltzfus

What activities to do before the meeting? Plan for development?

  1. represent 4 simple test cases
    1. nt alignment plus tree
    2. prot alignment plus tree
    3. kinases with inhibitor sensitivity
    4. worm morphologies
  2. carry out operations with reasoning
    1. set and logic operations on characters and OTUs
    2. tree operations (clade selection, prune)
    3. other?
  3. map ontology to other representations
    1. NEXUS
    2. neXML
  4. start compiling list of concepts that are missing
    1. review Enrico's proposal
  5. look ahead to future challenges
    1. genetic encoding of characters
    2. ambiguous, multi-dimensional, or otherwise complex characters

Other issues for meeting and for paper

  • what is the scope?
  • How to integrate with other ontologies?
    • table from 'related artefacts' exercise
    • genetic code as a test case for integration
      • requires nt aa mapping to specify code
      • requires species taxonomy to assign code to species
      • requires cell ontology to assign code to compartmental genome (nuc, mito, cp)

Next meeting

  • telecon, 14 March, 2:00 pm UTC
  • agenda
    • nt and prot test data sets (arlin)
    • protege demo (brandon)

Related Work

  • we are working on a direct generation of an ontology from the Concept Glossary. We are documenting the progress at this page [15]. Note that the page is not up-to-date at this moment (hopefully it will be by the end of the day or tomorrow [3/18/2008]). The goal is to eventually show that CDAO can map over all these concepts.