Database Interop Hackathon/Metadata Support

From Evolutionary Informatics Working Group
Revision as of 15:55, 22 February 2009 by Hlapp (talk) (New page: == Metadata support == NeXML has ways of representing character data and trees, whose semantics are implicitly tied to CDAO (through references to the ontology term using [http://www.w3.o...)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Metadata support

NeXML has ways of representing character data and trees, whose semantics are implicitly tied to CDAO (through references to the ontology term using SAWSDL syntax). But, many use cases exist where users want to attach other information to these objects. NeXML has a free-form facility to allow this, however, mass adoption of this feature would lead to a soup of annotations lacking clearly defined semantics. Hence, we need to define how ontology-mediated annotations ('metadata' in our definition) can be used. During discussions at NESCent at the pre-meeting we fleshed out some examples of how this would work.

Attaching a concept to an element

In addition to having the NeXML schema (which defined xml schema classes) define the semantics by reference to their CDAO classes, use cases exist where we would need to tie a NeXML instance (in a document) to a CDAO instance. The solution we came with is to enclose RDF inside an "any" value of a dictionary attachment. The RDF specifies the CDAO class and creates a uniquely identifiable instance on the fly.

Assigning an XML element to a type means creating an instance of that type.

<xml> <tree>

 <node id="foo">
   <dict xmlns:cdao="http://evolutionaryontology.org/cdao">
     <any id="bar">
       <cdao:Node rdf:id="baz"/>
     </any>
   </dict>
 </node>

<tree> </xml>

Attaching a taxon identifier to an OTU through a relation

An example of why you would want to a NeXML instance in a document to a CDAO instance of a concept is shown below. It extends the previous example to satisfy a common use case: specifying a taxon identifier from some external resource for an otu element. Notice how this uses the CDAO concept "cdao:has_Taxonomy_Reference" to link the otu element (whose ephemeral id is foo) first to a cdao:TU instance (with id baz), which is subsequently linked to an entry in the Teleost taxonomy (whose record id is 1030219).

<xml>

 <otu id="foo">
   <dict xmlns:cdao="http://evolutionaryontology.org/cdao">
     <any id="bar">
       <cdao:TU rdf:id="baz">
         <cdao:has_Taxonomy_Reference
             rdf:resource="http://purl.org/OBO/TTO:1030219"/>
       </cdao:TU>
     </any>
   </dict>
 </otu>

</xml>

Attaching a concept to an element through a relation

Another use case along similar syntactical lines as the previous example would be to tie a node in a tree to an inferred gene function from the Gene Ontology. Here we use the CDAO construct "has_function" to specify the semantics of the reference to an external resource.

<xml> <tree>

 <node id="foo">
   <dict xmlns:cdao="http://evolutionaryontology.org/cdao">
     <any id="bar">
       <cdao:Node rdf:id="baz">
         <cdao:has_function rdf:resource="http://purl.org/OBO/GO:034"/>
       </cdao:Node>
     </any>
   </dict>
 </node>

<tree> </xml>

Specimens within collections

Another common use case for external references is one where a NeXML otu element is to be defined as a specimen in a museum, i.e. we want to specify an identifiable collection, and the number of the specimen within it. In this case we suggest using the TDWG Darwin Core syntax, which has constructs for institutions and catalog numbers. A query of the TDWG ontological activities turned up the TDWG core ontology, however, we are unclear about the status and direction of the CoreOntology (and how to use it), so we're leaving that out for now, choosing the mix the DarwinCore syntax with semantics.

<xml> <otu id="foo">

 <dict xmlns:dwc="http://rs.tdwg.org/dwc/dwcore">
   <any id="watever">
     <cdao:TU rdf:id="baz">
       <cdao:has_Specimen_Reference rdf:parseType="rdf:Literal">
         <dwc:InstitutionCode rdf:datatype="xsd:uri">http://purl.org/obo/COLLECTION:0000403</dwc:InstitutionCode>
         <dwc:CatalogNumber rdf:datatype="xsd:string">207388</dwc:CatalogNumber>
       </cdao:has_Specimen_Reference>
     </cdao:TU>
   </any>
 </dict

</otu> </xml>

Literature References

Use cases exist where the user would want to attach literature citation records to a phylogenetic object. For example if the users wants to track the provenance of data in a meta-analysis. The subsequent syntax of the record itself could simply be the widely used Dublic Core standard. Yes, this does mean mixing syntax and semantics to some extent, but we concluded that it's a reasonable solution because DC at least implies some semantics (albeit overloaded in some cases, regrettably), and it's syntactically concise.

The Dublin Core [guidelines for dc citations] recommend to provide authors (creators), title and publisher, along with a string giving bibliographic citation. However, the Dublin core does not define what a reference is. Therefore, minimally, we need a term that describes the concept of a reference.

Example 1: associate a reference with a tree (or other) element

In this example, we are just associating a reference with a tree, by placing it in the tree element. In XML using nexml conventions, this is what a literature reference would look like:

<xml> <tree id="foo">

 <dict xmlns:dc="http://purl.org/dc/elements/1.1" xmlns:dcterms="http://purl.org/dc/terms">
   <any id="foo235">
     <cdao:Tree rdf:id="bar">
      <cdao:has_Reference rdf:parseType="rdf:resource">
       <dc:creator>Hill, R. V.</dc:creator>
       <dc:title>Integration of Morphological Data Sets for Phylogenetic Analysis of Amniota:
                         The Importance of Integumentary Characters and Increased Taxonomic Sampling</dc:title>
       <dcterms:bibliographicCitation>Systematic Biology 54(4):530-547, 2005</dcterms:bibliographicCitation>
      </cdao:has_Reference>
    </cdao:Tree>
   </any>
 </dict>

</tree> </xml>

Example 2: associate a reference with a record

<xml> <nexml> <dict xmlns:obi="http://purl.obofoundry.org/obo/obi.owl" xmlns:cdao="http://evolutionaryontology.org/cdao/cdao.owl">

 <any id="foo235">
   <obi:IAO_0000100 rdf:id="foo234">
     <cdao:has_Reference rdf:parseType="Resource">
       <dc:creator>Hill, R. V.</dc:creator>
       <dc:title>Integration of Morphological Data Sets for Phylogenetic Analysis of Amniota:
                     The Importance of Integumentary Characters and Increased Taxonomic Sampling</dc:title>
       <dcterms:bibliographicCitation>Systematic Biology 54(4):530-547, 2005</dcterms:bibliographicCitation>
     </cdao:has_Reference>
   </obi:IAO_0000100>
 </any>

</dict> </nexml> </xml>

Example 3: specifying that a reference represents "supporting evidence" for a nexml element

Work in progress


OBO phenotype

<xml> <state id="x88913" label="present" symbol="1">

 <dict xmlns:cdao="http://www.evolutionaryontology.org/cdao.owl"
          xmlns:phen="http://www.bioontologies.org/obd/schema/pheno">
   <any id="foo345">
     <cdao:Categorical rdf:id="foo456">
       <cdao:has_ExternalReference rdf:parseType="Literal">
         <phen:phenotype_character>
           <phen:description/>
           <phen:bearer>
             <phen:typeref about="TAO:0000203"/>
           </phen:bearer>
           <phen:quality>
             <phen:typeref about="PATO:0000467"/>
           </phen:quality>
         </phen:phenotype_character>
       </cdao:has_ExternalReference>
     </cdao:Categorical>
   </any>
 </dict>

</state> </xml>

Segments of character data

case of composite otus. create container of characters, refer to this.