- 1 Supporting Integrative Access to Cross-species Anatomy and Morphology Information
Supporting Integrative Access to Cross-species Anatomy and Morphology Information
Comparative analysis is a cornerstone of biology. Currently, formal knowledge of anatomy and morphology is "siloed" in species-specific ontologies. Comparative analysis requires a mapping of corresponding parts-- homologies-- between species. Much of this mapping could be done with automated methods, whereas other parts of the mapping require expert curation based on standards yet to be determined. In this proposal, we will establish a collaborative network to develop standards for assigning and annotating homology relations among animal anatomies, a technology infrastucture to support community annotation of homology relations, and a public interface to access these mappings.
What are the collaborative aspects of this project? First, there is a collaboration between animal biologists working in different organism-centered research programs (e.g., fish vs. spiders), all driven by the idea of homology as the relationship that makes integrative, comparative work on phenotypes possible. We will provide the organization and infrastruture for these biologists to develop a knowledge base of homology relations in animals. The knowledge base will consist of an annotated mapping between the different anatomy ontologies for different species. Facilitating this work will involve some software development to support homology annotation, but mainly it will involve development of community-based standards and training of experts in the software-facilitated curation of homology relations. Second, there is a collaboration of biologists and computer scientists. The expert-curated approach to annotating binary homology relations (i.e., mapping parts of one species to parts of another species) is not scalable to large numbers of species. Therefore we will explore the use of automated methods of ontology alignment to discover homology relations, using biological experts to provide feedback on the process (e.g., expert-curated reference standards).
- provide training and develop infrastructure to support expert curation of homology relations
- training workshops
- workshops to develop standards
- annotation tool
- database supporting curated relations (and views)
- align anatomy ontologies from species X, Y and Z
- automated methods (graph alignment)
- expert-curated methods
- develop programmable interface to access homology relationships
- test on use cases
- explore scaleable and long-term solutions to the homology information problem
Here are some randomly ordered points.
- this proposal focuses on facilitating the integration of data from different species via homology relationships
- who are the end-users for an integrative tools like this? expert biologists in organism-centered research programs are not necessarily interested in integrating data from their favorite taxon with data from other taxa, therefore, these experts are not necessarily the likely end-users for a system for discovering and presenting homology relations; however they might be end-users if they depend strongly on a comparative approach and the system is sophisticated enough to be useful for them; its more likely however that the end-users of homology-based integration tools would be researchers trying to find patterns or trying to do big integrative projects in "omics" (like the zfin projection or the bgee project).
- the problem of aligning two ontologies or taxonomies can be understood as a problem of "graph alignment". This problem is of interest because of many areas in which there are overlapping ontologies. Graph alignment methods can be developed for different types of problems, depending on what kind of graph is being aligned. For instance, taxonomies have only "is_a" relationships, but other ontologies have a richer set of relationships that can be used to improve the alignment between two ontologies.
- strategies for mapping binary homology relations are not scaleable, especially if they are manually curated. If we have N species with anatomy ontologies, then there are N(N-1)/2 unordered pairwise mappings: the number of mappings increases with the square of the number of anatomy ontologies, i.e., twice as many ontologies means 4 times as many homology relations to map and to maintain. Every computer scientist is going to understand this point. What this means is that a strategy based on curated pairwise relationships simply is not scalable. The ontologies must be mapped to some kind of central reference ontology, or else there must be some automated way to do the binary mapping.
- however, curated homology relations could be used as a reference point, to calibrate automated methods.
- Robinson-Rechavi and colleagues (Lausanne, Switzerland) have done a 4-way mapping between xenopus, danio, mus and homo, for the purposes of comparing gene expression patterns. This could be a good starting point for an evaluation (i.e., employ experts to evaluate their mappings). See http://bgee.unil.ch
Results from past research
training workshops and other meetings
Some small focused "meetings" could make use of remote collaboration technologies (e.g., webex).
- training workshop for expert curators mapping homology relationships
- goals of project to map homologies
- problems with homology
- standards for annotation
- types of evidence
- focus groups to work on problem areas
- problematic aspects of homology
- evaluation plan
- use cross-validation of homology annotations (compare results from different annotators)
The program does not provide a large amount of money for software development. Some of this would be spent on managing RCN activities and maintaining information. For the software development, it would be important to team up with a group that can provide more support for software development.
The software outputs would be mainly 3 things: web-based annotation environment; web-based query and search tool for homology relations; and methods for automated discovery of homology relations (by graph alignment).
- infrastructure support for annotation
- annotation interface with pick list for homology evidence codes, ontology terms, etc -- the expert curator uses this tool to annotate homology relations between anatomy terms of species X and species Y
- multi-user, e.g., with views to allow 2 different teams of annotators to create different mappings between the same pair of species
- link images
- supports any standards developed as part of the project (e.g., if there is a minimal standard for a curator homology relationship, the tool would be able to enforce this on curators as a constraint)
- web-based query and search tool for homology relations -- how the user (researcher) finds corresponding parts between species
- semantics-based term search (input = anatomy term from source species; output = term from destination species)
- local graph alignment (input = local graph showing part of source ontology; output = local graph alignment of source and destination); this would be nice if it were supplemented with pictures; note that the graph just shows connections between terms, e.g., knee_bone is_connected_to thigh_bone and thigh_bone is_connected_to hip_bone, and so on, but the graph of the connections could have images of the parts;
- other searches based on evidence code, annotator identity, homology definition, etc.
- graph alignment -- automated method to identify correspondences (homologies) between different anatomy ontologies
- try out existing graph alignment methods, identify weaknesses
- adapt available method to anatomy ontologies (e.g., take advantage of domain-specific relations)
- evaluation plan -- ways to measure how well the tools are working
- compare automated results with expert valuations
- use automated discovery of potential problems (local complexities in graph alignment; this also could be used to discover interesting evolutionary changes, though the "discoveries" likely will not be novel)
- animal homologies (vertebrates, worm, fly, bee, spider, others?)
- Genotype-phenotype map
- population data on variability of parts for 2 species.
- Phenotypic evolution
- meeting costs
- curation workshops
- curation jamborees
- infrastructure design workshop
- staffing costs
- software design and implementation
- use case testing