Supporting MIAPA

From Evolutionary Informatics Working Group
Revision as of 14:22, 28 May 2008 by (talk) (Knowledge Capture Exercise)
Jump to: navigation, search


Domain scientists with an interest in the archiving and re-use of phylogenetic data have called for a reporting standard designated "Minimal Information for a Phylogenetic Analysis", or MIAPA (Leebens-Mack, et al. 2006). Ideally the research community would develop, and adhere to, a standard that imposes a minimal reporting burden yet ensures that the reported data can be interpreted and re-used. Such a standard might be adopted by

  • data repositories such as TreeBase or Dryad
  • journals that publish supplementary material for phylogenetic studies
  • granting organizations that support phylogenetic studies
  • organizations that develop taxonomic nomenclature

Currently MIAPA is aspirational and represents an open call for further work. As a starting point, Leebens-Mack, et al. suggest that a study should report objectives, sequences, taxa, alignment method, alignment, phylogeny inference method, and phylogeny.

For the evolutionary informatics working group, the significance of MIAPA is the way that development of MIAPA could synergize with our efforts to promote interoperability. Data re-use, which is the goal of MIAPA, is one aspect (perhaps a symptom) of interoperability. To achieve re-use through compliance with reporting standards, we need to develop technology that makes it easy to comply with the standard, e.g., a nice GUI that makes it easy for users to construct a MIAPA-compliant submission. To enhance re-use through data-mining or reasoning (with the data and metadata), we need a controlled vocabulary, ideally an ontology. Developing such an ontology not only would jump-start the MIAPA project, it also would contribute to our efforts to develop a language to describe Transition Models, and it would represent a step in the direction of our long-term goal of developing a domain-specific language for phylogenetic analysis.

Some thoughts on developing MIAPA

Leebens-Mack, et al. called for further work, attempting to attract attention to this idea in order to stimulate effort. However, there has been no further effort to develop MIAPA. The NESCent evolutionary informatics working group invited Dr. Leebens-Mack to speak, and there was general agreement with the value of developing a MIAPA standard, and with the importance of providing ways to support a MIAPA standard through nexml and CDAO.

  1. What it might mean to have an effective MIAPA standard:
    • an explicit (possibly formal) description of the standard, specifying types of data and metadata
    • an explicit conformance policy
    • a controlled vocabulary for data and metadata
    • a file format for MIAPA documents
  2. What software support might entail
    • interactive software to facilitate creation of MIAPA-compliant documents
    • a relational mapping of the MIAPA standard
    • a formal taxonomy or ontology of metadata terms
  3. What it might take to get there
    • a working group with external funding
    • a consortium with representatives from data resources, publishers, researchers, and programmers
    • user testing at scientific conferences
    • multiple rounds of revision
  4. What would ease the burden on scientists (i.e., this is the goal behind the "minimal" in MIAPA)?
    • fewer categories of metadata
    • fewer arbitrary restrictions on format
    • familiarity of metadata concepts
    • software support for annotation with a controlled vocabulary
  5. What makes data reusable?
    • standard formats
    • validation
    • provenance, ideally, provenance that can be traced automatically
    • description of methods sufficient to reproduce results from data

Knowledge Capture Exercise

We imagine a Knowledge-capture-and-user-testing exercise along the following lines of the following experiment described in the abstract of Good, et al. 2006:

During two days at a conference focused on circulatory and respiratory health, 68 volunteers untrained in knowledge
engineering participated in an experimental knowledge capture exercise. These volunteers created a shared
vocabulary of 661 terms, linking these terms to each other and to a pre-existing upper ontology by adding 245
hyponym relationships and 340 synonym relationships. While ontology-building has proved to be an expensive
and labor-intensive process using most existing methodologies, the rudimentary ontology constructed in this
study was composed in only two days at a cost of only 3 t-shirts, 4 coffee mugs, and one chocolate moose.
The protocol used to create and evaluate this ontology involved a targeted, web-based interface. The design
and implementation of this protocol is discussed along with quantitative and qualitative assessments of the
constructed ontology.

Our plan would be to use a conference (ideally, the upcoming 2008 Evolution meeting, though its a bit soon) to gather data and to begin developing infrastructure for a MIAPA standard:

  1. develop an initial ontology framework
    • develop a quick-and-dirty ontology for the MIAPA data and metadata categories
    • identify other artefacts (ontologies, taxonomies) that can provide needed terms
    • add (to the MIAPA ontology) a larger list of domain-specific terms for
      • phylogenetic analysis software (list from Joe F's web site or TreeTapper
      • file formats (see list in BioPerl docs)
      • alignment software
  2. develop an interactive graphical tool for constructing MIAPA annotations
    • use an existing framework such as Phenote
    • load vocabulary terms from ontologies identified in step 1
    • provide term-completion based on the loaded vocabularies
    • provide slots for specific types of MIAPA annotations
  3. carry out a preliminary round of in-house testing and revision
  4. identify a target group of potential users, e.g.,
    • those that have published a paper with the term "phylogeny"
    • those attending a scientific meeting on phylogenetics
    • those that use a particular archive or piece of software
  5. engage the users as participants in testing and knowledge capture
    • request users to generate MIAPA-compliant annotations for actual or hypothetical data sets
    • provide incentives.