Supporting MIAPA

From Evolutionary Informatics Working Group
Jump to: navigation, search

Note: see the draft MIAPA_WhitePaper for a plan for NESCent involvement in a MIAPA project.


Domain scientists with an interest in the archiving and re-use of phylogenetic data have called for a reporting standard designated "Minimal Information for a Phylogenetic Analysis", or MIAPA (Leebens-Mack, et al. 2006). Ideally the research community would develop, and adhere to, a standard that imposes a minimal reporting burden yet ensures that the reported data can be interpreted and re-used. Such a standard might be adopted by

  • pipeline projects that generate phylogenetic data sets for downloading and re-use (e.g., TreeFam, Pandit, Hovergen,PhylomeDB)
  • repositories and databases designed to archive published data (e.g., TreeBASE, Dryad)
  • journals that publish supplementary material for phylogenetic studies (e.g., MBE, Systematic Biology)
  • granting organizations that support phylogenetic studies (e.g., NSF)
  • organizations that develop taxonomic nomenclature based on phylogenetic results

Currently MIAPA is hypothetical and aspirational, i.e., no standard has been adopted nor even developed. As a starting point, Leebens-Mack, et al. suggest that a study should report objectives, sequences, taxa, alignment method, alignment, phylogeny inference method, and phylogeny.

The MIAPA concept clearly aligns with the interoperability mandate of the NESCent evolutionary informatics working group, e.g., data re-use (the primary goal of MIAPA) is a desideratum of interoperability. Development of a MIAPA standard could synergize with ongoing projects and long-term goals of the working group. To achieve re-use through compliance with reporting standards, we need to develop technology that makes it easy to comply with the standard, e.g., a nice GUI that makes it easy for users to construct a MIAPA-compliant submission. To support re-use through data-mining or reasoning (on MIAPA-compliant reports), we need a controlled vocabulary, ideally an ontology. Developing such an ontology not only would jump-start the MIAPA project, it also would contribute to our efforts to develop a language to describe Transition Models, and it would represent a step in the direction of our long-term goal of developing a domain-specific language for phylogenetic analysis.

Some thoughts on developing MIAPA

Leebens-Mack, et al. called for further work, attempting to attract attention to this idea in order to stimulate effort. However, there has been no further effort to develop MIAPA. The NESCent evolutionary informatics working group invited Dr. Leebens-Mack to speak at our recent meeting, and there was general agreement with the value of developing a MIAPA standard, and with the importance of ensuring that the interoperability artefacts developed by the group -- nexml and CDAO -- provide a means of MIAPA compliance.

After the working group meeting officially ended, a few members (Arlin, Brandon, Hilmar) began to discuss what the further development of MIAPA would entail (below), and how we could jump-start the project with a knowledge capture exercise (next section).

  1. What qualities of a submission makes it reusable? (others can replicate, re-purpose, and build on results)
    • data accessible in standard formats
    • capacity for validation
    • provenance, ideally, provenance that can be traced automatically via external references
    • description of methods sufficient to reproduce results from data
    • rich semantics in any descriptions or annotations
  2. What technology and artefacts are required to have an effective MIAPA standard:
    • an explicit (possibly formal) description of the standard, specifying types of data and metadata
      • a controlled vocabulary, possibly even a formal ontology of metadata terms
      • possibly incorporation of data representation standard such as nexml or CDAO
    • explicit conformance policies (separate policies for repositories, data formats, journals, applications software)
    • a file format for MIAPA documents
    • a repository to store MIAPA-compliant entries
    • software tools and libraries to support MIAPA-compliant annotations
      • interactive software to facilitate creation of MIAPA-compliant documents
      • a relational mapping of the MIAPA standard to be used in repositories
      • service to validate MIAPA
  3. What would ease the burden on scientists (i.e., the implicit goal behind the "minimal" in MIAPA)?
    • fewer categories of metadata
    • fewer arbitrary restrictions on format
    • familiarity of metadata concepts
    • flexibility in representation
    • software support for annotation
  4. What logistics might be involved in developing and promulgating the standard
    • a working group with external funding
    • a consortium with representatives from data resources, publishers, researchers, and programmers
    • user testing at scientific conferences
    • collaboration with ontology experts at NCBO
    • multiple rounds of revision
    • workshops (to train users) and hackathons (to develop implementations)

Proof-of-concept (annotation software)

To demonstrate the potential for users to construct their own ontology-based MIAPA annotations, as in the knowledge capture experiment below, we have created a proof-of-concept application by adapting Phenote. The result is an interactive graphical tool for user-generated, ontology-bound MIAPA annotations.

For implementation details, see below.

screenshot and explanation

The File:Phenote miapa screenshot.pdf shows the following:

  • upper left, editor pane used to create annotations
  • upper right, ontology browser pane showing the MIAPA ontology
  • bottom pane, the table that this user is generating via the editor

resources and DIY instructions


  • config file File:Miapa.cfg.txt that sets up a table and binds it to the miapa ontology (remove ".txt" suffix before use)
  • the miapa.obo ontology (actually a translated version of the OWL source on sourceforge)
  • Phenote

To try this, you don't need to mess with the ontology itself. Just do the following:

  1. install the latest beta version of Phenote:
  2. download the attached File:Miapa.cfg.txt file, change the name to miapa.cfg, and put it in your ~/.phenote/conf directory
  3. choose the miapa config under the settings menu, then re-start Phenote (this is an annoyance of the Phenote interface)

whats right and wrong about the proof-of-concept

  • (good) most fields have a restricted vocabulary bound to an ontology
    • (bad) but the fields are not bound specifically to any class within the ontology
      • we could get around this by creating separate ontologies for each field
      • (good) Phenote can implement restrictions via the use of OBO "categories" (a kind of tag; see below)
        • (bad) but this is a clumsy way of doing something that should be done by logical subsumption within the ontology
  • (good) every time Phenote launches, it will update to the latest version of the miapa ontology
  • (good) the graphical interface is sweet
    • you can drag and drop ontology terms from the ontology browser into the editor!
    • you can start typing and get term-completion!
  • (good) Phenote is portable and relatively easy to install
    • (bad) but it has to be installed (except the webstart version, which is being phased out), so thats a burden on the user
  • (good) Phenote is adaptable
  • (good) Phenote has a "data adaptor" concept (apparently) to communicate with a back-end database
  • (bad) Phenote has no way to validate anything other than ontology terms, e.g., it does not allow an integer entry then check if its a valid integer (or a URI entry and then check if the URI points to a locatable resource)
  • (bad) the proof-of-concept design has no slots for top-level information (author, publication, etc)

Jim Balhoff (his email to Arlin, 6/3/08) provided the following example (from demo.cfg) to explain how to configure Phenote to use tags called "categories" in OBO.

<ns:field name="Abnormal" datatag="Tag" enable="true">
    <ns:ontology name="PATO" slim="abnormal_slim"/>

In his example, its the "slim" attribute that restricts this field named "Abnormal" to PATO (ontology) items tagged with the "abnormal_slim" category (don't be confused by "datatag", thats something else that is irrelevant to this example). If the "slim" attribute were missing, then Phenote would accept any PATO term in this field.

To do this, of course, you first need to create a category in OBO-Edit, then assign the category to the classes that you want, then save your revised OBO ontology.

what could be done to make this work

We can't make this work with Phenote alone. However, we could integrate a webstart version of Phenote into a web application that fills in the gaps in its capabilities. To clarify this example, lets use the following terminology:

  • miapa-maker server that provides
    • back-end operations to validate MIAPA entries, handle term requests, store entries in database
    • web interface for the user to
      • upload data required for MIAPA compliance
      • generate workflow description sufficient for MIAPA compliance
  • miapa-workflow-Phenote: the Phenote configuration that miapa-maker will launch for the user to annotate a workflow

That is, miapa-workflow-phenote will handle only the workflow description part of the annotation, i.e., the series of steps (search, align, infer tree). Everything else (top level stuff, making term requests, attaching data files) will be handled by the miapa-maker web site. The miapa-workflow-Phenote is pretty much just like the proof-of-concept config except

  1. its a lightweight webstart version of Phenote that the user does not have to install
  2. every field bound to MIAPA also may draw from a dummy term-request ontology
    • the term-request ontology is just the MIAPA ontology but with different labels, e.g., PhylogenyMethod is replaced by RequestPhylogenyMethodTerm; AlignmentFormat is replaced by RequestAlignmentFormatTerm
    • this allows the user to make category-specific term requests
    • the user also can request an unclassifiable term by invoking the top level, i.e., RequestMIAPA_ThingTerm
  3. the miapa-workflow-Phenote config has a data adaptor that allows it to export to the miapa-maker server

Now, we combine miapa-workflow-Phenote with our own miapa-maker web site as follows:

  1. user enters miapa-maker web site, asks to generate annotation, is assigned session cookie
  2. user enters top-level info into miapa-maker web form (author, publication, objectives)
  3. user launches webstart miapa-workflow-Phenote to generate "Workflow steps description"
    • user enters steps in Phenote just as in the proof-of-concept
    • if the user can't find a term, the user enters dummy term from dummy ontology, aka, "make_term_request"
    • when finished, user exits miapa-workflow-Phenote
  4. when miapa-workflow-Phenote exits, it sends "Workflow steps description" to the miapa-maker server
  5. miapa-maker receives "Workflow steps description" and responds
    1. validates and processes "Workflow steps description" data
    2. generates HTML showing workflow steps description
    3. identifies term requests and includes text boxes for user to enter term names
    4. provides check-box for user to attach output data files to any step in the workflow
  6. user responds by completing the MIAPA annotation and submitting it for processing
  7. server responds
    • stores the annotation
    • generates a receipt for the user
    • allows the user the option to download the annotation
    • allows the user the option to submit the annotation to a third-party data repository

Plan for a Knowledge Capture Exercise

We imagine a Knowledge-capture-and-user-testing exercise along the following lines of the following experiment described in the abstract of "Fast, Cheap and Out of Control: A Zero Curation Model for Ontology Development" (Good, et al. 2006: File:Good.pdf):

During two days at a conference focused on circulatory and respiratory health, 68 volunteers
untrained in knowledge engineering participated in an experimental knowledge capture exercise.
These volunteers created a shared vocabulary of 661 terms, linking these terms to each other
and to a pre-existing upper ontology by adding 245 hyponym relationships and 340 synonym
relationships. While ontology-building has proved to be an expensive and labor-intensive process
using most existing methodologies, the rudimentary ontology constructed in this study was
composed in only two days at a cost of only 3 t-shirts, 4 coffee mugs, and one chocolate moose.
The protocol used to create and evaluate this ontology involved a targeted, web-based interface.
The design and implementation of this protocol is discussed along with quantitative and qualitative
assessments of the constructed ontology.

Note that it only takes a few t-shirts and coffee mugs to stimulate potential users to log in to a web site and fill out some forms.

Our plan would be to use a conference (ideally, the upcoming 2008 Evolution meeting, though its a bit soon) to gather data and to begin developing infrastructure for a MIAPA standard:

  1. Task 1. develop an initial controlled vocabulary
  2. Task 2. develop and test a web client for interactively constructing MIAPA annotations
    1. Develop the client-side capabilities
      • use an existing cutomizable framework such as Phenote
      • load vocabulary terms from ontologies identified in step 1
      • provide term-completion based on the loaded vocabularies as in Phenote
      • provide slots for specific types of MIAPA annotations
      • provide support for term requests (i.e., user requests a needed term that is not in the controlled vocabulary)
    2. Implement a system to add term requests as provisional classes
    3. carry out a preliminary round of in-house testing and revision to make sure the system works
  3. Task 3. Arrange logistics to deploy the system at the Evolution meeting
    • advertise the exercise
      • send an email to evoldir, to registered participants of the conference, and of the pre-conference Ontology workshop
      • advertise the exercise in person at the Ontology workshop (Todd? Brandon?)
    • assemble a team of problem-solvers to be on call to fix problems during the conference weekend
      • set up a chat for discussions of emerging problems
      • make sure we have a sys admin contact in case of server problems
    • procure rewards (t-shirts, mugs, etc) and make a plan to distribute them at the conference
    • engage online participants in testing and knowledge capture
      • request users to generate MIAPA-compliant annotations for actual or hypothetical data sets
      • provide incentives for the most annotations, or the most new terms
      • provide users with the means to request new terms
      • provide users the means to categorize terms and specify relations

Alternatively, if we can't do this in time for the conference, we could just identify a target group of potential users and go ahead with the same kind of experiment on a longer time-scale. The targeted users might be

  • those that have published a paper with the term "phylogeny"
  • those attending a scientific meeting on phylogenetics
  • those that use a particular archive or piece of software

Follow-up: MIAPA alpha version

  1. Follow up the knowledge capture experiment
    • expert review of the results
    • revisions and additions to ontology
    • proposal of initial version of MIAPA (alpha version)
    • proposal of initial version of file format
    • proposal of initial version of DB schema
    • manuscript describing exercise and result

Further development and promulgation of MIAPA

Supposing that the above experiment is successful and gets some attention, we could move ahead with opening up the development process and involving the research community. The medium-range goal (1 to 2 years) would be to engage the community in developing a beta version of the standard that would support compliant annotation of a majority of phylogenetic analyses. The longer-range goal would be to take the necessary steps to promulgate MIAPA so that it becomes an accepted standard.

MIAPA beta version

  1. Form a consortium
    • Recruit consortium members from diverse constituencies:
      • end-users (the ones who will submit individual reports to repositories)
      • journal editorial boards
      • repository managers
      • pipeline managers
      • software developers
    • Secure funding for consortium operations
    • Implement the infrastructure needed
      • to maintain artefacts (e.g., sourceforge project)
      • for intra-group communication and coordination (email lists, web server, conference calls)
      • for publicity (web server)
  2. Develop beta version of standard
  3. Develop plans for a second round of testing
    • Set a specific milestone for the level of support (e.g., support 75 % of annotations with existing vocabulary)
    • Include testing in several different contexts:
      • mapping the standard to a relational schema in a repository
      • automatically constructing MIAPA-compliant annotations in a pipeline context
      • interactive end-user generation of MIAPA-compliant annotations
  4. Carry out testing
  5. Publish the revised standard

Promulgating MIAPA

  1. develop long-term maintenance plan for standard (decisions, versioning, obsolescence)
  2. develop infrastructure to support long-term maintenance plan
  3. recruit partners to commit to standard