From Evolutionary Informatics Working Group
Revision as of 01:42, 30 May 2008 by (talk) (Our notes from the working group meeting)
Jump to: navigation, search


The working group wishes to develop a strategy to reach "critical mass" for its main artefacts, nexml and CDAO. Reaching critical mass would mean that there are sufficiently many operations and resources that rely on an artefact to ensure that the self-interested researcher (or the lazy programmer) embraces the artefact simply because its useful and not just for some intangible long-term benefit of "interoperability".

In other words, we need to create "carrots", not just "sticks" (the main "stick" is that users may be forced to use nexml if it is the only way to get MIAPA compliance or to deposit data in a repository).

When we constituted the working group in 2006, we clearly were targeting developers of phylogenetics applications software. This might have been the best way to start developing the technology for interoperability-- artefacts such as nexml and CDAO-- but now we are faced with a different problem of promulgating these technologies. Targeting applications is not creating very many carrots for users. Targetting data resources might be more effective.

Our notes from the working group meeting

Data resource interop project

  • rationale
    • need to develop critical mass for nexml-cdao (critical mass of community involvement, use-cases, eyes on code)
    • hard to get critical mass by targeting app developers
    • hard to show value by targeting app developers
    • strategy of targeting data resources puts data in hands of users
    • however there also are carrots for db maintainers:
      • interchange with other sources
      • common treatment of keys such as taxonomy
  • data resources to target (see Data Resources page)
  • implementation strategy
    1. recruit participants
      • possible leadership from Hilmar, other suggestions (Rod Page, Bill Piel, Encyc of Life, TOL)
      • other participants drawn from projects above
    2. develop project plan
      • consider implementing a coordinated db with families from many sources (similar philosophy to InterPro for protein family alignment dbs?)
      • CDAO to providing constraining vocabulary for nexml schema
        • disagreement over whether this is a good idea
        • to succeed, we must provide users a way to expand nexml, thus to expand cdao
        • mechanism is the feature request, e.g., I want to represent Bremer values
          1. user contacts about how to represent new type of data
          2. if a formal mechanism is desired but does not exist
          3. user is referred to cdao feature request (must provide searchable interface and front page link to feature request form)
          4. cdao developers are obliged to respond
    3. get support
      • NSF grant
      • NESCent-sponsored hackathon

More notes, from telecon with Hilmar and Arlin (5/29/08)

Outline of the basic plan

  1. expand CDAO to support more metadata
  2. map CDAO to a relational schema (CDAO already is mapped to nexml and NEXUS)
  3. develop a database interface that has some nice features
    • better query interface than TreeBase or Pandit because it has more access to semantics
    • output in rich, structured format (nexml)
    • some nice integrated tools such as ATV and jalview
  4. pick two or three data resource managers to come work with us to develop a mapping, so that the content of their resources can be uploaded automatically into the database application via nexml-CDAO
  5. finally, hold a hackathon where we bring in other data resource managers to create their own mappings
  • the goal is to develop an integration platform
    • integrate taxonomy, phylogeny, gene family databases
  • data resource manager can take existing content, load it into db package, and automatically get the cool features
    • taxonomic links
    • standard services API
    • allows data submission, returns data in nexml or equiv
    • accepts metadata and transmits it
    • reasons over metadata, e.g., validate MIAPA compliance
  • NSF grant
    • Advances in Biological Informatics (due August 12)
    • program has explicit shift in focus away from database creation . . .
    • . . . but that's ok because we won't focus on One Database That Rules Them All, but on the infrastructure that supports them all
  • questions
    • isn't this proposing the same thing as pPOD?
    • apparent overlap with pPOD is strong; need buy-in from them
    • do we need a pilot project to get preliminary results?
    • do we need to hire interface programmers to do this? A: can do it by contract, at least initially