Son of Evoinfo

From Evolutionary Informatics Working Group
Revision as of 11:14, 24 June 2009 by (talk) (Introduction and Goals: futz)
Jump to: navigation, search

A New Working Group Proposal

This page will serve as the breeding ground for a proposal to NESCent to establish a new working group from the ashes of Evoinfo, whose work led to the establishment of the NeXML/CDAO/PhyloWS stack, and culminated in the Database Interoperability Hackathon.

Proposal deadline: July 10, 2009 (see guidelines)

Proposal details: "Proposals for working groups are short, not to exceed 5 single-spaced (12-pt type) pages (not including budgets or CVs)." (see instructions)

Proposal Sections

Here are the required sections of the proposal, as laid out in the instructions.

Title (80 characters max)

Working Group on STandards for Data and InterOperability (stdio)

  • just a suggestion... --maj

Short Title (25 characters max)

Data Standards & Interop

Name and contact information for Project Leader, and any Co-Leaders

Project Summary (250 words max)

("appropriate for public distribution on the NESCent web site")

Introduction and Goals

("What important scientific problem in evolutionary biology is being addressed?")

Every facet of evolutionary biology, whether systematics, biogeography, paleontology, or comparative genomics, has been profoundly affected and sometimes transformed by the availability of exponentially expanding online data resources. New synthetic work, impossible only a few years ago, is now being done routinely, and even automatedly, through the use of software tools that organize, integrate, and analyze publicly available evolutionary data.

However, the key to the successes already achieved in synthetic evolutionary biology does not lie intrinsically with the sophisticated databases and software tools that enable the underlying analyses. Advances in evolutionary synthesis emerge from the concerted effort and intelligence of multiple scientists across disciplines focused on a single problem using a common data resource. Data standards and software tools that specify, enforce, and promulgate those standards act as powerful lenses that efficiently concentrate the intellectual efforts of many individuals on complex problems and their associated data. The goal of this working group is to uncover ways--educational, promotional, and technical--to increase the usability and community uptake of an increasingly influential set of recently developed evolutionary data standards.

From 2006 to 2009, The NESCent-sponsored Evolutionary Informatics Working Group (Evoinfo) successfully developed and implemented a workable foundation for evolutionary data exchange and web interoperability: the Evolinfo Stack. The stack consists of three tightly integrated parts: syntactic (NeXML), semantic (CDAO), and operational (PhyloWS). A series of three Working Group meetings brought together national and international experts and developers to identify key data representation requirements, brainstorm and catalog important use cases, prioritize and engage in code development, and otherwise establish the technical, documentational, and intellectual infrastructure necessary to create a working stack.

Evolinfo culminated in the Database Interoperability Hackathon, which brought together a highly diverse group of data providers and developers, representing several major branches of evolutionary biology, to create prototype software solutions based on the Evolinfo Stack. The Hackathon was by many measures a great success, but in particular, working open-source resources and applications were created for five important areas of evolutionary analysis: taxonomic intelligence (syntax and protocols for resolving taxonomies); high-level application program interfaces (APIs) for syntax (a lightweight, high level Java API for NeXML); a high-level API for semantics (semantic API for CDAO, for extracting NeXML-encoded semantics into triple stores); a high-level API for implementing evolutionary web services (PHYLR, a PhyloWS-based high level API with adaptors); and tools for visualization (Web GUI for tree viewing and editing).

The mandate of the StDIo Working Group will be to determine, prioritize, and implement ways to put the Evolinfo Stack and these new software tools into production, and promote wide use of the stack by major data providers and open-source bioinformatics projects. Technical refinements to the stack are still under way, and these will fall within the purview of StDIo. However, the success of Evolinfo and the Interoperability Hackathon at bringing together a critical mass of highly motivated developers, who are continuing to work on projects begun at the Hackathon, and who are bringing new talent onto these projects, will help to insure that technical improvements will continue independently. The major goals of StDIo will include a focus on the following aspects of standards uptake:

Documentation: One of the first goals must be to improve the documentation of the stack components and related tools, to lower the barrier to entry, establish user expectations, and create a "common sense" within the user community about the APIs that implement the standards. This goal encompasses both content and web access and design. StDIo will prioritize and delegate work towards the improvement of stack documentation.

Community: Momentum behind ongoing adoption and development of the Evolinfo Stack must depend on a growing and independently interacting user base. Along with documentation improvement, StDIo will explore the establishment a responsive Help Desk facility that can field questions and funnel bug and enhancement requests to the right developer groups. This in itself will be a catalyst for user community development.

Dissemination: Stakeholders in the stack need to take the show on the road, to convince the larger evolutionary community of its need for these standards, and inform them of how the Evolinfo Stack can meet that need. StDIo will work to prioritize the methods (papers, websites, abstracts) and the venues (meetings, workshops, listservs, publications) through which to direct limited resources into the promotion of the stack.

Installation: We believe the stack will only be useful at its broadest and most complete when a core of major data providers implements web services based on PhyloWS. This will enable the establishment of Evolinfo Stack-based computable workflows. Our opinion is that some of the largest scientific advances will emerge only from very large and automated analyses only possible via workflow technology. StDIo will encourage and outline the development of online installation and training materials, and host at least one open training workshop.

Resources: The goals of StDIo include emphases on support and training, activities which require time and money. StDIo will also explore ways and means to obtain these resources from external entities for the future.

Proposed Activities

("What specific data and analytical tools will be used? How will synthesis occur?")

Names of Proposed Participants

("Indicate which individuals are already committed to participating. Not all participants need to be specified in advance; if unspecified, the type of expertise needed should be indicated. Working Groups should include some participants with appropriate analytical and/or IT expertise. If non-US participants are proposed, briefly describe why their participation is essential to the success of the project. Please review our NESCent Travel Arrangements and NSF/NESCent Travel Policy, and, in particular, information on non-US visitors and international airfare.")

  • Please add yourselves to this list; this will also serve as an indication of interest in participation in the group. Use my entry as a template. Thanks! ---maj
  1. Mark A. Jensen, PhD; Fortinbras Research/BioPerl; email
  2. Enrico Pontelli, PhD; New Mexico State University/Computer Science; email

Rationale for NESCent support

("Why can this activity be most effectively conducted at NESCent?")

Anticipated IT Needs

("Briefly describe any needs for IT support that are important to the success of the proposed project. Please indicate whether long-term maintenance of a public database will be expected.")

Proposed Timetable

("include Start Date month and year, number of meetings, and length of each meeting")

Anticipated Results

("include anticipated papers, data and software products, and anticipated public release of data and products")

Short CV of Project Leaders (2 pages for each)

("Do not include talks, society memberships, nor papers in preparation.")

  • These will be added to the final document. --maj