Son of Evoinfo

From Evolutionary Informatics Working Group
Revision as of 01:08, 25 June 2009 by (talk) (Title (80 characters max))
Jump to: navigation, search

A New Working Group Proposal

This page will serve as the breeding ground for a proposal to NESCent to establish a new working group from the ashes of Evoinfo, whose work led to the establishment of the NeXML/CDAO/PhyloWS stack, and culminated in the Database Interoperability Hackathon.

Proposal deadline: July 10, 2009 (see guidelines)

Proposal details: "Proposals for working groups are short, not to exceed 5 single-spaced (12-pt type) pages (not including budgets or CVs)." (see instructions)

Proposal Sections

Here are the required sections of the proposal, as laid out in the instructions.

Title (80 characters max)

Working Group on STandards for Data and InterOperability (stdio)

(prior group title was "Evolutionary Informatics: Supporting Interoperability in Evolutionary Analysis")

  • just a suggestion... --maj
  • "most people are non-programmers and won't get the reference to the UN*X name for a library of core input/output methods..." --arlin

some others:

Data Interoperability and Standards (DAIS)

Data Interoperability and Standards in Evolutionary Biology (DaIS-E)

Evolutionary Data Standards and Interoperability (EDaSI)

Promoting the EvolInfo Standards Stack (PRESS or PREIS)

Short Title (25 characters max)

Data Standards & Interop

Name and contact information for Project Leader, and any Co-Leaders

Project Summary (250 words max)

("appropriate for public distribution on the NESCent web site")

Advances in evolutionary synthesis emerge from the concerted effort and intelligence of multiple scientists across disciplines, who are focused on a single problem using a common data resource. Data standards and software tools that specify, enforce, and promulgate those standards act as powerful lenses that efficiently concentrate the intellectual efforts of many individuals on complex problems and their associated data. The goal of the StDIo ("stadio") working group is to uncover and implement ways--educational, promotional, and technical--to increase the usability and community uptake of an increasingly influential set of recently developed evolutionary data standards. This standard is the EvoInfo Stack, which includes syntactic (NeXML), semantic (CDAO), and operational (PhyloWS) components working together to provide large-scale, computable interoperability among evolutionary data providers and analysts.

Introduction and Goals

("What important scientific problem in evolutionary biology is being addressed?")

Every facet of evolutionary biology, whether systematics, biogeography, paleontology, or comparative genomics, has been profoundly affected and sometimes transformed by the availability of exponentially expanding online data resources. New synthetic work, impossible only a few years ago, is now being done routinely, and even automatedly, through the use of software tools that organize, integrate, and analyze publicly available evolutionary data.

However, the key to the successes already achieved in synthetic evolutionary biology does not lie intrinsically with the sophisticated databases and software tools that enable the underlying analyses. Advances in evolutionary synthesis emerge from the concerted effort and intelligence of multiple scientists across disciplines focused on a single problem using a common data resource. Data standards and software tools that specify, enforce, and promulgate those standards act as powerful lenses that efficiently concentrate the intellectual efforts of many individuals on complex problems and their associated data. The goal of this working group is to uncover ways--educational, promotional, and technical--to increase the usability and community uptake of an increasingly influential set of recently developed evolutionary data standards.

From 2006 to 2009, The NESCent-sponsored Evolutionary Informatics Working Group (Evoinfo) successfully developed and implemented a workable foundation for evolutionary data exchange and web interoperability: the EvoInfo Stack. The stack consists of three tightly integrated parts: syntactic (NeXML), semantic (CDAO), and operational (PhyloWS). A series of three Working Group meetings brought together national and international experts and developers to identify key data representation requirements, brainstorm and catalog important use cases, prioritize and engage in code development, and otherwise establish the technical, documentational, and intellectual infrastructure necessary to create a working stack.

EvoInfo culminated in the Database Interoperability Hackathon, which brought together a highly diverse group of data providers and developers, representing several major branches of evolutionary biology, to create prototype software solutions based on the EvoInfo Stack. The Hackathon was by many measures a great success, but in particular, working open-source resources and applications were created for five important areas of evolutionary analysis: taxonomic intelligence (syntax and protocols for resolving taxonomies); high-level application program interfaces (APIs) for syntax (a lightweight, high level Java API for NeXML); a high-level API for semantics (semantic API for CDAO, for extracting NeXML-encoded semantics into triple stores); a high-level API for implementing evolutionary web services (PHYLR, a PhyloWS-based high level API with adaptors); and tools for visualization (Web GUI for tree viewing and editing).

The mandate of the StDIo Working Group will be to determine, prioritize, and implement ways to put the EvoInfo Stack and these new software tools into production, and promote wide use of the stack by major data providers and open-source bioinformatics projects. Technical refinements to the stack are still under way, and these will fall within the purview of StDIo. However, the success of EvoInfo and the Interoperability Hackathon at bringing together a critical mass of highly motivated developers, who are continuing to work on projects begun at the Hackathon, and who are bringing new talent onto these projects, will help to insure that technical improvements will continue independently. The major goals of StDIo will include a focus on the following aspects of standards uptake:

Documentation: One of the first goals must be to improve the documentation of the stack components and related tools, to lower the barrier to entry, establish user expectations, and create a "common sense" within the user community about the APIs that implement the standards. This goal encompasses both content and web access and design. StDIo will prioritize and delegate work towards the improvement of stack documentation.

Community: Momentum behind ongoing adoption and development of the EvoInfo Stack must depend on a growing and independently interacting user base. Along with documentation improvement, StDIo will explore the establishment a responsive Help Desk facility that can field questions and funnel bug and enhancement requests to the right developer groups. This in itself will be a catalyst for user community development.

Dissemination: Stakeholders in the stack need to take the show on the road, to convince the larger evolutionary community of its need for these standards, and inform them of how the EvoInfo Stack can meet that need. StDIo will work to prioritize the methods (papers, websites, abstracts) and the venues (meetings, workshops, listservs, publications) through which to direct limited resources into the promotion of the stack.

Installation: We believe the stack will only be useful at its broadest and most complete when a core of major data providers implements web services based on PhyloWS. This will enable the establishment of EvoInfo Stack-based computable workflows. Our opinion is that some of the largest scientific advances will emerge only from very large and automated analyses only possible via workflow technology. StDIo will encourage and outline the development of online installation and training materials, and host at least one open training workshop.

Resources: The goals of StDIo include emphases on support and training, activities which require time and money. StDIo will also explore ways and means to obtain these resources from external entities for the future.

Proposed Activities

("What specific data and analytical tools will be used? How will synthesis occur?")

  • Here is where I (for one) would like to see the ideas of you potential participants. I've listed a possible structure and potential activities. Please comment and make suggestions, either here directly, or on the talk page. --maj

5-day Meetings twice a year over two years. Independent progress in the interim.

  • Meeting #1
State of the Stack address; Dissemination priorities and plans
EvoInfo Stack Doc-a-thon
  • Meeting #2
State of the Stack address; ...
EvoInfo Stack Workshop : final 2 days of the second 5-day meeting
(along with StDIo members, some key invitees to this in "markets" we want to "penetrate")
  • Meeting #3
State of the Stack address; ...
EvoInfo Stack User Group : final 2-3 days of third 5-day meeting
(previous invitees plus interested post-docs/students)
  • Meeting #4
State of the Stack address; Future directions
EvoInfo Stack User Group : final 2-3 days of fourth 5-day meeting
(ditto #3)

Other possibilities:

  • Bio* Hackathon / invited/interested reps from BioPerl/Java/Python/++
  • perhaps co-sponsored by the Open Bioinformatics Foundation
  • emphasis on EIStack, but not confined to this
  • Virtual Doc-a-thon

Names of Proposed Participants

("Indicate which individuals are already committed to participating. Not all participants need to be specified in advance; if unspecified, the type of expertise needed should be indicated. Working Groups should include some participants with appropriate analytical and/or IT expertise. If non-US participants are proposed, briefly describe why their participation is essential to the success of the project. Please review our NESCent Travel Arrangements and NSF/NESCent Travel Policy, and, in particular, information on non-US visitors and international airfare.")

  • Please add yourselves to this list; this will also serve as an indication of interest in participation in the group. Use my entry as a template. Thanks! ---maj
  1. Mark A. Jensen, PhD; Fortinbras Research/BioPerl; email
  2. Enrico Pontelli, PhD; New Mexico State University/Computer Science; email

Rationale for NESCent support

("Why can this activity be most effectively conducted at NESCent?")

NESCent is a major stakeholder in the EvoInfo Stack, having supported the Evolutionary Informatics Working Group not only with money, space, and IT resources, but also with the intellectual contributions and time of its own personnel. The stack is as much a product of NESCent's unique contributions as of any of the institutions represented by EvoInfo's members, and we believe NESCent stands to gain as much as any institution by widespread adoption of the stack's standards.

NESCent's physical and IT facilities are well-suited to the kind of collaborative work StDIo will pursue, both on-site and remotely. In addition, though an important stakeholder in the EvolInfo project, NESCent remains an institution-neutral place to hold frank discussions among individuals who represent possible competitors. The compromise and sucessful integration of two competing Java APIs for NeXML through the dialogue and collaboration of the Database Interoperability Hackathon is one example of how NESCent is able to provide the right setting and the right resources to pave the way for synthetic work. We feel that NESCent sponsorship of StDIo would help keep up the collaborative momentum it helped establish for the EvoInfo Working Group.

Anticipated IT Needs

("Briefly describe any needs for IT support that are important to the success of the proposed project. Please indicate whether long-term maintenance of a public database will be expected.")

Proposed Timetable

Starting in spring of 2010, the StDio group will meet twice a year for two years, with the exact schedule to be determined later. Each meeting will last for 3-5 days and include presentations, discussion, programming, and progress reports.

Anticipated Results

("include anticipated papers, data and software products, and anticipated public release of data and products")

Short CV of Project Leaders (2 pages for each)

("Do not include talks, society memberships, nor papers in preparation.")

  • These will be added to the final document. --maj