Son of Evoinfo

From Evolutionary Informatics Working Group
Revision as of 15:45, 11 July 2009 by (talk) (Proposed Activities)
Jump to: navigation, search

A New Working Group Proposal

This page will serve as the breeding ground for a proposal to NESCent to establish a new working group from the ashes of Evoinfo, whose work led to the establishment of the NeXML/CDAO/PhyloWS stack, and culminated in the Database Interoperability Hackathon.

Proposal deadline: July 10, 2009 (see guidelines)

Proposal details: "Proposals for working groups are short, not to exceed 5 single-spaced (12-pt type) pages (not including budgets or CVs)." (see instructions)

Proposal Sections

Here are the required sections of the proposal, as laid out in the instructions.

Title (80 characters max)

Working Group on STandards for Data and InterOperability (stdio)

(prior group title was "Evolutionary Informatics: Supporting Interoperability in Evolutionary Analysis")

  • just a suggestion... --maj
  • "most people are non-programmers and won't get the reference to the UN*X name for a library of core input/output methods..." --arlin

some others:

Data Interoperability and Standards (DAIS)

Data Interoperability and Standards in Evolutionary Biology (DaIS-E)

Evolutionary Data Standards and Interoperability (EDaSI)

Promoting the EvolInfo Standards Stack (PRESS or PREIS)

Short Title (25 characters max)

Data Standards & Interop

Name and contact information for Project Leader, and any Co-Leaders

Project Summary (250 words max)

("appropriate for public distribution on the NESCent web site")

Advances in evolutionary synthesis emerge from the concerted effort and intelligence of multiple scientists across disciplines, who are focused on a single problem using a common data resource. Data standards and software tools that specify, enforce, and promulgate those standards act as powerful lenses that efficiently concentrate the intellectual efforts of many individuals on complex problems and their associated data. The goal of the ACRONYM working group is to uncover and implement ways--educational, promotional, and technical--to increase the usability and community uptake of an increasingly influential set of recently developed evolutionary data standards. This standard is the EvoInfo Stack, which includes syntactic (NeXML), semantic (CDAO), and operational (PhyloWS) components working together to provide large-scale, computable interoperability among evolutionary data providers and analysts.

Introduction and Goals

("What important scientific problem in evolutionary biology is being addressed?")

Every facet of evolutionary biology, whether systematics, biogeography, paleontology, or comparative genomics, has been profoundly affected and sometimes transformed by the availability of exponentially expanding online data resources. New synthetic work, impossible only a few years ago, is now being done routinely, and even automatedly, through the use of software tools that organize, integrate, and analyze publicly available evolutionary data.

However, the key to the successes already achieved in synthetic evolutionary biology does not lie intrinsically with the sophisticated databases and software tools that enable the underlying analyses. Advances in evolutionary synthesis emerge from the concerted effort and intelligence of multiple scientists across disciplines focused on a single problem using a common data resource. Data standards and software tools that specify, enforce, and promulgate those standards act as powerful lenses that efficiently concentrate the intellectual efforts of many individuals on complex problems and their associated data. The goal of this working group is to uncover ways--educational, promotional, and technical--to increase the usability and community uptake of an increasingly influential set of recently developed evolutionary data standards.

From 2006 to 2009, The NESCent-sponsored Evolutionary Informatics Working Group (Evoinfo) successfully developed and implemented a workable foundation for evolutionary data exchange and web interoperability: the EvoInfo Stack. The stack consists of three tightly integrated parts: syntactic (NeXML), semantic (CDAO), and operational (PhyloWS). A series of three Working Group meetings brought together national and international experts and developers to identify key data representation requirements, brainstorm and catalog important use cases, prioritize and engage in code development, and otherwise establish the technical, documentational, and intellectual infrastructure necessary to create a working stack.

EvoInfo culminated in the Database Interoperability Hackathon, which brought together a highly diverse group of data providers and developers, representing several major branches of evolutionary biology, to create prototype software solutions based on the EvoInfo Stack. The Hackathon was by many measures a great success, but in particular, working open-source resources and applications were created for five important areas of evolutionary analysis: taxonomic intelligence (syntax and protocols for resolving taxonomies); high-level application program interfaces (APIs) for syntax (a lightweight, high level Java API for NeXML); a high-level API for semantics (semantic API for CDAO, for extracting NeXML-encoded semantics into triple stores); a high-level API for implementing evolutionary web services (PHYLR, a PhyloWS-based high level API with adaptors); and tools for visualization (Web GUI for tree viewing and editing).

The mandate of the ACRONYM Working Group will be to determine, prioritize, and implement ways to put the EvoInfo Stack and these new software tools into production, and promote wide use of the stack by major data providers and open-source bioinformatics projects. Technical refinements to the stack are still under way, and these will fall within the purview of ACRONYM. However, the success of EvoInfo and the Interoperability Hackathon at bringing together a critical mass of highly motivated developers, who are continuing to work on projects begun at the Hackathon, and who are bringing new talent onto these projects, will help to insure that technical improvements will continue independently. The major goals of ACRONYM will include a focus on the following aspects of standards uptake:

Documentation: One of the first goals must be to improve the documentation of the stack components and related tools, to lower the barrier to entry, establish user expectations, and create a "common sense" within the user community about the APIs that implement the standards. This goal encompasses both content and web access and design. ACRONYM will prioritize and delegate work towards the improvement of stack documentation.

Community: Momentum behind ongoing adoption and development of the EvoInfo Stack must depend on a growing and independently interacting user base. Along with documentation improvement, ACRONYM will explore the establishment a responsive Help Desk facility that can field questions and funnel bug and enhancement requests to the right developer groups. This in itself will be a catalyst for user community development. We also intend to explore and implement ways to disseminate promotional and training resources for the nascent community, including e.g., widely accessible presentations and workshops.

Reference Implementations: We believe the stack will only be useful at its broadest and most complete when a core of major data providers implements web services based on PhyloWS. This will enable the establishment of EvoInfo Stack-based computable workflows. Our opinion is that some of the largest scientific advances will emerge only from very large and automated analyses only possible via workflow technology. To this end, we will seek to engage at least three major data and analysis providers in an effort to implement a functional, cross-provider test implementation of the EvoInfo Stack.

Proposed Activities

("What specific data and analytical tools will be used? How will synthesis occur?")

The working group will perform most of its work in five-day meetings twice a year over two years. Some of that work will be focused on planning and helping to host events in the service of the Working Group's goals. We plan to submit requests to NESCent for support for these events, as did the EvoInfo Working Group. These events are likely to include at least one visiting scholar session, to bring a small number (2-3) of specialist programmers and scientists together for 1-2 weeks to work out the foundation of an online community presence for the EvoInfo Stack. We also will seek to plan a "Doc-a-thon", to be held after the online foundation is laid, bringing 5-7 individuals together to fill in EvoInfo Stack documentation, ensure that stack components are available easily and coherently online, unify the presentation and establish protocols for documentation updates.

If funds are available, we would use at least part of a later Working Group meeting to plan and organize an EvoInfo Stack workshop. Assuming that user interest has grown, we would be in a position to invite representatives of the most active participants in the community to this event, which would consist of talks, how-to sessions, and community dialogue.

Promoting and advising reference implementations is a key goal of the Working Group. We believe that, although this tends more to 'social engineering' than software engineering, we can specify deliverables from our reference implementors that are concrete, that will provide the Working Group with outcomes that are definable and measurable. For example, we may require that any candidate implementor be able to accept and emit RDF-annotated NeXML under the PhyloWS protocol after EvoInfo Stack implementation. Establishing a clear and reachable benchmark like this will allow the Working Group to approach potential collaborators with a concrete plan, allow collaborators to assess the doability of the implementation, and provide a goal toward which work can be oriented.

  • Meeting #1
State of the Stack address; Technical priorities and plans; Dissemination priorities and plans
Discussion and planning of reference implementations
  • Meeting #2
State of the Stack address; ...
EvoInfo Doc-a-thon
  • Meeting #3
State of the Stack address; ...
EvoInfo Stack Workshop : final 2 days of the second 5-day meeting
(along with ACRONYM members, some key invitees to this in "markets" we want to "penetrate")
  • Meeting #4
State of the Stack address; Reports, future directions

Other possibilities:

  • Bio* Hackathon / invited/interested reps from BioPerl/Java/Python/++
  • perhaps co-sponsored by the Open Bioinformatics Foundation
  • emphasis on EIStack, but not confined to this
  • Virtual Doc-a-thon

Names of Proposed Participants

("Indicate which individuals are already committed to participating. Not all participants need to be specified in advance; if unspecified, the type of expertise needed should be indicated. Working Groups should include some participants with appropriate analytical and/or IT expertise. If non-US participants are proposed, briefly describe why their participation is essential to the success of the project. Please review our NESCent Travel Arrangements and NSF/NESCent Travel Policy, and, in particular, information on non-US visitors and international airfare.")

  1. Mark A. Jensen, PhD; Fortinbras Research/BioPerl; email
  2. Enrico Pontelli, PhD; New Mexico State University/Computer Science; email
  3. Rutger A. Vos, PhD; University of British Columbia/NeXML; email
  4. Arlin Stolzfus, PhD; University of Maryland/NIST; email

Rationale for NESCent support

("Why can this activity be most effectively conducted at NESCent?")

NESCent is a major stakeholder in the EvoInfo Stack, having supported the Evolutionary Informatics Working Group not only with money, space, and IT resources, but also with the intellectual contributions and time of its own personnel. The stack is as much a product of NESCent's unique contributions as of any of the institutions represented by EvoInfo's members, and we believe NESCent stands to gain as much as any institution by widespread adoption of the stack's standards.

NESCent's physical and IT facilities are well-suited to the kind of collaborative work ACRONYM will pursue, both on-site and remotely. In addition, though an important stakeholder in the EvolInfo project, NESCent remains an institution-neutral place to hold frank discussions among individuals who represent possible competitors. The compromise and sucessful integration of two competing Java APIs for NeXML through the dialogue and collaboration of the Database Interoperability Hackathon is one example of how NESCent is able to provide the right setting and the right resources to pave the way for synthetic work. We feel that NESCent sponsorship of ACRONYM would help keep up the collaborative momentum it helped establish for the EvoInfo Working Group.

Anticipated IT Needs

("Briefly describe any needs for IT support that are important to the success of the proposed project. Please indicate whether long-term maintenance of a public database will be expected.")

Proposed Timetable

Starting in spring of 2010, the ACRONYM group will meet twice a year for two years, with the exact schedule to be determined later. Each meeting will last for 3-5 days and include presentations, discussion, programming, and progress reports.

Anticipated Results

("include anticipated papers, data and software products, and anticipated public release of data and products")

Short CV of Project Leaders (2 pages for each)

("Do not include talks, society memberships, nor papers in preparation.")

  • These will be added to the final document. --maj