Difference between revisions of "NSF Interop Proposal"

From Evolutionary Informatics Working Group
Jump to: navigation, search
(Senior Personnel)
Line 44: Line 44:
== Senior Personnel ==
== Senior Personnel ==
# Enrico Pontelli, New Mexico State University, Computer Science
# Enrico Pontelli, New Mexico State University, Computer Science
# Rutger Vos, University of British Columbia, Zoology
== Synopsis for collaborators ==
== Synopsis for collaborators ==

Revision as of 16:16, 25 June 2009

Overview and talking points

Through the work of NESCent's informatics staff, the Evolutionary Informatics working group, and the participants in the recent data interop hackathon, we are in a position to apply for an NSF Interop proposal.

This funding program provides 250 K per year to support interoperability projects that are multidisciplinary and that have a community aspect and a technology aspect. The next deadline (possibly the last deadline for this program) is July 23.

What makes us competitive:

  • our past success in developing interop technologies nexml, CDAO and PhyloWS
  • the 3-part interop formula of data syntax (nexml), semantics (CDAO) and services (phyloWS)
  • our past success in actual demonstration projects that show off interop technology
  • our demonstrated commitment to including diverse projects
  • our connections with a network of researchers, programmers, and data providers

Planning documents

What we need to do

The first priorities are to work out the scope of the project, and major aims that are consistent with

  1. who we are, what we've done and what we want to do, and
  2. what is required for a successful NSF interop grant, and what kind of support the program provides

A key concept for the proposal will be community involvement. We have developed nexml, CDAO and phyloWS with the aim of serving community interop needs. However, so far these tools are limited in their use. Lets imagine some future point where these are full-fledged community resources, widely supported in the phylogenetics community (like BioPerl is now), with

  • many people involved in development (i.e., many "eyes on code")
  • documentation and training resources readily available for anyone who wants to learn
  • many people trained to use the tools
  • many research projects willing to contribute to maintaining and improving these tools
  • symposia and satellite conferences at major meetings

How do we get to this point? We need to convince people that our tools (CDAO, nexml, phyloWS) are effective and open, and that they have sufficient "critical mass" to represent a safe technology investment, something that will continue to be supported and be useful in the future. BioPerl, for instance, is used in various genome projects and, for this reason, it won't be abandoned anytime soon-- it has critical mass. I think we can show that our tools are effective with demonstration projects, but in order to get critical mass we need to build a community of supporters and participants, so that these tools become community resources.

The NSF Interop program can help us get to that point. It will provide support for meetings and workshops, along with a modest amount of support for technical staff. The staff support could be used to pay programmers to develop the tools that support nexml, CDAO and phyloWS. We could focus this technical support on

  • one or a few integrative projects that we would implement in order to showcase the technologies
  • generalized support for many projects carried out individually by members of the collaborative

There are a few advantages to the latter approach. First, we will be more focused on the standards and less on the final product. This will help to distinguish us from projects like the iPlant Tree of Life. Second, working with many projects will ensure that our solutions are generalized, rather than being biased by the choice of a few data types/ providers for a showcase project.

In addition to workshops and meetings aimed at development of standards and tools, we should also aim for some training workshops:

  • for data providers (how do I share my data?)
  • for data users (what data is available and how do I find / obtain it?). This could be an independent workshop, or integrated into an existing program (Woods Hole Molecular Evolution, Bodega Bay Phylogenetics, Computational Phyloinformatics at NESCent, etc).


  1. Karen Cranston, EOL and Field Museum of Natural History (I would be coming at this from the perspective of both a provider (PhyLoTA) and user (EOL, Treeviz working group) of phylogenetic data. I am officially working with the EOL, and also have a connection to the iPlant Tree of Life group, both of which are going to need these tools.)

Senior Personnel

  1. Enrico Pontelli, New Mexico State University, Computer Science
  2. Rutger Vos, University of British Columbia, Zoology

Synopsis for collaborators

(This is the synopsis to use in requests for letters of support)

List of collaborating projections and institutions

Some of this can be drawn from the hackathon wiki

Initial draft of NSF Proposal

Project Summary

Project Description

What are the collaborative aspects of this project?



Results from past research

Research Design

training workshops and other meetings

software development

Use cases

Budget justification

  • meeting costs
  • staffing costs
    • software design and implementation
    • use case testing