Difference between revisions of "NSF Interop Proposal"

From Evolutionary Informatics Working Group
Jump to: navigation, search
(PIs)
 
(36 intermediate revisions by 6 users not shown)
Line 1: Line 1:
= Overview and talking points =
+
As a followup activity to the successful [[Database Interop Hackathon]], a group of individuals submitted a proposal to NSF for a multi-year $750000 project to fund an [[EvoIO Network]].
  
Through the work of NESCent's informatics staff, the Evolutionary Informatics working group, and the participants in the recent data interop hackathon, we are in a position to apply for an NSF [http://www.nsf.gov/pubs/2007/nsf07565/nsf07565.htm Interop proposal]. 
+
= Background =
  
This funding program provides 250 K per year to support interoperability projects that are multidisciplinary and that have a community aspect and a technology aspect.  The next deadline (possibly the last deadline for this program) is July 23.  
+
Due to the foundation laid (over several years) by many people, including NESCent's informatics staff, the Evolutionary Informatics working group, and the participants in the recent data interop hackathon, we are in a position to apply to the NSF [http://www.nsf.gov/pubs/2007/nsf07565/nsf07565.htm INTEROP] program.  This program provides up to 250 K per year to support a ''data interoperability network''.  The network should be multidisciplinary; the network proposal should have a community aspect and a technology aspect.  The deadline (possibly the last deadline for this program) is July 23.
  
What makes us competitive:  
+
What makes us competitive:
 
* our past success in developing interop technologies nexml, CDAO and PhyloWS
 
* our past success in developing interop technologies nexml, CDAO and PhyloWS
 
* the 3-part interop formula of data syntax (nexml), semantics (CDAO) and services (phyloWS)
 
* the 3-part interop formula of data syntax (nexml), semantics (CDAO) and services (phyloWS)
 
* our past success in actual demonstration projects that show off interop technology
 
* our past success in actual demonstration projects that show off interop technology
* our demonstrated commitment to including diverse projects  
+
* our demonstrated commitment to including diverse projects
* our connections with a network of researchers, programmers, and data providers  
+
* our connections with a network of researchers, programmers, and data providers
  
= Planning documents =
+
Over the course of 4 weeks in June and July, a group of individuals developed a proposal for a data interoperability network focused on trees and associated data and metadata.  Two key features of the proposal are the use of hackathons, and the use of the "EvoIO Stack" (nexml, CDAO, phyloWS) as a technological nucleus for growing an interop network.
  
== What we need to do ==
+
= The Proposal =
  
The first priorities are to work out the scope of the project, and major aims that are consistent with
+
== Project Summary ==
# who we are, what we've done and what we want to do, and
 
# what is required for a successful NSF interop grant, and what kind of support the program provides 
 
 
 
A key concept for the proposal will be '''community involvement'''.  We have developed nexml, CDAO and phyloWS with the aim of serving community interop needs.  However, so far these tools are limited in their use.  Lets imagine some future point where these are full-fledged community resources, widely supported in the phylogenetics community (like BioPerl is now), with
 
* many people involved in development (i.e., many "eyes on code")
 
* documentation and training resources readily available for anyone who wants to learn
 
* many people trained to use the tools
 
* many research projects willing to contribute to maintaining and improving these tools
 
* symposia and satellite conferences at major meetings
 
 
 
How do we get to this point?  We need to convince people that our tools (CDAO, nexml, phyloWS) are '''effective''' and '''open''', and that they have sufficient "critical mass" to represent a safe technology investment, something that will continue to be supported and be useful in the future.  BioPerl, for instance, is used in various genome projects and, for this reason, it won't be abandoned anytime soon-- it has critical mass.  I think we can show that our tools are effective with demonstration projects, but in order to get critical mass we need to build a community of supporters and participants, so that these tools become community resources. 
 
 
 
The NSF  Interop program can help us get to that point.  It will provide support for meetings and workshops, along with a modest amount of support for technical staff.  The staff support could be used to pay programmers to develop the tools that support nexml, CDAO and phyloWS.  We could focus this technical support on
 
* one or a few integrative projects that we would implement in order to showcase the technologies
 
* generalized support for many projects carried out individually by members of the collaborative
 
  
 +
'''INTEROP: A network for enabling community-driven standards to link evolution into the global web of data (EvoIO)'''
  
== PIs ==
+
PI: '''Arlin Stoltzfus''', Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute (CDAO, Bio::NEXUS, Nexplorer). Co-PIs: '''Karen Cranston''', EOL and Field Museum of Natural History; '''Enrico Pontelli''', New Mexico State University, Computer Science (CDAO); '''Sheldon McKay''', Cold Spring Harbor Laboratory (GMOD, modENCODE, iPlant), '''Hilmar Lapp''', NESCent (PhyloWS, BioSQL); '''Nico Cellinese''', University of Florida, Florida Museum of Natural History (TOLKIN, RegNum).
?
 
  
== Senior Personnel ==
+
'''Intellectual Merit'''. Evolutionary trees (“phylogenies”) organize knowledge of biodiversity and
# Enrico Pontelli, New Mexico State University, Computer Science
+
provide a framework for rigorous methods of comparative analysis used throughout the bio-
 
+
sciences. In the past, the scope of a tree-based analysis was limited to the researcher’s “own”
== Synopsis for collaborators ==
+
small data set. The great mass of currently available data makes possible far-reaching and
 
+
systematic analyses, but only if trees (and associated data and metadata) can be accessed,
(This is the synopsis to use in requests for letters of support)
+
searched, retrieved, and repurposed— that is, only if the data are interoperable. An integrated
 
+
solution to this problem requires attention to the syntax and semantics of data, metadata, and
== List of collaborating projections and institutions ==
+
services. Over the past 3 years, an “Evolutionary Informatics” working group funded by NESCent (an NSF Center) developed an interoperability “stack” consisting of NeXML (a file format
 
+
for comparative data), the Comparative Data Analysis Ontology (CDAO) and PhyloWS (a web
Some of this can be drawn from the hackathon wiki
+
services standard). Recently, the group staged a “hackathon” that engaged a fresh group of
 
+
researcher-programmers (chosen to represent community data resources) to learn, apply, and
= Initial draft of NSF Proposal =
+
extend the EvoIO Stack, with results that show the remarkable promise of this approach to
 
+
train early-career scientists, disseminate standards, and improve interoperability. The investigators will build on this approach and on their unique technology and experience to engage a
== Project Summary ==
+
larger community in improving interoperability of trees with associated data and metadata
 +
(e.g., taxonomic affiliations, sources, character data, etc). The EvoIO Network will organize
 +
hackathons, hold training workshops, host working groups, and implement infrastructure for
 +
community-building around emerging standards. Network staff will provide technical expertise
 +
in knowledge representation and bioinformatics, working to support standards and to build
 +
reference implementations. The resulting EvoIO community will extend broadly into
 +
systematics-biodiversity, comparative genomics, and phylogenetics, and will penetrate into
 +
key areas of community ecology, phylogenetic epidemiology and paleobiology.
  
 +
'''Broader impacts'''. The research areas affected by this proposal— all those areas in which
 +
phylogenetic trees are used routinely— are diverse and currently are not unified by professional organizations, software platforms, or standards. By bringing together scientists from
 +
various disciplines, we will develop awareness of the need for standards, cohesion around
 +
preferred approaches to interoperability, and ultimately a broad consensus on specific standards. This will be accomplished by building on the momentum of work done under prior NSF
 +
funding via NESCent. The key to developing a cohesive community in the absence of pre-
 +
existing cohesion is the hackathon mechanism, which generates success stories and arms
 +
young researcher-programmers with the know-how to create further successes. Through this
 +
mechanism, user requirements will be translated into standards and specifications, and implemented in community software tools. Reference Implementations (developed concurrently
 +
with standards and specifications) will be used to aid in standards development and training.
 +
Hackathons will take place in eastern, western, and central locations to maximize diversity in
 +
impact, and will include strategically selected participants as well as a large fraction of participants chosen in response to a broad solicitation in the biodiversity, systematics, genomics,
 +
and phylogenetics communities.  Standards and specifications developed by the Network will
 +
be disseminated via the relevant international standards group (the TDWG Phylogenetics
 +
Standard Interest Group).  Efforts will be made to integrate ideas from this project into existing
 +
educational and outreach programs, with particular focus on involving students from NMSU (a
 +
minority-serving institution).
  
 
== Project Description ==
 
== Project Description ==
  
What are the collaborative aspects of this project? 
+
The [[Media:ProjectDescription_opt.pdf|Project Description]] is available as a PDF.
 
 
=== Aims ===
 
 
 
 
 
=== Background ===
 
 
 
=== Results from past research ===
 
 
 
=== Research Design ===
 
 
 
==== training workshops and other meetings ====
 
 
 
==== software development ====
 
 
 
==== Use cases ====
 
 
 
=== Budget justification ===
 
 
 
* meeting costs
 
* staffing  costs
 
** software design and implementation
 
** use case testing
 

Latest revision as of 11:41, 8 October 2009

As a followup activity to the successful Database Interop Hackathon, a group of individuals submitted a proposal to NSF for a multi-year $750000 project to fund an EvoIO Network.

Background

Due to the foundation laid (over several years) by many people, including NESCent's informatics staff, the Evolutionary Informatics working group, and the participants in the recent data interop hackathon, we are in a position to apply to the NSF INTEROP program. This program provides up to 250 K per year to support a data interoperability network. The network should be multidisciplinary; the network proposal should have a community aspect and a technology aspect. The deadline (possibly the last deadline for this program) is July 23.

What makes us competitive:

  • our past success in developing interop technologies nexml, CDAO and PhyloWS
  • the 3-part interop formula of data syntax (nexml), semantics (CDAO) and services (phyloWS)
  • our past success in actual demonstration projects that show off interop technology
  • our demonstrated commitment to including diverse projects
  • our connections with a network of researchers, programmers, and data providers

Over the course of 4 weeks in June and July, a group of individuals developed a proposal for a data interoperability network focused on trees and associated data and metadata. Two key features of the proposal are the use of hackathons, and the use of the "EvoIO Stack" (nexml, CDAO, phyloWS) as a technological nucleus for growing an interop network.

The Proposal

Project Summary

INTEROP: A network for enabling community-driven standards to link evolution into the global web of data (EvoIO)

PI: Arlin Stoltzfus, Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute (CDAO, Bio::NEXUS, Nexplorer). Co-PIs: Karen Cranston, EOL and Field Museum of Natural History; Enrico Pontelli, New Mexico State University, Computer Science (CDAO); Sheldon McKay, Cold Spring Harbor Laboratory (GMOD, modENCODE, iPlant), Hilmar Lapp, NESCent (PhyloWS, BioSQL); Nico Cellinese, University of Florida, Florida Museum of Natural History (TOLKIN, RegNum).

Intellectual Merit. Evolutionary trees (“phylogenies”) organize knowledge of biodiversity and provide a framework for rigorous methods of comparative analysis used throughout the bio- sciences. In the past, the scope of a tree-based analysis was limited to the researcher’s “own” small data set. The great mass of currently available data makes possible far-reaching and systematic analyses, but only if trees (and associated data and metadata) can be accessed, searched, retrieved, and repurposed— that is, only if the data are interoperable. An integrated solution to this problem requires attention to the syntax and semantics of data, metadata, and services. Over the past 3 years, an “Evolutionary Informatics” working group funded by NESCent (an NSF Center) developed an interoperability “stack” consisting of NeXML (a file format for comparative data), the Comparative Data Analysis Ontology (CDAO) and PhyloWS (a web services standard). Recently, the group staged a “hackathon” that engaged a fresh group of researcher-programmers (chosen to represent community data resources) to learn, apply, and extend the EvoIO Stack, with results that show the remarkable promise of this approach to train early-career scientists, disseminate standards, and improve interoperability. The investigators will build on this approach and on their unique technology and experience to engage a larger community in improving interoperability of trees with associated data and metadata (e.g., taxonomic affiliations, sources, character data, etc). The EvoIO Network will organize hackathons, hold training workshops, host working groups, and implement infrastructure for community-building around emerging standards. Network staff will provide technical expertise in knowledge representation and bioinformatics, working to support standards and to build reference implementations. The resulting EvoIO community will extend broadly into systematics-biodiversity, comparative genomics, and phylogenetics, and will penetrate into key areas of community ecology, phylogenetic epidemiology and paleobiology.

Broader impacts. The research areas affected by this proposal— all those areas in which phylogenetic trees are used routinely— are diverse and currently are not unified by professional organizations, software platforms, or standards. By bringing together scientists from various disciplines, we will develop awareness of the need for standards, cohesion around preferred approaches to interoperability, and ultimately a broad consensus on specific standards. This will be accomplished by building on the momentum of work done under prior NSF funding via NESCent. The key to developing a cohesive community in the absence of pre- existing cohesion is the hackathon mechanism, which generates success stories and arms young researcher-programmers with the know-how to create further successes. Through this mechanism, user requirements will be translated into standards and specifications, and implemented in community software tools. Reference Implementations (developed concurrently with standards and specifications) will be used to aid in standards development and training. Hackathons will take place in eastern, western, and central locations to maximize diversity in impact, and will include strategically selected participants as well as a large fraction of participants chosen in response to a broad solicitation in the biodiversity, systematics, genomics, and phylogenetics communities. Standards and specifications developed by the Network will be disseminated via the relevant international standards group (the TDWG Phylogenetics Standard Interest Group). Efforts will be made to integrate ideas from this project into existing educational and outreach programs, with particular focus on involving students from NMSU (a minority-serving institution).

Project Description

The Project Description is available as a PDF.