Difference between revisions of "NSF Interop Proposal"

From Evolutionary Informatics Working Group
Jump to: navigation, search
 
(8 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 +
As a followup activity to the successful [[Database Interop Hackathon]], a group of individuals submitted a proposal to NSF for a multi-year $750000 project to fund an [[EvoIO Network]].
 +
 
= Background =
 
= Background =
  
Due to the foundation laid (over several years) by many people, including NESCent's informatics staff, the Evolutionary Informatics working group, and the participants in the recent data interop hackathon, we are in a position to apply for an NSF [http://www.nsf.gov/pubs/2007/nsf07565/nsf07565.htm Interop proposal].
+
Due to the foundation laid (over several years) by many people, including NESCent's informatics staff, the Evolutionary Informatics working group, and the participants in the recent data interop hackathon, we are in a position to apply to the NSF [http://www.nsf.gov/pubs/2007/nsf07565/nsf07565.htm INTEROP] program. This program provides up to 250 K per year to support a ''data interoperability network''.  The network should be multidisciplinary; the network proposal should have a community aspect and a technology aspect.  The deadline (possibly the last deadline for this program) is July 23.
 
 
This funding program provides up to 250 K per year to support ''data interoperability networks''.  The networks are interoperability projects that are multidisciplinary; the network proposal should have a community aspect and a technology aspect.  The deadline (possibly the last deadline for this program) is July 23.
 
  
 
What makes us competitive:
 
What makes us competitive:
Line 12: Line 12:
 
* our connections with a network of researchers, programmers, and data providers
 
* our connections with a network of researchers, programmers, and data providers
  
Over the course of 4 weeks in June and July, a group of individuals developed a proposal for a data interoperability network focused on trees and associated data and metadata.  Two key features of the proposal are the use of hackathons, and the use of the "EvoIO Stack" (nexml, CDAO, phyloWS) as a technological nucleus for growing an interop network.
+
Over the course of 4 weeks in June and July, a group of individuals developed a proposal for a data interoperability network focused on trees and associated data and metadata.  Two key features of the proposal are the use of hackathons, and the use of the "EvoIO Stack" (nexml, CDAO, phyloWS) as a technological nucleus for growing an interop network.
  
 
= The Proposal =
 
= The Proposal =
Line 18: Line 18:
 
== Project Summary ==
 
== Project Summary ==
  
'''INTEROP: A network for enabling community-driven standards to link evolution into the  
+
'''INTEROP: A network for enabling community-driven standards to link evolution into the global web of data (EvoIO)'''
global web of data (EvoIO)'''  
 
  
PI: '''Arlin Stoltzfus''', Center for Advanced Research in Biotechnology, University of Maryland  
+
PI: '''Arlin Stoltzfus''', Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute (CDAO, Bio::NEXUS, Nexplorer). Co-PIs: '''Karen Cranston''', EOL and Field Museum of Natural History; '''Enrico Pontelli''', New Mexico State University, Computer Science (CDAO); '''Sheldon McKay''', Cold Spring Harbor Laboratory (GMOD, modENCODE, iPlant), '''Hilmar Lapp''', NESCent (PhyloWS, BioSQL); '''Nico Cellinese''', University of Florida, Florida Museum of Natural History (TOLKIN, RegNum).
Biotechnology Institute (CDAO, Bio::NEXUS, Nexplorer). Co-PIs: '''Karen Cranston''', EOL and  
 
Field Museum of Natural History; '''Enrico Pontelli''', New Mexico State University, Computer Sci-
 
ence (CDAO); '''Sheldon McKay''', Cold Spring Harbor Laboratory (GMOD, modENCODE,  
 
iPlant), '''Hilmar Lapp''', NESCent (PhyloWS, BioSQL); '''Nico Cellinese''', University of Florida, Flor-
 
ida Museum of Natural History (TOLKIN, RegNum).  
 
  
'''Intellectual Merit'''. Evolutionary trees (“phylogenies”) organize knowledge of biodiversity and  
+
'''Intellectual Merit'''. Evolutionary trees (“phylogenies”) organize knowledge of biodiversity and
provide a framework for rigorous methods of comparative analysis used throughout the bio-  
+
provide a framework for rigorous methods of comparative analysis used throughout the bio-
sciences. In the past, the scope of a tree-based analysis was limited to the researcher’s “own”  
+
sciences. In the past, the scope of a tree-based analysis was limited to the researcher’s “own”
small data set. The great mass of currently available data makes possible far-reaching and  
+
small data set. The great mass of currently available data makes possible far-reaching and
systematic analyses, but only if trees (and associated data and metadata) can be accessed,  
+
systematic analyses, but only if trees (and associated data and metadata) can be accessed,
searched, retrieved, and repurposed— that is, only if the data are interoperable. An integrated  
+
searched, retrieved, and repurposed— that is, only if the data are interoperable. An integrated
solution to this problem requires attention to the syntax and semantics of data, metadata, and  
+
solution to this problem requires attention to the syntax and semantics of data, metadata, and
services. Over the past 3 years, an “Evolutionary Informatics” working group funded by NES-
+
services. Over the past 3 years, an “Evolutionary Informatics” working group funded by NESCent (an NSF Center) developed an interoperability “stack” consisting of NeXML (a file format
Cent (an NSF Center) developed an interoperability “stack” consisting of NeXML (a file format  
+
for comparative data), the Comparative Data Analysis Ontology (CDAO) and PhyloWS (a web
for comparative data), the Comparative Data Analysis Ontology (CDAO) and PhyloWS (a web  
+
services standard). Recently, the group staged a “hackathon” that engaged a fresh group of
services standard). Recently, the group staged a “hackathon” that engaged a fresh group of  
+
researcher-programmers (chosen to represent community data resources) to learn, apply, and
researcher-programmers (chosen to represent community data resources) to learn, apply, and  
+
extend the EvoIO Stack, with results that show the remarkable promise of this approach to
extend the EvoIO Stack, with results that show the remarkable promise of this approach to  
+
train early-career scientists, disseminate standards, and improve interoperability. The investigators will build on this approach and on their unique technology and experience to engage a
train early-career scientists, disseminate standards, and improve interoperability. The investi-
+
larger community in improving interoperability of trees with associated data and metadata
gators will build on this approach and on their unique technology and experience to engage a  
+
(e.g., taxonomic affiliations, sources, character data, etc). The EvoIO Network will organize
larger community in improving interoperability of trees with associated data and metadata  
+
hackathons, hold training workshops, host working groups, and implement infrastructure for
(e.g., taxonomic affiliations, sources, character data, etc). The EvoIO Network will organize  
+
community-building around emerging standards. Network staff will provide technical expertise
hackathons, hold training workshops, host working groups, and implement infrastructure for  
+
in knowledge representation and bioinformatics, working to support standards and to build
community-building around emerging standards. Network staff will provide technical expertise  
+
reference implementations. The resulting EvoIO community will extend broadly into
in knowledge representation and bioinformatics, working to support standards and to build  
+
systematics-biodiversity, comparative genomics, and phylogenetics, and will penetrate into
reference implementations. The resulting EvoIO community will extend broadly into  
+
key areas of community ecology, phylogenetic epidemiology and paleobiology.
systematics-biodiversity, comparative genomics, and phylogenetics, and will penetrate into  
 
key areas of community ecology, phylogenetic epidemiology and paleobiology.  
 
  
'''Broader impacts'''. The research areas affected by this proposal— all those areas in which  
+
'''Broader impacts'''. The research areas affected by this proposal— all those areas in which
phylogenetic trees are used routinely— are diverse and currently are not unified by profes-
+
phylogenetic trees are used routinely— are diverse and currently are not unified by professional organizations, software platforms, or standards. By bringing together scientists from
sional organizations, software platforms, or standards. By bringing together scientists from  
+
various disciplines, we will develop awareness of the need for standards, cohesion around
various disciplines, we will develop awareness of the need for standards, cohesion around  
+
preferred approaches to interoperability, and ultimately a broad consensus on specific standards. This will be accomplished by building on the momentum of work done under prior NSF
preferred approaches to interoperability, and ultimately a broad consensus on specific stan-
+
funding via NESCent. The key to developing a cohesive community in the absence of pre-
dards. This will be accomplished by building on the momentum of work done under prior NSF  
+
existing cohesion is the hackathon mechanism, which generates success stories and arms
funding via NESCent. The key to developing a cohesive community in the absence of pre-  
+
young researcher-programmers with the know-how to create further successes. Through this
existing cohesion is the hackathon mechanism, which generates success stories and arms  
+
mechanism, user requirements will be translated into standards and specifications, and implemented in community software tools. Reference Implementations (developed concurrently
young researcher-programmers with the know-how to create further successes. Through this  
+
with standards and specifications) will be used to aid in standards development and training.
mechanism, user requirements will be translated into standards and specifications, and im-
+
Hackathons will take place in eastern, western, and central locations to maximize diversity in
plemented in community software tools. Reference Implementations (developed concurrently  
+
impact, and will include strategically selected participants as well as a large fraction of participants chosen in response to a broad solicitation in the biodiversity, systematics, genomics,
with standards and specifications) will be used to aid in standards development and training.
+
and phylogenetics communities.  Standards and specifications developed by the Network will
Hackathons will take place in eastern, western, and central locations to maximize diversity in  
+
be disseminated via the relevant international standards group (the TDWG Phylogenetics
impact, and will include strategically selected participants as well as a large fraction of partici-
+
Standard Interest Group).  Efforts will be made to integrate ideas from this project into existing
pants chosen in response to a broad solicitation in the biodiversity, systematics, genomics,  
+
educational and outreach programs, with particular focus on involving students from NMSU (a
and phylogenetics communities.  Standards and specifications developed by the Network will  
 
be disseminated via the relevant international standards group (the TDWG Phylogenetics  
 
Standard Interest Group).  Efforts will be made to integrate ideas from this project into existing  
 
educational and outreach programs, with particular focus on involving students from NMSU (a  
 
 
minority-serving institution).
 
minority-serving institution).
 +
 
== Project Description ==
 
== Project Description ==
 +
 +
The [[Media:ProjectDescription_opt.pdf|Project Description]] is available as a PDF.

Latest revision as of 11:41, 8 October 2009

As a followup activity to the successful Database Interop Hackathon, a group of individuals submitted a proposal to NSF for a multi-year $750000 project to fund an EvoIO Network.

Background

Due to the foundation laid (over several years) by many people, including NESCent's informatics staff, the Evolutionary Informatics working group, and the participants in the recent data interop hackathon, we are in a position to apply to the NSF INTEROP program. This program provides up to 250 K per year to support a data interoperability network. The network should be multidisciplinary; the network proposal should have a community aspect and a technology aspect. The deadline (possibly the last deadline for this program) is July 23.

What makes us competitive:

  • our past success in developing interop technologies nexml, CDAO and PhyloWS
  • the 3-part interop formula of data syntax (nexml), semantics (CDAO) and services (phyloWS)
  • our past success in actual demonstration projects that show off interop technology
  • our demonstrated commitment to including diverse projects
  • our connections with a network of researchers, programmers, and data providers

Over the course of 4 weeks in June and July, a group of individuals developed a proposal for a data interoperability network focused on trees and associated data and metadata. Two key features of the proposal are the use of hackathons, and the use of the "EvoIO Stack" (nexml, CDAO, phyloWS) as a technological nucleus for growing an interop network.

The Proposal

Project Summary

INTEROP: A network for enabling community-driven standards to link evolution into the global web of data (EvoIO)

PI: Arlin Stoltzfus, Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute (CDAO, Bio::NEXUS, Nexplorer). Co-PIs: Karen Cranston, EOL and Field Museum of Natural History; Enrico Pontelli, New Mexico State University, Computer Science (CDAO); Sheldon McKay, Cold Spring Harbor Laboratory (GMOD, modENCODE, iPlant), Hilmar Lapp, NESCent (PhyloWS, BioSQL); Nico Cellinese, University of Florida, Florida Museum of Natural History (TOLKIN, RegNum).

Intellectual Merit. Evolutionary trees (“phylogenies”) organize knowledge of biodiversity and provide a framework for rigorous methods of comparative analysis used throughout the bio- sciences. In the past, the scope of a tree-based analysis was limited to the researcher’s “own” small data set. The great mass of currently available data makes possible far-reaching and systematic analyses, but only if trees (and associated data and metadata) can be accessed, searched, retrieved, and repurposed— that is, only if the data are interoperable. An integrated solution to this problem requires attention to the syntax and semantics of data, metadata, and services. Over the past 3 years, an “Evolutionary Informatics” working group funded by NESCent (an NSF Center) developed an interoperability “stack” consisting of NeXML (a file format for comparative data), the Comparative Data Analysis Ontology (CDAO) and PhyloWS (a web services standard). Recently, the group staged a “hackathon” that engaged a fresh group of researcher-programmers (chosen to represent community data resources) to learn, apply, and extend the EvoIO Stack, with results that show the remarkable promise of this approach to train early-career scientists, disseminate standards, and improve interoperability. The investigators will build on this approach and on their unique technology and experience to engage a larger community in improving interoperability of trees with associated data and metadata (e.g., taxonomic affiliations, sources, character data, etc). The EvoIO Network will organize hackathons, hold training workshops, host working groups, and implement infrastructure for community-building around emerging standards. Network staff will provide technical expertise in knowledge representation and bioinformatics, working to support standards and to build reference implementations. The resulting EvoIO community will extend broadly into systematics-biodiversity, comparative genomics, and phylogenetics, and will penetrate into key areas of community ecology, phylogenetic epidemiology and paleobiology.

Broader impacts. The research areas affected by this proposal— all those areas in which phylogenetic trees are used routinely— are diverse and currently are not unified by professional organizations, software platforms, or standards. By bringing together scientists from various disciplines, we will develop awareness of the need for standards, cohesion around preferred approaches to interoperability, and ultimately a broad consensus on specific standards. This will be accomplished by building on the momentum of work done under prior NSF funding via NESCent. The key to developing a cohesive community in the absence of pre- existing cohesion is the hackathon mechanism, which generates success stories and arms young researcher-programmers with the know-how to create further successes. Through this mechanism, user requirements will be translated into standards and specifications, and implemented in community software tools. Reference Implementations (developed concurrently with standards and specifications) will be used to aid in standards development and training. Hackathons will take place in eastern, western, and central locations to maximize diversity in impact, and will include strategically selected participants as well as a large fraction of participants chosen in response to a broad solicitation in the biodiversity, systematics, genomics, and phylogenetics communities. Standards and specifications developed by the Network will be disseminated via the relevant international standards group (the TDWG Phylogenetics Standard Interest Group). Efforts will be made to integrate ideas from this project into existing educational and outreach programs, with particular focus on involving students from NMSU (a minority-serving institution).

Project Description

The Project Description is available as a PDF.