Difference between revisions of "NSF Interop Proposal"

From Evolutionary Informatics Working Group
Jump to: navigation, search
Line 12: Line 12:
 
* our connections with a network of researchers, programmers, and data providers
 
* our connections with a network of researchers, programmers, and data providers
  
Over the course of 4 weeks in June and July, a group of individuals developed a proposal for a data interoperability network focused on trees and associated data and metadata.  Two key features of the proposal are the use of hackathons, and the use of the "EvoIO Stack" (nexml, CDAO, phyloWS) as a technological nucleus for growing an interop network.
+
Over the course of 4 weeks in June and July, a group of individuals developed a proposal for a data interoperability network focused on trees and associated data and metadata.  Two key features of the proposal are the use of hackathons, and the use of the "EvoIO Stack" (nexml, CDAO, phyloWS) as a technological nucleus for growing an interop network.
  
 
= The Proposal =
 
= The Proposal =
Line 18: Line 18:
 
== Project Summary ==
 
== Project Summary ==
  
'''INTEROP: A network for enabling community-driven standards to link evolution into the  
+
'''INTEROP: A network for enabling community-driven standards to link evolution into the
global web of data (EvoIO)'''  
+
global web of data (EvoIO)'''
  
PI: '''Arlin Stoltzfus''', Center for Advanced Research in Biotechnology, University of Maryland  
+
PI: '''Arlin Stoltzfus''', Center for Advanced Research in Biotechnology, University of Maryland
Biotechnology Institute (CDAO, Bio::NEXUS, Nexplorer). Co-PIs: '''Karen Cranston''', EOL and  
+
Biotechnology Institute (CDAO, Bio::NEXUS, Nexplorer). Co-PIs: '''Karen Cranston''', EOL and
Field Museum of Natural History; '''Enrico Pontelli''', New Mexico State University, Computer Sci-  
+
Field Museum of Natural History; '''Enrico Pontelli''', New Mexico State University, Computer Sci-
ence (CDAO); '''Sheldon McKay''', Cold Spring Harbor Laboratory (GMOD, modENCODE,  
+
ence (CDAO); '''Sheldon McKay''', Cold Spring Harbor Laboratory (GMOD, modENCODE,
iPlant), '''Hilmar Lapp''', NESCent (PhyloWS, BioSQL); '''Nico Cellinese''', University of Florida, Flor-  
+
iPlant), '''Hilmar Lapp''', NESCent (PhyloWS, BioSQL); '''Nico Cellinese''', University of Florida, Flor-
ida Museum of Natural History (TOLKIN, RegNum).  
+
ida Museum of Natural History (TOLKIN, RegNum).
  
'''Intellectual Merit'''. Evolutionary trees (“phylogenies”) organize knowledge of biodiversity and  
+
'''Intellectual Merit'''. Evolutionary trees (“phylogenies”) organize knowledge of biodiversity and
provide a framework for rigorous methods of comparative analysis used throughout the bio-  
+
provide a framework for rigorous methods of comparative analysis used throughout the bio-
sciences. In the past, the scope of a tree-based analysis was limited to the researcher’s “own”  
+
sciences. In the past, the scope of a tree-based analysis was limited to the researcher’s “own”
small data set. The great mass of currently available data makes possible far-reaching and  
+
small data set. The great mass of currently available data makes possible far-reaching and
systematic analyses, but only if trees (and associated data and metadata) can be accessed,  
+
systematic analyses, but only if trees (and associated data and metadata) can be accessed,
searched, retrieved, and repurposed— that is, only if the data are interoperable. An integrated  
+
searched, retrieved, and repurposed— that is, only if the data are interoperable. An integrated
solution to this problem requires attention to the syntax and semantics of data, metadata, and  
+
solution to this problem requires attention to the syntax and semantics of data, metadata, and
services. Over the past 3 years, an “Evolutionary Informatics” working group funded by NES-  
+
services. Over the past 3 years, an “Evolutionary Informatics” working group funded by NES-
Cent (an NSF Center) developed an interoperability “stack” consisting of NeXML (a file format  
+
Cent (an NSF Center) developed an interoperability “stack” consisting of NeXML (a file format
for comparative data), the Comparative Data Analysis Ontology (CDAO) and PhyloWS (a web  
+
for comparative data), the Comparative Data Analysis Ontology (CDAO) and PhyloWS (a web
services standard). Recently, the group staged a “hackathon” that engaged a fresh group of  
+
services standard). Recently, the group staged a “hackathon” that engaged a fresh group of
researcher-programmers (chosen to represent community data resources) to learn, apply, and  
+
researcher-programmers (chosen to represent community data resources) to learn, apply, and
extend the EvoIO Stack, with results that show the remarkable promise of this approach to  
+
extend the EvoIO Stack, with results that show the remarkable promise of this approach to
train early-career scientists, disseminate standards, and improve interoperability. The investi-  
+
train early-career scientists, disseminate standards, and improve interoperability. The investi-
gators will build on this approach and on their unique technology and experience to engage a  
+
gators will build on this approach and on their unique technology and experience to engage a
larger community in improving interoperability of trees with associated data and metadata  
+
larger community in improving interoperability of trees with associated data and metadata
(e.g., taxonomic affiliations, sources, character data, etc). The EvoIO Network will organize  
+
(e.g., taxonomic affiliations, sources, character data, etc). The EvoIO Network will organize
hackathons, hold training workshops, host working groups, and implement infrastructure for  
+
hackathons, hold training workshops, host working groups, and implement infrastructure for
community-building around emerging standards. Network staff will provide technical expertise  
+
community-building around emerging standards. Network staff will provide technical expertise
in knowledge representation and bioinformatics, working to support standards and to build  
+
in knowledge representation and bioinformatics, working to support standards and to build
reference implementations. The resulting EvoIO community will extend broadly into  
+
reference implementations. The resulting EvoIO community will extend broadly into
systematics-biodiversity, comparative genomics, and phylogenetics, and will penetrate into  
+
systematics-biodiversity, comparative genomics, and phylogenetics, and will penetrate into
key areas of community ecology, phylogenetic epidemiology and paleobiology.  
+
key areas of community ecology, phylogenetic epidemiology and paleobiology.
  
'''Broader impacts'''. The research areas affected by this proposal— all those areas in which  
+
'''Broader impacts'''. The research areas affected by this proposal— all those areas in which
phylogenetic trees are used routinely— are diverse and currently are not unified by profes-  
+
phylogenetic trees are used routinely— are diverse and currently are not unified by profes-
sional organizations, software platforms, or standards. By bringing together scientists from  
+
sional organizations, software platforms, or standards. By bringing together scientists from
various disciplines, we will develop awareness of the need for standards, cohesion around  
+
various disciplines, we will develop awareness of the need for standards, cohesion around
preferred approaches to interoperability, and ultimately a broad consensus on specific stan-  
+
preferred approaches to interoperability, and ultimately a broad consensus on specific stan-
dards. This will be accomplished by building on the momentum of work done under prior NSF  
+
dards. This will be accomplished by building on the momentum of work done under prior NSF
funding via NESCent. The key to developing a cohesive community in the absence of pre-  
+
funding via NESCent. The key to developing a cohesive community in the absence of pre-
existing cohesion is the hackathon mechanism, which generates success stories and arms  
+
existing cohesion is the hackathon mechanism, which generates success stories and arms
young researcher-programmers with the know-how to create further successes. Through this  
+
young researcher-programmers with the know-how to create further successes. Through this
mechanism, user requirements will be translated into standards and specifications, and im-  
+
mechanism, user requirements will be translated into standards and specifications, and im-
plemented in community software tools. Reference Implementations (developed concurrently  
+
plemented in community software tools. Reference Implementations (developed concurrently
with standards and specifications) will be used to aid in standards development and training.
+
with standards and specifications) will be used to aid in standards development and training.
Hackathons will take place in eastern, western, and central locations to maximize diversity in  
+
Hackathons will take place in eastern, western, and central locations to maximize diversity in
impact, and will include strategically selected participants as well as a large fraction of partici-  
+
impact, and will include strategically selected participants as well as a large fraction of partici-
pants chosen in response to a broad solicitation in the biodiversity, systematics, genomics,  
+
pants chosen in response to a broad solicitation in the biodiversity, systematics, genomics,
and phylogenetics communities.  Standards and specifications developed by the Network will  
+
and phylogenetics communities.  Standards and specifications developed by the Network will
be disseminated via the relevant international standards group (the TDWG Phylogenetics  
+
be disseminated via the relevant international standards group (the TDWG Phylogenetics
Standard Interest Group).  Efforts will be made to integrate ideas from this project into existing  
+
Standard Interest Group).  Efforts will be made to integrate ideas from this project into existing
educational and outreach programs, with particular focus on involving students from NMSU (a  
+
educational and outreach programs, with particular focus on involving students from NMSU (a
 
minority-serving institution).
 
minority-serving institution).
 +
 
== Project Description ==
 
== Project Description ==
 +
 +
The Project Description is available as a PDF [[Image:ProjectDescription_opt.pdf]]

Revision as of 14:14, 27 July 2009

Background

Due to the foundation laid (over several years) by many people, including NESCent's informatics staff, the Evolutionary Informatics working group, and the participants in the recent data interop hackathon, we are in a position to apply for an NSF Interop proposal.

This funding program provides up to 250 K per year to support data interoperability networks. The networks are interoperability projects that are multidisciplinary; the network proposal should have a community aspect and a technology aspect. The deadline (possibly the last deadline for this program) is July 23.

What makes us competitive:

  • our past success in developing interop technologies nexml, CDAO and PhyloWS
  • the 3-part interop formula of data syntax (nexml), semantics (CDAO) and services (phyloWS)
  • our past success in actual demonstration projects that show off interop technology
  • our demonstrated commitment to including diverse projects
  • our connections with a network of researchers, programmers, and data providers

Over the course of 4 weeks in June and July, a group of individuals developed a proposal for a data interoperability network focused on trees and associated data and metadata. Two key features of the proposal are the use of hackathons, and the use of the "EvoIO Stack" (nexml, CDAO, phyloWS) as a technological nucleus for growing an interop network.

The Proposal

Project Summary

INTEROP: A network for enabling community-driven standards to link evolution into the global web of data (EvoIO)

PI: Arlin Stoltzfus, Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute (CDAO, Bio::NEXUS, Nexplorer). Co-PIs: Karen Cranston, EOL and Field Museum of Natural History; Enrico Pontelli, New Mexico State University, Computer Sci- ence (CDAO); Sheldon McKay, Cold Spring Harbor Laboratory (GMOD, modENCODE, iPlant), Hilmar Lapp, NESCent (PhyloWS, BioSQL); Nico Cellinese, University of Florida, Flor- ida Museum of Natural History (TOLKIN, RegNum).

Intellectual Merit. Evolutionary trees (“phylogenies”) organize knowledge of biodiversity and provide a framework for rigorous methods of comparative analysis used throughout the bio- sciences. In the past, the scope of a tree-based analysis was limited to the researcher’s “own” small data set. The great mass of currently available data makes possible far-reaching and systematic analyses, but only if trees (and associated data and metadata) can be accessed, searched, retrieved, and repurposed— that is, only if the data are interoperable. An integrated solution to this problem requires attention to the syntax and semantics of data, metadata, and services. Over the past 3 years, an “Evolutionary Informatics” working group funded by NES- Cent (an NSF Center) developed an interoperability “stack” consisting of NeXML (a file format for comparative data), the Comparative Data Analysis Ontology (CDAO) and PhyloWS (a web services standard). Recently, the group staged a “hackathon” that engaged a fresh group of researcher-programmers (chosen to represent community data resources) to learn, apply, and extend the EvoIO Stack, with results that show the remarkable promise of this approach to train early-career scientists, disseminate standards, and improve interoperability. The investi- gators will build on this approach and on their unique technology and experience to engage a larger community in improving interoperability of trees with associated data and metadata (e.g., taxonomic affiliations, sources, character data, etc). The EvoIO Network will organize hackathons, hold training workshops, host working groups, and implement infrastructure for community-building around emerging standards. Network staff will provide technical expertise in knowledge representation and bioinformatics, working to support standards and to build reference implementations. The resulting EvoIO community will extend broadly into systematics-biodiversity, comparative genomics, and phylogenetics, and will penetrate into key areas of community ecology, phylogenetic epidemiology and paleobiology.

Broader impacts. The research areas affected by this proposal— all those areas in which phylogenetic trees are used routinely— are diverse and currently are not unified by profes- sional organizations, software platforms, or standards. By bringing together scientists from various disciplines, we will develop awareness of the need for standards, cohesion around preferred approaches to interoperability, and ultimately a broad consensus on specific stan- dards. This will be accomplished by building on the momentum of work done under prior NSF funding via NESCent. The key to developing a cohesive community in the absence of pre- existing cohesion is the hackathon mechanism, which generates success stories and arms young researcher-programmers with the know-how to create further successes. Through this mechanism, user requirements will be translated into standards and specifications, and im- plemented in community software tools. Reference Implementations (developed concurrently with standards and specifications) will be used to aid in standards development and training. Hackathons will take place in eastern, western, and central locations to maximize diversity in impact, and will include strategically selected participants as well as a large fraction of partici- pants chosen in response to a broad solicitation in the biodiversity, systematics, genomics, and phylogenetics communities. Standards and specifications developed by the Network will be disseminated via the relevant international standards group (the TDWG Phylogenetics Standard Interest Group). Efforts will be made to integrate ideas from this project into existing educational and outreach programs, with particular focus on involving students from NMSU (a minority-serving institution).

Project Description

The Project Description is available as a PDF File:ProjectDescription opt.pdf