Difference between revisions of "NSF Interop Proposal"

From Evolutionary Informatics Working Group
Jump to: navigation, search
Line 1: Line 1:
= Overview and talking points =
+
= Background =
  
Through the work of NESCent's informatics staff, the Evolutionary Informatics working group, and the participants in the recent data interop hackathon, we are in a position to apply for an NSF [http://www.nsf.gov/pubs/2007/nsf07565/nsf07565.htm Interop proposal].
+
Due to the foundation laid (over several years) by many people, including NESCent's informatics staff, the Evolutionary Informatics working group, and the participants in the recent data interop hackathon, we are in a position to apply for an NSF [http://www.nsf.gov/pubs/2007/nsf07565/nsf07565.htm Interop proposal].
  
This funding program provides 250 K per year to support interoperability projects that are multidisciplinary and that have a community aspect and a technology aspect.  The next deadline (possibly the last deadline for this program) is July 23.
+
This funding program provides up to 250 K per year to support ''data interoperability networks''.  The networks are interoperability projects that are multidisciplinary; the network proposal should have a community aspect and a technology aspect.  The deadline (possibly the last deadline for this program) is July 23.
  
 
What makes us competitive:
 
What makes us competitive:
Line 12: Line 12:
 
* our connections with a network of researchers, programmers, and data providers
 
* our connections with a network of researchers, programmers, and data providers
  
= Planning documents =
+
Over the course of 4 weeks in June and July, a group of individuals developed a proposal for a data interoperability network focused on trees and associated data and metadata.  Two key features of the proposal are the use of hackathons, and the use of the "EvoIO Stack" (nexml, CDAO, phyloWS) as a technological nucleus for growing an interop network. 
  
== Key aspects of the INTEROP program ==
+
= The Proposal =
 
 
Here are some key points to keep in mind for INTEROP:
 
# '''community involvement AND enabling technologies'''.  This is not a science or technology proposal, its a '''network''' proposal.  A successful proposal needs to focus on the network with technology playing a supporting role.  We need to show that we are ready to respond to a community's needs, and that we have the technical expertise to support standards or conventions that arise in response to community needs.  If the community needs a web services standard, we need to be able to develop one.  In order to do this, we need to create a community, using workshops and web sites and mailing lists and so on.  We have been doing a lot of that, but it needs to be opened up even more.  I think we are on solid ground here.
 
# '''cross-cutting'''.  A successful proposal needs to address more than one disciplinary area.  This may be a challenge for us.  We are diverse in terms of ranging from molecular evolution to species diversity, but this is all within the discipline of life sciences.  We have a computer scientist, but we might need more.  What other disciplines could be involved, e.g., earth sciences, physics, behavior?  The program also looks for diversity in the '''types of data''' involved.  So, if we address phylogenies, taxonomic classes, and comparative data, this is much broader than if we just focus on trees.
 
# '''community engagement'''.  We need to do more than just involve a community, we need to be pro-active and responsive.  "Proposals for activites not based on significant community engagement and consensus-building activities are not responsive to this solicitation and will be returned without review".  We have developed nexml, CDAO and phyloWS with the aim of serving community interop needs.  However, so far these tools are limited in their use.  Lets imagine some future point where these are full-fledged community resources, widely supported in the phylogenetics community (like BioPerl is now), with
 
#* many people involved in development (i.e., many "eyes on code")
 
#* documentation and training resources readily available for anyone who wants to learn
 
#* many people trained to use the tools
 
#* many research projects willing to contribute to maintaining and improving these tools
 
#* symposia and satellite conferences at major meetings
 
 
 
The NSF  Interop program will provide support for meetings and workshops, along with a modest amount of support for technical staff.
 
 
 
== Possible components of a proposal ==
 
 
 
=== technology approach ===
 
 
 
The approach we have discussed is sometimes called the "evoinfo stack", consisting of
 
* CDAO (semantics)
 
* nexml (syntax)
 
* phyloWS (services)
 
 
 
Together, these three represent an integrated approach to tackling interop problems.
 
 
 
=== a really good name ===
 
 
 
=== key collaborators ===
 
 
 
We have a core team composed of the PI and co-PIs, we also need collaborators.
 
 
 
The collaborators should be chosen strategically.
 
 
 
The collaborators should provide letters of commitment indicating their willingness to commit (as appropriate) to
 
* send reps to attend meetings and workshops
 
* contribute programmer hours from their own staff
 
* offer services
 
* participate in a working group or standards group
 
 
 
=== scientific challenges ===
 
 
 
The key challenge that we discussed was to ensure interoperability of trees with data and metadata.  This has a nice focus but it can be expanded to cover anything.
 
 
 
=== staffing ===
 
 
 
Some money can be spent on salaries.  Programmer post-docs in bioinformatics cost 40K or more plus benefits plus overhead, i.e., 70 or 80 K.
 
 
 
=== hackathons ===
 
 
 
We bring together selected participants to work in an open-ended way, or to work on specific interoperability objectives, e.g., a reference implementation.
 
 
 
Cost:
 
* about $1K (??) per person for 4 days at NESCent
 
* about $1200 per person for 4 days at FMNH in Chicago (incl. food, lodging and ground transportation)
 
 
 
=== training events ===
 
 
 
Training events could focus on developers or on end-users, e.g., on how to use nexml and nexml APIs. We could provide training
 
* at a stand-alone workshop
 
* as part of another workshop or course (Woods Hole Molecular Evolution, Bodega Bay Phylogenetics, Computational Phyloinformatics at NESCent, etc).
 
* during a conference (which ones? SSE? SMBE?)
 
* online (web course)
 
 
 
=== electronic resources ===
 
 
 
* "help desk" concept
 
* coordinate mailing lists
 
* coordinate web sites
 
* some sort of interaction portal
 
 
 
=== technical support ===
 
 
 
* developing libraries
 
* developing reference implementations
 
* help desk
 
 
 
=== documentation ===
 
 
 
* getting started with the EvoInfo stack
 
* best practices for using the EvoInfo stack
 
* making the most of EvoInfo stack implementations at (TOLKIN, iPlant, EOL, etc.)
 
 
 
=== demos, showcases, reference implementations ===
 
 
 
* pick one project (e.g., TOLKIN) for thorough implementation of interop technologies
 
* numerous small demos (e.g., as from the hackathon)
 
* work with key projects (iPlant, EoL) for demo proejcts
 
 
 
=== working groups ===
 
 
 
might need to form some working groups to address specific issues:
 
* metadata
 
* taxonomic identifiers
 
* scoping and conflicts between ontologies
 
 
 
=== a schedule with milestones ===
 
 
 
What are we going to accomplish in year 1?
 
 
 
What about year 2?
 
 
 
Year 3?
 
 
 
== People and institutions ==
 
 
 
Some things to keep in mind:
 
* NESCent can collaborate by providing meeting space (we pay travel and logistics overhead)
 
* this is a US domestic program, but good to have international collaboration
 
 
 
=== PI, Co-PIs, and senior project personnel ===
 
 
 
This proposal needs a '''single PI''' from the coordinating instiution. However, there is no limit to the number of Co-PIs.
 
 
 
Here are the people interested so far:
 
# Karen Cranston, EOL and Field Museum of Natural History (I would be coming at this from the perspective of both a provider (PhyLoTA) and user (EOL, Treeviz working group) of phylogenetic data. I am officially working with the EOL, and also have a connection to the iPlant Tree of Life group, both of which are going to need these tools.)
 
# Enrico Pontelli, New Mexico State University, Computer Science
 
# Rutger Vos, University of British Columbia, Zoology
 
# Arlin Stoltzfus, University of Maryland Biotechnology Institute
 
# Sheldon McKay, Cold Spring Harbor Laboratory (GMOD, modENCODE, iPlant)
 
# Hilmar Lapp, NESCent
 
 
 
=== Network members ===
 
These are projects that we would like to engage as active members of the INTEROP network. We will need a letter of collaboration, with specific contribution goals, from each project.
 
 
 
{| class="wikitable"
 
|-
 
! Project
 
! Who Will Contact
 
! Details - who should we contact? have they committed to participate?
 
|-
 
| EOL
 
| Karen
 
| Mark Westneat at FMHN contacted; will write a strong letter for EOL participation (includes TreeViz working group)
 
|-
 
| iPlant
 
| Sheldon?
 
|
 
|-
 
| TDWG
 
| Hilmar / Nico?
 
|
 
|-
 
| NCEAS
 
| Hilmar
 
| Jeanine Cavendar-Bares has a working group?
 
|-
 
| NCEAS
 
| Nico
 
| potential collaboration with Mark Shildhauer's INTEROP group
 
|-
 
| PaleoBiology
 
| ?
 
| Someone had a contact in Chicago?
 
|-
 
| TreeBase
 
| ?
 
| Are we getting a letter of collaboration from Bill?
 
|-
 
| MOBot / APweb
 
| Hilmar?
 
|
 
|}
 
 
 
 
 
= Initial draft of NSF Proposal =
 
 
 
== Title ==
 
 
 
suggested titles (must begin with "INTEROP: "):
 
* INTEROP: Integration and re-use of phylogenetic and comparative data by an expanding research community
 
* INTEROP: Engaging an expanding community in developing and using the EvoInfo Stack
 
* INTEROP: As phyloGood as it phyloGets
 
  
 
== Project Summary ==
 
== Project Summary ==
  
''the project summary has 3 parts:''
+
'''INTEROP: A network for enabling community-driven standards to link evolution into the
* ''Title, PI, Co-PIs, and senior project personnel''
+
global web of data (EvoIO)'''  
* ''"a succinct summary of intellectual merit" including scope of activities (communities, data types, technologies), networking activities and mechanisms for participation, and ways of providing technical expertise''
 
* ''"a description of broader impacts" including interop, participation, education & training''
 
 
 
  
'''INTEROP: As PhyloGood as it PhyloGets'''
+
PI: '''Arlin Stoltzfus''', Center for Advanced Research in Biotechnology, University of Maryland
<hr>
+
Biotechnology Institute (CDAO, Bio::NEXUS, Nexplorer). Co-PIs: '''Karen Cranston''', EOL and  
'''PI''': ''to be determined''. '''Co-PIs''':  ''Karen Cranston'', EOL and Field Museum of Natural History; ''Enrico Pontelli'', New Mexico State University, Computer Science; ''Rutger Vos'', University of British Columbia, Zoology; ''Arlin Stoltzfus'', University of Maryland Biotechnology Institute; ''Sheldon McKay'', Cold Spring Harbor Laboratory (GMOD, modENCODE, iPlant), ''Hilmar Lapp'', NESCent
+
Field Museum of Natural History; '''Enrico Pontelli''', New Mexico State University, Computer Sci-
<hr>
+
ence (CDAO); '''Sheldon McKay''', Cold Spring Harbor Laboratory (GMOD, modENCODE,  
'''Intellectual Merit'''.  The application of comparative methods based on phylogenies plays an expanding role in contemporary biological data analysis.  Further advances in the use and the useability of phylogenetic comparative methods depends on improvements in interoperability.  A key problem is the ability to compute interoperably with phylogenetic trees of arbitrary complexity and with various "metadata" or annotations such as taxonomic identifiers, literature references, and procedural notes.  An integrated technological solution to this problem requires attention to the syntax, semantics, and services.  The proposed Network will build on the work of the Evolutionary Informatics Working Group, which has developed  . . .
+
iPlant), '''Hilmar Lapp''', NESCent (PhyloWS, BioSQL); '''Nico Cellinese''', University of Florida, Flor-
 +
ida Museum of Natural History (TOLKIN, RegNum).  
  
in progress (needs: scope of network in terms of communities and data types; major networking activities; mechanisms for promoting participation; descrioption of technical expertise available to the Network)
+
'''Intellectual Merit'''. Evolutionary trees (“phylogenies”) organize knowledge of biodiversity and
 
+
provide a framework for rigorous methods of comparative analysis used throughout the bio-
'''Broader impacts'''  (this should include impace of enhanced interop on science; plans to provide for diverse participation; educational training; outreach goals)
+
sciences. In the past, the scope of a tree-based analysis was limited to the researcher’s “own”
 +
small data set. The great mass of currently available data makes possible far-reaching and
 +
systematic analyses, but only if trees (and associated data and metadata) can be accessed,
 +
searched, retrieved, and repurposed— that is, only if the data are interoperable. An integrated
 +
solution to this problem requires attention to the syntax and semantics of data, metadata, and
 +
services. Over the past 3 years, an “Evolutionary Informatics” working group funded by NES-
 +
Cent (an NSF Center) developed an interoperability “stack” consisting of NeXML (a file format
 +
for comparative data), the Comparative Data Analysis Ontology (CDAO) and PhyloWS (a web
 +
services standard). Recently, the group staged a “hackathon” that engaged a fresh group of  
 +
researcher-programmers (chosen to represent community data resources) to learn, apply, and
 +
extend the EvoIO Stack, with results that show the remarkable promise of this approach to
 +
train early-career scientists, disseminate standards, and improve interoperability. The investi-
 +
gators will build on this approach and on their unique technology and experience to engage a
 +
larger community in improving interoperability of trees with associated data and metadata
 +
(e.g., taxonomic affiliations, sources, character data, etc). The EvoIO Network will organize
 +
hackathons, hold training workshops, host working groups, and implement infrastructure for
 +
community-building around emerging standards. Network staff will provide technical expertise
 +
in knowledge representation and bioinformatics, working to support standards and to build
 +
reference implementations. The resulting EvoIO community will extend broadly into
 +
systematics-biodiversity, comparative genomics, and phylogenetics, and will penetrate into
 +
key areas of community ecology, phylogenetic epidemiology and paleobiology.
  
 +
'''Broader impacts'''. The research areas affected by this proposal— all those areas in which
 +
phylogenetic trees are used routinely— are diverse and currently are not unified by profes-
 +
sional organizations, software platforms, or standards. By bringing together scientists from
 +
various disciplines, we will develop awareness of the need for standards, cohesion around
 +
preferred approaches to interoperability, and ultimately a broad consensus on specific stan-
 +
dards. This will be accomplished by building on the momentum of work done under prior NSF
 +
funding via NESCent. The key to developing a cohesive community in the absence of pre-
 +
existing cohesion is the hackathon mechanism, which generates success stories and arms
 +
young researcher-programmers with the know-how to create further successes. Through this
 +
mechanism, user requirements will be translated into standards and specifications, and im-
 +
plemented in community software tools. Reference Implementations (developed concurrently
 +
with standards and specifications) will be used to aid in standards development and training. 
 +
Hackathons will take place in eastern, western, and central locations to maximize diversity in
 +
impact, and will include strategically selected participants as well as a large fraction of partici-
 +
pants chosen in response to a broad solicitation in the biodiversity, systematics, genomics,
 +
and phylogenetics communities.  Standards and specifications developed by the Network will
 +
be disseminated via the relevant international standards group (the TDWG Phylogenetics
 +
Standard Interest Group).  Efforts will be made to integrate ideas from this project into existing
 +
educational and outreach programs, with particular focus on involving students from NMSU (a
 +
minority-serving institution).
 
== Project Description ==
 
== Project Description ==
 
=== 1. Introduction and Network Objectives ===
 
 
needs
 
* example of kinds of data to be integrated
 
*
 
 
==== Network Objectives ====
 
 
=== 2. Background and Rationale ===
 
 
==== Results from Prior NSF Support ====
 
 
===== Evolutionary Informatics Working Group (NESCent) =====
 
 
* cohesion
 
* practical gains during hackathon
 
* the evoinfo stack
 
 
===== another NSF-supported project =====
 
 
===== another NSF-supported project =====
 
 
===== another NSF-supported project =====
 
 
===== another NSF-supported project =====
 
 
=== 3. The Network Plan ===
 
 
==== Vision and Rationale for the Network ====
 
==== Network Organization ====
 
(groups and deliverables for each group)
 
 
==== Network Activities ====
 
 
==== Responsibilities of the Network ====
 
 
how will network meet dual responsibilities of engaging the community and providing expertise?
 
 
=== 4. Broader Impacts ===
 
 
==== Education, Outreach and Training ====
 
 
=== 5. Network Management Plan ===
 
 
== Notes on additional parts of application ==
 
 
=== References Cited ===
 
indicate with an asterisk any cited publications from prior research frunded by NSF for the PI or co-PIs.
 
 
=== Biographical Sketchs ===
 
 
for PI, Co-PIs, senior personnel
 
 
=== Current and pending support ===
 
 
for PI, Co-PIs, senior personnel
 
 
=== Budget ===
 
 
Note
 
* most awards will be 3 years (special justification for 4 or 5 years)
 
* restrictions on indirect costs on "partcipant support" costs (see solicitation)
 
 
== Special Information and Supplementary Documentation ==
 
 
=== key personnel list ===
 
 
(max 3 pp): PI, Co-PI, senior personnel, with a brief descrioption of what each person brings to the network
 
 
=== current activities===
 
 
current activities and results under prior NSF support for PI, co-PIs and senior personnel. Apparently this is the place for your '''other''' projects.  Projects that are relevant precursors to teh Network proposal will be in another section.
 
 
=== letters of collaboration ===
 
 
letters from individuals or entities with a direct, integral and essential role in the nNetwork may be included.  letters should document willingness to participate in or contribute to network activities.  general letters of enorsement may not be included.
 
 
=== conflicts of interest list ===
 
 
"provide a list, in a single alphabetized table, with the full names of all people with conflicts of interest for 5he PI, co-PIs and senior personnel", including
 
# PhD advisors or advisees
 
# collaborators or co-authors for the past 48 months
 
# any other individuals or orgs with which the investigator has financial ties.
 
Please specific the type of conflict for each listing in the table.
 
 
This table is available via google docs.  Please contact Arlin for the URL.
 

Revision as of 15:09, 27 July 2009

Background

Due to the foundation laid (over several years) by many people, including NESCent's informatics staff, the Evolutionary Informatics working group, and the participants in the recent data interop hackathon, we are in a position to apply for an NSF Interop proposal.

This funding program provides up to 250 K per year to support data interoperability networks. The networks are interoperability projects that are multidisciplinary; the network proposal should have a community aspect and a technology aspect. The deadline (possibly the last deadline for this program) is July 23.

What makes us competitive:

  • our past success in developing interop technologies nexml, CDAO and PhyloWS
  • the 3-part interop formula of data syntax (nexml), semantics (CDAO) and services (phyloWS)
  • our past success in actual demonstration projects that show off interop technology
  • our demonstrated commitment to including diverse projects
  • our connections with a network of researchers, programmers, and data providers

Over the course of 4 weeks in June and July, a group of individuals developed a proposal for a data interoperability network focused on trees and associated data and metadata. Two key features of the proposal are the use of hackathons, and the use of the "EvoIO Stack" (nexml, CDAO, phyloWS) as a technological nucleus for growing an interop network.

The Proposal

Project Summary

INTEROP: A network for enabling community-driven standards to link evolution into the global web of data (EvoIO)

PI: Arlin Stoltzfus, Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute (CDAO, Bio::NEXUS, Nexplorer). Co-PIs: Karen Cranston, EOL and Field Museum of Natural History; Enrico Pontelli, New Mexico State University, Computer Sci- ence (CDAO); Sheldon McKay, Cold Spring Harbor Laboratory (GMOD, modENCODE, iPlant), Hilmar Lapp, NESCent (PhyloWS, BioSQL); Nico Cellinese, University of Florida, Flor- ida Museum of Natural History (TOLKIN, RegNum).

Intellectual Merit. Evolutionary trees (“phylogenies”) organize knowledge of biodiversity and provide a framework for rigorous methods of comparative analysis used throughout the bio- sciences. In the past, the scope of a tree-based analysis was limited to the researcher’s “own” small data set. The great mass of currently available data makes possible far-reaching and systematic analyses, but only if trees (and associated data and metadata) can be accessed, searched, retrieved, and repurposed— that is, only if the data are interoperable. An integrated solution to this problem requires attention to the syntax and semantics of data, metadata, and services. Over the past 3 years, an “Evolutionary Informatics” working group funded by NES- Cent (an NSF Center) developed an interoperability “stack” consisting of NeXML (a file format for comparative data), the Comparative Data Analysis Ontology (CDAO) and PhyloWS (a web services standard). Recently, the group staged a “hackathon” that engaged a fresh group of researcher-programmers (chosen to represent community data resources) to learn, apply, and extend the EvoIO Stack, with results that show the remarkable promise of this approach to train early-career scientists, disseminate standards, and improve interoperability. The investi- gators will build on this approach and on their unique technology and experience to engage a larger community in improving interoperability of trees with associated data and metadata (e.g., taxonomic affiliations, sources, character data, etc). The EvoIO Network will organize hackathons, hold training workshops, host working groups, and implement infrastructure for community-building around emerging standards. Network staff will provide technical expertise in knowledge representation and bioinformatics, working to support standards and to build reference implementations. The resulting EvoIO community will extend broadly into systematics-biodiversity, comparative genomics, and phylogenetics, and will penetrate into key areas of community ecology, phylogenetic epidemiology and paleobiology.

Broader impacts. The research areas affected by this proposal— all those areas in which phylogenetic trees are used routinely— are diverse and currently are not unified by profes- sional organizations, software platforms, or standards. By bringing together scientists from various disciplines, we will develop awareness of the need for standards, cohesion around preferred approaches to interoperability, and ultimately a broad consensus on specific stan- dards. This will be accomplished by building on the momentum of work done under prior NSF funding via NESCent. The key to developing a cohesive community in the absence of pre- existing cohesion is the hackathon mechanism, which generates success stories and arms young researcher-programmers with the know-how to create further successes. Through this mechanism, user requirements will be translated into standards and specifications, and im- plemented in community software tools. Reference Implementations (developed concurrently with standards and specifications) will be used to aid in standards development and training. Hackathons will take place in eastern, western, and central locations to maximize diversity in impact, and will include strategically selected participants as well as a large fraction of partici- pants chosen in response to a broad solicitation in the biodiversity, systematics, genomics, and phylogenetics communities. Standards and specifications developed by the Network will be disseminated via the relevant international standards group (the TDWG Phylogenetics Standard Interest Group). Efforts will be made to integrate ideas from this project into existing educational and outreach programs, with particular focus on involving students from NMSU (a minority-serving institution).

Project Description