Difference between revisions of "NSF Interop Proposal"

From Evolutionary Informatics Working Group
Jump to: navigation, search
(technical support)
Line 30: Line 30:
 
#* symposia and satellite conferences at major meetings
 
#* symposia and satellite conferences at major meetings
  
The NSF  Interop program will provide support for meetings and workshops, along with a modest amount of support for technical staff.
+
The NSF  Interop program will provide support for meetings and workshops, along with a modest amount of support for technical staff.
  
 
== Possible components of a proposal ==
 
== Possible components of a proposal ==
Line 36: Line 36:
 
=== technology approach ===
 
=== technology approach ===
  
The approach we have discussed is sometimes called the "evoinfo stack", consisting of  
+
The approach we have discussed is sometimes called the "evoinfo stack", consisting of
* CDAO (semantics)  
+
* CDAO (semantics)
* nexml (syntax)  
+
* nexml (syntax)
* phyloWS (services)  
+
* phyloWS (services)
  
Together, these three represent an integrated approach to tackling interop problems.
+
Together, these three represent an integrated approach to tackling interop problems.
  
 
=== a really good name ===
 
=== a really good name ===
Line 47: Line 47:
 
=== key collaborators ===
 
=== key collaborators ===
  
We have a core team composed of the PI and co-PIs, we also need collaborators.
+
We have a core team composed of the PI and co-PIs, we also need collaborators.
  
The collaborators should be chosen strategically.
+
The collaborators should be chosen strategically.
  
The collaborators should provide letters of commitment indicating their willingness to commit (as appropriate) to  
+
The collaborators should provide letters of commitment indicating their willingness to commit (as appropriate) to
 
* send reps to attend meetings and workshops
 
* send reps to attend meetings and workshops
 
* contribute programmer hours from their own staff
 
* contribute programmer hours from their own staff
* offer services  
+
* offer services
* participate in a working group or standards group  
+
* participate in a working group or standards group
  
 
=== scientific challenges ===
 
=== scientific challenges ===
  
The key challenge that we discussed was to ensure interoperability of trees with data and metadata.  This has a nice focus but it can be expanded to cover anything.
+
The key challenge that we discussed was to ensure interoperability of trees with data and metadata.  This has a nice focus but it can be expanded to cover anything.
  
 
=== staffing ===
 
=== staffing ===
Line 69: Line 69:
 
We bring together selected participants to work in an open-ended way, or to work on specific interoperability objectives, e.g., a reference implementation.
 
We bring together selected participants to work in an open-ended way, or to work on specific interoperability objectives, e.g., a reference implementation.
  
Cost is (??) about $1K per person for 4 days at NESCent.
+
Cost:
 +
* about $1K (??) per person for 4 days at NESCent
 +
* about $1200 per person for 4 days at FMNH in Chicago (incl. food, lodging and ground transportation)
  
 
=== training events ===
 
=== training events ===
Line 121: Line 123:
 
== People and institutions ==
 
== People and institutions ==
  
Some things to keep in mind:  
+
Some things to keep in mind:
 
* NESCent can collaborate by providing meeting space (we pay travel and logistics overhead)
 
* NESCent can collaborate by providing meeting space (we pay travel and logistics overhead)
* this is a US domestic program, but good to have international collaboration  
+
* this is a US domestic program, but good to have international collaboration
  
 
=== PI, Co-PIs, and senior project personnel ===
 
=== PI, Co-PIs, and senior project personnel ===
Line 135: Line 137:
 
# Arlin Stoltzfus, University of Maryland Biotechnology Institute
 
# Arlin Stoltzfus, University of Maryland Biotechnology Institute
 
# Sheldon McKay, Cold Spring Harbor Laboratory (GMOD, modENCODE, iPlant)
 
# Sheldon McKay, Cold Spring Harbor Laboratory (GMOD, modENCODE, iPlant)
 +
 +
=== Potential participating projects ===
 +
 +
Large synthesis projects:
 +
* [http://www/eol.org EOL]
 +
* [http://www.iplantcollaborative.org/ iPlant]
 +
 +
Other data providers and collaborators:
 +
* [http://www.tolkin.org TOLKIN]
 +
* [http://www.treebase.org/ TreeBASE] (trees, data matrices)
 +
* [http://loco.biosci.arizona.edu/pb/ PhyLoTA] (trees, data matrices)
 +
* [http://www.morphbank.net/ Morphbank]
 +
* [http://paleodb.org/cgi-bin/bridge.pl PaleoDB]
 +
* [http://www.timetree.org/ TimeTree]
 +
* [http://www.ensembl.org/index.html Ensembl]
 +
* [https://www.phenoscape.org/wiki/Main_Page Phenoscape]
  
 
= Initial draft of NSF Proposal =
 
= Initial draft of NSF Proposal =
Line 140: Line 158:
 
== Title ==
 
== Title ==
  
suggested titles (must begin with "INTEROP: "):  
+
suggested titles (must begin with "INTEROP: "):
 
* INTEROP: Integration and re-use of phylogenetic and comparative data by an expanding research community
 
* INTEROP: Integration and re-use of phylogenetic and comparative data by an expanding research community
 
* INTEROP: As phyloGood as it phyloGets
 
* INTEROP: As phyloGood as it phyloGets

Revision as of 17:26, 29 June 2009

Overview and talking points

Through the work of NESCent's informatics staff, the Evolutionary Informatics working group, and the participants in the recent data interop hackathon, we are in a position to apply for an NSF Interop proposal.

This funding program provides 250 K per year to support interoperability projects that are multidisciplinary and that have a community aspect and a technology aspect. The next deadline (possibly the last deadline for this program) is July 23.

What makes us competitive:

  • our past success in developing interop technologies nexml, CDAO and PhyloWS
  • the 3-part interop formula of data syntax (nexml), semantics (CDAO) and services (phyloWS)
  • our past success in actual demonstration projects that show off interop technology
  • our demonstrated commitment to including diverse projects
  • our connections with a network of researchers, programmers, and data providers

Planning documents

Key aspects of the INTEROP program

The first priorities are to work out the scope of the project, and major aims that are consistent with

  1. who we are, what we've done and what we want to do, and
  2. what is required for a successful NSF interop grant, and what kind of support the program provides

I think we are familiar with the first item, so lets get to the second one: what makes a successful INTEROP proposal? Here are some key distinctions to keep in mind for INTEROP:

  1. community involvement AND enabling technologies. A successful proposal needs to have both. We need to show that we are ready to respond to a community's needs, and that we have the technical expertise to support standards or conventions that arise in response to community needs. If the community needs a web services standard, we need to be able to develop one. In order to do this, we need to create a community, using workshops and web sites and mailing lists and so on. We have been doing a lot of that, but it needs to be opened up even more. I think we are on solid ground here.
  2. cross-cutting. A successful proposal needs to address more than one disciplinary area. This may be a challenge for us. We are diverse in terms of ranging from molecular evolution to species diversity, but this is all within the discipline of life sciences. We have a computer scientist, but we might need more. What other disciplines could be involved, e.g., earth sciences, physics, behavior? The program also looks for diversity in the types of data involved. So, if we address phylogenies, taxonomic classes, and comparative data, this is much broader than if we just focus on trees.
  3. community engagement. We need to do more than just involve a community, we need to be responsive. "Proposals for activites not based on significant community engagement and consensus-building activities are not responsive to this solicitation and will be returned without review". We have developed nexml, CDAO and phyloWS with the aim of serving community interop needs. However, so far these tools are limited in their use. Lets imagine some future point where these are full-fledged community resources, widely supported in the phylogenetics community (like BioPerl is now), with
    • many people involved in development (i.e., many "eyes on code")
    • documentation and training resources readily available for anyone who wants to learn
    • many people trained to use the tools
    • many research projects willing to contribute to maintaining and improving these tools
    • symposia and satellite conferences at major meetings

The NSF Interop program will provide support for meetings and workshops, along with a modest amount of support for technical staff.

Possible components of a proposal

technology approach

The approach we have discussed is sometimes called the "evoinfo stack", consisting of

  • CDAO (semantics)
  • nexml (syntax)
  • phyloWS (services)

Together, these three represent an integrated approach to tackling interop problems.

a really good name

key collaborators

We have a core team composed of the PI and co-PIs, we also need collaborators.

The collaborators should be chosen strategically.

The collaborators should provide letters of commitment indicating their willingness to commit (as appropriate) to

  • send reps to attend meetings and workshops
  • contribute programmer hours from their own staff
  • offer services
  • participate in a working group or standards group

scientific challenges

The key challenge that we discussed was to ensure interoperability of trees with data and metadata. This has a nice focus but it can be expanded to cover anything.

staffing

Some money can be spent on salaries. Programmer post-docs in bioinformatics cost 40K or more plus benefits plus overhead, i.e., 70 or 80 K.

hackathons

We bring together selected participants to work in an open-ended way, or to work on specific interoperability objectives, e.g., a reference implementation.

Cost:

  • about $1K (??) per person for 4 days at NESCent
  • about $1200 per person for 4 days at FMNH in Chicago (incl. food, lodging and ground transportation)

training events

Training events could focus on developers or on end-users, e.g., on how to use nexml and nexml APIs. We could provide training

  • at a stand-alone workshop
  • as part of another workshop or course (Woods Hole Molecular Evolution, Bodega Bay Phylogenetics, Computational Phyloinformatics at NESCent, etc).
  • during a conference (which ones? SSE? SMBE?)
  • online (web course)

electronic resources

  • "help desk" concept
  • coordinate mailing lists
  • coordinate web sites
  • some sort of interaction portal

technical support

  • developing libraries
  • developing reference implementations
  • help desk

documentation

  • getting started with the EvoInfo stack
  • best practices for using the EvoInfo stack
  • making the most of EvoInfo stack implementations at (TOLKIN, iPlant, EOL, etc.)

demos, showcases, reference implementations

  • pick one project (e.g., TOLKIN) for thorough implementation of interop technologies
  • numerous small demos (e.g., as from the hackathon)
  • work with key projects (iPlant, EoL) for demo proejcts

working groups

might need to form some working groups to address specific issues:

  • metadata
  • taxonomic identifiers
  • scoping and conflicts between ontologies

a schedule with milestones

What are we going to accomplish in year 1?

What about year 2?

Year 3?

People and institutions

Some things to keep in mind:

  • NESCent can collaborate by providing meeting space (we pay travel and logistics overhead)
  • this is a US domestic program, but good to have international collaboration

PI, Co-PIs, and senior project personnel

This proposal needs a single PI from the coordinating instiution. However, there is no limit to the number of Co-PIs.

Here are the people interested so far:

  1. Karen Cranston, EOL and Field Museum of Natural History (I would be coming at this from the perspective of both a provider (PhyLoTA) and user (EOL, Treeviz working group) of phylogenetic data. I am officially working with the EOL, and also have a connection to the iPlant Tree of Life group, both of which are going to need these tools.)
  2. Enrico Pontelli, New Mexico State University, Computer Science
  3. Rutger Vos, University of British Columbia, Zoology
  4. Arlin Stoltzfus, University of Maryland Biotechnology Institute
  5. Sheldon McKay, Cold Spring Harbor Laboratory (GMOD, modENCODE, iPlant)

Potential participating projects

Large synthesis projects:

Other data providers and collaborators:

Initial draft of NSF Proposal

Title

suggested titles (must begin with "INTEROP: "):

  • INTEROP: Integration and re-use of phylogenetic and comparative data by an expanding research community
  • INTEROP: As phyloGood as it phyloGets

Project Summary

the project summary has 3 parts:

  • Title, PI, Co-PIs, and senior project personnel
  • "a succinct summary of intellectual merit" including scope of activities (communities, data types, technologies), networking activities and mechanisms for participation, and ways of providing technical expertise
  • "a description of broader impacts" including interop, participation, education & training

Project Description

Aims

Background

Results from past research

Research Design

Budget justification

  • meeting costs
  • staffing costs
    • software design and implementation
    • use case testing