Difference between revisions of "Dbhack2 proposal"

From Evolutionary Informatics Working Group
Jump to: navigation, search
(Proposal for a Phyloinformatics VoCamp)
 
(33 intermediate revisions by 6 users not shown)
Line 2: Line 2:
  
 
NESCent has some funds and is open to the idea of another hackathon.  Past hackathons have been 15 to 20 people from outside NESCent, along with a handful from inside NESCent.
 
NESCent has some funds and is open to the idea of another hackathon.  Past hackathons have been 15 to 20 people from outside NESCent, along with a handful from inside NESCent.
 +
 +
= VoCamp related meeting plan =
 +
 +
We decided to focus on developing vocabulary and ontology with a diverse group of stakeholders.
 +
 +
== Background: drivers and issues ==
 +
 +
We don't know precisely what this group will choose to focus on, but we need to articulate some of the drivers and state some of the issues to be resolved.
 +
 +
For instance, deciding how we are going to designate taxonomic identifiers is critical.  Data integration depends on being able to integrate across common variables.  In the genomics world, these are things like genbank accession numbers, species names (or NCBI taxon ids), and sequences, e.g., my colleague John Moult integrates SNP data, function annotations, and protein structure by way of sequence matches and accession numbers.  In the larger world of comparative biology, the integrating variables would include (in addition to genbank accessions and species names) other taxon identifiers, specimen (and collection) ids, geographic coordinates, and so on.
 +
 +
In development of the PhyloWS standard, we are working on specifying the available query terms. For example, a user may want to find:
 +
* subtrees descended from a given internal node
 +
* trees inferred using maximum likelihood
 +
* fully resolved (binary) trees
 +
But we need terminology to define these concepts. Ideally, these terms would be in an external ontology. This would also allow the NeXML returned from a PhyloWS query to contain these concepts, linked as metadata through the ontology.
 +
 +
Feel free to expand anything in the list into its own subsection
 +
* decide on an approach to representing descriptions of studies (probably based on OBI)
 +
* clarify relations in CDAO, following BFO and REL principles
 +
 +
== Meeting preparation ==
 +
 +
=== Soliciting and choosing participants ===
 +
We need to consider how we will choose participants and prepare for the meeting.  We may need to pick very carefully to achieve credibility if we want to promote standards.  This looks like a diverse group.
 +
 +
=== Preparing materials for the meeting ===
 +
 +
It would be a failure if the group got together and split into tiny pieces because there were not enough common interests.  We may need to assemble test cases that could be used to evaluate solutions, e.g., test cases of taxonomic disambiguation or cross-mapping.
 +
 +
== Meeting plan ==
 +
 +
What will be the structure of the meeting?  Open Space?  Agenda?
  
 
= Locations =
 
= Locations =
  
== Montpelier ==
+
== Montpellier ==
  
 
* Cost per person:
 
* Cost per person:
** $ 900 from major US airport to Montpelier
+
** $ 900 from major US airport to Montpellier
 
** $ 110 per night lodging
 
** $ 110 per night lodging
 
* Meeting facilities:
 
* Meeting facilities:
Line 17: Line 50:
 
** CDAO group in Strasbourg (Julie Thompson); collaborator Pontarotti in Marseilles has developed an ontology of "genetic events" in evolution.
 
** CDAO group in Strasbourg (Julie Thompson); collaborator Pontarotti in Marseilles has developed an ontology of "genetic events" in evolution.
 
** Ontologies group at U. Manchester is a cheap flight away.
 
** Ontologies group at U. Manchester is a cheap flight away.
** some names
+
** some relevant initiatives & projects
*** Roger Hyam
+
*** TDWG Technical Architecture Group (for which ontologies are one the three cornerstones)
*** John Wieczorek, Berkeley (Darwin core)
+
*** [http://wiki.tdwg.org/twiki/bin/view/DarwinCore/WebHome Darwin core]
*** Bob Morris (Biological Descriptions: http://www.tdwg.org/activities/bd/)
+
*** [http://www.tdwg.org/activities/bd/ Biological Descriptions]
*** Rich Pyle (taxonomic names and concepts http://www.tdwg.org/activities/tnc/)
+
*** [http://www.tdwg.org/activities/tnc/ Taxonomic names and concepts]
*** Markus Dorin (GBIF http://www.gbif.org)
+
*** [http://www.gbif.org GBIF]
*** Cynthia Parr (EOL)
+
*** [http://eol.org EOL]
*** Matt Jones (Observational Data/NSF-DataOne http://wiki.tdwg.org/Observational/)
+
*** [http://wiki.tdwg.org/Observational/ Observational Data] (also an NSF Interop project)
 +
*** [http://dataone.org DataOne]
  
 
== NESCent ==
 
== NESCent ==
Line 44: Line 78:
 
** airport is an hour away
 
** airport is an hour away
  
= Ideas =
+
= Hackathon-related ideas =
  
 
Projects focused on consolidating our gains and serving the community of users:
 
Projects focused on consolidating our gains and serving the community of users:
Line 87: Line 121:
 
:integrate with other projects with an outward facing PhyloWS interface
 
:integrate with other projects with an outward facing PhyloWS interface
 
:write-back capability via PhlyoWS
 
:write-back capability via PhlyoWS
 
== widen the domain ==
 
 
molecular epidemiology
 
  
 
== models ==
 
== models ==
Line 96: Line 126:
 
transition models
 
transition models
  
== ontology development ==
+
= Proposal for a Phyloinformatics VoCamp =
  
= Proposal =
+
The text of the proposal is at [[VoCamp1 Proposal]].

Latest revision as of 13:29, 27 August 2009

Overview

NESCent has some funds and is open to the idea of another hackathon. Past hackathons have been 15 to 20 people from outside NESCent, along with a handful from inside NESCent.

VoCamp related meeting plan

We decided to focus on developing vocabulary and ontology with a diverse group of stakeholders.

Background: drivers and issues

We don't know precisely what this group will choose to focus on, but we need to articulate some of the drivers and state some of the issues to be resolved.

For instance, deciding how we are going to designate taxonomic identifiers is critical. Data integration depends on being able to integrate across common variables. In the genomics world, these are things like genbank accession numbers, species names (or NCBI taxon ids), and sequences, e.g., my colleague John Moult integrates SNP data, function annotations, and protein structure by way of sequence matches and accession numbers. In the larger world of comparative biology, the integrating variables would include (in addition to genbank accessions and species names) other taxon identifiers, specimen (and collection) ids, geographic coordinates, and so on.

In development of the PhyloWS standard, we are working on specifying the available query terms. For example, a user may want to find:

  • subtrees descended from a given internal node
  • trees inferred using maximum likelihood
  • fully resolved (binary) trees

But we need terminology to define these concepts. Ideally, these terms would be in an external ontology. This would also allow the NeXML returned from a PhyloWS query to contain these concepts, linked as metadata through the ontology.

Feel free to expand anything in the list into its own subsection

  • decide on an approach to representing descriptions of studies (probably based on OBI)
  • clarify relations in CDAO, following BFO and REL principles

Meeting preparation

Soliciting and choosing participants

We need to consider how we will choose participants and prepare for the meeting. We may need to pick very carefully to achieve credibility if we want to promote standards. This looks like a diverse group.

Preparing materials for the meeting

It would be a failure if the group got together and split into tiny pieces because there were not enough common interests. We may need to assemble test cases that could be used to evaluate solutions, e.g., test cases of taxonomic disambiguation or cross-mapping.

Meeting plan

What will be the structure of the meeting? Open Space? Agenda?

Locations

Montpellier

  • Cost per person:
    • $ 900 from major US airport to Montpellier
    • $ 110 per night lodging
  • Meeting facilities:
    • essentially free
  • Synergies:

NESCent

  • Cost per person:
  • Meeting facilities:
  • Synergies:
    • NESCent: Vision, Lapp, Balhoff, Swofford, Scherle

New Mexico

  • Cost per person:
  • Meeting facilities:
  • Synergies:
    • Tucson, AZ is 3 or 4 hr away by car (iPlant Collaborative; Maddison will have moved to Oregon by Nov; Mike Worobey in EEB does viral phylogenetics)
    • Los Alamos (LANL HIV db) is 3 hr away by car)
    • NMSU is home of Pontelli, Chisham
  • challenges or disadvantages
    • airport is an hour away

Hackathon-related ideas

Projects focused on consolidating our gains and serving the community of users:

  1. polish up demo projects from dbhack1. The dbhack1 projects were promising but incomplete. Solid well documented demonstration projects are needed to expose our interop technologies.
  2. set up an EvoIO portal (translation, data set curation, etc)
  3. develop an online course complete with information resources, demos, and assignments

Projects focused on building foundations and serving the community of developers:

  1. widen the domain by including viral phylogenetics and molecular epidemiology
  2. ontology, including sticky problems (CDAO relations and upper-level categories) and annotation support (names of programs and file formats)
  3. transition model language

phylo interop portal

the strategic focus of the portal would be interop, but the portal could support other community-building activities such as blogs, bookmarking, forums, etc.

  • objectives
    • provide users with centralized resources
    • demonstrate useful, working aspects of interop technologies
    • illustrate benefits of integrative or large-scale analyses
    • testbed for trying out new concepts and for debugging
    • increase exposure of project to increase chances of funding
  • features
    • format interconversion
    • triple store for download
    • visualization - second
    • data set integration wizard
    • annotation support for curation, metadata
    • analysis operations (implemented by reasoner)
  • how to get this done with limited resources
    • get more players involved by conceiving this broadly
    • provide hosting to some projects where mutually beneficial
    • use the hackathon mechanism to get started
    • use CREST funds to hire graduate student
    • prepare in advance to take advantage of GSoC mechanism
    • work with NSF PIs to get interop-related supplements (e.g., MrBayes)

polish up projects

Sheldon:

phylowidget/viz improvements
clean up and generalize interface; complete modularization
integrate with other projects with an outward facing PhyloWS interface
write-back capability via PhlyoWS

models

transition models

Proposal for a Phyloinformatics VoCamp

The text of the proposal is at VoCamp1 Proposal.