Difference between revisions of "Data Resources"

From Evolutionary Informatics Working Group
Jump to: navigation, search
(TreeFam)
m (Resource Matrix)
 
(32 intermediate revisions by 5 users not shown)
Line 1: Line 1:
=Quick analysis of data resources to target for interop hackathon=
+
{{HackHead}}
 +
This page started as a quick analysis of data resources to target for the [[Database Interop Hackathon]].  It has become a hub for information about possible and participating data providers.
  
The idea is that each of these resources has comparative data that we might want to make interoperable.  
+
Each of these resources may have comparative data that we might want to make interoperable.
 +
 
 +
== Resource Matrix ==
 +
 
 +
{| class="wikitable sortable"
 +
! Resource
 +
! 6 words or less
 +
! Info
 +
! Participants
 +
|-
 +
! [http://www.biodiversitycollectionsindex.org/static/index.html Biodiversity Collections Index]
 +
| Provides GUIDs for Natural History Collections
 +
| [[BCI|Info]]
 +
| [[User:Rhyam|Roger Hyam]]
 +
|-
 +
! [http://www.datadryad.org/ Dryad]
 +
| Evolutionary publication backing data repository
 +
| [[Dryad|Info]]
 +
| [[User:Rscherle|Ryan Scherle]]
 +
|-
 +
! [http://www.eol.org Encyclopedia of Life]
 +
| Page for each species on Earth.
 +
| [[Encyclopedia of Life|Info]]
 +
| [[User:Kcranston|Karen Cranston]]
 +
|-
 +
! [http://www.ensembl.org/ Ensembl]
 +
| Genome sequences and annotations (including gene families)
 +
| [[Ensembl|info]]
 +
| [[User:Gjordan|Greg Jordan]]
 +
|-
 +
! [http://gmod.org/ GMOD]
 +
| Organism database toolkit, sequence emphasis.
 +
| [[GMOD|info]]
 +
| [[User:Dpc13|Dave Clements]], [[User:Mckays|Sheldon McKay]]
 +
|-
 +
! [http://pbil.univ-lyon1.fr/databases/hogenom.php HOGENOM]
 +
| Complete Genome Homologous Genes Families
 +
| [[#Hogenom|Info]]
 +
|
 +
|-
 +
! [http://pbil.univ-lyon1.fr/databases/hovergen.php HOVERGEN]
 +
| Homologous Vertebrate Genes Database
 +
| [[#Hovergen|Info]]
 +
|
 +
|-
 +
! [http://iplantcollaborative.org/ iPlant Collaborative]
 +
| Uncover plants' higher order principles
 +
| [[iPlant|Info]]
 +
| [[User:Kgendler|Karla Gendler]], [[User:Mckays|Sheldon McKay]]
 +
|-
 +
! [http://kepler-project.org/ Kepler]
 +
| an open-source scientific workflow system
 +
| [[Kepler|Info]]
 +
| [[User:Ialtintas|Ilkay Altintas]]
 +
|-
 +
! [http://mesquiteproject.org/ Mesquite]
 +
| Phylogenetic and population genetics analysis software.
 +
| [[Mesquite|Info]]
 +
| [[User:Pmidford|Peter Midford]]
 +
|-
 +
! [http://microbesonline.org/ MicrobesOnline]
 +
| Multispecies comparison among prokaryotes.
 +
| [[#MicrobesOnline|Info]]
 +
|
 +
|-
 +
! [http://www.modencode.org/ modENCODE]
 +
| model organism ENCyclopedia Of DNA Elements
 +
| [[modENCODE|Info]]
 +
| [[User:Mckays|Sheldon McKay]]
 +
|-
 +
! [http://www.morphbank.net/ MorphBank]
 +
| Specimen images and annotation
 +
| [[MorphBank|Info]]
 +
| [[User:Kseltmann|Katja Seltmann]]
 +
|-
 +
! [http://morphobank.geongrid.org/ MorphoBank]
 +
| Homology of phenotypes over the web
 +
| [[MorphoBank|Info]]
 +
| [[User:Lchan|Lucie Chan]]
 +
|-
 +
! [http://hymenoptera.tamu.edu/ mx]
 +
| Evolutionary systematists' descriptive taxonomy CMS
 +
| [[mx|Info]]
 +
| [[User:Myoder|Matt Yoder]]
 +
|-
 +
! [http://www.molevol.org/nexplorer Nexplorer]
 +
| Phylogenetic browsing and editing comparative data
 +
| [[Nexplorer|Info]]
 +
| [[User:Vgopalan|Vivek Gopalan]]
 +
|-
 +
! [http://paleodb.org/ PaleoDB]
 +
| Fossil taxonomy and distribution
 +
| [[PaleoDB|Info]]
 +
| [[User:Mkosnik|Matt Kosnik]]
 +
|-
 +
! [http://www.ebi.ac.uk/goldman-srv/pandit/ PANDIT]
 +
| Protein & Nucleotide Domains with Inferred Trees
 +
| [[PANDIT|Info]]
 +
| [[User:Gjordan|Greg Jordan]]
 +
|-
 +
! [http://www.pantherdb.org/ PANTHER]
 +
| Protein ANalysis THrough Evolutionary Relationships
 +
| [[#Panther|Info]]
 +
|
 +
|-
 +
! [http://www.eu-nomen.eu/pesi/ PESI]
 +
| Pan European Species Infrastructure
 +
| [[PESI|Info]]
 +
| [[User:Rhyam|Roger Hyam]]
 +
|-
 +
! [http://phenoscape.org/ Phenoscape]
 +
| Evolutionarily variable morphological characters database
 +
| [[Phenoscape|Info]]
 +
| [[User:Jbalhoff|Jim Balhoff]], [[User:Hlapp|Hilmar Lapp]], [[User:Tjv|Todd Vision]]
 +
|-
 +
! PhyloDB
 +
| ?
 +
| [[PhyloDB|Info]]
 +
| [[User:Bpiel|Bill Piel]]
 +
|-
 +
! [http://phylogenomics.berkeley.edu/phylofacts/ PhyloFacts]
 +
| Universal Proteome Explorer
 +
| [[#PhyloFacts|Info]]
 +
|
 +
|-
 +
! [http://www.phylogeny.fr Phylogeny.Fr]
 +
| Robust Phylogenetic Analysis For The Non-Specialist
 +
| [[#Phylogeny.Fr|Info]]
 +
|
 +
|-
 +
! [http://phylomedb.bioinfo.cipf.es/index.html PhylomeDB]
 +
| A database for phylomes
 +
| [[#PhylomeDB|Info]]
 +
|
 +
|-
 +
! [http://loco.biosci.arizona.edu/pb/ PhyLoTA]
 +
| Genbank nucleotide sequence taxonomic distribution
 +
| [[PhyLoTA|Info]]
 +
| [[User:Kcranston|Karen Cranston]]
 +
|-
 +
! [http://www.phylowidget.org/ PhyloWidget]
 +
| Viewing, editing, publishing phylogenetic trees online.
 +
| [[PhyloWidget|Info]]
 +
| [[User:Gjordan|Greg Jordan]]
 +
|-
 +
! [http://phylodata.seas.upenn.edu/cgi-bin/wiki/pmwiki.php pPOD]
 +
| Database technologies for integrating AToL information
 +
| [[pPOD|Info]]
 +
| [[User:Sdonnelly|Sam Donnelly]]
 +
|-
 +
! [http://www.timetree.org/ TimeTree]
 +
| Species divergence times
 +
| [[TimeTree|Info]]
 +
|
 +
|-
 +
! [http://www.speciesindex.org/ SpeciesIndex]
 +
| Just a bit of fun
 +
| [[SpeciesIndex|Info]]
 +
| [[User:Rhyam|Roger Hyam]]
 +
|-
 +
! [http://www.phylo.org/sub_sections/databases TreeBASE]
 +
| Database of Phylogenetic Knowledge
 +
| [[TreeBASE|Info]]
 +
| [[User:Bpiel|Bill Piel]]
 +
|-
 +
! [http://www.treefam.org/ TreeFam]
 +
| Tree Families Database
 +
| [[#TreeFam|Info]]
 +
|
 +
|-
 +
! [http://tolweb.org/ Tree of Life]
 +
| Everything!
 +
| [[Tree of Life|Info]]
 +
| [[User:Kschulz|Katja Schulz]]
 +
|}
 +
 
 +
Many resources have their own page (linked to from the "Info" links above).  Resources that don't have their own page may be described below.
 +
 
 +
__TOC__
  
 
==questions to ask for each resource==
 
==questions to ask for each resource==
  
==Resources==
+
* What is the scope of the resource?
 +
* Who controls it?
 +
* How can users access data?
 +
* How are data organized?  Is there an explicit data model, schema, or format description?
 +
* Is there evidence that this is an important resources (registered users; citations)?
 +
 
 +
== Phylogeny Services ==
 +
 
 +
=== Phylogeny.Fr ===
  
===Pandit===
+
http://www.phylogeny.fr
 +
 
 +
output formats are {{NewickLink|Newick}}, NHX and Phylip, but apparently no means to export alignment and tree together.
 +
 
 +
==Data Resources==
 +
 
 +
The resources below provide comparative data and are listed in order of apparent level of re-use, using literature citations as the measure of apparent data re-use.  This is a crude way of prioritizing targets because i) literature citations sometimes do not indicate data re-use; and ii) past re-use is not as important as the potential for future re-use of a resource when semantic tools are used to increase its value.
 +
 
 +
===Panther===
 +
 
 +
The [http://www.pantherdb.org/ Panther] system project is an online data resource for protein families with trees, HMMs, metabolic pathways, and other "functional" information.  According to the latest publication [http://www.ncbi.nlm.nih.gov/pubmed/17130144?ordinalpos=6&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DefaultReportPanel.Pubmed_RVDocSum], "PANTHER is a freely available, comprehensive software system for relating protein sequence evolution to the evolution of specific protein functions and biological roles."
 +
 
 +
* '''279''' citations for 2003, 2005 and 2007 papers, but its not clear how data re-use is based on the comparative aspect of this resource, as opposed to using this resource to mine relations between individual proteins and metabolic pathway annotations.
 +
 
 +
=== Hovergen ===
 +
[http://pbil.univ-lyon1.fr//search/query_fam.php Hovergen],
 +
 
 +
* hovergen 1994 paper cited '''157''' times
  
[http://www.ebi.ac.uk/goldman-srv/pandit/ Pandit]
+
=== Hogenom===
  
* [http://www.ebi.ac.uk/goldman-srv/pandit/pandit.cgi?action=desc format description]
+
[http://pbil.univ-lyon1.fr/databases/hogenom.php hogenom]
* citations (from Web of Science):
 
** 7 for 2006 paper
 
** 16 for 2003 paper
 
  
 
===TreeBase===
 
===TreeBase===
  
[http://www.phylo.org/sub_sections/databases TreeBaseII] has the kind of granular schema that would be a good challenge to try to accommodate using cdao. What is missing from cdao (and nexml) is the notion of a "study" with various types of metadata, including publication/reference metadata (which is somewhat dc-like).
+
See [[TreeBASE]].
  
===pPOD===
+
===Ensembl===
[http://phylodata.seas.upenn.edu/cgi-bin/wiki/pmwiki.php pPOD] (not really a data resource, its a db tech project led by computer scientists)
+
 
 +
See [[Ensembl]]. The Ensembl family of databases covers much more than comparative genomics, but the Ensembl Compara schema and pipeline is becoming an increasingly important component of Ensembl.
 +
* '''100-200''' citations each for the yearly Ensembl papers
 +
* There was a paper published this year on [http://www.ncbi.nlm.nih.gov/pubmed/19029536 Ensembl Compara's Gene Trees], but it's too early to tell whether it will become highly cited.
  
 
===TreeFam===
 
===TreeFam===
Line 28: Line 232:
 
* citations for 2006 paper: 36
 
* citations for 2006 paper: 36
  
===Hovergen, hogenom===
+
===Pandit===
[http://pbil.univ-lyon1.fr//search/query_fam.php Hovergen], [http://pbil.univ-lyon1.fr/databases/hogenom.php hogenom]
+
 
 +
See [[PANDIT]]
 +
 
 +
=== Tree of Life ===
 +
 
 +
See [[Tree of Life]].
 +
 
 +
=== Microbes Online ===
 +
 
 +
Arkin lab's [http://microbesonline.org MicrobesOnline] server has cool [http://www.microbesonline.org/cgi-bin/treeBrowse.cgi?locus=392933 tree-based view of sequence families]
 +
 
 +
=== PhylomeDB ===
 +
 
 +
[http://phylomedb.bioinfo.cipf.es/ PhylomeDB]
 +
 
 +
=== MorphBank ===
 +
 
 +
See [[MorphBank]].
 +
 
 +
=== PhyLoTA ===
 +
 
 +
[http://loco.biosci.arizona.edu/pb/ PhyLoTA]
 +
 
 +
=== PhyloFacts ===
 +
 
 +
[http://phylogenomics.berkeley.edu/phylofacts/ PhyloFacts]
 +
 
 +
=== TimeTree ===
 +
 
 +
See [[TimeTree]].
 +
 
 +
=== MorphoBank ===
 +
 
 +
See [[MorphoBank]].
 +
 
 +
===pPOD===
  
=== organism-centered===
+
See [[pPOD]].
  
organism-centered gene-family databases, e.g., for plasmodium
+
[[Category:DB Interop Hackathon]]
 +
[[Category:Data Resources]]
 +
[[Category:Working Group]]

Latest revision as of 14:41, 15 June 2009

This page started as a quick analysis of data resources to target for the Database Interop Hackathon. It has become a hub for information about possible and participating data providers.

Each of these resources may have comparative data that we might want to make interoperable.

Resource Matrix

Resource 6 words or less Info Participants
Biodiversity Collections Index Provides GUIDs for Natural History Collections Info Roger Hyam
Dryad Evolutionary publication backing data repository Info Ryan Scherle
Encyclopedia of Life Page for each species on Earth. Info Karen Cranston
Ensembl Genome sequences and annotations (including gene families) info Greg Jordan
GMOD Organism database toolkit, sequence emphasis. info Dave Clements, Sheldon McKay
HOGENOM Complete Genome Homologous Genes Families Info
HOVERGEN Homologous Vertebrate Genes Database Info
iPlant Collaborative Uncover plants' higher order principles Info Karla Gendler, Sheldon McKay
Kepler an open-source scientific workflow system Info Ilkay Altintas
Mesquite Phylogenetic and population genetics analysis software. Info Peter Midford
MicrobesOnline Multispecies comparison among prokaryotes. Info
modENCODE model organism ENCyclopedia Of DNA Elements Info Sheldon McKay
MorphBank Specimen images and annotation Info Katja Seltmann
MorphoBank Homology of phenotypes over the web Info Lucie Chan
mx Evolutionary systematists' descriptive taxonomy CMS Info Matt Yoder
Nexplorer Phylogenetic browsing and editing comparative data Info Vivek Gopalan
PaleoDB Fossil taxonomy and distribution Info Matt Kosnik
PANDIT Protein & Nucleotide Domains with Inferred Trees Info Greg Jordan
PANTHER Protein ANalysis THrough Evolutionary Relationships Info
PESI Pan European Species Infrastructure Info Roger Hyam
Phenoscape Evolutionarily variable morphological characters database Info Jim Balhoff, Hilmar Lapp, Todd Vision
PhyloDB  ? Info Bill Piel
PhyloFacts Universal Proteome Explorer Info
Phylogeny.Fr Robust Phylogenetic Analysis For The Non-Specialist Info
PhylomeDB A database for phylomes Info
PhyLoTA Genbank nucleotide sequence taxonomic distribution Info Karen Cranston
PhyloWidget Viewing, editing, publishing phylogenetic trees online. Info Greg Jordan
pPOD Database technologies for integrating AToL information Info Sam Donnelly
TimeTree Species divergence times Info
SpeciesIndex Just a bit of fun Info Roger Hyam
TreeBASE Database of Phylogenetic Knowledge Info Bill Piel
TreeFam Tree Families Database Info
Tree of Life Everything! Info Katja Schulz

Many resources have their own page (linked to from the "Info" links above). Resources that don't have their own page may be described below.

questions to ask for each resource

  • What is the scope of the resource?
  • Who controls it?
  • How can users access data?
  • How are data organized? Is there an explicit data model, schema, or format description?
  • Is there evidence that this is an important resources (registered users; citations)?

Phylogeny Services

Phylogeny.Fr

http://www.phylogeny.fr

output formats are Newick, NHX and Phylip, but apparently no means to export alignment and tree together.

Data Resources

The resources below provide comparative data and are listed in order of apparent level of re-use, using literature citations as the measure of apparent data re-use. This is a crude way of prioritizing targets because i) literature citations sometimes do not indicate data re-use; and ii) past re-use is not as important as the potential for future re-use of a resource when semantic tools are used to increase its value.

Panther

The Panther system project is an online data resource for protein families with trees, HMMs, metabolic pathways, and other "functional" information. According to the latest publication [1], "PANTHER is a freely available, comprehensive software system for relating protein sequence evolution to the evolution of specific protein functions and biological roles."

  • 279 citations for 2003, 2005 and 2007 papers, but its not clear how data re-use is based on the comparative aspect of this resource, as opposed to using this resource to mine relations between individual proteins and metabolic pathway annotations.

Hovergen

Hovergen,

  • hovergen 1994 paper cited 157 times

Hogenom

hogenom

TreeBase

See TreeBASE.

Ensembl

See Ensembl. The Ensembl family of databases covers much more than comparative genomics, but the Ensembl Compara schema and pipeline is becoming an increasingly important component of Ensembl.

  • 100-200 citations each for the yearly Ensembl papers
  • There was a paper published this year on Ensembl Compara's Gene Trees, but it's too early to tell whether it will become highly cited.

TreeFam

TreeFam

  • citations for 2006 paper: 36

Pandit

See PANDIT

Tree of Life

See Tree of Life.

Microbes Online

Arkin lab's MicrobesOnline server has cool tree-based view of sequence families

PhylomeDB

PhylomeDB

MorphBank

See MorphBank.

PhyLoTA

PhyLoTA

PhyloFacts

PhyloFacts

TimeTree

See TimeTree.

MorphoBank

See MorphoBank.

pPOD

See pPOD.