This page started as a quick analysis of data resources to target for the Database Interop Hackathon. It has become a hub for information about possible and participating data providers.
Each of these resources may have comparative data that we might want to make interoperable.
|Resource||6 words or less||Info||Participants|
|Biodiversity Collections Index||Provides GUIDs for Natural History Collections||Info||Roger Hyam|
|Dryad||Evolutionary publication backing data repository||Info||Ryan Scherle|
|Encyclopedia of Life||Page for each species on Earth.||Info||Karen Cranston|
|Ensembl||Genome sequences and annotations (including gene families)||info||Greg Jordan|
|GMOD||Organism database toolkit, sequence emphasis.||info||Dave Clements, Sheldon McKay|
|HOGENOM||Complete Genome Homologous Genes Families||Info|
|HOVERGEN||Homologous Vertebrate Genes Database||Info|
|iPlant Collaborative||Uncover plants' higher order principles||Info||Karla Gendler, Sheldon McKay|
|Kepler||an open-source scientific workflow system||Info||Ilkay Altintas|
|Mesquite||Phylogenetic and population genetics analysis software.||Info||Peter Midford|
|MicrobesOnline||Multispecies comparison among prokaryotes.||Info|
|modENCODE||model organism ENCyclopedia Of DNA Elements||Info||Sheldon McKay|
|MorphBank||Specimen images and annotation||Info||Katja Seltmann|
|MorphoBank||Homology of phenotypes over the web||Info||Lucie Chan|
|mx||Evolutionary systematists' descriptive taxonomy CMS||Info||Matt Yoder|
|Nexplorer||Phylogenetic browsing and editing comparative data||Info||Vivek Gopalan|
|PaleoDB||Fossil taxonomy and distribution||Info||Matt Kosnik|
|PANDIT||Protein & Nucleotide Domains with Inferred Trees||Info||Greg Jordan|
|PANTHER||Protein ANalysis THrough Evolutionary Relationships||Info|
|PESI||Pan European Species Infrastructure||Info||Roger Hyam|
|Phenoscape||Evolutionarily variable morphological characters database||Info||Jim Balhoff, Hilmar Lapp, Todd Vision|
|PhyloFacts||Universal Proteome Explorer||Info|
|Phylogeny.Fr||Robust Phylogenetic Analysis For The Non-Specialist||Info|
|PhylomeDB||A database for phylomes||Info|
|PhyLoTA||Genbank nucleotide sequence taxonomic distribution||Info||Karen Cranston|
|PhyloWidget||Viewing, editing, publishing phylogenetic trees online.||Info||Greg Jordan|
|pPOD||Database technologies for integrating AToL information||Info||Sam Donnelly|
|TimeTree||Species divergence times||Info|
|SpeciesIndex||Just a bit of fun||Info||Roger Hyam|
|TreeBASE||Database of Phylogenetic Knowledge||Info||Bill Piel|
|TreeFam||Tree Families Database||Info|
|Tree of Life||Everything!||Info||Katja Schulz|
Many resources have their own page (linked to from the "Info" links above). Resources that don't have their own page may be described below.
- 1 Resource Matrix
- 2 questions to ask for each resource
- 3 Phylogeny Services
- 4 Data Resources
questions to ask for each resource
- What is the scope of the resource?
- Who controls it?
- How can users access data?
- How are data organized? Is there an explicit data model, schema, or format description?
- Is there evidence that this is an important resources (registered users; citations)?
output formats are Newick, NHX and Phylip, but apparently no means to export alignment and tree together.
The resources below provide comparative data and are listed in order of apparent level of re-use, using literature citations as the measure of apparent data re-use. This is a crude way of prioritizing targets because i) literature citations sometimes do not indicate data re-use; and ii) past re-use is not as important as the potential for future re-use of a resource when semantic tools are used to increase its value.
The Panther system project is an online data resource for protein families with trees, HMMs, metabolic pathways, and other "functional" information. According to the latest publication , "PANTHER is a freely available, comprehensive software system for relating protein sequence evolution to the evolution of specific protein functions and biological roles."
- 279 citations for 2003, 2005 and 2007 papers, but its not clear how data re-use is based on the comparative aspect of this resource, as opposed to using this resource to mine relations between individual proteins and metabolic pathway annotations.
- hovergen 1994 paper cited 157 times
See Ensembl. The Ensembl family of databases covers much more than comparative genomics, but the Ensembl Compara schema and pipeline is becoming an increasingly important component of Ensembl.
- 100-200 citations each for the yearly Ensembl papers
- There was a paper published this year on Ensembl Compara's Gene Trees, but it's too early to tell whether it will become highly cited.
- citations for 2006 paper: 36
Tree of Life
See Tree of Life.