Difference between revisions of "Data Resources"

From Evolutionary Informatics Working Group
Jump to: navigation, search
m (pPOD)
m
Line 1: Line 1:
=Quick analysis of data resources to target for interop hackathon=
+
This page started as a quick analysis of data resources to target for the [[Database Interop Hackathon]].  It has become a hub for information about possible and participating data providers.
  
The idea is that each of these resources has comparative data that we might want to make interoperable.  
+
Each of these resources may have comparative data that we might want to make interoperable.
 +
 
 +
 
 +
 
 +
== Resource Matrix ==
 +
 
 +
{| class="wikitable"
 +
! Resource
 +
! 6 words or less
 +
! Info
 +
! Participants
 +
|-
 +
! [http://www.phylogeny.fr Phylogeny.Fr]
 +
| Robust Phylogenetic Analysis For The Non-Specialist
 +
| [[#Phylogeny.Fr|Info]]
 +
|
 +
|-
 +
! [http://www.pantherdb.org/ PANTHER]
 +
| Protein ANalysis THrough Evolutionary Relationships
 +
| [[#Panther|Info]]
 +
|
 +
|-
 +
! [http://pbil.univ-lyon1.fr/databases/hovergen.php HOVERGEN]
 +
| Homologous Vertebrate Genes Database
 +
| [[#Hovergen|Info]]
 +
|
 +
|-
 +
! [http://pbil.univ-lyon1.fr/databases/hogenom.php HOGENOM]
 +
| Complete Genome Homologous Genes Families
 +
| [[#Hogenom|Info]]
 +
|
 +
|-
 +
! [http://www.phylo.org/sub_sections/databases TreeBASE II]
 +
| Database of Phylogenetic Knowledge
 +
| [[#TreeBase|Info]]
 +
|
 +
|-
 +
! [http://www.treefam.org/ TreeFam]
 +
| Tree Families Database
 +
| [[#TreeFam|Info]]
 +
|
 +
|-
 +
! [http://www.ebi.ac.uk/goldman-srv/pandit/ PANDIT]
 +
| Protein & Nucleotide Domains with Inferred Trees
 +
| [[#Pandit|Info]]
 +
|
 +
|-
 +
! [http://tolweb.org/ Tree of Life]
 +
| Everything!
 +
| [[#Tree of Life|Info]]
 +
|
 +
|-
 +
! [http://microbesonline.org/ MicrobesOnline]
 +
| Multispecies comparison among prokaryotes.
 +
| [[#MicrobesOnline|Info]]
 +
|
 +
|-
 +
! [http://phylomedb.bioinfo.cipf.es/index.html PhylomeDB]
 +
| A database for phylomes
 +
| [[#PhylomeDB|Info]]
 +
|-
 +
|}
  
 
==questions to ask for each resource==
 
==questions to ask for each resource==
  
* What is the scope of the resource?
+
* What is the scope of the resource?
* Who controls it?
+
* Who controls it?
* How can users access data?  
+
* How can users access data?
* How are data organized?  Is there an explicit data model, schema, or format description?  
+
* How are data organized?  Is there an explicit data model, schema, or format description?
 
* Is there evidence that this is an important resources (registered users; citations)?
 
* Is there evidence that this is an important resources (registered users; citations)?
  
Line 17: Line 78:
 
http://www.phylogeny.fr
 
http://www.phylogeny.fr
  
output formats are Newick, NHX and Phylip, but apparently no means to export alignment and tree together.
+
output formats are Newick, NHX and Phylip, but apparently no means to export alignment and tree together.
  
 
==Data Resources==
 
==Data Resources==
  
The resources below provide comparative data and are listed in order of apparent level of re-use, using literature citations as the measure of apparent data re-use.  This is a crude way of prioritizing targets because i) literature citations sometimes do not indicate data re-use; and ii) past re-use is not as important as the potential for future re-use of a resource when semantic tools are used to increase its value.
+
The resources below provide comparative data and are listed in order of apparent level of re-use, using literature citations as the measure of apparent data re-use.  This is a crude way of prioritizing targets because i) literature citations sometimes do not indicate data re-use; and ii) past re-use is not as important as the potential for future re-use of a resource when semantic tools are used to increase its value.
  
 
===Panther===
 
===Panther===
Line 29: Line 90:
 
* '''279''' citations for 2003, 2005 and 2007 papers, but its not clear how data re-use is based on the comparative aspect of this resource, as opposed to using this resource to mine relations between individual proteins and metabolic pathway annotations.
 
* '''279''' citations for 2003, 2005 and 2007 papers, but its not clear how data re-use is based on the comparative aspect of this resource, as opposed to using this resource to mine relations between individual proteins and metabolic pathway annotations.
  
===Hovergen, hogenom===
+
=== Hovergen ===
[http://pbil.univ-lyon1.fr//search/query_fam.php Hovergen], [http://pbil.univ-lyon1.fr/databases/hogenom.php hogenom]
+
[http://pbil.univ-lyon1.fr//search/query_fam.php Hovergen],  
  
 
* hovergen 1994 paper cited '''157''' times
 
* hovergen 1994 paper cited '''157''' times
 +
 +
=== Hogenom===
 +
 +
[http://pbil.univ-lyon1.fr/databases/hogenom.php hogenom]
  
 
===TreeBase===
 
===TreeBase===
Line 57: Line 122:
 
** 16 for 2003 paper
 
** 16 for 2003 paper
  
=== Tree o' Life ===
+
=== Tree of Life ===
 
** [http://tolweb.org Tree of Life], note example implementation, e.g. [http://nexml.org/nexml/phylows/tolweb/16299 hominidae subtree (node 16299)]
 
** [http://tolweb.org Tree of Life], note example implementation, e.g. [http://nexml.org/nexml/phylows/tolweb/16299 hominidae subtree (node 16299)]
  

Revision as of 19:11, 27 February 2009

This page started as a quick analysis of data resources to target for the Database Interop Hackathon. It has become a hub for information about possible and participating data providers.

Each of these resources may have comparative data that we might want to make interoperable.


Resource Matrix

Resource 6 words or less Info Participants
Phylogeny.Fr Robust Phylogenetic Analysis For The Non-Specialist Info
PANTHER Protein ANalysis THrough Evolutionary Relationships Info
HOVERGEN Homologous Vertebrate Genes Database Info
HOGENOM Complete Genome Homologous Genes Families Info
TreeBASE II Database of Phylogenetic Knowledge Info
TreeFam Tree Families Database Info
PANDIT Protein & Nucleotide Domains with Inferred Trees Info
Tree of Life Everything! Info
MicrobesOnline Multispecies comparison among prokaryotes. Info
PhylomeDB A database for phylomes Info

questions to ask for each resource

  • What is the scope of the resource?
  • Who controls it?
  • How can users access data?
  • How are data organized? Is there an explicit data model, schema, or format description?
  • Is there evidence that this is an important resources (registered users; citations)?

Phylogeny Services

Phylogeny.Fr

http://www.phylogeny.fr

output formats are Newick, NHX and Phylip, but apparently no means to export alignment and tree together.

Data Resources

The resources below provide comparative data and are listed in order of apparent level of re-use, using literature citations as the measure of apparent data re-use. This is a crude way of prioritizing targets because i) literature citations sometimes do not indicate data re-use; and ii) past re-use is not as important as the potential for future re-use of a resource when semantic tools are used to increase its value.

Panther

The Panther system project is an online data resource for protein families with trees, HMMs, metabolic pathways, and other "functional" information. According to the latest publication [1], "PANTHER is a freely available, comprehensive software system for relating protein sequence evolution to the evolution of specific protein functions and biological roles."

  • 279 citations for 2003, 2005 and 2007 papers, but its not clear how data re-use is based on the comparative aspect of this resource, as opposed to using this resource to mine relations between individual proteins and metabolic pathway annotations.

Hovergen

Hovergen,

  • hovergen 1994 paper cited 157 times

Hogenom

hogenom

TreeBase

TreeBaseII has the kind of granular schema that would be a good challenge to try to accommodate using cdao. What is missing from cdao (and nexml) is the notion of a "study" with various types of metadata, including publication/reference metadata (which is somewhat dc-like).

Its not clear what reference to site. The TreeBase "Intro" page cites about 7 different references up to 2000, including posters and papers. Some of the papers apparently are scientific studies and not implementation papers. Some of the most likely candidates for a TreeBase citation are as follows (perhaps we should include all of these):

  • Sanderson, et al., 1994 (Am. Jour. Bot.), cited 59 times, can't track citing papers
  • Sanderson, et al., 1993 (Syst. Biol.) cited 29 times, most citing papers are meta-analyses ! (that's good)
  • Morel, 1996, cited 21 times

TreeFam

TreeFam

  • citations for 2006 paper: 36

Pandit

Pandit

Tree of Life

Microbes Online

Arkin lab's MicrobesOnline server has cool tree-based view of sequence families

PhylomeDB

PhylomeDB

MorphBank

MorphBank

PhyLoTA

PhyLoTA

PhyloFacts

PhyloFacts

TimeTree

TimeTree

MorphoBank

MorphoBank

pPOD

pPOD (not really a data resource, its a db tech project led by computer scientists)