TreeBASE Data

From Evolutionary Informatics Working Group
Revision as of 12:13, 10 March 2009 by (talk) (Dump Contents)
Jump to: navigation, search

TreeBASE Dump

A Postgres dump for TreeBASE can be obtained [here].

Dump Contents

               List of relations
Schema |         Name         |   Type   | Owner
public | edges                | table    | piel
public | ncbi_names           | table    | piel
public | ncbi_nodes           | table    | piel
public | node_path            | table    | piel
public | nodes                | table    | piel
public | nodes_node_id        | sequence | piel
public | study                | table    | piel
public | study_id_seq         | sequence | piel
public | taxa                 | table    | piel
public | taxon_id_seq         | sequence | piel
public | taxon_variant_id_seq | sequence | piel
public | taxon_variants       | table    | piel
public | tb_labels            | table    | piel
public | tb_labels_id         | sequence | piel
public | trees                | table    | piel
public | trees_tree_id        | sequence | piel
(16 rows)

For each "study" record, there are many "trees" records. Each "trees" record has many "nodes" records, which are wired to each other via the "edges" table and a transitive closure index is in the "node_path" table. Each "tb_labels" record can point to many "nodes" records -- "tb_labels" is a table of unique taxon labels that appear in all trees. (and frankly, maybe we should delete this and fuse it with the nodes table, since uniqueness is not really necessary and perhaps a bad thing in terms of homonyms). Anyway, each taxon_variant record maps to zero or more tb_labels; each "taxa" record maps to one or more taxon_variant records. Each "taxa" record represents a single, normalized taxon, usually a species, but could be a subspecies or a higher taxon. Wherever possible, each "taxa" record has an ncbi_taxid -- that is, the IDs used by ncbi in their Genbank distribution. Consequently, these taxids connect the "ncbi_names" table, which in turn uses the "ncbi_nodes" table as a hierarchical classification. This classification has been pre-indexed with left and right IDs so that hierarchical searching is possible.