NCBI Data Model Information

From Evolutionary Informatics Working Group
Revision as of 13:22, 13 November 2007 by Arlin.stoltzfus@nist.gov (talk)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Overview

Scope

Bibliographic information and biological information.

The biological sequences and associated information include

  • basic sequence stuff
  • sequence features
    • splicing
    • coding regions
    • item
  • item

Representation, status, etc

Represented in ASN.1. There are XML translations of data available. I'm not sure how the ASN.1 to XML mapping is done.

There is an enormous computing infrastructure at NCBI that is built on top of this model.


Key features of interest

The basic biosequence concept is as follows (in ASN.1):

Bioseq ::= SEQUENCE {
   id SET OF Seq-id OPTIONAL,
   descr Seq-descr ,
   inst Seq-inst ,
   annot SET OF Seq-annot OPTIONAL }

Here SEQUENCE is an ASN.1 term indicating that the listed items occur in the order given, i.e., one or more identifiers, a description, a sequence instance, and one or more annotations of the sequence.

Sequences can be of various types, not just different chemical types, but also constructed sequences, partial sequences, and so on.

Annotations are attached to sequences by location. Locations on sequences can be specified in a variety of ways (points, ranges, choices, ambiguities, and sets of these, etc).