Difference between revisions of "NCBI Data Model Information"

From Evolutionary Informatics Working Group
Jump to: navigation, search
 
(Representation, status, etc)
Line 15: Line 15:
 
== Representation, status, etc ==
 
== Representation, status, etc ==
  
Represented in ASN.1.  There are XML translations of data available.  I'm not sure how the ASN.1 to XML mapping is done.   
+
The data model is represented in ASN.1.   
  
There is an enormous computing infrastructure at NCBI that is built on top of this model.  
+
The implementation is supported by a programming library that is used internally at NCBI.
  
 +
External users have CVS access to the library.  The documentation can be browsed. 
 +
 +
This includes a serialization library that handles mappings between ASN.1 and XML. 
 +
 +
There is an enormous computing infrastructure at NCBI that is built on top of this model.
  
 
== Key features of interest ==
 
== Key features of interest ==

Revision as of 13:25, 13 November 2007

Overview

Scope

Bibliographic information and biological information.

The biological sequences and associated information include

  • basic sequence stuff
  • sequence features
    • splicing
    • coding regions
    • item
  • item

Representation, status, etc

The data model is represented in ASN.1.

The implementation is supported by a programming library that is used internally at NCBI.

External users have CVS access to the library. The documentation can be browsed.

This includes a serialization library that handles mappings between ASN.1 and XML.

There is an enormous computing infrastructure at NCBI that is built on top of this model.

Key features of interest

The basic biosequence concept is as follows (in ASN.1):

Bioseq ::= SEQUENCE {
   id SET OF Seq-id OPTIONAL,
   descr Seq-descr ,
   inst Seq-inst ,
   annot SET OF Seq-annot OPTIONAL }

Here SEQUENCE is an ASN.1 term indicating that the listed items occur in the order given, i.e., one or more identifiers, a description, a sequence instance, and one or more annotations of the sequence.

Sequences can be of various types, not just different chemical types, but also constructed sequences, partial sequences, and so on.

Annotations are attached to sequences by location. Locations on sequences can be specified in a variety of ways (points, ranges, choices, ambiguities, and sets of these, etc).