Nucleotide
The Entrez
Nucleotides database is a collection of nucleotide sequences from GenBank,
RefSeq, and PDB.
GenBank
GenBank
is the primary sequence repository, containing the annotated sequences
as submitted by the original authors. The database is updated every
two months. GenBank is part of the International
Nucleotide Sequence Database Collaboration which also includes the
DNA DataBank of
Japan (DDBJ) and the European
Molecular Biology Laboratory (EMBL). Data is exchanged among these
three organization daily.
RefSeq
The Reference
Sequence (RefSeq) is a biologically non-redundant collection of
DNA, RNA, and protein sequences. RefSeq provide one example for each
biological molecule per organism. The alternatively spliced transcripts
that share identical exons have different entries in RefSeq.
The common Refseq accession prefix
| Accession
prefix |
Molecular
type |
| NC_ |
Complete
genomic molecule |
| NT_ |
Genomic
contig |
| NM_ |
mRNA
|
| XM_ |
mRNA
(Computed) |
| NP_ |
Protein |
| XP_ |
Protein
(Computed) |
| NR_ |
RNA |
| XR_ |
RNA(Computed) |
UniGene
UniGene
is an experimental system for automatically partitioning GenBank sequence
into a non-redundant set of gene-oriented clusters. A UniGen entry is
created by clustering Expressed Sequence Tags (ESTs) and mRNAs. UniGene
is a unified view of the transcripton. Each UniGene cluster contains
sequences that represent a unique gene, as well as related information
such as the tissue types and map location.
HomoloGene
HomoloGene
is a resource for putative homology relationships among genes from some organisms.
MegaBlast is used to perfom cross-species sequence alignment and only the reciprocal
best matches are retained in the HomoloGene.
dbSNP
The Single Nucleotide
Polymorphism database (dbSNP) is a public-domain archive for a broad collection
of simple genetic polymorphisms, which include Single Nucleotide Polymorphisms
(SNPs), small-scale multi-base Deletion Insertion Polymorphisms (DIPs), and
Short Tandem Repeats (STRs).
UniSTS
UniSTS
is a comprehensive database of Sequence Tagged Sites (STSs) and STS-based maps.
STS is a short unique DNA fragment whose exact location and order of bases in
genome are known. STS is the landmark on the physical mapping and assembly of
the human genome.
dbEST
The dbEST
is a large division of GenBank that contains sequence data for Expressed
Sequence Tags (ESTs) from a number of organisms. ESTs account for
two-thirds of all submissions to GenBank. ESTs are short (typically
400-600 bases) and relatively inaccurate (around 2% error) cDNA fragments,
typically at the 5' or 3' end of the cDNA sequence. ESTs are the gold
mine for the discovery of new genes, particularly those involved in
human disease processes.
dbGSS
The dbGSS
is the database for Genome Survey Sequences (GSS). GSSs are similar to ESTs
with the exception that most of the GSSs are genomic in origin, rather than
cDNA (mRNA).
dbSTS
The dbSTS
is a database of Sequence Tagged Sites (STSs). STS is a short unique
DNA fragment whose exact location and order of bases in genome are known.
STS is the landmark on the physical mapping and assembly of the human
genome.
MGC
The Mammalian Gene Collection
(MGC) is a collection of full-length open reading frame (FL-ORF) clones
for human, mouse, and rat genes. All MGC clones can be purchased from distributors
of the IMAGE consortium.
PopSet
A PopSet
is a set of DNA sequences used to analyse the evolutionary relationship of a
population.
TPA
Third
Party Annotation (TPA) is a database designed to support annotation
derived from experiment involving the annotation of existing nucleotide
sequences in the primary sequence database.