Nucleotides database is a collection of nucleotide sequences from GenBank,
RefSeq, and PDB.
is the primary sequence repository, containing the annotated sequences
as submitted by the original authors. The database is updated every
two months. GenBank is part of the International
Nucleotide Sequence Database Collaboration which also includes the
DNA DataBank of
Japan (DDBJ) and the European
Molecular Biology Laboratory (EMBL). Data is exchanged among these
three organization daily.
Sequence (RefSeq) is a biologically non-redundant collection of
DNA, RNA, and protein sequences. RefSeq provide one example for each
biological molecule per organism. The alternatively spliced transcripts
that share identical exons have different entries in RefSeq.
The common Refseq accession prefix
is an experimental system for automatically partitioning GenBank sequence
into a non-redundant set of gene-oriented clusters. A UniGen entry is
created by clustering Expressed Sequence Tags (ESTs) and mRNAs. UniGene
is a unified view of the transcripton. Each UniGene cluster contains
sequences that represent a unique gene, as well as related information
such as the tissue types and map location.
is a resource for putative homology relationships among genes from some organisms.
MegaBlast is used to perfom cross-species sequence alignment and only the reciprocal
best matches are retained in the HomoloGene.
The Single Nucleotide
Polymorphism database (dbSNP) is a public-domain archive for a broad collection
of simple genetic polymorphisms, which include Single Nucleotide Polymorphisms
(SNPs), small-scale multi-base Deletion Insertion Polymorphisms (DIPs), and
Short Tandem Repeats (STRs).
is a comprehensive database of Sequence Tagged Sites (STSs) and STS-based maps.
STS is a short unique DNA fragment whose exact location and order of bases in
genome are known. STS is the landmark on the physical mapping and assembly of
the human genome.
is a large division of GenBank that contains sequence data for Expressed
Sequence Tags (ESTs) from a number of organisms. ESTs account for
two-thirds of all submissions to GenBank. ESTs are short (typically
400-600 bases) and relatively inaccurate (around 2% error) cDNA fragments,
typically at the 5' or 3' end of the cDNA sequence. ESTs are the gold
mine for the discovery of new genes, particularly those involved in
human disease processes.
is the database for Genome Survey Sequences (GSS). GSSs are similar to ESTs
with the exception that most of the GSSs are genomic in origin, rather than
is a database of Sequence Tagged Sites (STSs). STS is a short unique
DNA fragment whose exact location and order of bases in genome are known.
STS is the landmark on the physical mapping and assembly of the human
The Mammalian Gene Collection
(MGC) is a collection of full-length open reading frame (FL-ORF) clones
for human, mouse, and rat genes. All MGC clones can be purchased from distributors
of the IMAGE consortium.
is a set of DNA sequences used to analyse the evolutionary relationship of a
Party Annotation (TPA) is a database designed to support annotation
derived from experiment involving the annotation of existing nucleotide
sequences in the primary sequence database.