University of Southern California

NCBI Nucleotide Databases

Nucleotide
The Entrez Nucleotides database is a collection of nucleotide sequences from GenBank, RefSeq, and PDB.

GenBank
GenBank is the primary sequence repository, containing the annotated sequences as submitted by the original authors. The database is updated every two months. GenBank is part of the International Nucleotide Sequence Database Collaboration which also includes the DNA DataBank of Japan (DDBJ) and the European Molecular Biology Laboratory (EMBL). Data is exchanged among these three organization daily.

RefSeq
The Reference Sequence (RefSeq) is a biologically non-redundant collection of DNA, RNA, and protein sequences. RefSeq provide one example for each biological molecule per organism. The alternatively spliced transcripts that share identical exons have different entries in RefSeq.

The common Refseq accession prefix
Accession prefix
    Molecular type
NC_
    Complete genomic molecule
NT_
    Genomic contig
NM_
    mRNA
XM_
    mRNA (Computed)
NP_
    Protein
XP_
    Protein (Computed)
NR_
    RNA
XR_
    RNA(Computed)

UniGene
UniGene is an experimental system for automatically partitioning GenBank sequence into a non-redundant set of gene-oriented clusters. A UniGen entry is created by clustering Expressed Sequence Tags (ESTs) and mRNAs. UniGene is a unified view of the transcripton. Each UniGene cluster contains sequences that represent a unique gene, as well as related information such as the tissue types and map location.

HomoloGene
HomoloGene is a resource for putative homology relationships among genes from some organisms. MegaBlast is used to perfom cross-species sequence alignment and only the reciprocal best matches are retained in the HomoloGene.

dbSNP
The Single Nucleotide Polymorphism database (dbSNP) is a public-domain archive for a broad collection of simple genetic polymorphisms, which include Single Nucleotide Polymorphisms (SNPs), small-scale multi-base Deletion Insertion Polymorphisms (DIPs), and Short Tandem Repeats (STRs).

UniSTS
UniSTS is a comprehensive database of Sequence Tagged Sites (STSs) and STS-based maps. STS is a short unique DNA fragment whose exact location and order of bases in genome are known. STS is the landmark on the physical mapping and assembly of the human genome.

dbEST
The dbEST is a large division of GenBank that contains sequence data for Expressed Sequence Tags (ESTs) from a number of organisms. ESTs account for two-thirds of all submissions to GenBank. ESTs are short (typically 400-600 bases) and relatively inaccurate (around 2% error) cDNA fragments, typically at the 5' or 3' end of the cDNA sequence. ESTs are the gold mine for the discovery of new genes, particularly those involved in human disease processes.

dbGSS
The dbGSS is the database for Genome Survey Sequences (GSS). GSSs are similar to ESTs with the exception that most of the GSSs are genomic in origin, rather than cDNA (mRNA).

dbSTS
The dbSTS is a database of Sequence Tagged Sites (STSs). STS is a short unique DNA fragment whose exact location and order of bases in genome are known. STS is the landmark on the physical mapping and assembly of the human genome.

MGC
The Mammalian Gene Collection (MGC) is a collection of full-length open reading frame (FL-ORF) clones for human, mouse, and rat genes. All MGC clones can be purchased from distributors of the IMAGE consortium.

PopSet
A PopSet is a set of DNA sequences used to analyse the evolutionary relationship of a population.

TPA
Third Party Annotation (TPA) is a database designed to support annotation derived from experiment involving the annotation of existing nucleotide sequences in the primary sequence database.


HOME - JOURNALS - BOOKS/MULTIMEDIA - DATABASES - eRESOURCES - CATALOGS - COMPUTING - LIBRARY INFO - LIBRARY SERVICES -  CONTACT US

©2009  Norris Medical Library   2003 Zonal Ave,   Los Angeles, CA 90089-9130    (323) 442-1116   medlib@usc.edu