http://www.usc.edu/its/doc/statistics/databases/
This document describes the data available in public user space, and how to access information at other sites through Web browsers, ftp and similar methods. Comments on this document are always welcome. Please contact the ITS Statistics Consultant at the phone or e-mail address given at the end of this document, if you have suggestions or other information.
Some parts of the 1980 and 1990 Census data bases are available on the RCF. The files are ASCII-coded raw data, in the format specified in the Census Data Dictionary. The 1990 Census Data Dictionaries for STF1 and STF3 also are available in the same location, as shown elsewhere in this document. The 1980 Technical Documentation is not available under UNIX, and must be obtained from the Bureau of the Census (see address below) or from the VKC Library. (This latter method may involve a per-page charge, the amount of which will depend on what part[s] of the documentation you need.)
ITS has not received Census data directly from the U.S. Census Bureau. We received parts of the data from USC departments and researchers who obtained the data themselves. Since the data are public domain, we have made these files available to all users.
There are no publicly available Census files on ITS computers from any decade prior to 1980.
ls ~datastor/census/census1980
(These files are also viewable, and downloadable, from
the ITS datastor Web site at
www-rcf.usc.edu/~datastor/census/census1980/)
The "stf" files correspond to files with similar names used by the Bureau of the Census in designating subfiles of the data base. "stf" stands for "Summary Tape File". This term, and the organization of the Census data in general, are explained in the "Tabulation and Publication Program" booklet shown in the list of References below.
The Technical Documentation for the 1980 Census files listed here is located in the VKC Library. The documents also may be obtained from other sources, including the Census Bureau in Washington (the address of which is at the end of this document).
All files in this group (1980 Census) contain data from the State of California only.
The file "census80.pumsa" is a Public Use Microdata Sample provided by faculty in the School of Policy, Planning and Development. It is a 5% sample (results multiplied by 20 would approximate the total population), and, like the other files in this 1980 group, contains data for the State of California only. Please contact the ITS Statistics Specialist (see last page) if you have questions regarding this file.
The table below shows some of the important characteristics of each file:
physical # of # of # of # of
record rec's bytes obs. phys.
length per per in rec's
File Name (LRECL*) obs. obs. file in file
----------------------------------------------------------
census80.stf1a 1638 2 3276 34130 68260
census80.stf1b 1638 2 3276 222772 445544
census80.stf2a 1956 6 11736 60995 365970
census80.stf3a 2016 6 12096 34130 204780
census80.pumsa 193 1 193 1679223 1679223
----------------------------------------------------------
*LRECL as specified in most statistics packages must be the
physical record length value shown here, even though Census
documentation sometimes refers to the entire length of one
observation (physical record length times # of rec's per obs.)
as the "Logical Record Length".
Please note: the accuracy and integrity of the files listed above is assumed, but cannot be guaranteed. Users should run their own verification tests and assure themselves that the data are complete and appropriate for their needs.
ls ~datastor/census/census1990
datastor Web site at
www-rcf.usc.edu/~datastor/census/census1990/)The file called "census90.stf1a.datadict" is the Census Data Dictionary for the STF1A files. It is an ASCII file that can be printed (about 35 pages) directly from a UNIX prompt:
lpr -P<printer-code> ~datastor/census/census1990/census90.stf1a.datadict
The STF3A Data Dictionary (about 80 pages) also may be printed in this way.
The Technical Documentation for the 1990 Census files listed here is located in the VKC Library. The documents also may be obtained from other sources, including the Census Bureau in Washington.
Files that have a ".ca" extension are samples containing California data only.
The file called census90.stf1a.lacounty represents data from Los Angeles County only, and consists of 258 observations. It would be a convenient "small" data set on which to test your programs to make sure they work without errors, before running them on the entire stf1a.ca segment.
The STF3A sample is in two parts, because of the size of the sample. If you wish to use the entire STF3A California sample, you will need to read each part (pt1, pt2) separately, making a subset from each part that contains only the variables and observations you need for your analysis, and then merge the two subsets.
The "pumsa.ca" file is a Public Use Microdata Sample for the State of California. It is a 5% sample. The "pumsa.eq" file is a special PUMS file which describes what geography is coded into each PUMA area identifier in the PUMS file. Census Bureau documentation explains the PUMS files further.
(Note: ITS maintains PUMS 1% and 5%
samples for the entire nation in the datastor RCF account
and on our
datastor Website.
Information about all Census Files at USC is on the Web
at:
http://www.usc.edu/its/doc/statistics/help/census/and information about Census and other data bases is in the ITS-maintained Data Base Clearinghouse at:
http://www.usc.edu/its/doc/statistics/databases/For economy and reasons related to allocation of disk space, it is not acceptable for users to keep their own copies of the Census data. See explanation in "Copies and Subsets of the Census Data", below.
The table below shows some of the important characteristics of each file:
physical # of # of # of # of
record rec's bytes obs. phys.
length per per in rec's
File Name (LRECL*) obs. obs. file in file
-----------------------------------------------------------------
census90.stf1a.ca 4805 2 9610 68949 137898
census90.stf1a.lacounty 4805 2 9610 129 258
census90.stf3a.ca.pt1 7925 4 31700 37675 150700
census90.stf3a.ca.pt2 7925 4 31700 32940 131760
census90.pumsa.ca 231 1 231 2037765 2037765
-----------------------------------------------------------------
*LRECL as specified in most statistics packages must be the
physical record length value shown here, even though Census
documentation sometimes refers to the entire length of one
observation (physical record length times # of rec's per obs.)
as the "Logical Record Length".
Please note: the accuracy and integrity of the files listed above is assumed, but cannot be guaranteed. Users should run their own verification tests and assure themselves that the data are complete and appropriate for their needs.
atlas90 subdirectory of the ITS RCF account.
Details on accessing and using these files are found in this
section.
To create a map from the TIGER files provided by SAS:
AF C=SASHELP.GISIMP.TIGERCD.FRAME
NOTE: If you don't have a Command Line, or don't know where one is, you can click Globals>Options>Command Line to request one.
This opens a SAS "Frame" called the SAS/GIS Map Extraction Utility Window in which you will be prompted to enter the path to your 'Tiger CD-ROM' (this terminology is used in the UNIX environment and on the User Rooms Windows machines as well, even though no CD-ROM is needed there). If you are using SAS for Windows or Macintosh on a system installed and/or maintained by you, do as requested by the message SAS gives you.
/usr/usc/sas/default/TGRMAPS
In SAS for Windows in the ITS User Areas, enter:
g:\programs\stat\saswin\TGRMAPS
Note that, apparently, SAS regards this path as case sensitive, so TGRMAPS must be all caps.
If you do want your map files (or some of them) to be saved permanently, you can choose SASUSER for the LIBRARY field or you can use any LIBREF that you have already created (prior to your SAS/GIS session) using a LIBNAME statement.
The TIGER maps referenced above are specially compressed and processed by SAS for use with SAS/GIS through the "TigerCD" Frame. For this reason, they can be accessed only by the AF command as shown above, which calls a special access Frame. Of course, any appropriate GIS-type maps and data, including the original TIGER files distributed by the Census Bureau, can be imported into SAS/GIS using the usual File>Import menu choices within SAS/GIS.
The ATLAS files are raw data coordinates, along with Census Tract identifying numbers, but some processing is necessary before SAS (or any other software) can make a map from these coordinate numbers.
To see a list of the ATLAS files, enter the following command in an RCF UNIX session:
ls ~datastor/census/atlas90
or visit the following Web page:
www-rcf.usc.edu/~datastor/census/atlas90/
Each file is named with the format ATLAS90.xxxT,
where "xxx"
is a three-character abbreviation of the county represented in that
file.
For example, ATLAS90.LOST (!) is the file for Los Angeles County.
In addition, the census/atlas90 directory has two files pertinent to the Atlas files and Census Tract boundaries in general. CT708090.CORRESP contains details about the changes in Census Tract boundaries over the last three censuses. CT708090.DATA contains aggregated raw data such as total population, population by certain ethnicities, population by age categories, marital status and so forth. NOTE: all the file names in the census/atlas90 directory are UPPER CASE as shown in the examples, while the directory itself remains lower case.
The VKC (Applied Social Sciences) Library has a 10-page "code book" that outlines the contents of all the ATLAS90 files.
Each ATLAS file contains X and Y map coordinates for each Census Tract
in the county, beginning with a "title line" that
contains the State, County and Census Tract numbers and other
information.
As an illustration, the first few lines of
data in ATLAS90.LOST for Census Tract 1011 look like:
"060371011.00",91
-118.302500,34.273900
-118.301500,34.274200
-118.301083,34.274425
-118.300200,34.274900
-118.298900,34.275200
-118.298600,34.275400
(etc.)
06), the County Code
(037), and the Census Tract number
(1011.00), followed (after the last double quote)
by a number representing the number of coordinates listed
for that tract.
Each line after the "title line" contains two
numbers, separated by a comma.
These are the X and Y coordinates for drawing the
boundaries for that particular Census Tract.ITS provides a sample program for reading in the Atlas coordinate data and mapping the Census Tracts using SAS/GRAPH. The same SAS data set created in this program can be imported into SAS/GIS as a GENPOLY (Generic POLYgon) file. The complete sample program is in:
http://www.usc.edu/its/doc/statistics/help/census/programs/atlasmap1.sas
and the elements of that program are explained here.
The graph that results from the program can be viewed at:
http://www.usc.edu/its/doc/statistics/help/census/miscellaneous/atlas.gif
Reading the Coordinates into a SAS Data Set.
The sample program makes a "permanent" SAS data set
in the /tmp area, as shown in the following
LIBNAME statement.
This can be specified anywhere you like, or you can use
temporary SAS data sets for this operation, if you prefer.
The raw data are stored on the RCF, and are specified in the sample
program using the following FILENAME statement:
libname atlasdta '/tmp/';
filename inatlas '~userserv/public_html/census/atlas90/ATLAS90.LOST';
The first step in mapping the coordinates found in the ATLAS files is to create a SAS data set with the following variables:
TRACT X Y
Although the STATE, COUNTY and COORDNUM variables are not necessary for basic maps, since they are present in the data, they are read in by this sample program.
/*** THESE FIRST LINES GIVE SAS THE LIST OF DESIRED TRACTS ***/
%let traclist=%str(
2218,2219,2226,2227,2246,2247,2311,2312
);
/*** THIS IS THE MAIN DATA STEP TO CREATE THE SAS DATA SET ***/
data atlasdta.atlas1; infile inatlas; keep tract coordnum x y;
retain state county tract coordnum;
input @1 test $1. @;
if test='"' then do; state=.; county=.; tract=.; coordnum=.;
input @2 state 2. @4 county 3. @7 tract 4. @16 coordnum 2.; end;
else input @1 x 11. @13 y;
format x y 12.6;
if x=. and y=. then delete;
if tract in (&traclist.);
In the program above, the list of Census Tract numbers is assigned
to a "symbolic variable" called traclist
in the %LET statement, and then used in the later
IF statement as &traclist.
Annotating the Map with Customized Labels. The use of Annotate data sets in SAS/GRAPH is a difficult, but ultimately rewarding process, as it allows nearly complete flexibility in labeling a graph to your specifications. The sample program contains an annotate data set that adds labels to the map, but since using the Annotate facility is relatively complex, the details are left to the SAS/GRAPH documentation, which you should consult if you are interested.
The annotations used in the sample program for this particular map places the text 'USC Campus', a palm-tree symbol, and the Census-Tract numbers in the appropriate places on the map surface. The SAS code for the Annotate DATA step looks like this:
data atanno; length text $ 9 hsys xsys ysys $ 1;
input style $ hsys $ xsys $ ysys $ function $ text $ x y size;
cards;
cartog 1 1 1 label L 45 50 6
centb 1 1 1 label USC 45 45 4
centb 1 1 1 label Campus 44 40 4
centb 1 1 1 label 2218 29 91 4
centb 1 1 1 label 2219 29 75 4
centb 1 1 1 label 2226 12 52 4
centb 1 1 1 label 2227 34 53 4
centb 1 1 1 label 2246 79 54 4
centb 1 1 1 label 2247 56 73 4
centb 1 1 1 label 2312 25 17 4
centb 1 1 1 label 2311 64 23 4
;
Once the Annotate data set (ATANNO) is created, it is then
referenced using the ANNO= option of the
PROC GMAP statement, as shown in the next
section.Creating the Map of the Census Tracts. You are now ready to create the map itself. This is done with a PROC GMAP statement with an appropriate CHORO statement, as shown in the following portion of the sample program:
pattern1 value=mempty repeat=5000;
proc gmap map=atlasdta.atlas1 data=atlasdta.atlas1 anno=atanno;
title1 h=2 f=centb 'Census Tracts in USC Neighborhood';
title2 h=1.5 f=centb 'Los Angeles, California';
footnote1 h=1.3 f=centb 'Source: 1990 Census ATLAS files';
id tract; choro tract / discrete nolegend coutline=black;
Many of the statements and features in this sample are optional,
including the TITLE and FOOTNOTE
statements.
The PATTERN statement assures that each of the
Census Tract areas will be empty; otherwise, SAS will fill them with
patterns.
The display of the map is automatic, unless you are directing the map information to a Graphics Stream File (GSF) for later printing and/or saving. See the SAS/GRAPH User's Guide and the ITS online information on SAS/GRAPH at
http://www.usc.edu/its/doc/statistics/sas/
for more information on how to use GSFs.
http://www.usc.edu/ucs/userserv/statistics/databases/Data sets listed there include:
http://www.usc.edu/its/doc/statistics/help/census/You can read, save or download the files from there.
It is possible, however, to make and keep subsets of the data for your repeated use. The method of subsetting data will vary, depending on the requirements of the statistics software you use to analyze the data. In most packages, you will be able to specify the variables you want at the time you read the raw data. In addition, you can KEEP and/or DROP certain unwanted variables when you make your permanent system files (SPSS save files, or SAS Data Sets, for example). These procedures are explained in the manuals for your statistics package. If you have trouble using the commands or statements for your package, the consultants can help you. (NOTE that while the consultants do not help people write their programs, they do help users to deal with any errors and warnings that their programs generate, and can answer specific, individual questions about syntax.)
Raw (ASCII) Subsets for use in most Statistics Packages
ASCII (plain text) subsets for use in any statistics package can be made relatively easily using SPSS or SAS. For example, when the data you want are in a SAS data set, a simple SAS PUT statement can output the data into "raw" (ascii) form to be read into whatever package you are going to use. A sample SAS program to read in a subset of Census variables from the 1990 STF1A file and output raw data is shown here. (These, and other sample programs for 1980 and 1990 STF1A and STF3A data, plus a short explanation, also are available online in an ITS Statistics Web Page at:
http://www.usc.edu/its/doc/statistics/help/census/
This way, you won't have to
retype the program entirely for yourself.)For the 1990 STF1A example, we assume you are using seven Census variables, five from the first record and two from the second. (Each "logical" record of the STF1A data is split into two physical records of 4805 bytes each. These are referred to in the Census documentation as "segments".) We also assume that a regular "free format" raw data arrangement is acceptable to the stat package you will be using. An example of the raw data output follows the program.
FILENAME RAW
'~datastor/census/census1990/census90.stf1a.ca';
DATA CENSUS;
INFILE RAW LRECL=4805;
INPUT FILEID $ 1-8 STUSAB $ 9-10 SUMLEV 11-13
GEOCOMP $ 14-15 CHARITER 16-18 /
H13_1 2389-2397 H19 2767-2775;
FILENAME SUBSET 'my.raw.census.subset';
DATA _NULL_; SET CENSUS; FILE SUBSET;
PUT @1 FILEID @10 STUSAB @15 SUMLEV @20 GEOCOMP
@25 CHARITER @30 H13_1 @40 H19; RUN;
To use the program above for your own work on the STF1A data, you need change only the INPUT statement and the PUT statement. For the INPUT statement, simply list the variables and column specifications (with the $ if you're reading an alphanumeric variable) from the Census code book for those variables you want to read. For the PUT statement, just list the same variable names that you used in your INPUT statement, and add appropriate column numbers.
As mentioned elsewhere, there are sample programs (like the example shown here) for 1980 and 1990 files, both STF1A and STF3A, at the URL shown above for the ITS Census information Web site.
To run a subset program, type it into a UNIX file or download it from the ITS Census information Web URL, then enter
sas <file.name>
STF1A CA 040 00 0 457885 29008161
STF1A CA 040 40 0 764 45312
STF1A CA 040 42 0 0 0
STF1A CA 040 43 0 0 0
STF1A CA 050 00 0 17950 1242068
STF1A CA 060 00 0 890 68633
STF1A CA 070 00 0 890 68633
STF1A CA 080 00 0 8 3288
STF1A CA 091 00 0 5 1190
STF1A CA 091 00 0 0 813
You can then use this raw file as input to the stat package of your choice.
Subsets of Particular Census Tract Groupings
ITS has provided a SAS Macro that provides aggregate data for any group of Census Tracts you specify (within a particular county), for 1990 and 1980 data, both STF1A and STF3A. For details, go to the ITS Census information Web page (URL shown above) or directly to the tract subset information page at:
http://www.usc.edu/its/doc/statistics/help/census/tractsubsets/
The "logical record length" (lrecl) of Census data files is usually quite large, and some programs (e.g., SAS) require that you specify the lrecl in order to override the program's default record length. For example, the STF1A file has physical records that are 4805 bytes long. In order to read the file in SAS, you must specify the lrecl in an infile statement like the following:
infile foobar lrecl = 4805 ;
Specifying an lrecl that is longer than the actual record length is acceptable.
Packages other than SAS that require the specification of the lrecl value will have different ways of doing so. Consult the official Census documentation to verify the lrecl value for the particular data set you are using, and consult the documentation for your statistics package to learn how to specify the lrecl in your program.
When your browser is started, you can click on "File" and follow along to a place where you can enter a Universal Resource Locator (URL). To connect to the Census Bureau's server, type in the following URL:
http://www.census.gov/
UNIX Prompt>ftp ftp.census.gov
The next prompt should be from the Census server, asking who you are. Type:
anonymous
The server then will prompt you for a password, where you should type your complete internet address. For example, if you are user 'abcde' here at USC, you would type
abcde@usc.edu
When you get logged in, type 'cd pub' to get to the public directory. From there, you can type 'dir' to see the files and directories available to you. For help with anonymous ftp, type 'help ftp' or contact ITS consulting as described at the end of this document.
SPSSx Processing of U.S. Census Data
(copyright 1984)
SPSS Inc., 444 North Michigan Avenue, Chicago IL 60611
Telephone: 312.329.2400
Analysis With Local Census Data: Portraits of Change
(1992) by Dowell Myers, Academic Press; see particularly Chapter 4
Some documentation regarding Census files may be available in USC Library Government Documents Department.
Consulting. The ITS Consultants may be familiar with the language and general operations of this software, but it may be necessary to make an appointment with a member of the full time staff in order to receive detailed help. Support of statistics software is the responsibility of the ITS Statistics Consultant with the participation of other full-time ITS staff. These people may be contacted through the ITS Customer Support Center as shown here.
Customer Support. USC students, staff or faculty who would like information about ITS Workshops or about obtaining site-licensed software or other computing-related questions should visit the Customer Support Center in Leavey Library Lower Commons, or call 213.740.5555, or send eMail to <consult@usc.edu>
Documentation. This document, and many others on a variety of topics, are available in the ITS Customer Support Help System, available on the World Wide Web at:
http://www.usc.edu/its/
You can find Statistics Software Help Documents through the search engine at this same URL, or go directly to them at:
http://www.usc.edu/its/doc/statistics/help/
Newsgroups. Another source of information, this one providing the opportunity to exchange thoughts with other users, is the newsgroup 'usc.comp.all.stat.users'. For more information about reading news, subscribing to newsgroups, and related topics, visit
http://www.usc.edu/its/doc/internet/news/