University of Southern California

ITS Information Technology Services

A division of the Office of the Chief Information Officer

Changing SAS Data

This document will show you how to make the most common changes to your SAS data, including:

Correcting Data Entry Errors

If one or two records have incorrect data, you may wish to edit the SAS dataset using the Explorer Window. However, what about those cases where there are a lot of data entry errors? Take this common example:

You have distributed a questionnaire to several hundred people. One of the items asks whether the respondent owns a car. While most people answered this, "yes" or "no", about 75 answered, "Y", "N", or some other variation. While to a human being, it is obvious that "Y", "yes, "YES" and "Yes" all mean the same thing, the computer treats these as four different responses. How can you fix this without searching for each of the errors and changing each one manually?

This is done in two steps. First, the following statement will convert all of the answers to the variable, "car" to uppercase.

car = upcase(car);

The next statements change the values for responses of "Y" or "N" to "YES" or "NO" .

If car = "Y" then car = "YES" ;
Else if car = "N" then car = "NO" ;

Changing Scale of the Data

Changing from one scale to another is very simple. It requires only one statement. The most common example is given below:

Marsha wants to determine if living in a "quiet dorm" with extra study hours, a regular dorm or off-campus makes a difference in student grades. At the end of the first month, she has several hundred students who have taken an Introduction to Psychology exam with 47 points possible. She wants to report the results for each group, but she thinks it would be a lot easier for the audience to understand if the results were in percentages.

The following statement solves Marsha's problem.

grade = grade/ 47 ;

Let's get a little more complicated. Assume the students took three different tests because they were in three different sections of Introduction to Psychology. Section A had a test with 51 points, Section B with 50 points possible and Section C with 47. Marsha would like to combine the results from all sections, using percentage correct as her dependent variable. The following statements will make the grade for each section a percentage, using the appropriate total points possible for each section in the denominator.

If section = "A" then grade = grade / 51 ;
else if section = "B" then grade = grade / 50 ;
else if section = "C" then grade = grade / 47 ;

Summing Data to Calculate a Total Score

Commonly, a SAS user will want to add the answers to several different items. These may be sales results for quarters, points for questions on a test or response to a survey. One simple solution is as follows:

variablename = variable1 + variable2 ;

TotalSales = quarter1 + quarter2 + quarter3 + quarter4 ;

NOTE: Using this method, if there are no data for a variable, SAS will be unable to calculate a total score. The result will be a missing value.

Another option is to use the SUM function as shown below. SAS treats missing data as a zero. In some cases, this may be desired. If a student did not answer a question, it is usual to assign zero points.

Total = sum(question1,question2,question3,question4) ;

If there are a large number of questions numbered q1, q2, etc., it is not necessary to type out every question. The following format can be used.

TotalScore = sum(of q1 - q50) ;

Last updated:
February 24, 2010

SAS

The use of all USC computing resources is governed by the USC Computing Policies.