University of Southern California

ITS Information Technology Services

A division of the Office of the Chief Information Officer

Merging Datasets Using SAS

To merge two datasets on a key variable, such as student ID, name or social security number, the two datasets must first be sorted on that variable. (Merging without a key variable is possible using SAS but it is generally a bad practice, can lead to unexpected results and is not recommended). The commands for merging two datasets on a key variable are:

Data dataset-name ;
Merge dataset1 dataset2 ;
By sort-variable(s) ;

Datasets can be sorted and merged by more than one variable. For example, to merge the datasets "Psych101" and "Psych102" by students' last name and first name, the following code would be used ;

Proc sort data = psych101 ;
by lastname firstname ;

Proc sort data = psych102 ;
by lastname firstname ;

Data students ;
Merge psych101 psych102 ;
By lastname firstname ;

A word of caution when merging files, you should always check your SAS log. If you see a warning message that says "Warning: Multiple lengths have been specified for the BY variables, this may produce unexpected results," you should check your results carefully. Let's say the Psych101 dataset had a variable length of five for last name and the Psych102 dataset had a variable length of ten. In the Psych101 dataset, there is a student named "Amy Harris". In the Psych102 dataset, there is an "Amy Harrison". Because the variable length has been set as five, the last name for "Amy Harrison" will be truncated to five characters and she will be merged with "Amy Harris". This is what is meant by unexpected results.

If only one variable on the By statement has differing lengths across datasets, the solution is simple. Since SAS sets the length the first time it finds encounters the variable, simply change the order of the datasets on the By statement to have the dataset with the longer length come first in order. In the example above, if the dataset "Psych102" was listed first, the variable length would be set as ten, no truncation would occur. Problem solved.

For more information on possible problems with merging datasets, and solutions, see the paper by Robert Virgile Danger: Merge ahead .

Last updated:
September 17, 2008

SAS

The use of all USC computing resources is governed by the USC Computing Policies.