What to do when data files have variable length records
-------------------------------------------------------
Symptoms can include: SAS "LOST CARD" message
SAS "Went to a new line..." message
SPSS "wrong number records" message
or "unexpected end of file" reached
Stat packages can sometimes have trouble reading files in which
carriage returns appear in a line before the "input" operation is
finished reading that line. For example, say you have a SAS program
with the following INPUT statement
input var1 1-5 var2 20-25;
and your data look like this
5 10 15 20 25
----|----|----|----|----|
74835 43 23 234x
84935 23 34 471x
where 'x' is a carriage return in the data file. This will create a
SAS error. (Other packages will behave similarly.) The SAS symptom
will be the message "SAS went to a new line when the Input statement
went past the end of a line." You can pad the ends of such lines with
spaces rather easily with the following SAS job. (Note that column 80
is chosen here as a standard line length; when you are using this program,
you should choose a number which is longer than the longest line you will
be reading. For example, if your INPUT or similar statement is going
to read into column 123, you should enter @124 or higher in this program.)
data _null_;
infile 'var.length.file';
file 'fixed.length.file';
input; put @80 ' ' @1 _infile_;
run;
Since this program reads in and writes out raw data, the resulting data
file (in the example this is referred to as 'fixed.length.file') can be
used as input to any stat package.
While the solution given above is a SAS program, the raw data output
from this program can be used by any program that was previously
having trouble with the variable length records.
NOTE: If you are actually using SAS to read the data for analysis, you
may be able to use the TRUNCOVER option of the INFILE statement, which
was added in Release 6.07. It is documented in Technical Report P-222
beginning on page 30. Of course, the solution shown above will work
also, in any release of SAS.
Here is an example of how the TRUNCOVER option works (taken from SAS
Technical Report P-222, page 32):
Imagine you have a file with variable-length records that has data
as follows:
----+----10---+----20
1
22
333
4444
55555
and the end-of-line (EOL) character is in column 2 in line 1, column 3
in line 2, etc.; in other words, after the last character in each
line, the line actually ends.
The normal INPUT statement trying to read a variable of width 5. would
see the end of line too soon (except in the fifth line), and would
not read the data as you desire. Even the MISSOVER option would not
fix the problem, as it would simply ignore too-short lines.
To fix this, the TRUNCOVER option tells SAS, in effect, to read whatever
is there, even if the EOL comes before it is expected. The following
program
data trunc; infile 'data.above' truncover;
input testvar 5.;
proc print;
run;
reads the data correctly and gives the following output:
OBS TESTVAR
1 1
2 22
3 333
4 4444
5 55555