University of Southern California

ITS Information Technology Services

A division of the Office of the Chief Information Officer

Creating Categories with Stata

There are many situations when you might want to break numeric data into categories. Examples include sorting student numeric grades into A through F, identifying businesses as small, medium or large based on sales, or sorting census regions into categories from rural through major metropolitan areas.

The first step is to create a new variable. From the Stata menu, select Data then Create or change variables then Create new variable. The window shown below will pop up.

In this example, you are going to be assigning letter grades, so be sure you select str (string) as the new variable type. Otherwise, Stata will only let you assign numeric values, which is definitely not what you want.

Create Variable The example above creates a new variable named Grade and assigns it the value of "A". Note two things. First, the letter "A" is shown in quotes. If you omit the quotes, Stata will look for a variable in your dataset named A. Second, you don't want to assign everyone an "A" grade. So, you click on the if/in tab at the top of this window. Another window will pop up.

In this window, under If: (expression), type Average > 92 , then click OK. In your Stata results window, you will see a message that says something like:

.generate str Grade = "A" if Average > 92
131 values missing

This tells you that you have created a variable named Grade and given it a value of "A" when the value of Average is greater than 92. However, 131 of your students have no value for grade at all. Let's take care of them next.

From the Stata menu, you again select Data then Create or change variables but this time you select Change contents of variable.

From the drop down variable list, select Grade as the variable name. In the New contents box, type "B". Don't forget the quotation marks!

Click on the if/in tab at the top of this window. In the pop-up window, under If: (expression), type Average > 84 & Average < 93, then click OK. In your Stata results window, you will see a message that says something like:

. replace Grade = "B" if Average > 84 & Average < 93
(27 real changes made)

Repeat these steps to set the grades for C, D and F.

Mistakes to avoid
  1. A very common mistake is to enter Average > 84 in the second IF condition. That won't produce the desired results. Instead, it will set everyone who has an average of 84 or higher to have a grade of "B", including those people who scored 93 and up who were given an "A" grade in the first step.
  2. You must use an ampersand '&' in an IF condition, you cannot use the word "and". Typing Average > 84 and Average < 93 in the If condition box will result in this error message: "invalid 'and' r(198) ;"

Last updated:
December 03, 2008

Stata

The use of all USC computing resources is governed by the USC Computing Policies.