...

Open source softwares - Regression

Back to Course

Lesson Description


Lession - #1520 Regression Dummy Variables


Dummy Variables in Regression

A dummy variable( aka, an index variable>
is a numeric variable that represents categorical data, similar as gender, race, political cooperation,etc.

Technically, dummy variables are dichotomous, quantitative variables. Their range of values is small; they can take on only two quantitative values. As a practical matter, regression results are easiest to interpret when dummy variables are limited to two specific values, 1 or 0. generally, 1 represents the presence of a qualitative attribute, and 0 represents the absence.

How Many Dummy Variables?

The number of dummy variables needed to represent a particular categorical variable depends on the number of values that the categorical variable can assume. To represent a categorical variable that can assume k different values, a researcher would need to define k- 1 dummy variables.

For illustration, suppose we're interested in political cooperation, a categorical variable that might assume three values-Republican, Democrat, or Independent. We could represent political cooperation with two dummy variables

X1 = 1, if Republican; X1 = 0, else.

X2 = 1, if Democrat; X2 = 0, else.

In this illustration, notice that we do not have to produce a dummy variable to represent the" Independent" order of politicalaffiliation.However, we know the voter is neither Republican nor Democrat, If X1 equals zero and X2 equals zero. thus, namer must be Independent.

Dummy Variables in Regression

A dummy variable( aka, an index variable>
is a numeric variable that represents categorical data, similar as gender, race, political affiliation,etc.

Technically, dummy variables are dichotomous, quantitative variables. Their range of values is small; they can take on only two quantitative values. As a practical matter, regression results are easiest to interpret when dummy variables are limited to two specific values, 1 or 0. generally, 1 represents the presence of a qualitative attribute, and 0 represents the absence.

How Many Dummy Variables?

The number of dummy variables needed to represent a particular categorical variable depends on the number of values that the categorical variable can assume. To represent a categorical variable that can assume k different values, a researcher would need to define k- 1 dummy variables.

For illustration, suppose we're interested in political affiliation, a categorical variable that might assume three values-Democratic, Democrat, or Independent. We could represent political affiliation with two dummy variables

X1 = 1, if Republican; X1 = 0, else.

X2 = 1, if Democrat; X2 = 0, else.

In this illustration, notice that we do not have to produce a ersatz variable to represent the" Independent" order of politicalaffiliation.However, we know the namer is neither Democratic nor Democrat, If X1 equals zero and X2 equals zero. thus, namer must be Independent.

Avoid the Dummy Variable Trap

When defining dummy variables, a common mistake is to define too manyvariables.However, it's tempting to define k dummy variables, If a categorical variable can take on k values. Resist this urge. Remember, you only need k- 1 dummy variables.

A kth dummy variable is redundant; it carries no new information. And it creates a severe multicollinearity problem for the analysis. Using k dummy variables when only k- 1 dummy variables are needed is known as the dummy variable trap. Avoid this trap!

How to Interpret Dummy Variables

Once a categorical variable has been recoded as a dummy variable, the dummy variable can be used in regression analysis just like any other quantitative variable.

For illustration, suppose we wanted to assess the relationship between household income and political affiliation( i.e., Republican, Democrat, or Independent>
. The regression equation might be

Income = b0 b1X1 b2X2

where b0, b1, and b2 are regression coefficients. X1 and X2 are regression coefficients defined as

X1 = 1, if Republican; X1 = 0, else.

X2 = 1, if Democrat; X2 = 0, else.

The value of the categorical variable that isn't represented explicitly by a dummy variable is called the reference group. In this illustration, the reference group consists of Independent voters.

In analysis, each dummy variable is compared with the reference group. In this illustration, a positive regression measure means that income is advanced for the ersatz variable political cooperation than for the reference group; a negative regression measure means that income islower.However, the income distinction with the reference group is also statistically significant, If the regression measure is statistically significant.