2. Basic probability distributions in econometrics

In the previous chapter we study the basics of probability distributions and how to use them when calculating probabilities. There exist a number of different probability distributions for discrete and continuous random variables, but some are more commonly used than others. In regression analysis we primarily use continuous probability distributions. For that reason we need to know something about the most basic probability functions related to continuous random variables. In this chapter we are going to work with the normal distribution, the ^-distribution, the Chi-square distribution and the P-distribution. Having knowledge about their properties will enable us to construct most of the tests required to make statistical inference within the regression analysis.

2.1. The normal distribution

The single most important probability distribution for a continuous random variable in statistics and econometrics is the so called normal distribution. It is a symmetric and bell shaped distribution. Its Probability Density Function (PDF) and the corresponding Cumulative Distribution Function (CDF) are pictured in Figure 2.1.

Figure 2.1 The normal PDF and CDF

For notational convenience, we express a normally distributed random variable X as X ~ N(ux , o), which says that X is normally distributed with the expected value given by uX and the variance given by o . The mathematical expression for the normal density function is given by:

which should be used in order to determine the corresponding CDF:

Unfortunately this integral has no closed form solution and need to be solved numerically. For that reason most basic textbooks in statistics and econometrics has statistical tables in their appendix giving the probability values for different values of c.

Properties of the normal distribution

1. The normal distribution curve is symmetric around its mean, UX , as shown in Figure 2.1a.

2. Approximately 68% of the area below the normal curve is covered by the interval of plus minus one standard deviation around its mean: UX ±°X .

3. Approximately 95% of the area below the normal curve is covered by the interval of plus minus two standard deviations around its mean: UX ±2xaX .

4. Approximately 99.7% of the area below the normal curve is covered by the interval of plus minus three standard deviations around its mean: UX ±3xaX .

5. A linear combination of two or more normal random variables is also normal.

Example 2.1

If X and Y are normally distributed variables, then Z = aX + bY will also be a normally distributed random variable, where a and b are constants.

6. The skewness of a normal random variable is zero.

7. The kurtosis of a normal random variable is three.

8. A standard normal random variable has a mean equal to zero and a standard deviation equal to one.

9. Any normal random variable x with mean ux and standard deviation aX can be transformed into a standard normal random variable z using the formula z =-.

Example 2.2

Since any normally distributed random variable can be transformed into a standard normal random variable we do not need an infinite number of tables for all combinations of means and variances, but just one table that corresponds to the standard normal random variable.

Assume a normal random variable X with expected value equal to 4 and a standard deviation equal to 8.

Using this information we may transform X into a standard normal random variable using the following X-4 transformation: Z =-. It is now easy to show that Z has a mean equal to 0 and a variance equal 8 to 1. That is, we have

Example 2.3

We have that Z should be lower than a negative value, and the table does only contain positive values. We therefore need to transform our problem so that it adapts to the table we have access to. In order to do that, we need to recognize that the standard normal distribution is symmetric around its zero mean and the area of the pdf equals 1. That implies that P(Z <-0.167) = P(Z > 0.167) and that P(Z > 0.167) = 1 - P(Z < 0.167). In the last expression we have something that we will be able to find in the table. Hence, the solution is:

Example 2.4

Assume the same random variable as in the previous example and calculate the following probability: P(3.5 < X < 4.5). Whenever dealing with intervals we need to split up the probability expression in two parts using the same logic as in the previous example. Hence, the probability may be rewritten in the following way:

In order to find the probability for this last equality we simply use the technique from the previous example.

The sampling distribution of the sample mean

Another very important concept in statistics and econometrics is the idea of a distribution of an estimator, such as the mean or the variance. It is essential when dealing with statistical inference. This issue will be discussed substantially in later chapters and then in relation to estimators of the regression parameters.

The idea is quite simple. Whenever using a sample when estimating a population parameter we receive different estimate for each sample we use. This happens because of sampling variation. Since we are using different observations in each sample it is unlikely that the sample mean will be exactly the same for each sample taken. By calculating sample means from many different samples, we will be able to form a distribution of mean values. The question is whether it is possible to say something about this distribution without having to take a large number of samples and calculate their means. The answer to that question is yes!

Assume a normal random variable X, with mean 4 and variance 9. Find the probability that X is less than 3.5. In order to solve this problem we first need to transform our normal random variable into a standard normal random variable, and thereafter use the table in the appendix to solve the problem. That is:

In statistics we have a very important theorem that goes under the name The Central Limit Theorem.

It says:

If X1, X2, ... , Xn is a sufficiently large random sample from a population with any distribution, with mean UX and variance o , then the distribution of sample means will be approximately normal with 2

e[x ]= ux and variance v[x] = —x- .

A basic rule of thumb says that if the sample is larger than 30 the shape of the distribution will be sufficiently close, and if the sample size is 100 or larger it will be more or less exactly normal. This basic theorem will be very helpful when carrying out tests related to sample means.

Basics steps in hypothesis testing

Assume that we would like to know if the sample mean of a random variable has changed from one year to another. In the first year we have population information about the mean and the variance. In the following year we would like to carry out a statistical test using a sample to see if the population mean has changed, as an alternative to collect the whole population yet another time. In order to carry out the statistical test we have to go through the following steps:

1) Set up the hypothesis

In this step we have to form a null hypothesis that correspond to the situation of no change, and an alternative hypothesis, that correspond to a situation of a change. Formally we may write this in the following way:

In general we would like to express the hypothesis in such a way that we can reject the null hypothesis. If we do that we will be able to say something with a statistical certainty. If we are unable to reject the null hypothesis we can only conclude that we do not have enough statistical material to say anything about the matter. The hypothesis given above is a so called a two sided test, since the alternative hypothesis is expressed with a "not equal to". The alternative would be to express the alternative hypothesis with an inequality, such as larger than (>) or smaller than (<), which would result in a one sided test. In most cases, you should prefer to use a two sided test before a one sided test unless you are absolutely sure that it is impossible for the random variable to be smaller or larger than the given value in the null hypothesis.

2) Form the test function

In this step we will use the ideas that come from the Central Limit Theorem. Since we have taken a sample and calculated a mean we know that a mean can be seen as a random variable that is normally distributed. Using this information we will be able to form the following test function:

We transform the sample mean using the population information according to the null hypothesis. That will give us a new random variable, our test function Z, that is distributed according to the standard normal distribution. Observe that this is true only if our null hypothesis is true. We will discuss this issue further below.

3) Choose the level of significance for the test and conclude

At this point we have a random variable Z, and if the sample size is larger than 100, we know how it is distributed for certain. The fewer number of observations we have, the less we know about the distribution of Z, and the more likely it is to make a mistake when performing the test. In the following discussion we will assume that the sample size is sufficiently large so that the normal distribution is a good approximation.

Since we know the distribution of Z, we also know that realizations of Z take values between -1.96 and 1.96 in 95% of the cases (You should confirm this using Table A1 in the appendix). That is, if we take 100 samples and calculates the sample means and the corresponding test value for each sample, on average 95% of the test values will have values within this interval, if our null hypothesis is correct. This knowledge will now be used using only one sample.

If we take a sample and calculate a test value and find that the test value appear outside the interval, we say that this event is so unlikely to appear (less than 5 percent in the example above) that it cannot possible come from the distribution according to the null hypothesis (it cannot have the mean stated in the null hypothesis). We therefore say that we reject the null hypothesis in favor for the alternative hypothesis.

In this discussion we have chosen the interval [-1.96;1.96] which cover 95% of the probability distribution. We therefore say that we have chosen a 5% significance level for our test, and the end points for this interval are referred to as critical values. Alternatively, with a significance level of 5% there is a 5% chance that we will receive a value that is located outside the interval. Hence there is a 5% chance of making a mistake. If we believe this is a large probability, we may choose a lower significance level such as 1% or 0.1%. It is our choice as a test maker.

Example 2.5

Assume that you have taken a random sample of 10 observations from a normally distributed population and found that the sample mean equals 6. You happen to know that the population variance equals 2. You would like to know if the mean value of the population equals 5, or if it is different from 5.

You start by formulating the relevant null hypothesis and alternative hypothesis. For this example we have:

H0 : M = 5 H1 : ju * 5

You know that according to the central limit theorem the sampling distribution of sample means has a normal distribution. We may therefore form the following test function:

We know that our test function follows the standard normal distribution (has a mean equal to zero) if the null hypothesis is true. Assume that we choose a significance level of 1%. A significance level of 1% means that there is a 1% chance that we will reject the null hypothesis even though the null hypothesis is correct. The critical values according to a significance level of 1% are [-2.576; 2.575]. Since our test value is located within this interval we cannot reject the null hypothesis. We have to conclude that the mean value of the population might be 5. We cannot say that it is significantly different from 5.

Found a mistake? Please highlight the word and press Shift + Enter