Regression and the Least Squares Method: Where Do Parameter Estimates Come From?

Picture of Joanna Grochowska-Angielczyk

Joanna Grochowska-Angielczyk

Econometrics – Lecture 6

Topic: Regression analysis. Estimation of model parameters
using the Least Squares Method.

In this lecture, I will explain what linear regression is and how the Least Squares Method works in detail. You will therefore learn where the formulas for estimating the structural parameters a of an econometric model come from.

Welcome!

The main goal of econometrics is to examine and explain the behavior of one economic variable as a function of the behavior of other variables. Of course, these variables must be related in some way. For example: whether and how a household’s expenditures depend on its income; or whether the increase in food expenditures is faster or slower depending on income growth.

In one of the previous lectures, I discussed a type of statistical dependence known as correlation. As we remember, the concept of correlation concerns the STRENGTH and DIRECTION of the relationship under study.

Besides correlation analysis, there is another type of analysis that can be performed—namely REGRESSION. It is a branch of statistics concerned with studying relationships and dependencies between the distributions of two or more examined characteristics in the general population.

However, the term regression refers to the SHAPE of the relationship between characteristics. We distinguish between linear and nonlinear regression analysis.

The graph of linear regression for two variables, as the name suggests, is a straight line. In nonlinear regression analysis, the graphical representation of the relationship consists of higher-order curves, e.g., a parabola.

It is enough to look at the point clouds below. Based on them, one can conclude that the closer the values of the correlation coefficient are to 1 (in absolute value), the more linearly the points are arranged on the chart. The third row shows examples of nonlinear plots.

Source: https://pl.wikipedia.org/wiki/Zależność_zmiennych_losowych, accessed on 12 June 2018.

Regression and correlation analysis may concern not only two, but also a larger number of variables. In such cases, we speak of so-called multiple analysis.

I will now proceed to discuss both the simple and the multiple case of linear regression.

Linear regression – introductory information

The term linear regression comes from the fact that the assumed model of dependence between dependent and independent variables is a linear function or a linear transformation.

The line determined by the equation of a linear model is the regression line, and the model itself is a linear regression model. One can speak of a regression line only in the case of a model with a constant term and one explanatory variable. In the multidimensional case, i.e., multiple regression, we speak of a regression hyperplane.

Before more detailed charts appear, it is worth recalling the general form of an econometric model:

Y space equals space alpha subscript 0 space plus space alpha subscript 1 X subscript 1 space plus space alpha subscript 2 X subscript 2 space plus space... space plus space alpha subscript k X subscript k space plus space epsilon

Taking into account realizations of the variables, it is often written as follows:

y subscript t space equals space alpha subscript 0 space plus space alpha subscript 1 x subscript 1 t end subscript space plus space alpha subscript 2 x subscript 2 t end subscript space plus space... space plus space alpha subscript k x subscript k t end subscript space plus space epsilon subscript t

where:

Y space semicolon space y subscript t – the explained (dependent, endogenous) variable; realizations of the dependent variable in period t,

X subscript k space semicolon space x subscript k t end subscript – explanatory (independent) variables; realizations of the explanatory variables in period t,

epsilon – the random (error) term (you can read more about it in the article),

space t – successive realizations (observations), t space element of space 1 comma... comma n.

The origin of the term regression is quite interesting.

In everyday language, regression means: going backward, decline, disappearance. So you might wonder—how did this word end up in statistics?

The term was first used by Francis Galton, Charles Darwin’s son-in-law. In 1886, he studied the relationship between the height of parents and the height of their children. He observed that tall parents tend to have, on average, tall children. However, the children of exceptionally tall parents tend to be closer to the average height than their parents are. Galton called this tendency to return toward the mean “regression toward mediocrity.” His conclusions can therefore be expressed using the following linear model:

w z r o s t space d z i e c i space equals space alpha subscript 0 space plus space alpha subscript 1 times w z r o s t space r o d z i c ó w space plus space epsilon subscript t

where the value of the parameter standing next to the explanatory variable lies between 0 space less than space alpha subscript 1 space less than space 1. This means that one centimeter of the parents’ height translates into less than one centimeter of the children’s height.

The dependent variable and the explanatory variables in a regression model are not symmetric. In his research, Galton not only assumed that parents’ height affects children’s height, but also that the effect does not work in the opposite direction—i.e., parents’ height does not depend on children’s height.

Therefore, it is worth noting that in economic theory it is very important to know the direction of the cause-and-effect relationship.

Simple linear regression

Simple linear regression concerns the case of two variables: the dependent variable Y and one explanatory variable X.

A straight line can be described using a formula you have known for years (since middle school): Y space equals space a X plus b. This is the simplest notation. So if we already know what the letters Y and X mean, then the remaining two represent:

a – the parameter associated with the explanatory variable; this is the slope coefficient of the regression, i.e., the tangent of the angle gamma between the line and the OX axis,

b – the intercept (constant term), i.e., the coordinate of the point where the line intersects the OY axis.

Most people probably remember how, in high school (or earlier), you found the equation of a line passing through two points A space open parentheses x subscript A comma y subscript A close parentheses and B space open parentheses x subscript B comma y subscript B close parentheses. Of course, there was a specific formula for this, available for example in exam formula sheets: open parentheses y minus y subscript A close parentheses open parentheses x subscript B minus x subscript A close parentheses minus open parentheses y subscript B minus y subscript A close parentheses open parentheses x minus x subscript A close parentheses equals 0. Equally well, it was enough to solve a system of two linear equations by substituting the coordinates of the points for X and Y, for example: open curly brackets table attributes columnalign left end attributes row cell y subscript A space equals space a times x subscript A plus b end cell row cell y subscript B space equals space a times x subscript B plus b end cell end table close and from that compute the unknown parameter values a and b.

It is worth remembering that in econometrics, however, we do NOT deal with a functional relationship (often called deterministic), i.e., one in which each value x subscript i corresponds to one and only one value y subscript i.

In econometrics, we study stochastic (random, probabilistic) relationships between the variables X and Y. In this case, EACH value x subscript i corresponds to an entire set of values y subscript i forming a certain distribution. Hence, the typical equation of a simple regression line is as follows:

Y space equals space alpha subscript 0 space plus space alpha subscript 1 X space plus space epsilon

This situation can be illustrated as follows:

If this distribution is a normal distribution (one of the types of statistical distributions of random variables), then the relationship Y(X) is linear.

Let’s get to the point. Drawing a straight line through two points seems easy enough. However, with many points it won’t be that simple. In practice, it is almost never the case that one line passes through every point marked on the chart.

In such a situation, we need to choose a method that allows us to find the most optimal line. It should be properly “fitted,” i.e., drawn in a way that best reflects the relationship between X and Y.

Such a line (that is, the regression equation in its THEORETICAL form) is written as:

Y with hat on top space equals space a subscript 0 space plus space a subscript 1 X space

If we take into account specific realizations of the variables in subsequent periods t, we can also write it as:

y with hat on top subscript t space equals space a subscript 0 space plus space a subscript 1 x subscript t space

In this case, a subscript 0 is an estimator, i.e., the estimated value of the intercept. Meanwhile, a subscript 1 represents the estimated value of the regression coefficient. It determines the influence of variable X on variable Y.

The question is: how do we express numerically the values of the parameters a subscript 0 and a subscript 1? Should I draw a line, for example, y with hat on top subscript t space equals space 2 comma 8 space plus space 4 x subscript t space, or rather a line with a slightly different slope, for example y with hat on top subscript t space equals space 2 comma 52 space plus space 4 comma 6 x subscript t space?

In the next part of the lecture, I will explain mathematically how to derive the formulas for the best estimates of a subscript 0 and a subscript 1, as well as the remaining parameters a subscript 2 comma space a subscript 3 space comma space... space comma space a subscript k for the case of a model with many explanatory variables.

Estimation of the parameters of an econometric model using the Least Squares Method – the case with one explanatory variable.

There are many methods for estimating model parameters. Perhaps you have already heard of maximum likelihood, median regression, or the two-point method. Nevertheless, among all these methods the most popular one is the Least Squares Method (often abbreviated as LS).

It requires certain assumptions, which I will discuss in more detail in the next lecture. The most important of them concern the properties of the random error term epsilon subscript t in the model y subscript t space equals space alpha space plus space beta x subscript t space plus space epsilon subscript t:

  • the expected value of the error term equals zero for all for t of space E open parentheses epsilon subscript t close parentheses equals 0;
  • the error term has a constant finite variance for all for t of space D squared open parentheses epsilon subscript t close parentheses equals sigma squared;
  • there is no autocorrelation of the error term, i.e., no dependence of the error term across different time periods for all for t not equal to s of space c o v open parentheses epsilon subscript t comma epsilon subscript s space close parentheses equals 0.

For the sake of consistent notation, in the article above and throughout my entire course I use the following symbols: a subscript 0 and a subscript 1. These are least-squares estimators of the parameters alpha subscript 0 and alpha subscript 1 from the model:

y subscript t space equals space alpha subscript 0 space plus space alpha subscript 1 x subscript t space plus space epsilon

In your classes, you may have used a more general notation, such as:

y subscript t space equals space alpha space plus space beta x subscript t space plus space epsilon subscript t

So you were looking for parameter estimates in the theoretical form:

y with hat on top subscript t space equals space a space plus space b x subscript t.

In the literature or during classes you may also encounter the reversed notation:

y subscript t space equals space beta space plus space alpha x subscript t space plus space epsilon subscript t, so the theoretical equation of the line would be:

y with hat on top subscript t space equals space b space plus space a x subscript t.

The most important thing, however, is to understand which letter in the equation denotes the intercept and which denotes the slope coefficient (the one multiplying X).

At this point, go back for a moment to the chart shown earlier. As I mentioned, the red line does not perfectly match all the blue points. Some of them lie below the line, others above it.

The econometric model will be better fitted the smaller the distance between the theoretical values y with hat on top subscript t and the observed values y subscript t.

Each of these vertical (burgundy) segments represents the difference between the actual values of the variable y subscript t and the theoretical values y with hat on top subscript t computed from the regression line. These are the model’s so-called residuals. We denote them by:

e subscript t space equals space y subscript t minus y with hat on top subscript t

The relationship between residuals, observations, and estimated parameters can be written as:

space y subscript t space equals space y with hat on top subscript t plus e subscript t space equals space a subscript 0 plus a subscript 1 x subscript t plus e subscript t

This implies that the residuals e subscript t are estimates of the random terms epsilon subscript t from the model y subscript t space equals space alpha subscript 0 space plus space alpha subscript 1 x subscript t space plus space epsilon subscript t, but they are not equal to them!

Some differences between actual and theoretical values lie above the axis, so they are positive. Others lie below the axis, so they are negative. Therefore, if our goal is to make these segments as small as possible, it makes no sense to simply add the raw differences e subscript 1 plus e subscript 2 plus... plus e subscript n. In that case, the result would not be meaningful. The fit is better the smaller the absolute values of these deviations are.

Example 1

In a certain model, the differences between the theoretical and the actual values are: . These are distances between two points. You can compare them, for example, to a temperature reading on a thermometer and its distance from zero—sometimes it is positive, sometimes negative. Or to steps taken forward (the positive ones) and backward (the negative ones).

If you need to compute the overall difference, i.e. add up all distances, a simple sum of the numbers will not be meaningful: . Many numbers simply cancel out. But you did not take only four steps backward. That is why you should use the absolute value of each number, i.e. its distance from zero on the number line. It’s like counting steps—forward and backward—together.

open vertical bar 3 close vertical bar space plus open vertical bar space 4 close vertical bar space plus space open vertical bar negative 1 close vertical bar space plus space open vertical bar 0 close vertical bar space plus space open vertical bar negative 5 close vertical bar space plus space open vertical bar negative 4 close vertical bar space plus space open vertical bar 2 close vertical bar space plus space open vertical bar negative 3 close vertical bar equals bold 22

Now everything matches 🙂

The criterion that must be minimized in order to obtain the best fit is the sum of the absolute values of all residuals:

sum from t equals 1 to n of space open vertical bar y subscript t minus y with hat on top subscript t close vertical bar space equals sum from t equals 1 to n of space open vertical bar e subscript t close vertical bar

To find the smallest values—i.e. the local extrema of a multivariable function—we need derivatives (you can find more about them in Krystian’s courses).

However, this function is inconvenient to work with, because the absolute value function has no derivative at zero. As a result, the sum sum from t equals 1 to n of space open vertical bar e subscript t close vertical bar cannot be minimized using standard analytical methods.

The Least Squares Method comes to the rescue. As the name suggests, it allows us to search for a minimum of the sum of squared differences between observed values and theoretical values (computed from the model equation):

sum from t equals 1 to n of space open parentheses y subscript t minus y with hat on top subscript t space close parentheses squared equals sum from t equals 1 to n of space open parentheses e subscript t close parentheses squared space rightwards arrow space m i n

Substituting the theoretical model equation, we obtain:

sum from t equals 1 to n of space open parentheses y subscript t minus open parentheses a subscript 0 plus a subscript 1 x subscript t close parentheses close parentheses squared equals sum from t equals 1 to n of space open parentheses y subscript t minus a subscript 0 minus a subscript 1 x subscript t close parentheses squared space equals S open parentheses a subscript 0 comma space a subscript 1 close parentheses space rightwards arrow space m i n

We now need to find the minimum of the sum-of-squares function S open parentheses a subscript 0 comma a subscript 1 close parentheses. That is, we choose the estimates a subscript 0 space i space a subscript 1, so that this sum is as small as possible.

Using calculus, we can find an extremum of a function. Here it is enough to compute the partial derivatives with respect to the parameters and set them equal to zero. For the function S open parentheses a subscript 0 comma a subscript 1 close parentheses these conditions can be written as a system of equations:

open curly brackets table attributes columnalign left end attributes row cell fraction numerator partial differential S open parentheses a subscript 0 comma a subscript 1 close parentheses over denominator partial differential a subscript 0 end fraction equals 0 end cell row cell fraction numerator begin display style partial differential S open parentheses a subscript 0 comma a subscript 1 close parentheses end style over denominator begin display style partial differential a subscript 1 end style end fraction equals 0 end cell end table close

To compute the partial derivatives, we can of course expand the squared expression in parentheses, i.e.:

S open parentheses a subscript 0 comma space a subscript 1 close parentheses equals sum from t equals 1 to n of space open parentheses y subscript t minus a subscript 0 minus a subscript 1 x subscript t close parentheses squared space equals sum from t equals 1 to n of space open parentheses y subscript t minus a subscript 0 minus a subscript 1 x subscript t close parentheses times open parentheses y subscript t minus a subscript 0 minus a subscript 1 x subscript t close parentheses equals equals sum from t equals 1 to n of space open parentheses y subscript t superscript 2 minus 2 y subscript t a subscript 0 minus 2 y subscript t a subscript 1 x subscript t plus 2 a subscript 0 a subscript 1 x subscript t plus a subscript 0 superscript 2 plus a subscript 1 superscript 2 x subscript t superscript 2 close parentheses

By computing the partial derivatives and using the basic derivative rules, namely: open parentheses a times f left parenthesis x right parenthesis close parentheses apostrophe space equals space a times open parentheses f left parenthesis x right parenthesis close parentheses apostrophe space, open parentheses x to the power of n close parentheses apostrophe space equals space n times x to the power of n minus 1 end exponent, open parentheses x close parentheses apostrophe equals 1, and open parentheses s t a ł a close parentheses apostrophe equals 0, I obtain:

fraction numerator partial differential S open parentheses a subscript 0 comma a subscript 1 close parentheses over denominator partial differential a subscript 0 end fraction equals sum from t equals 1 to n of space open parentheses y subscript t superscript 2 minus 2 y subscript t a subscript 0 minus 2 y subscript t a subscript 1 x subscript t plus 2 a subscript 0 a subscript 1 x subscript t plus a subscript 0 superscript 2 plus a subscript 1 superscript 2 x subscript t superscript 2 close parentheses apostrophe equals sum from t equals 1 to n of space open parentheses 0 minus 2 y subscript t times 1 minus 0 plus 2 a subscript 1 x subscript t times 1 plus 2 a subscript 0 superscript blank plus 0 close parentheses apostrophe equals equals sum from t equals 1 to n of space open parentheses negative 2 y subscript t plus 2 a subscript 1 x subscript t plus 2 a subscript 0 close parentheses space equals space sum from t equals 1 to n of space open parentheses negative 2 y subscript t close parentheses plus sum from t equals 1 to n of space open parentheses 2 a subscript 1 x subscript t close parentheses plus sum from t equals 1 to n of space open parentheses 2 a subscript 0 close parentheses equals bold 2 open square brackets bold minus bold sum from bold t bold equals bold 1 to bold n of bold y subscript bold t bold plus bold a subscript bold 1 bold sum from bold t bold equals bold 1 to bold n of bold x subscript bold t bold plus bold n bold times bold a subscript bold 0 close square brackets
fraction numerator partial differential S open parentheses a subscript 0 comma a subscript 1 close parentheses over denominator partial differential a subscript 1 end fraction equals sum from t equals 1 to n of space open parentheses y subscript t superscript 2 minus 2 y subscript t a subscript 0 minus 2 y subscript t a subscript 1 x subscript t plus 2 a subscript 0 a subscript 1 x subscript t plus a subscript 0 superscript 2 plus a subscript 1 superscript 2 x subscript t superscript 2 close parentheses apostrophe equals sum from t equals 1 to n of space open parentheses 0 minus 0 minus 2 y subscript t x subscript t times 1 plus 2 a subscript 0 x subscript t times 1 plus 0 plus 2 times a subscript 1 superscript blank x subscript t superscript 2 close parentheses apostrophe equals equals sum from t equals 1 to n of space open parentheses negative 2 y subscript t x subscript t plus 2 a subscript 0 x subscript t plus 2 times a subscript 1 x subscript t superscript 2 close parentheses space equals space sum from t equals 1 to n of space open parentheses negative 2 y subscript t x subscript t close parentheses plus sum from t equals 1 to n of space open parentheses 2 a subscript 0 x subscript t close parentheses plus sum from t equals 1 to n of space open parentheses 2 a subscript 1 x subscript t superscript 2 close parentheses equals bold 2 open square brackets bold minus bold sum from bold t bold equals bold 1 to bold n of bold y subscript bold t bold x subscript bold t bold plus bold a subscript 0 bold sum from bold t bold equals bold 1 to bold n of bold x subscript bold t bold plus bold italic a subscript bold 1 bold sum from bold t bold equals bold 1 to bold n of bold space bold italic x subscript bold t superscript bold 2 close square brackets

Comparing the computed derivatives to zero, I get:

open curly brackets table attributes columnalign left end attributes row cell bold 2 open square brackets bold minus bold sum from bold t bold equals bold 1 to bold n of bold y subscript bold t bold plus bold a subscript bold 1 bold sum from bold t bold equals bold 1 to bold n of bold x subscript bold t bold plus bold n bold times bold a subscript bold 0 close square brackets equals 0 space space space space space space space space space space space space space space space space space divided by colon 2 end cell row cell bold 2 open square brackets bold minus bold sum from bold t bold equals bold 1 to bold n of bold y subscript bold t bold x subscript bold t bold plus bold a subscript 0 bold sum from bold t bold equals bold 1 to bold n of bold x subscript bold t bold plus bold italic a subscript bold 1 bold sum from bold t bold equals bold 1 to bold n of bold space bold italic x subscript bold t superscript bold 2 close square brackets equals 0 space space space space space space divided by colon 2 end cell end table close
open curly brackets table attributes columnalign left end attributes row cell negative sum from straight t equals 1 to straight n of straight y subscript straight t plus straight a subscript 1 sum from straight t equals 1 to straight n of straight x subscript straight t plus straight n times straight a subscript 0 equals 0 end cell row cell negative sum from straight t equals 1 to straight n of straight y subscript straight t straight x subscript straight t plus straight a subscript 0 sum from straight t equals 1 to straight n of straight x subscript straight t plus a subscript 1 sum from straight t equals 1 to straight n of space x subscript straight t superscript 2 equals 0 end cell end table close

0«/mo»«munderover»«mo»§#8721;«/mo»«mrow»«mi mathvariant=¨normal¨»t«/mi»«mo»=«/mo»«mn”>1«mi mathvariant=¨normal¨»n«msub>«mo»+«/mo»«msub>a1«munderover>∑=1«mo> «msubsup>x2«mo>=«mn>0«mspace linebreak=¨newline¨/»«/math»” />

open curly brackets table attributes columnalign left end attributes row cell straight n times straight a subscript 0 plus straight a subscript 1 sum from straight t equals 1 to straight n of straight x subscript straight t equals sum from straight t equals 1 to straight n of straight y subscript straight t end cell row cell straight a subscript 0 sum from straight t equals 1 to straight n of straight x subscript straight t plus a subscript 1 sum from straight t equals 1 to straight n of space x subscript straight t superscript 2 equals sum from straight t equals 1 to straight n of straight y subscript straight t straight x subscript straight t end cell end table close space space space space space space space space space left parenthesis 1 right parenthesis

Our task is to solve the above system of equations and compute the values of a subscript 0 and a subscript 1.

I will use here the method of opposite coefficients.

open curly brackets table attributes columnalign left end attributes row cell straight n times straight a subscript 0 plus straight a subscript 1 sum from straight t equals 1 to straight n of straight x subscript straight t equals sum from straight t equals 1 to straight n of straight y subscript straight t space space space space space space space space space space space space space space space divided by space times open parentheses negative sum from straight t equals 1 to straight n of straight x subscript straight t close parentheses end cell row cell straight a subscript 0 sum from straight t equals 1 to straight n of straight x subscript straight t plus a subscript 1 sum from straight t equals 1 to straight n of space x subscript straight t superscript 2 equals sum from straight t equals 1 to straight n of straight y subscript straight t straight x subscript straight t space space space space space divided by times straight n end cell end table close
open curly brackets table attributes columnalign left end attributes row cell negative straight n space straight a subscript 0 sum from straight t equals 1 to straight n of straight x subscript straight t minus straight a subscript 1 open parentheses sum from straight t equals 1 to straight n of straight x subscript straight t close parentheses squared equals negative sum from straight t equals 1 to straight n of straight x subscript straight t space sum from straight t equals 1 to straight n of straight y subscript straight t space end cell row cell straight n space straight a subscript 0 sum from straight t equals 1 to straight n of straight x subscript straight t plus n space a subscript 1 sum from straight t equals 1 to straight n of space x subscript straight t superscript 2 equals n sum from straight t equals 1 to straight n of straight y subscript straight t straight x subscript straight t space space end cell end table close

0«munderover>∑=1«msub>«mo>+«msub>a1«munderover>∑=1«mo> «msubsup>x2«mo>=«munderover>∑=1«msub>«msub>«mo>/«mo>·«mi mathvariant=¨normal¨»n«mspace linebreak=¨newline¨/»«/math»” /> 0«munderover>∑=1«msub>«mo>-«msub>1«msup>∑=1«msub>2«mo>=«mo>-«munderover>∑=1«msub>«munderover>∑=1«msub>«mtr>«mi mathvariant=¨normal¨»n«mo> «msub>0«munderover>∑=1«msub>«mo>+«mi>n«mo> «msub>a1«munderover>∑=1«msubsup>x2«mo>=«mi>n«munderover>∑=1«msub>«msub>«mspace linebreak=¨newline¨/»«/math»” />

Adding both equations together, we get:

n space a subscript 1 sum from straight t equals 1 to straight n of space x subscript straight t superscript 2 minus straight a subscript 1 open parentheses sum from straight t equals 1 to straight n of straight x subscript straight t close parentheses squared equals straight n sum from straight t equals 1 to straight n of straight y subscript straight t straight x subscript straight t space space minus sum from straight t equals 1 to straight n of straight x subscript straight t space sum from straight t equals 1 to straight n of straight y subscript straight t space

1«munderover>∑=1«msubsup>x2«mo>-«msub>1«msup>∑=1«msub>2«mo>=«mi>n«munderover>∑=1«msub>«msub>«mo>-«munderover>∑=1«msub>«munderover>∑=1«msub>«mspace linebreak=¨newline¨/»«/math»” />

Hence we can compute the value of the estimator a subscript 11«/math»” />:

a subscript 1 open square brackets straight n space sum from straight t equals 1 to straight n of space straight x subscript straight t superscript 2 minus open parentheses sum from straight t equals 1 to straight n of straight x subscript straight t close parentheses squared close square brackets equals straight n sum from straight t equals 1 to straight n of straight y subscript straight t straight x subscript straight t space space minus sum from straight t equals 1 to straight n of straight x subscript straight t space sum from straight t equals 1 to straight n of straight y subscript straight t space
a subscript 1 equals fraction numerator straight n sum from straight t equals 1 to straight n of straight y subscript straight t straight x subscript straight t space space minus sum from straight t equals 1 to straight n of straight x subscript straight t space sum from straight t equals 1 to straight n of straight y subscript straight t space over denominator straight n space sum from straight t equals 1 to straight n of space straight x subscript straight t superscript 2 minus open parentheses sum from straight t equals 1 to straight n of straight x subscript straight t close parentheses squared end fraction

After a few small transformations, we finally get (for readability I won’t write the summation indices):

a subscript 1 equals fraction numerator straight n sum for blank of straight y subscript straight t straight x subscript straight t space space minus sum straight x subscript straight t space sum straight y subscript straight t space over denominator straight n space sum straight x subscript straight t superscript 2 minus open parentheses sum straight x subscript straight t close parentheses squared end fraction equals fraction numerator straight n sum for blank of straight y subscript straight t straight x subscript straight t space space minus n squared times begin display style 1 over n squared end style sum straight x subscript straight t space sum straight y subscript straight t space over denominator straight n space sum straight x subscript straight t superscript 2 minus n squared times 1 over n squared open parentheses sum straight x subscript straight t close parentheses squared end fraction equals fraction numerator straight n sum for blank of straight y subscript straight t straight x subscript straight t space space minus n squared times begin display style fraction numerator sum straight x subscript straight t over denominator straight n end fraction end style space begin display style fraction numerator sum straight y subscript straight t over denominator straight n end fraction end style space over denominator straight n space sum straight x subscript straight t superscript 2 minus n squared times open parentheses 1 over n sum straight x subscript straight t close parentheses squared end fraction equals fraction numerator straight n open parentheses sum for blank of straight y subscript straight t straight x subscript straight t space space minus n space x with bar on top space y with bar on top space close parentheses over denominator straight n space open parentheses sum straight x subscript straight t superscript 2 minus n space open parentheses straight x with bar on top close parentheses squared close parentheses end fraction
bold italic a subscript bold 1 bold equals fraction numerator bold sum bold y subscript bold t bold x subscript bold t bold space bold space bold minus bold n bold space bold x with bold bar on top bold space bold y with bold bar on top bold space over denominator bold sum bold x subscript bold t superscript bold 2 bold minus bold n bold space open parentheses bold x with bold bar on top close parentheses to the power of bold 2 end fraction

It remains to estimate the parameter a subscript 0. For this purpose, I will use the first equation from system (1).

open curly brackets table attributes columnalign left end attributes row cell straight n times straight a subscript 0 plus straight a subscript 1 begin inline style sum for blank of end style straight x subscript straight t equals begin inline style sum for blank of end style straight y subscript straight t space space space space end cell row cell straight a subscript 1 equals fraction numerator sum straight y subscript straight t straight x subscript straight t space space minus straight n space straight x with bar on top space straight y with bar on top space over denominator sum straight x subscript straight t superscript 2 minus straight n space open parentheses straight x with bar on top close parentheses squared end fraction end cell end table close
straight n times straight a subscript 0 equals begin inline style sum for blank of end style straight y subscript straight t minus straight a subscript 1 begin inline style sum for blank of end style straight x subscript straight t space space space space space divided by space colon straight n
straight a subscript 0 equals fraction numerator sum for blank of straight y subscript straight t over denominator straight n end fraction minus fraction numerator straight a subscript 1 sum for blank of straight x subscript straight t over denominator straight n end fraction

Hence, finally:

...

where ... and ... are the arithmetic means of the variables X and Y, respectively.

After some transformations, one can also use another form of the formula for the parameter a_1 in front of X. Both forms are correct and can be used interchangeably.

...

This is how we derive the formulas for estimating the structural parameters of an econometric model. 🙂

For those who are more mathematically advanced: how do we know that the values computed in this way actually minimize the function S(a0,a1)? In mathematical analysis, to confirm this one typically starts by computing the second-order partial derivatives of the function with respect to the parameters. I remember that the first derivatives looked as follows: ... and ....

...
...
...

I arrange them into the so-called Hessian, i.e. the matrix of second-order derivatives:

...

A function of two variables has an extremum when two conditions hold:

  • a local maximum when the determinant of the matrix at the point (a0,a1) is positive, i.e. det(H) > 0 and ...
  • a local minimum when the determinant of the matrix at the point (a0,a1) is positive, i.e. det(H) > 0 and ...

I am looking for the value that minimizes the function S(a0,a1).

The second condition for a local minimum is satisfied because .... Therefore, I check the determinant of the Hessian to make sure it is indeed positive. For a 2x2 matrix, the determinant is easy to compute: ad-bc. Hence:

...
...

Therefore, the derived estimators of the parameters a0 and a1 minimize the function S(a0,a1).

Interpretation of the slope coefficient and the intercept

Once you compute the parameters of the linear regression equation ..., it is worth knowing what they actually mean.

We interpret the estimated value of the slope coefficient a1 as follows:

An increase (ALWAYS an increase) of the explanatory variable X by 1 unit implies a change (an increase or a decrease) in the explained variable, on average, by the value of the estimated parameter alpha_i.

The intercept a0 tells us what value of Y we should expect when X equals zero. However, this interpretation is not always meaningful. I mentioned this in my course.

Example 2

In a certain group of students, the relationship between the number of points obtained as an exam score , and the number of hours spent studying for that exam was examined. After calculations, the following model was estimated: . The interpretation of the model parameters is as follows:

– if the number of hours spent studying for the exam increases by one hour, then the number of points obtained on the exam increases on average by about 37 points;

– is not interpretable. After all, it makes no sense to say that if a student does not study for the exam (spends hours studying), they will obtain as many as 128 points on the exam…

Estimation of econometric model parameters by the Least Squares Method – the case of multiple explanatory variables.

A moment ago, I explained how searching for a line works in the case of two variables X and Y. A linear model with an intercept and one explanatory variable is a special case of a model with k explanatory variables. So how does the Least Squares Method work when we have more than one X variable? In this case, finding a solution becomes relatively straightforward when we use matrix algebra.

The general econometric model with an intercept has the form:

...

In matrix–vector notation, it can be written as:

...

Hence:

...

With this notation, the column vector contains all observations of the dependent (explained) variable. In the matrix X, successive columns contain observations of the explanatory variables in the model. Typically, the matrix X is a rectangular matrix with many more rows than columns, because most often the number of observations is greater than the number of variables X_k. A rectangular matrix X cannot be inverted (only square matrices are invertible). For this reason, equation (2) cannot be solved using purely algebraic manipulations.

After estimating the structural parameters alpha_i, the econometric model will take the form:

...

Now I will show how to derive the estimator bold a of the parameters using the Least Squares Method.

The principle is the same as before. The idea of OLS comes down to choosing the estimated values a0,a1,...,ak of the structural parameters alpha0,alpha1,...,alphak, so that the sum of squared differences between the observed values y_t and the theoretical values computed from the model equation yhat_t is as small as possible.

...

As before, after substituting the theoretical model equation, I obtain:

...

The solution of the system in matrix form will be a vector of the form: a vector.

The function S(...), using the properties of matrix operations, can be written in matrix form as follows:

...

The sum of squared residuals S(a) is a single specific number — in other words, a scalar. Therefore, each term in the obtained sum is also just an ordinary number. Transposing scalars or changing the order of multiplication does not affect the result, so: .... As a result, we get:

...

The function S(a) attains a minimum if its first derivative with respect to the vector a equals the zero vector, and the second derivative is positive definite.

...

Setting the derivative ... equal to the zero vector, I obtain:

...
...
...

Using the property of matrix multiplication for a matrix A and its inverse A^{-1}, we obtain the identity matrix, i.e. A^{-1}A = AA^{-1} = I. This is the matrix corresponding simply to the number one.

Hence, we finally obtain the formula for the estimates of the unknown structural parameters alpha_i in vector form:

...

X^T denotes the transpose of the matrix X, while (X^T X)^{-1} denotes the inverse of the matrix X^T X.

So far, we have considered the necessary condition for the existence of an extremum. We must now verify whether the extremum we found is indeed a minimum of the function S(a). The sufficient condition for an extremum is that the Hessian matrix (the matrix of second derivatives) is positive definite. In this case, it takes the form:

...

The equation above clearly shows that the positive definiteness condition for the Hessian is satisfied, because X^T X is positive definite as long as its determinant is non-zero.

Interpretation of the coefficients

As in the case of an equation with a single explanatory variable, it is also useful to know how to interpret the parameters of the linear regression equation ....

In this case, our line of thinking should follow the previously discussed single-variable case very closely. The difference is that we must add the phrase indicating that the remaining variables (not being interpreted at a given moment) are held constant. Consequently, the estimated value of the coefficient a_i, associated with the variable X_i, where i in {1,...,k}, is interpreted as follows:

An increase (ALWAYS an increase) in the explanatory variable X_k by 1 unit implies a change (an increase or a decrease) in the explained variable Y on average by the value of the estimated parameter alpha_i, assuming the other variables remain constant (ceteris paribus).

As before, the intercept a0 tells us what value of Y we should expect when all explanatory variables are equal to zero. However, for economic variables this interpretation is not always meaningful, as shown in Example 2.

This is where the formulas for estimating the parameters of the econometric model using the Least Squares Method come from, and this is how they were derived.

Fun fact – Anscombe’s quartet

Everything that has upsides also has its downsides. One example illustrating the limitations of the Least Squares Method in the general case is Anscombe’s quartet – a specially constructed set of four datasets that have nearly identical statistical indicators (the mean and variance in the X and Y directions, the correlation coefficient, and the regression line), despite having a markedly different structure when viewed graphically.





Source: By Anscombe.svg: SchutzPrace; derivatives of this file (label using subscripts): Avenue – Anscombe.svg, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=9838454, accessed: 15 June 2018.

At the very end of this article, I will mention one more important property. Using the Least Squares Method, the resulting estimators a of the model parameters have the following properties: they are linear, consistent, unbiased, and most efficient. But more on all of that in the next lectures.

Summary

In the lecture above, I presented the concept of “regression” and the operation of the most widely used estimation method in econometrics, known as the Least Squares Method. It is precisely with this method that, by estimating the unknown parameters of the model, we obtain estimates for which the model best fits the presented data.

I hope that from now on the formulas used will no longer be a mystery to you.

If you want to apply this knowledge in practice, I encourage you to take a look at my course, especially lesson no. 3.

THE END


Click to review what the correlation coefficient is and how strongly variables are related (previous lecture) <–

Click to see what the assumptions of the classical least squares method are (next lecture) ->

Click to return to the page with Econometrics lectures


Leave a Reply

Your email address will not be published. Required fields are marked *

Your comment will be publicly visible on our website along with the above signature. You can change or delete your comment at any time. The administrator of personal data provided in this form is eTrapez Usługi Edukacyjne E-Learning Krystian Karczyński. The principles of data processing and your related rights are described in our Privace Policy (polish).


Categories on the Blog