| Here are those AP STATers again. Each is holding a copy of The Practice of Statistics, by Yates, Moore, McCabe. Considering the textbooks as points, what characteristic would the least-squares regression line possess? (A) It would be approximately y = x. (B) It would be approximately horizontal. (C) It would be approximately vertical. |
- OVERVIEW: If a scatterplot shows a linear relationship between two quantitative variables, least-squares regression is a method for finding a line that summarizes the relationship between the two variables, at least within the domain of the explanatory variable, x. The least-squares regression line (LSRL) is a mathematical model for the data.
Regression Line: A straight line that describes how aresponse variable y chances as an explanatory variable x changes. Itcan sometimes be used to predict the value of y for a given value ofx.
A residual is a difference between an observed y and apredicted y.
Important facts about the least squares regression line.
r^{2} in regression: The coefficient ofdetermination, r^{2}, is the fraction of the variation in thevalues of y that is explained the least squares regression of y on x.
Calculation of r^{2} for a simple example:
r^{2} = (SSM-SSE)/SSM, where
SSM = sum(y-
y)^{2 }(Sum ofsquares about the mean y)
SSM = sum(y-y(hat))^{2} (Sum of squares of residuals)In this example, y(hat) = 2 + 2.25x, the mean of x is 4, and themean of y is 11.
| | | (y-11)^{2} | y(hat) | residual=y-y(hat) | (residual)^{2} |
| | | 25 | 6.5 | -0.5 | 0.25 |
| | | 1 | 11.0 | 1.0 | 1.00 |
| 15 | | 16 | 15.5 | -0.5 | 0.25 |
TOTALS | | 42 = SM |
| 0.0 | 1.50 = SSE |
THINGS TO NOTE:
- Sum of deviations from mean = 0.
- Sum of residuals = 0.
- r^{2} > 0 does not mean r > 0. If x and y are negatively associated, then r < 0.
Outlier: A point that lies outside the overall pattern ofthe other points in a scatterplot. (It can be an outlier in the xdirection, in the y direction, or in both directions.)
Influential point: A point that, if removed, wouldconsiderably change the position of the regression line. (Points thatare outliers in the x direction are often influential.)
NOTE: Do not confuse the slope b of the LSRL with the correlationr. The relation between the two is given by the formula b =r(s_{y}/s_{x}). If you are working with normalizeddata, then b does equal r since s_{y} = s_{x} = 1.(When you normalize a data set, the normalized data has mean = 0 andstandard deviation = 1.) If you are working with normalized data, theregression line has the sample form y_{n} = rx_{n},where x_{n} and y_{n} are normalized x and y values,respectively. Since the regression line contains the mean of x andthe mean of y, and since normalized data has a mean of 0, theregression line for normalized x and y values contains (0,0).
PHACS (Procedure, Hypothesis, Assumptions, Calculations,Summarize)
RETURN TO TEXTBOOK HOME PAGE /Back to the top of this page