Sanderson M. Smith

INFERENCE FOR SLOPE OF LEAST-SQUARES REGRESSION LINE
and

This paper is put together simply to demonstrate reading computer output and to do a bit of inference work with the slope of a least-squares regression line. It represents something that could possibly be referenced on the Advanced Placement Statistics Examination. What appears below is, I believe, certainly "fair game" for the Exam.

I will use the data from exercise 14.8 (page 761, Yates text) for this demonstration, but it is not necessary for you to look there.

Good runners take more steps per second as they speed up. Here are the average number of steps per second for a group of top female runners at different speeds.

 X Speed (ft./sec.) 15.86 16.88 17.5 18.62 19.97 21.06 22.11 Y Steps per second 3.05 3.12 3.17 3.25 3.36 3.46 3.55

Now, on the AP exam, one could be faced with a computer printout relating to this data. It might look something like this:

 R squared = 99.8% s = 0.0091 with 7 - 2 = 5 degrees of freedom Variable Coefficient s.e. of coeff t-ratio prob. Constant 1.76608 0.0307 57.6 <0.0001 Speed 0.080284 0.0016 49.7 <0.0001

OK, one should be able to glean from this computer output that the least-squares line calculated from the data is

STEPS = 1.76608 + 0.080284(SPEED)

with r = sqrt(.998) = .998999 (Note that r would be positive here since the slope of the LSRL is positive.)

Realize that the slope (.080284) and the y-intercept (1.76608) are statistics. They are not population parameters. If we collected another set of similar data and calculated a least-squares regression line, these numbers would undoubtedly be different.

There is (in theory at least) a least-squares regression line for all of the possible data points {(SPEED,STEPS)} that one could collect. If one could calculate the equation of that line, the slope and intercepts would be parameters.

What one can do is use our statistics calculate a confidence interval for these parameters. Note that the standard errors for the statistics are provided in the computer printout. For the sample slope of 0.080284, the calculated standard error is 0.0016. (The formula for the standard error computation appears on page 762 of the Yates text, but they are not needed here.)

There are 7 data points, and the degrees of freedom is therefore 7 - 2 = 5 (as stated in the computer printout). If one wants a 95% confidence interval for the true slope of the LSRL, the crucial t value (from t-distribution table) is t = 2.571. The desired confidence interval is

.080284 plus/minus (2.571)(0.0016) = (.0761704,.0843976)

What if one is asked to interpret what this interval means? It could be said that we are 95% confident that each additional unit (ft./sec.) increase in speed results in the number of steps increasing by a number between .0762 and .0844.

Just a note on writing and interpreting slope of a least-squares regression line....

The LSRL above has the form

y(hat) = 1.76608 + 0.080284x

The slope of this line is m = .080284. Using the algebraic identity

m = 1/(1/m)

we could write .080284 = 1/(1/.080284) = 1/12.46 (approximately)

OK, so it would be reasonable to write the LSRL as

y(hat) = 1.76608 + (1/12.46)x

Why would one want to do this? Well, it does make it a bit easier to interpret the slope, something that students are sometimes asked to do on the AP Examination. For instance, returning to the problem above and writing the LSRL as

STEPS = 1.76608 + (1/12.46)(SPEED)

one could put this interpretation on the slope: For each additional step per second taken by a female runner, her speed would increase by approximately 12.46 ft./sec.

(This is, I believe, a reasonable interpretation, although in a real life situation it would be tough to actually take one complete additional step per second. Note use of the word approximately in the explanation. This is important.)