Sanderson M. Smith

Example of Problem Involving
INFERENCE ON SLOPE OF REGRESSION LINE

(Problem #6 on 2001 AP Statistics Examination)

This is an analysis of Problem #6 in Section II on the 2001 Advanced Placement Statistics Examination.

The College Board does not allow a reproduction of the problem statement here. If you do not have before you a statement of the actual problem, you can get it from the College Board site

=============================================

Comment: This problem involves reading generic computer output. There is no need to input the actual data values provided into a calculator.

Here is a solution:

(a) One way to compare the data in the two groups would be to construct back-to-back dot plots

 Did complete Ph.D. Program GPA Did not complete Ph.D. Program * 2.9 * * 3.0 3.1 * 3.2 3.3 * * 3.4 * * * 3.5 * * 3.6 * * * 3.7 * 3.8 * * 3.9 * * * 4.0

As indicated by the display above, those who did complete the Ph.D. program tended to have a higher undergraduate GPA in statistics and mathematics than those who did not. Here is a "big 5" summary, illustrating that, with the exception of MIN values, the values for those who "completed" are greater than the corresponding values for those who "did not complete."

 GPA Did complete Ph.D. Program Did not complete Ph.D. Program MIN 2.9 2.9 Q1 = 25th percentile 3.45 3.1 MEDIAN 3.6 3.5 Q3 = 75th percentile 3.9 3.6 MAX 4.0 3.9

(b) Using the computer output, the regression equation for those who completed the program is

(Credit hours) = 23.514 - 2.7555(GPA)

The premise states that all assumptions for inference were reasonable. We can therefore run a hypothesis test (t test) on b, the true slope of the regression line.

H0: b= 0.

Ha: b is not 0.

Degrees of freedom (df) = 13 - 2 = 11.

The computer output shows t = -5.90 with a two-sided P-value of 0.0000. In other words, H0 would be rejected at virtually all levels of significance. At the 1% level with 11 df, the critical region for t is t > 3.106 or t < -3.106. For those who completed the Ph.D. program, we conclude that there is a significant relationship between GPA and the number of credit hours per semester.

(c) We are interested in the point (3.5, 14).

For those who completed the Ph.D. program, the calculated regression line is

(Credit hours) = 23.514 - 2.7555(GPA).

Substituting GPA = 3.5, we obtain 23.514 - 2.7555(3.5) = 13.86975.

For those who did not complete the program, the calculated regression line is

(Credit Hours) = 24.200 - 3.485(GPA).

Substituting GPA = 3.5, we obtain 24.200 - 3.485(3.5) = 12.0025.

Since (3.5,14) is much closer to the model for those who completed the program, it is reasonable to think that an applicant with GAP = 3.5 and a mean number of credit hours = 14.0 will successfully complete the program.

Additional notes relating to this problem:

The required statistical value of t = -5.90 is provided in the computer output. Note that it can be calculated by the formula t = b/SEb = -2.7555/0.4668 = -5.90.

One could respond to (b) by constructing a confidence interval for b. For instance, using the provided data, the 95% confidence interval is

-2.7555 plus/minus 2.201(0.4668) = (-3.783, -1.728)

Since this interval does not contain 0, there is evidence to reject H0 at the 5% level of significance.