"Research is the process of going upalleys to see if they are blind."

Marston Bates

12.2 COMPARING TWO PROPORTIONS (Pages678 - 689)

OVERVIEW: This section, although notlengthy in terms of pages, has lots of "meat" in it. There are twostandard error formulas that are commonly used when comparingproportions from two independent samples. Both formula are listed onthe formula sheet that is provided to those who take the AdvancedPlacement Statistics Examination. On page 679, the authors remindreaders that "variances add" when you are talking about differences.This thought is important in understanding the standard error formulalisted on page 681. Just a reminder... the "variances add" idea goesback to page 400. Review, if necessary.

I'll attempt to summarize this important sectionwith two examples:

Example 1 (In whichwe assume that there is no reason to assume that the samples comefrom populations with equal variances.) We will use formulas shown onpage 681.

Do voters want State Proposition A? Assume that the following represents the results foundwhen examining two random samples from two different towns in thestate.

 Sample Size # wanting Prop. A Sample Proportion Sample #1 N1 = 93 74 p1(hat) = .796 Sample #2 N2 = 87 54 p2(hat) = .621

SE = sqrt[(.796)(1-.796)/93 + (.621)(1-.621)/87] =0.06672

Since the sampling distribution ofp1(hat) -p2(hat)isapproximately normal, the 95% confidence interval forp1 -p2 is

(.796-.621) plus/minus 1.96(0.06672) = 0.174plus/minus 0.1308 = [4.32%, 30.48%].

Since this is a 95% confidence interval, we knowthat 19 out of 20 confidence intervals constructed in this fashionwould contain the true percentage difference. In other words, it isstatistically reasonable to assume that the actual value for theparameter p1- p2 isbetween 4% and 30%. Since this interval does not contain 0%, it isreasonable to conclude that these samples came from populations thatdiffer in their support for Proposition A.

Example 2 (In whichwe initially assume that the samples come from populations with equalvariances.) We will use formulas shown on page 684.

Assume we have the following statistical resultsfrom a research project.

 Sample Size # successes Sample Proportion Test group N1 = 1,865 103 p1(hat) = .0552 Placebo group N2 = 1,712 79 p2(hat) = .0461

Null hypothesis H0: p1 = p2

Alternate hypothesis Ha: p1 is not equal top2.

Pooling the data, we obtain p(hat) =(103+79)/(1895+1712) = .0509.

z = (.0552-.0461)/sqrt[(.0509)(1-.0509)(1/1865 +1/1712)] = 1.24.

At the 5% level of significance (2-tail), thecritical region is z < -1.96 or z > 1.96. Since our calculatedz of 1.24 is not in this region, we would not rejectH0.

Alternate approach:Using the TI-83, normalcdf(1.24,1E99,0,1) = .1074877610. Since this is a 2-tailsituation, the P-value is 2(.1074877610), or about 21.5%. There isnot strong evidence to reject H0 .