The binomial distributions

"I abhor averages. I like theindividual case. A man may have six meals one day and none the next,making an average of three meals per day, but that is not a good wayto live."

Louis D. Brandies, A Free Man's Life

12.1 INFERENCE FOR A POPULATION PROPORTION(Pages 658 - 674)
OVERVIEW: This section deals withtests and confidence intervals for sample proportions. When thesample size is large, the distribution of p(hat) is approximatelynormal with mean p and standard deviation sqrt(p(1-p)/n). Theinference procedures are reasonably accurate if the population is atleast ten times larger than the sample, and if the sample size issuch that np and n(1-p) are both at least 10.
A confidence intervalfor a sample proportion p(hat) has theform
p(hat) plus/minusz*sqrt[p(hat)((1-p(hat))/n).
The expression sqrt[p(hat)((1-p(hat))/n) = SE is called thestandard errorof p(hat)
Tests of a null hypothesis: p = p₀ are based on thestatistic
z = (p(hat) - p₀)/sqrt((p₀(1-p₀)/n))
Example:
A random sample of 867 registered voters foundthat 63% favored Proposition A. Calculate a 95% confidence intervalfor the proportion of voters in the population who favor PropositionA.
Response: The desired confidenceinterval is
.63 plus/minus 1.96sqrt(.63(1-.63)/867) = .63plus/minus .0321 = [.5979,.6621].
We see this type of thing (indirectly) quitefrequently in newspapers and magazines when the report results ofsurveys. For instance, a survey might report that 56% of a sample of1,446 voters support candidate Herkimer, with a margin of error = 3%.Those who are statistically literate know that the 95% confidenceinterval is [53%, 59%]. The margin of error reported is 2 standarddeviations. In this example, one standard deviation issqrt(.56(.44)/1446) = .013, and 2(.013) = .026, which is rounded upto 3%.
Example:
A random sample of 1700 voters from a largepopulation found found that 1250 favored Proposition B. Test thehypothesis that that at most 70% of the voters in the populationfavor Proposition B.
Analysis:
Null hypothesis H₀: p = .7
Alt. hypothesis H_a: p > .7
Sample proportion p(hat) = 1250/1700 =.7353
z_.7353 = (.7353-.7)/sqrt(.7(.3)/1700) = 3.18
P-value = normalcdf(3.18,1E99,0,1) = .0007= .07%
Conclusion: It is highly unlikely that thepopulation proportion favoring Proposition B is 70% or less.
Another approach:
The standard error, SE =sqrt(.7(.3)/1700) = .011114
P-value = normalcdf(.7353,1E99,.70,.011114) = .0007, which agrees with thecalculations above.
There is a subsection of this section calledChoosing the Sample Size, which begins on page 669. This section is well-writtenand easy to follow. I'm going to attempt to put a real-lifeperspective on this. Here's a hypothetical (but realistic)situation.
Suppose you are heading a scientificpolling organization. Candidate Herkimer wants you poll a group ofvoters to determine his popularity. And, he would like a margin oferror of 2%. The question now becomes how large a sample size isneeded to get a 2% margin of error?
Sounds simple enough. We are interested in a 95%confidence interval, which involves the z-value 1.96. For simplicity,let's just use 2 instead of 1.96. We need only solve theequation
.02 = 2sqrt((p(1-p)/n)
Easily done: We get (.02)² = 4p(1-p)/N ==> N =4p(1-p)/(.02)². But, we have a Catch 22 situation. We don't have a value of p,and we can't get an approximation for p before we take a sample. But,it is the sample size we are trying to determine, etc., etc.
Note that the equation N =4p(1-p)/(.02)² represents a parabola that opens downward. If you graphthis downward-opening parabola, you will find that N obtains amaximum value when p = 0.5. Substituting p = 0.5 into the equation,we get N = 4(.5)(1-.5)/(.02)² = = 1/(.02)² = 2,500. Hence, to get themargin of error requested by candidate Herkimer, we would need an SRSof at least 2,500 voters.
In general, if we are interested in a specificmargin of error = m_e for a 95% confidence interval, the sample size needed is1/(m_e)².
Let's retract: Suppose candidate Herkimer wants amargin of error of 1.5% or less. Basically, he wants a 95% confidenceinterval that is 3 units wide. The sample size needed is1/(015)² =4445. Let's now assume that a sample of this size shows that 43% ofthe voting population will vote for Herkimer. This translates to a95% confidence interval [41.5%, 44.5%]. Putting the properinterpretation on this interval, and assuming that Herkimer isinvolved in a 2-person political race, the polls indicate that heneeds to do something to attract voters. Why? Simply because thestatistics suggest he would loose the election if it were heldtoday.

RETURN TO TEXTBOOK HOME PAGE