Sanderson M. Smith

SAMPLE SIZE NEEDED FOR
SPECIFIED MARGIN OF ERROR

The general formula for a confidence interval is

estimate plus/minus (critical value )(standard deviation of the estimate )

A 95% confidence interval for proportions has the form

p(hat) plus/minus 1.96 ÷[((p(hat))(1-p(hat))/N]

where N is the sample size and p(hat) is the sample proportion.

Since 1.96 is approximately 2, we will use 2 in what follows to simply computations.

If the population proportion parameter is p, the margin of error, m, for a 95% confidence interval can be calculated using the formula

m = 2 ÷[p(1-p)/N]

When sampling, p is replaced by p(hat), the sample proportion, to compute m.

What sample size is needed if one wants a specific margin of error?

Solving the above equation for N yields m2/4 = p(1-p)/N ==> N = 4p(1-p)/m2.

YIKE! We face a "Catch 22" situation. We want N, and we know m, but we don't know a value for p, and we can't get such a value until we actually take a sample.

We get around this dilemma by finding the value of p that will maximize N. Since 4 and m2 are known constants, we need only maximize y = p(1-p) = p - p2. This is simply a parabola that opens downward. We need only find the vertex. We can take a derivative and note that dy/dp = 1 - 2p which has value of 0 when p = 1/2. In other words, looking at the equation

N = 4p(1-p)/m2

we will get the largest possible value of N when we substitute p = 1/2. Note that is the substitution is made, we get N = 4(1/2)(1/2)/m2 = 1/m2, a very simple formula. In other words, if we want a 95% confidence interval and know m, margin of error, we can determine the sample size needed for the specified m. For instance, if we want a margin of error = 2%, then the sample size required is 1/(.02)2 = 2,500.

What is shown in the box below is a published survey related to the Persian Gulf War some years ago.

 Would you support or oppose U.S. forces resuming action to force Saddam from power? 54% Support 37% Oppose For this Newsweek Poll, the Gallop Organization interviewed a national sample of 751 adults by telephone April 4-5. The margin of error is plus or minus 4 percentage points. Some "Don't Know" and other responses not shown.

Let's do some computations:

If we were to compute the margin of error using 54%, we would get 2 ÷[(.54)(.46)/751] = 0.0363736. Rounding "out" to the nearest integer percent, we would get the 4% stated in the survey results. If one calculates the margin of error using 37%, one obtains 2 ÷[(.37)(.63)/751] = 0.0352356. Again, if we round "out," we get 4%.

If we wanted a margin of error = 4%, the sample size needed would be 1/(.04)2 = 625. A margin of error of 3% would require a sample size = 1/(.03)2 = 1,111. What is reported in the survey "jives" with these calculations.

While published surveys such as the one above do not generally talk about a 95% confidence interval, the reported margin of error does relate to such an interval, as has been demonstrated. Using the information provided in the survey above, the 95% confidence interval for those support using action to remove Saddam from power is [50%, 58%]. The corresponding 95% confidence interval for those who oppose is [33%,41%].

"Numbers rule the universe."

-PYTHAGORAS (around 550 B.C.)