Sanderson M. Smith

SOMETIMES A CONFIDENCE INTERVAL MAKES NO SENSE

It is sometimes useful to demonstrate that there are times when statistical computations or constructions make no sense. Below one can observe two samples of size 12. For each sample, we will construct a 95% confidence interval for the population mean. In one case, the construction makes sense, and it the other case it does not!

Since the sample sizes are small, the t distribution would be used. Here are some general rules for using a t distribution.

• It is used for small sample sizes (generally less than 30) when population parameters are not known.
• If sample size is less then 15, the t distribution can be used if the data is close to normal.
• If sample size is at least 15, the t distribution can be used except in the presence of outliers or strong skewness.
• If sample size is large (40 or more), the t distribution can be used for clearly skewed distributions.

Using these guidelines, a confidence interval construction makes sense for Sample #1, but not for Sample #2. However, we will construct 95% confidence intervals for both samples.

 SAMPLE #1 SAMPLE #2 98 0 78 100 88 100 77 100 88 100 89 100 82 100 92 100 93 100 88 100 84 100 86 100 N = sample size 12 12 Mean 86.917 91.667 s 6.067 28.868 SE = standard error = s/ ÷N 1.751 8.333 Degrees of freedom 11 11 t value (95% CI) 2.201 2.201 95% CI (Mean) plus/minus t*(SE) 83.062 to 90.772 73.325 to 117.650

Now, assume that I was not aware of the actual sample values, but I knew that [83.062, 90.772] was a 95% confidence interval for a mean calculated from a sample of size 12. Assuming that everything was done properly, and that the conditions for use of the t distribution were met, it would not be unreasonable to think that the mean of the population from which the sample came was, for example, 85. Note that I am not suggesting the mean is 85. I'm simply saying that 85 would not be an unreasonable guess for the mean, whereas a value of 78 (for instance), would not be statistically reasonable.

Now consider Sample #2. Assume I was not aware of the actual sample values, but I knew that [73.325, 117.650] was a 95% confidence interval for the mean calculated from a sample of size 12. If I made the (incorrect) assumption that the conditions for use of the t distribution were met, then it would not be unreasonable to think that the mean of the population from which the sample came was 112. However, one need only look at the sample values to realize that the value of 112 is not a reasonable estimate of the population mean.

The moral of the story is that meaningful use of statistical methods often depends on certain conditions being met. If these conditions are not met, then statistical computations, while they can be done, can be very misleading or inaccurate.

"He uses statistics like a drunker man uses a lamppost...for support rather than illumination."

ANDREW LANG