Sanderson M. Smith

Example of Problem Involving OUTLIERS
(Problem #1 on 2001 AP Statistics Examination)

This is an analysis of Problem #1 in Section II on the 2001 Advanced Placement Statistics Examination.

The College Board does not allow a reproduction of the problem statement here. If you do not have before you a statement of the actual problem, you can get it from the College Board site

=============================================

Comment: An outlier is an unusually small or large data value. There are a number of "reasonable" ways to define an outlier. Two acceptable solutions to this problem are demonstrated below. The abbreviation IQR stands for "interquartile range."

 Solution #1: (a) An observation is an outlier if it is more than 1.5 IQR's above Q3 or less than 1.5 IQR's below Q1. In the data provided 1.5(IQR) = 1.5(19.250 - 9.680) = 1.5(9.57) = 14.355. An observation is an outlier if it is not in the interval (9.680-14.355,19.250+14.355) = (-4.675, 33.606). (b) Yes, there is at least one outlier. The data value 38.180 is not in the interval above, so it is on outlier. There may be other outliers on the high side, but these can't be determined from the information provided. (c) The fact that the word only is underlined suggests that the news media considers 10 inches of rainfall to be unusual. However, we can note that 10 inches is not an outlier, and, in fact, falls just above Q1, the 25th percentile. We can also note that 10 is within one standard deviation of the mean, again suggesting that it is not an unusual value. Solution #2: (a) An outlier is an observation that is more than 3 standard deviations from the mean. Using the data provided, an outlier would be any observation that is not in the interval (14.941-3(6.747), 14.941+3(6.747)) = (-5.3, 35.182). (b) Yes, there is at least one outlier. The data value 38.180 is not in the interval above, so it is an outlier. There may be other outliers on the high side, but these can't be determined from the information provided. (c) The fact that the word only is underlined suggests that the news media considers 10 inches of rainfall to be unusual. However, we can note that 10 inches is not an outlier, and, in fact, falls just above Q1, the 25th percentile. We can also note that 10 is within one standard deviation of the mean, again suggesting that it is not an unusual value.

Additional notes relating to this problem:

There are other "reasonable" ways to define an outlier. For instance, one might consider an outlier to be any observation that is more than two standard deviations from the mean.

There are outlier definitions that would fall in the "unreasonable" category. For instance, if an outlier is defined to be any value that is more than one IQR above the MAX, or less than one IQR below the MIN, then no data set could have an outlier.

Finally, while Solution #2 is in the "reasonable" category, beware of responding to (c) by saying something like 95% of data values will be within 2 standard deviations of the mean, or that 99% of data values will be within 3 standard deviations of the mean. This would be making the assumption that the data provided has a normal distribution. This is not stated in the premise. In fact, a boxplot constructed from the data would suggest that the distribution is skewed to the right.