Sanderson M. Smith

# 2000 AP Statistics Exam

This summary was constructed by Diann C. Resnick of Bellaire High School in Houston, Texas. It is presented here in Herkimer's Hideaway with permission from Diann.
There were some common mistakes that students made during this exam that were not unique to one specific question or subsection.

The students often:

o did not reread their answers for missing words or incomplete sentences.

o abbreviated so many words that the grader did not know what was being written.

o did not answer the questions in the context of the problem. This is always necessary for a complete answer.

o gave long philosophical answers to questions. Students seemed caught up in writing about things that had no bearing on the answer to the question and were not statistical in nature.

In general, whenever students conduct any test of inference there are specific steps to follow. This includes all four of the following:

1. State the hypotheses in the context of the problem.

2. Name the test used and why it was used. Check - not just name - the conditions or assumptions for the test used.

3. Carry out the mechanics of the test and give the test statistic, p value, and degrees of freedom, if applicable.

4. Write a conclusion. A student MUST link the test statistic to the conclusion. This can be similar to; since the p - value is so small (alpha<.05), I reject the null hypothesis and conclude that there is no difference in the mean mental skill score of babies who use walkers and in babies who do not use walkers.

The Free Response section does not mean that students have to write everything in paragraph form. Symbolic and algebraic work are permissible. All that is required is that the reader be able to follow the work and explanations.

Question 1

Many students:

• did not give a value or range of values for the pain relief of drug A. [part (a)]
• did not accurately describe the graph. They failed to explain that drug B had practically no effect from 210 mg to 280 mg and then after 280 mg, the drug had a positive linear increase to almost complete relief of pain at 400 mg. Some students indicated that there was a positive correlation between pain relief and dosage, but failed to state that at low dosage levels there was little or no pain relief. [part (b)]
• did not completely explain their reasoning for choice of drug. They indicated that they choose drug A, but not why it was a better choice than drug B. [part (c)]
• did not understand that the expression "comparison" meant to compare and contrast.
• often added to the complexity of the problem by giving information that was not asked for.
• did not understand that when writing in context, numbers and words are both important. Good communication was lacking.

Question 2

Many students:

• did not understand what was meant by the basic assumptions for a confidence interval. Students were not able to identify the target population of interest and then could not determine if inference could be made for this population.
• seemed to think that the population referred to in the problem were footprints in the cave, not the population of adults who used the cave. Students often wrote a template statement about the footprints in the cave needed to be randomly selected and that this was justified because it was given in the stem of the problem. The student never seemed to make the distinction that the random sample was from footprints in the cave and not a random sample of the adult population who used the cave.
• did not recognize that there were problems in the sample. Many did not make the connection that because the footprints in the cave could have been that of children or many footprints of the same person, the sample was not appropriate to use for making inference about the adults who used the cave (population).
• did not seem to understand that saying "the data" or the "distribution" was normal was not correct. The assumption that was necessary for the confidence interval was that the population of adults using the cave was normal or approximately normal. They did not understand that the distribution of the sample data is what is used to check the assumption of the normality of the population &endash; not the assumption itself.
• used counts as a condition for the confidence interval. Their assumption was that if n < 40 no outliers or little skewness was allowed. Again they did not recognize that the sample size and the shape of the sample distribution are not an assumption, but a method to check the underlying normality of the population.
• drew a box plot from the summary statistics of the sample data and then said the data were skewed right. They failed to make a conclusion based on this fact. They never stated whether or not they concluded if the population was normal.
• drew a box plot from the summary statistics of the sample data and then said that the population was not normal. The student failed to explain why they made this conclusion. The linkage from the graph to the conclusion was not complete or missing.
• could not correctly determine the direction of skewness of a box plot.
• stated as a necessary condition for the confidence interval that the sample (or data or distribution) needed to be normal or approximately normal. They did not seem to understand that the population is what needed to be normal and that the sample is used to test this assumption.
• did not write in context of the problem. They just said normality or random sample. If the student had mentioned the context of the problem, then perhaps they would have been able to make a distinction between sample and population.
• had difficulty in their written communication skills. Their reasoning was difficult to follow.
• listed all the assumptions they ever used or knew (np > 10 and n(1-p) > 10) even though these assumptions had no meaning in this problem.
• thought that the population was not skewed since it was not stated in the problem, or conversely, stated the population was normal since it was not stated in the problem.
• used the argument that since the mean is close or equal to the median, the population was normal. They also said that since there were no outliers, the population was normal. This is an incorrect statement. They did not seem to understand that samples from a normally distributed population could contain outliers and have the mean not equal to the median.
• said that since the sample mean, s, and n were given, one could construct a confidence interval. The student did not seem to understand what the question was asking.

Question 3

Many students:

• did not seem to remember the three big features to look for in descriptive statistics: center, spread, and shape. Students seemed to remember center or shape, but to forget to describe the spread of the data.
• described the shapes of the graphs but failed to compare the two graphs.
• used words that have statistical meaning, such as correlation, association, and frequency, in non-statistical ways.
• gave parallel answers to the problem, but failed to cross out one of the answers. The worse of the two answers is what was graded.
• wrote extensively on the details of the graphs, but failed to mention trends in the data.
• confused the word symmetric with the word uniform and described a graph as being normal as opposed to approximately normal or normal-like in shape.
• wrote and wrote and wrote. More is not necessarily better. Students did not reread what they wrote.
• failed to provide linkage from the description of the graphs to the conclusions they were supposed to make.
• did not write a scale, labels, or a title on their graphs.
• wrote most men have flexibility rating of 6. This is incorrect, what they were describing was the mode, but that is not what they said. Students were not careful with their choice of words.

Question 4

Many students in part (a):

• did not name the test or the assumptions they used for the problem.
• did not completely check assumptions for the hypothesis test. A check mark beside an assumption is not sufficient to show that an assumption was checked.
• used the sample mean, , not the population parameter m, in stating the hypotheses.
• failed to explain what m1 represented in the problem. The student did not explain their notation.
• did not write on their papers the values they got in working the problem. Depending on the test or method they used, they needed to write on the t/Z value, p-value or critical region rejection value, and the degrees of freedom.
• wrote on their papers that p = 2.098 - leaving off the scientific notation from their calculator. They did not seem to catch that p cannot be a number greater than 1.
• wrote two solutions to their problem, often in conflict with each other. They never indicated which one was correct, or recognize that the conclusions were not the same.
• failed to write their answer in context of the problem or failed to give linkage for their conclusion.
• wrote the conclusion, based on the p-value, backwards. They wrote that small p meant for them to fail to reject the null hypothesis.
• could not determine whether the hypothesis test was a one or two-tailed test.

Many students in part (b):

• confused the ideas of regression, correlation, and association. They thought those terms meant the same thing as a "cause and effect" relationship.
• thought that a confounding variable was a reason for not to be able to establish "cause and effect". They did not understand that an experiment can control for confounding variables and that they always exist in an experiment.
• tried to give physiological, not statistical reasons for this answer.

Question 5

Many students in part (a):

• failed to randomize subjects into treatment and control groups.
• were not specific in the description of their treatment group(s).
• did not explain what was being measured in the experiment.
• did not explain that measures should be taken before the experiment, after the experiment and then the measurement means should be compared and analyzed.
• said to take a SRS of subjects. They did not seem to know the difference in randomly assigning subjects to treatments and taking a SRS of subjects from a population.
• did not describe a good randomization scheme so that the treatment and control groups would be fairly balanced in size.

Many students in part (b):

• did not know what blocking was or did not describe it well. They often mistook blocking to mean a(n)
double blind experiment

control/placebo treatment experiment

extra layer of randomization
• failed to explain why they chose the blocking variable they did.

Many students in part (c):

• gave philosophical, not statistical, reasons for the inability to double blind in this experiment.
• confused double blind with single blind.
• thought that it would be unethical to double blind in this situation. They did not understand the concept of informed consent in drug testing.

Question 6

In part (a) of the problem many students:

• failed to list conditions/assumptions and then substitute numbers to see if the conditions were met.
• had difficulty in correctly interpreting the meaning of a confidence interval. They often made incorrect probability statements such as:
The probability that "É" is in this interval is 95%.

Ninety-five percent of the time procedures conducted like this will have the "..." in the interval.

• tried to give the meaning of a 95% confidence interval and then made a mistake doing this.
• failed to state the test they used and/or the formula to calculate the confidence interval.
• did not state their conclusion in context of the problem. A template answer was not sufficient.

In part (b) of the problem many students:

• tried to use a hypothesis test to determine the probability. They did not know how to calculate the mean and variance/standard deviation of the difference of two random variables.

In part (c) of the problem many students:

• tried to give reasons that were not "statistical" to determine independence.
• did not try to answer part (c) if they did not get part (b). Many students did get this answer correctly by guessing at an answer in part (b) and then using this for the answer in (c).
• used the point estimate, instead of the confidence interval, to try and determine if wife and husband's heights were independent.

In part (d) of the problem many students:

• did not correctly center the ellipse at mean x, mean y.
• did not correctly size the ellipse within +- 2 or 3 standard deviations from mean x, mean y.
• did not recognize that the line y = x could be used as the dividing line for when the heights of wives and husbands were equal.