NOTES ON INFERENCE

This is an e-mail note sent to students in Advanced Placement Statistics.

Hi AP STATers...

This is an attempt to give you a better feel for the power of statistics and to summarize some of the sophisticated topics we have experienced in the last few weeks.

=====================

* You take random samples and examine a statistic from the sample in an attempt to obtain some useful information about the population from which the sample came. Generally, you are attempting to gain some knowledge about an unknown population parameter.

* In a properly-conducted survey, you often work with proportions. While you can use use sample counts, proportions are perhaps easier to read and interpret. For instance, if 823 from a random sample of 2,054 voter favor Herkimer in an upcoming election, the 823 is a sample count, and one could work with it. However, it is usually easier to discuss and interpret the proportion 823/2054, which is .40, or 40%.

This situation is modeled by a binomial distribution, but it can be approximated by a normal distribution. We can, for instance, easily determine the approximate probability that a sample of size 2054 would contain more than 830 Herkimer supporters, or less than 38% Herkimer supporters. (First situation involves count, the second involves proportion. They are, of course, related. You simply must be consistent in your analysis.)

You can construct a 95% confidence interval which is quite meaningful for a properly-obtained sample proportion. This is what is reported (very indirectly) in political polls. If a properly-conducted survey has Herkimer favored by 57% of the voters in a sample with a margin of error of 4%, then the 95% CI is 53% to 61%. Interpretation (assuming all the sampling was done properly): There is a 95% probability that the calculated interval will contain the true proportion of Herkimer voters in the population from which the sample came. It is reasonable to assume that the population parameter is between 53% and 61%. Note that you are providing statistical information about an unknown population parameter. And, you are not saying anything definite. You have used the statistic 57% to make an inference about a population parameter.

* In a somewhat different vein, you have what I like to call quality control situations. Sometimes you examine sample means and make inferences from them. This is done in business, industry, research, etc. In this situation, you have a population with known mean m and standard deviation s. You take a sample of size n from this population calculate the mean (a statistics), and attempt to determine if you would statistically conclude that it came from the described population. You set up a null hypothesis, H0, and an alternate hypothesis, Ha.

H0: Sample same came from population with mean m.

Ha: Sample did not come from population with mean m.

In industry, it you reject H0, this might suggest your equipment needs repair, adjustment, etc. In research, if you want to establish that you have done something that makes a difference, you would hope to reject H0. Note that we are running test just on means, and making no attempt to account for a possible change in standard deviation or variance.

OK, you have a sample mean, x(bar). We know that this single statistic is part of a normal distribution if H0 is true. Again, If H0 is true, then, by the Central Limit Theorem, x(bar) should be a number in a normal distribution with mean m and standard deviation s /sqrt(n).

We ask "How likely is it that we would get x(bar) if H0 is true?" We calculate a P-value. This is the probability that we would get a statistic as extreme as we did if H0 is true. If the P-value is small, we might be tempted to reject H0. If we reject H0, then the sample statistic is statistically significant. That is, it is considerably different than what we might expect if H0 is true.

But, what is statistically significant?

Common levels of significance are 5% and 1%. If you test at the 5% level, you are saying that H0 will be rejected if the P-value is less than 5%. That is, if there is less than a 5% probability that you would get the statistic you did if H0 is true, then you will reject H0. In essence, you are saying that there is strong evidence to suggest that this sample did not come from a population with mean m . There is, of course, a 5% chance that you are incorrectly rejecting H0. When you incorrectly reject a null hypothesis, you are making a Type I Error. When you set a level of significance at 5%, you are allowing a 5% chance of making a Type I error.

OK, STATers... Lots to digest here. But, it will come together if you stick with it.

STATISTICAL POWER AND REASONING IS AWESOME.