Simpson's Paradox: A Simple Example

Sanderson M. Smith

Z-SCORES ...WHAT ARE THEY?

OK, good people. I want you to go off to college really understanding what STATISTICS is all about.

We will be doing some incredibly sophisticated things (college level and graduate school level) during the second marking period.

The concept of a z-score (often called a standardized score) is incredibly important.

Any set of numerical data has a mean and a standard deviation.

The mean is a measure of center.

The standard deviation is a measure of spread.

Together they give you some tremendous insight into the nature of the data.

But, hang tight here.... If I told you the mean of a data set was 60 and the standard deviation was 25, would you conclude that the data was pretty well "spread out?" ("Spread out".... using the phrase loosely here.)

Well...maybe! If, for instance, the STATISTICS class took a test and the mean was 60 and the standard deviation was 25, then (if test scores could range from 0 to 100), the distribution of the scores is indeed "spread out."

However, if I have a set of numbers that can range from -1000 to 1000, then the distribution really isn't "spread out" at all. In fact, it is pretty well "condensed" in relation to the range of the data.

WHAT'S THE POINT? The mean and standard deviation are useful statistics. But, unless you know the range of the data values, you really don't know to what extent the data is distributed over the range of possible values.

Now...

Consider any numerical data set, S. The set S has a mean m and a standard deviation s.

You can create another data set S₁ by subtracting the mean ( m ) from each score.

The set S₁ has mean = 0 and standard deviation s.

Now, take this set and divide each number by the standard deviation s.

Now... you have a set S₂ with mean = 0 and standard deviation = 1.

In many statistical situations, if a data item is more than 2 standard deviations from the mean, this is considered to be statistical significant. A z-score (standardized score) is very useful in the sense that it tells you immediately how many standard deviations a score is below or above the mean. Quality control (which we will study) relies heavily on z-scores. It is to your advantage to know what they represent.

What follows should make sense ---->

Consider a population of scores with mean 70 and standard deviation 6.

A raw score of 70 would have a z-score of 0.

A raw score of 82 would have a z-score of 2.

A raw score of 61 would have a z score of -1.5.

If a z-score is 1.6 would have a raw score of 70 + 1.6(6) = 79.6.

If a z-score is = -2.2, then the raw score is 70 +(-2.2)6 = 56.8.

MATH POWER TO ALL (and STATISTICAL POWER to those in STATISTICS).

"I gather, young man, that you wish to be a Member of Parliament. The first lesson you must learn is, when I call for statistics about the rate of infant mortality, what I want is proof that fewer babies died when I was Prime Minister than when anyone else was Prime Minister. That is a political statistic."

-WINSTON CHURCHILL

RETURN TO WRITING HOME PAGE

Previous Page | Print This Page