OVERVIEW: We can see relations betweentwo or more categorical variables by setting up tables. Up to thispoint, we have studied relations in which at least the responsevariable was quantitative.
A two-waytable of counts describes the relationshipbetween two categorical variables... the row variable and the columnvariable. The row totals and column totals give the marginaldistributions of the two variables separately, but do not give anyinformation about the relationships between the variables.Probabilities, including conditional probabilities, can be calculatedfrom two-way tables.
...Prob(X) is the probability that X is true.
...Prob(X|Y) is the probability that X is true, given that Y istrue.
Two hundred employees of a company are classifiedaccording to the following 2-by-3 table, where A, B, and C aremutually exclusive properties.
FEMALE 20 40 60 120 MALE 30 10 40 80
50 50 100 200
o What is the probability that a randomly chosenperson is female?
Ans. Prob(F) = 120/200 = 60%.
o What is the probability that a randomly chosenperson has property A?
Ans. Prob(A) = 50/200 = 25%.
o If a randomly chosen person is female, what isthe probability that she has property B?
Ans. Prob(B|F) = 40/120 = 33 1/3% [=prob(B and F)/prob(F).]
o If a randomly chosen person has property C, whatis the probability that the individual is a male?
Ans. Prob(M|C) = 40/100 = 40% [=prob(C and M)/prob(C).]
o If a randomly chosen person has B or C, what isthe probability that the person is a male?
Ans. Prob(M|B or C) = 50/150 = 331/3%.
An example of Simpson'sparadox:
Here are the batting averages of two baseballplayers for both halves of a season.
[Batting average is simply the ratio of
Times at bat
Times at bat
Here are the batting averages for the entireseason.
Caldwell: 110/400 =.275
Wilson: 30/105 =
Caldwell, despite having a better average thanWilson for both halves of the season, ends up with an overall averagethat is less than that of Wilson. Using percentages, one canconstruct numerous examples of Simpson's paradox.
From an algebraicstandpoint:
If a/b > c/d and p/q > r/s, then
...it is true that a/b + p/q > c/d + r/s.
...it is not necessarily true that (a+p)/(b+q) >(c+r)/(d+s).
RETURN TO TEXTBOOK HOME PAGE /Back to the top of this page