THEORY OF STATISTICS.
If we use 4 to denote “heads” in the first toss, B “heads” in
the second, we have from the above (4)=44, (B)=53. Hence
(4)(B)|N = th 23-32, while actually (4B) is 26. Hence
there is a positive association, in the given record, between
the result of the first throw and the result of the second. But it
is fairly certain, from the nature of the case, that such association
cannot indicate any real connection between the results of the
two throws; it must therefore be due merely to such a complex
system of causes, impossible to analyse, as leads, for example, to
differences between small samples drawn from the same material.
The conclusion is confirmed by the fact that, of a number of such
records, some give a positive association (like the above), but
others a negative association.
8. An event due, like the above occurrence of positive associa-
tion, to an extremely complex system of causes of the general
nature of which we are aware, but of the detailed operation of
which we are ignorant, is sometimes said to be due to chance, or
better to the chances or fluctuations of sampling.
A little consideration will suggest that such associations due to
the fluctuations of sampling must be met with in all classes of
statistics. To quote, for instance, from § 1, the two illustrations
there given of independent attributes, we know that in any
actual record we would not be likely to find exactly the same
proportion of abnormally wet seasons in leap years as in ordinary
years, nor exactly the same proportion of male births when the
moon is waxing as when it is waning. But so long as the diver-
gence from independence is not well marked we must regard such
attributes as practically independent, or dependence as at least
unproved.
The discussion of the question, how great the divergence must
be before we can consider it as ““ well marked,” must be postponed
to the chapters dealing with the theory of sampling. At present
the attention of the student can only be directed to the existence
of the difficulty, and to the serious risk of interpreting a ‘chance
association ” as physically significant.
9. The definition of § 5 suggests that we are to test the
existence or the intensity of association between two attributes
by a comparison of the actual value of (4B) with its independence-
value (as it may be termed) (4)(B)/N. The procedure is from the
theoretical standpoint perhaps the most natural, but it is more
usual, and is simplest and best in practice, to compare proportions,
e.g. the proportion of 4’s amongst the B’s with the proportion
amongst the ’s. Such proportions are usually expressed in the
form of percentages or proportions per thousand.
30