346 THEORY OF STATISTICS.
than the other. If two samples be drawn quite independently
from different universes, indefinitely large samples from which
exhibit the standard-deviations o;, and o,, the standard error of
the difference of their means will be given by
oi 0%
SR ot 1D
This is, indeed, the formula usually employed for testing the
significance of the difference between two means in any case:
seeing that the standard error of the mean depends on the
standard-deviation only, and not on the mean, of the distribution,
we can inquire whether the two universes from which samples
have been drawn differ in mean apart from any dyfference in
dispersion.
If two quite independent samples be drawn from the same
universe, but instead of comparing the mean of the one with the
mean of the other we compare the mean m, of the first with the
mean m, of both samples together, the use of (6) or (7) is not
justified, for errors in the mean of the one sample are correlated
with errors in the mean of the two together. = Following precisely
the lines of the similar problem in § 13, Chap. XIII, case IIL, we
find that this correlation is Nn J(n, + ny), and hence
ny
0 =10; (my + 7g) h : \ . (8)
(For a complete treatment of this problem in the case of samples
drawn from two different universes ¢f. ref. 22.)
13. The distribution of means of samples drawn under the
conditions of simple sampling will always be more symmetrical
than the distribution of the original record, and the symmetry
will be the greater the greater the number of observations in the
sample. Further, the distribution of means (and therefore also of
the differences between means) tends to become not merely sym-
metrical but normal. We can only illustrate, not prove, the
point here ; but if the student will refer to§ 13, Chap. XV., he will
see that the genesis of the normal curve in this case is in accord-
ance with what we then stated, viz. that the distribution tends to
be normal whenever the variable may be regarded as the sum
(or some slightly more complex function) of a number of other
variables. In the present instance this condition is strictly ful
filled. The mean of the sample of n observations is the sum of
the values in the sample each divided by n, and we should expect
the distribution to be the more nearly normal the larger n. As
an illustration of the approach to symmetry even for small values