5 THEORY OF STATISTICS.
measurement. Instead of assigning to any observation its true
value X, we assign to it the value X, corresponding to the centre
of the class-interval, thereby making an error 3, where
xX; = X Zs 0.
To deduce from this equation a formula showing the nature of
the influence of grouping on the standard-deviation we must know
the correlation between the error 6 and X or X;. If the original
distribution were a histogram, X; and & would be uncorrelated,
the mean value of 8 being zero for every value of Xj : further, the
square of the standard-deviatioh of 6 would be ¢2/12, where c is
the class-interval (Chap. VIII § 12, eqn. (10)). Hence, if 0 be the
standard-deviation of the grouped values X; and o the standard-
deviation of the true values JX,
Bam 0
go =g< 15s
But the true frequency distribution is rarely or never a
histogram, and trial on any frequency distribution approximating
to the symmetrical or slightly asymmetrical forms of fig. 5, p. 89,
or fig. 9 (a), p. 92, shows that grouping tends to increase rather
than reduce the standard-deviation. If we assume, as in § 3, that
the correlation between 8 and X, instead of 6 and X, is appreciably
zero and that the standard-deviation of § may be taken as ¢?/12,
as before (the values of 8 being to a first approximation uniformly
distributed over the class-interval when all the intervals are
considered together), then we have
Sg
g 01 19 . (4)
This is a formula of correction for grouping (Sheppard’s correc-
tion, refs. 1 to 4) that is very frequently used, and that trial
(ref. 1) shows to give very good results for a curve approximating
closely to the form of fig. 5, p. 89. The strict proof of the
formula lies outside the scope of an elementary work : it is based
on two assumptions: (1) that the distribution of frequency is
continuous, (2) that the frequency tapers off gradually to zero
in both directions. The formula would not give accurate results
in the case of such a distribution as that of fig. 9 (8), p. 92, or
fig. 14, p. 97, neither is it applicable at all to the more divergent
forms such as those of figs. 15, et seq.
5. If certain observations be repeated so that we have in every
case two measures 2; and x, of the same deviation #, it is possible
to obtain the true standard-deviation o, if the further assumption
is legitimate that the errors 6, and §, are uncorrelated with each
other. On this assumption
2192