THEORY OF STATISTICS.
however, be chosen, for simplicity in classification, so that no
limit corresponds exactly to any recorded value (cf. § 8 below). In
some exceptional cases, moreover, the observations exhibit a marked
clustering round certain values, e.g. tens, or tens and fives. This
is generally the case, for instance, in age returns, owing to the
tendency to state a round number where the true age is unknown.
Under such circumstances, the values round which there is a
marked tendency to cluster should preferably be made mid-values
of intervals, in order to avoid sensible error in the assumption that
the mid-value is approximately representative of the values in the
class. Thus, in the case of ages, since the clustering is chiefly round
tens, ¢ 25 and under 35,” “35 and under 45,” etc., the classification
of the English census, is a better grouping than ¢ 20 and under
30,” «30 and under 40,” and so on (cf. the Census of England and
Wales, 1911, vol. vii., and also ref. 5, in which a different view is
taken). When there is any probability of a clustering of this kind
occurring, it is as well to subject the raw material to a close
examination before finally fixing the classification.
1. Classification.—The scale of intervals having been fixed, the
observations may be classified. If the number of observations is
not large, it will be sufficient to mark the limits of successive
intervals in a column down the left-hand side of a sheet of paper,
and transfer the entries of the original record to this sheet by
marking a 1 on the line corresponding to any class for each entry
assigned thereto. It saves time in subsequent totalling if each
fifth entry in a class is marked by a diagonal across the preceding
four, or by leaving a space.
The disadvantage in this process is that it offers no facilities for
checking: if a repetition of the classification leads to a different
result, there is no means of tracing the error. If the number of
observations is at all considerable and accuracy is essential, it is
accordingly better to enter the values observed on cards, one to
each observation. These are then dealt out into packs according
to their classes, and the whole work checked by running through
the pack corresponding to each class, and verifying that no cards
have been wrongly sorted.
8. In some cases difficulties may arise in classifying, owing to
the occurrence of observed values corresponding to class-limits.
Thus, in compiling Table I., some districts will have been noted
with death-rates entered in the Registrar-General’s returns as
16-5, 175, or 185, any one of which might at first sight have
been apparently assigned indifferently to either of two adjacent
classes. In such a case, however, where the original figures for
numbers of deaths and population are available, the difficulty may
be readily surmounted by working out the rate to another place
20