IX.— CORRELATION. 177
Hence the sons of fathers of deviation « from the mean of all fathers
have an average deviation of only 0-522 from the mean of all sons ;
ve. they step back or “regress” towards the general mean, and 0-52
may be termed the “ratio of regression.” In general, however,
the idea of a “stepping back” or “regression” towards a more
or less stationary mean is quite inapplicable—obviously so where
the variables are different in kind, as in Tables V. and VI.—
and the term “ coeflicient of regression” should be regarded simply
as a convenient name for the coefficients 4, and 6,, RR and CC
are generally termed the “lines of regression,” and equations (6)
the “regression equations.” The expressions “ characteristic lines,”
““ characteristic equations” (Yule, ref. 8) would perhaps be better.
Where the actual means of arrays appear to be given, to a satis-
factory degree of approximation, by straight lines, we may say
that the regression is linear. It is not safe, however, to assume
that such linearity extends beyond the limits of observation.
14. The two standard deviations
8,=0, n/1-12 8,=0, 1-12
are of considerable importance. It follows from (7) that s, is the
standard deviation of (z-6,.7), and similarly s, is the standard
deviation of (y — b,x). Hence we may regard s, and s, as the
standard errors (root mean square errors) made in estimating «
from y and y from « by the respective characteristic relations
x=05.y y =bya.
s, may also be regarded as a kind of average standard deviation of
a row about RE, and s, as an average standard deviation of a
column about CC. In an ideal case, where the regression is
truly linear and the standard deviations of all parallel arrays are
equal, a case to which the distribution of Table III. is a rough
approximation, s, is the standard deviation of the z-array and s,
the standard deviation of the y-array (cf. Chap. X. § 19 (3)).
Hence s, and s, are sometimes termed the “standard deviations
of arrays.”
15. Proceeding now to the arithmetical work, the only new
expression that has to be calculated in order to determine 7 5,, 0,
$x» and s, is the product sum 3(zy) or the mean product p. Asin
the cases of means and standard deviations, the form of the
arithmetic is slightly different according as the observations are
few and ungrouped, or sufficient to justify the formation of a
correlation-table. In the first case, as in Example i. below, the
work is quite straightforward.
Ezample i., Table VII.—The variables are (1) X—the estimated
12